Yang Song

 

 


310 IST Building

yasong@cse.psu.edu

Department of Computer Science & Engineering

http://www.cse.psu.edu/~yasong

The Pennsylvania State University

Office: (814) 865-6168

University Park, PA 16802, USA

Cell: (814) 777-1087


RESEARCH INTERESTS

  • Machine Learning, Data Mining, Text Classification and Clustering, Search Engine, Information Retrieval,
    Personalized Classification, Entity Extraction.

 

EDUCATION

2004.8 - Present:

Ph.D candidate of Department of Computer Science&Engineering, The Pennsylvania State University, University Park
GPA: 3.90 / 4.0

2000.9 - 2004.7:

B.S. of College of Computer Science, Zhejiang University, China
GPA (major): 3.92 / 4.0

 

RESEARCH EXPERIENCE

2008.5 - 2008.8

Summer Research Intern (Mentor: Anca Sailer, Manager: Hidayatullah Shaikh)
IBM Research, TJ Watson center, New York

  • Hierarchical online text classification for real-time problem classification in large-scale distributed technical support services. Comparing to the traditional batch learning algorithms, this online learning framework can significantly decrease the computational complexity of the training process by learning from new instances on an incremental fashion. In the same time this reduces the amount of memory required to store the training instances.

2007.5 - 2007.8

Summer Research Intern, (Supervisor, Aleksander Klocz)
Microsoft Research, Redmond
Machine Learning and Applied Statistics Group & Live Labs

  • Large-scale text classification with application to email spam filtering. Proposed a novel two-level cascaded Naive Bayes model. The model was further augmented by collaborative filtering to improve the performance. Results on Hotmail data shows substantial improvement against previous methods.

2005.5 - 2005.8

Summer Research Intern, (Supervisor, Dr. Eric Glover, Dr. Tomasz Imielinski)
Research Group: Ask, Piscataway, NJ

  • Co-designed and improved real-time on-line entity extraction algorithms, designed and implemented three demo systems that may eventually lead to product(s) on the live site. The algorithms showed better running time, as well as better precision and recall, compared to some of the current wildly-used extraction algorithms. The demo systems involved processing of both unstructured and semi-structured data, including entity extraction from generic HTML pages.

2004.9 - Present:

Research Assistant, (Supervisor: Dr. Lee Giles)
Laboratory of the Intelligent Information Systems, CSE Dept, Penn State Univ.

  • Personalized Service of the Next Generation CiteSeer, including personalized search, automatic taxonomy generation, topic-based personalized document classification, submission system and so on.
  • Name disambiguation and entity resolution for meta-data in digital library, leveraged statistical machine learning methods.
  • Performed efficient document classification in large-scale digital libraries, novel dimension reduction technique was introduced by applying entity extraction and collaborative filtering methods.
  • Co-designed and implemented a novel multi-class boosting algorithm.
  • Distributed Event Management for the Next Generation CiteSeer. Designing new two-phase commit algorithms for distributed user events, including post-validation, propagation-validation, failure recovery and etc.

2001.5 - 2003.7:

Research Assistant, (Supervisor: Dr. Bo Zhou)
State Street Technology Center, Zhejiang University, China

  • Oscar database platform, co-operated with a US top invest company SSgA, I was in charge of designing the query optimizer.

 

PAPERS & PUBLICATIONS

Conference Papers

  • Yang Song, Anca Sailer, Hidayatullah Shaikh, "Problem Classification Method to Enhance the ITIL Incident, Problem and
    Change Management Process"
    , in Proceedings of the 11th IFIP/IEEE International Symposium on Integrated Network Management
    (IM 2009), June 2009.
  • Yang Song, Lu Zhang, C. Lee Giles, "A Non-parametric Approach to Pair-wise Dynamic Topic Correlation Detection", in
    Proceedings of IEEE International Conference on Data Mining series (ICDM 2008), Pica, Italy, December 2008.
  • Yang Song, Lu Zhang, C. Lee Giles, "Sparse Gaussian Processes Classification for Fast Tag Recommendation", in
    Proceedings of ACM 17th Conference on Information and Knowledge Management (CIKM 2008), Napa Valley, California, USA.
    October 2008.
  • Yang Song, C. Lee Giles, "Efficient User Preference Predictions Using Collaborative Filtering", to appear at the 19th
    International Conference on Pattern Recognition  (ICPR 2008), Tampa, Florida, USA.
  • Yang Song, Ziming Zhuang, Huajing Li, Jia Li, Wang-chien Lee, C. Lee Giles, "Real-time Automatic Tag Recommendation", 31st
    Annual International ACM SIGIR Conference (SIGIR 2008), Singapore.
  • Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, C. Lee Giles, "IKNN: Informative sK-Nearest Neighbor Classification",
    ECML/PKDD 2007, Warsaw, Poland.
    [pdf]
  • Farooq, U., Kannampallil, T., Song, Y., Ganoe, C.H., Carroll, J.M., and Giles, C.L. (2007). "Evaluating tagging behavior
    in social bookmarking systems: metrics and design heuristics."
    ACM Proceedings of the International GROUP Conference
    on Supporting Group Work (Sanibel Island, Florida, November 4-7, 2007), In Press. New York, NY: ACM Press
    .
  • Yang Song, Jian Huang, Isaac G. Councill, Jia Li, C. Lee Giles, "Efficient Topic-based Unsupervised Name  
    Disambiguation", Proceedings of 2007 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2007), 2007. [pdf]
  • Y. Song, J. Huang, I. G. Councill, J. Li, C. L. Giles, "Generative Models for Name Disambiguation". Accepted
    by the 16th International World Wide Web Conference (WWW 2007) (Poster).
    [pdf]
  • Yang Song, Ding Zhou, Jian Huang, Isaac Councill, Hongyuan Zha, C. Lee Giles, "Boosting the Feature Space:
    Text Categorization for Unstructured Data on the Web"
    , In Proceedings of the Sixth IEEE international Conference
    on Data Mining, (ICDM 2006).
    [pdf]
  • J Carroll, U Farooq, C Ganoe, Y Song, I. Councill, and C. L. Giles. "Collaborative search and sensemaking in CiteSeer",
    Human Computer Interaction Consortium (HCIC 2007), accepted.
    [pdf]
  • Jian Huang, Seyda Erekia, Yang Song, Hongyuan Zha, C. Lee Giles, "Multi-class Boosting Procedures with K-class
    Weak Classifiers"
    , accepted by 2007 SIAM Conference on Data Mining (SDM07).
    [pdf]
  • Li, H., Councill, I.G., Bolelli, L., Zhou, D., Song, Y., Lee, W., Sivasubramaniam, A., & Giles, C.L. (2005)
    "CiteSeerX - A scalable autonomous scientific digital library". In Proceedings of the First International Conference
    on Scalable Information Systems (INFOSCALE 06), Hong Kong, China.
    [pdf]
  • Ding Zhou, Yang Song, Ya Zhang, Hongyuan Zha, "Towards Discovering Organizational Structure from Email Corpus",
    In proceedings of the 4th IEEE International Conference on Machine Learning and Applications, Los Angeles , CA, U.S.A.
    2005 (ICMLA 2005). [pdf]

Journal Articles

  • Yang Song, C. Lee Giles. "Text Classification by Augmenting Feature Space", submitted for review. [pdf]
  • Umer Farooq, Yang Song, John M. Carroll and C. Lee Giles. "Social Bookmarking for Scholary Digital Libraries".
    IEEE Internet Computing, 29-35, Dec 2007.
  • Yang Song, Alek Kolcz, C. Lee Giles, "Better Naive Bayes Classification for High-Precision Spam Detection",
    in review.
  • Yang Song, Lu Zhang, C. Lee Giles, "Automatic Tag Recommendation Algorithms for Social Recommender
    Systems"
    , in review.

HORNORS & AWARDS

2004 - 2005:

College of Engineering Fellowship (The Graduate School Endowed Fellowship)
The Pennsylvania State University, University Park

2004 - 2005:

Graduate Teaching Assistantship
The Pennsylvania State University, University Park

1999 - 2003:

Various Awards from Zhejiang University

2001. 9:

Excellent English Study Award (awarded to students who scored above 90 in the CET 4)

1999.4:

Direct promoted to ZJU, waiving National College Entrance Examinations

1995.11:

Third Prize of National Olympics of Mathematics (NOM) '95

STANDARD TESTS

TOEFL

650 TWE 5.0

GRE

2250 (verbal 650 (92%), analytical 800 (99%), quantitative 800 (99%))

AEOCPT
(American English Oral Communicative Proficiency Test )

290/300 (equivalent to TSE 60/60, hold by ITA program in Penn State Univ.)

 

 

COMPUTER SKILLS

  • Platforms: DOS, Windows 3.x/95/98/NT/XP, Unix/Linux, MFC, Win32
  • Languages: C, C++, Java, JavaScript, HTML, Pascal, Perl, x86 Assembly language, Visual Basic, SML, Prolog
  • Proficiency in Web application development with Java Servlet and Java Server Page
  • Database application development, SQL, experience with Microsoft SQL Server, IBM DB2, Oracle and MySQL
  • Experiences in hardware designing and debugging

Other Activities

  • Served as reviewer of SIGMOD2005, SIGIR2005, JCDL2005, IJCAI2005, WWW2005, CIKM2005, WWW2006, SIGIR2006, JCDL2006,
    CIKM2006, WWW2007, JCDL 2007, SIGIR 2007, JCDL2008, SIGIR2008.