Yang Song

 

 


Search Quality Group

yangsong AT microsoft DOT com

Microsoft Research

http://research.microsoft.com/people/yangsong

Redmond, WA 98034

Office: (425) 704-9449

USA

Cell: (814) 777-1087


RESEARCH INTERESTS

  • Machine Learning, Data Mining, Text Classification and Clustering, Search Engine, Information Retrieval,
    Personalized Classification, Entity Extraction.

 

EDUCATION

2004.8 - 2008.12:

Ph.D of Department of Computer Science&Engineering, The Pennsylvania State University, University Park
GPA: 3.90 / 4.0

1999.9 - 2003.7:

B.S. of College of Computer Science, Zhejiang University, China
GPA (major): 3.92 / 4.0

 

PROFESSIONAL EXPERIENCE

2009.1 - Present

Research Developer, Microsoft Research, Redmond
Internet Services Research Center (ISRC)
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052, USA

·         Responsible for carrying out basic and applied research on the challenging issue of search engine related optimization issues. Collaborate with both researchers and engineers to improve the quality of search engine relevance.

2008.5 - 2008.8

Summer Research Intern (Mentor: Anca Sailer, Manager: Hidayatullah Shaikh)
IBM Research, TJ Watson center, New York

  • Hierarchical online text classification for real-time problem classification in large-scale distributed technical support services. Comparing to the traditional batch learning algorithms, this online learning framework can significantly decrease the computational complexity of the training process by learning from new instances on an incremental fashion. In the same time this reduces the amount of memory required to store the training instances.

2007.5 - 2007.8

Summer Research Intern, (Supervisor, Aleksander Klocz)
Microsoft Research, Redmond, Live Labs

  • Large-scale text classification with application to email spam filtering. Proposed a novel two-level cascaded Naive Bayes model. The model was further augmented by collaborative filtering to improve the performance. Results on Hotmail data shows substantial improvement against previous methods.

2005.5 - 2005.8

Summer Research Intern, (Supervisor, Dr. Eric Glover, Dr. Tomasz Imielinski)
Research Group: Ask, Piscataway, NJ

  • Co-designed and improved real-time on-line entity extraction algorithms, designed and implemented three demo systems that may eventually lead to product(s) on the live site. The algorithms showed better running time, as well as better precision and recall, compared to some of the current wildly-used extraction algorithms. The demo systems involved processing of both unstructured and semi-structured data, including entity extraction from generic HTML pages.

2004.9 – 2008.12:

Research Assistant, (Supervisor: Dr. Lee Giles)
Laboratory of the Intelligent Information Systems, CSE Dept, Penn State Univ.

  • Personalized Service of the Next Generation CiteSeer, including personalized search, automatic taxonomy generation, topic-based personalized document classification, submission system and so on.
  • Name disambiguation and entity resolution for meta-data in digital library, leveraged statistical machine learning methods.
  • Performed efficient document classification in large-scale digital libraries, novel dimension reduction technique was introduced by applying entity extraction and collaborative filtering methods.
  • Co-designed and implemented a novel multi-class boosting algorithm.
  • Distributed Event Management for the Next Generation CiteSeer. Designing new two-phase commit algorithms for distributed user events, including post-validation, propagation-validation, failure recovery and etc.

2001.5 - 2003.7:

Research Assistant, (Supervisor: Dr. Bo Zhou)
State Street Technology Center, Zhejiang University, China

  • Oscar database platform, co-operated with a US top invest company SSgA, I was in charge of designing the query optimizer.

PAPERS & PUBLICATIONS

Journal Articles

  • Yang Song, Alek Kolcz, C. Lee Giles, "Better Naive Bayes Classification for High-Precision Spam Detection",
    accepted. To appear in Journal of Software: Practice and Experience (SPE).
  • Yang Song, C. Lee Giles. "Text Classification by Augmenting Feature Space", submitted for review. [pdf]
  • Umer Farooq, Yang Song, John M. Carroll and C. Lee Giles. "Social Bookmarking for Scholary Digital Libraries".
    IEEE Internet Computing, 29-35, Dec 2007.
  • Yang Song, Lu Zhang, C. Lee Giles, "Automatic Tag Recommendation Algorithms for Social Recommender
    Systems"
    , in review.

Conference Papers

  • Yang Song, Anca Sailer, Hidayatullah Shaikh, "Problem Classification Method to Enhance the ITIL Incident, Problem and
    Change Management Process"
    , in Proceedings of the 11th IFIP/IEEE International Symposium on Integrated Network Management
    (IM 2009), June 2009.
  • Yang Song, Lu Zhang, C. Lee Giles, "A Non-parametric Approach to Pair-wise Dynamic Topic Correlation Detection", in
    Proceedings of IEEE International Conference on Data Mining series (ICDM 2008), Pica, Italy, December 2008.
  • Yang Song, Lu Zhang, C. Lee Giles, "Sparse Gaussian Processes Classification for Fast Tag Recommendation", in
    Proceedings of ACM 17th Conference on Information and Knowledge Management (CIKM 2008), Napa Valley, California, USA.
    October 2008.
  • Yang Song, C. Lee Giles, "Efficient User Preference Predictions Using Collaborative Filtering", to appear at the 19th
    International Conference on Pattern Recognition  (ICPR 2008), Tampa, Florida, USA.
  • Yang Song, Ziming Zhuang, Huajing Li, Jia Li, Wang-chien Lee, C. Lee Giles, "Real-time Automatic Tag Recommendation", 31st
    Annual International ACM SIGIR Conference (SIGIR 2008), Singapore.
  • Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, C. Lee Giles, "IKNN: Informative sK-Nearest Neighbor Classification",
    ECML/PKDD 2007, Warsaw, Poland.
    [pdf]
  • Farooq, U., Kannampallil, T., Song, Y., Ganoe, C.H., Carroll, J.M., and Giles, C.L. (2007). "Evaluating tagging behavior
    in social bookmarking systems: metrics and design heuristics."
    ACM Proceedings of the International GROUP Conference
    on Supporting Group Work (Sanibel Island, Florida, November 4-7, 2007), In Press. New York, NY: ACM Press
    .
  • Yang Song, Jian Huang, Isaac G. Councill, Jia Li, C. Lee Giles, "Efficient Topic-based Unsupervised Name  
    Disambiguation", Proceedings of 2007 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2007), 2007. [pdf]
  • Y. Song, J. Huang, I. G. Councill, J. Li, C. L. Giles, "Generative Models for Name Disambiguation". Accepted
    by the 16th International World Wide Web Conference (WWW 2007) (Poster).
    [pdf]
  • Yang Song, Ding Zhou, Jian Huang, Isaac Councill, Hongyuan Zha, C. Lee Giles, "Boosting the Feature Space:
    Text Categorization for Unstructured Data on the Web"
    , In Proceedings of the Sixth IEEE international Conference
    on Data Mining, (ICDM 2006).
    [pdf]
  • J Carroll, U Farooq, C Ganoe, Y Song, I. Councill, and C. L. Giles. "Collaborative search and sensemaking in CiteSeer",
    Human Computer Interaction Consortium (HCIC 2007), accepted.
    [pdf]
  • Jian Huang, Seyda Erekia, Yang Song, Hongyuan Zha, C. Lee Giles, "Multi-class Boosting Procedures with K-class
    Weak Classifiers"
    , accepted by 2007 SIAM Conference on Data Mining (SDM07).
    [pdf]
  • Li, H., Councill, I.G., Bolelli, L., Zhou, D., Song, Y., Lee, W., Sivasubramaniam, A., & Giles, C.L. (2005)
    "CiteSeerX - A scalable autonomous scientific digital library". In Proceedings of the First International Conference
    on Scalable Information Systems (INFOSCALE 06), Hong Kong, China.
    [pdf]
  • Ding Zhou, Yang Song, Ya Zhang, Hongyuan Zha, "Towards Discovering Organizational Structure from Email Corpus",
    In proceedings of the 4th IEEE International Conference on Machine Learning and Applications, Los Angeles , CA, U.S.A.
    2005 (ICMLA 2005). [pdf]

US Patents

  • Problem Classification Method to Enhance the Incident Management Process for the IT Services, filed at IBM.
  • Business Partner Selection algorithms for Remote Managed Services, filed at IBM.

HORNORS & AWARDS

2008.11

NSF Student Travel Award

2008.9

CIKM Student Travel Award

2008.5

SIGIR Student Travel Award

2004 - 2005:

College of Engineering Fellowship (The Graduate School Endowed Fellowship)
The Pennsylvania State University, University Park

2004 - 2005:

Graduate Teaching Assistantship
The Pennsylvania State University, University Park

1999 - 2003:

Various Awards from Zhejiang University

2001. 9:

Excellent English Study Award (awarded to students who scored above 90 in the CET 4)

1999.4:

Direct promoted to ZJU, waiving National College Entrance Examinations

1995.11:

Third Prize of National Olympics of Mathematics (NOM) '95

MEMBERSHIPS

Sigma Xi

Full Member

IEEE

Full Member

ACM

Student Member

Other Activities

  • Served as reviewer of SIGMOD2005, SIGIR2005, JCDL2005, IJCAI2005, WWW2005, CIKM2005, WWW2006, SIGIR2006, JCDL2006,
    CIKM2006, WWW2007, JCDL 2007, SIGIR 2007, JCDL2008, SIGIR2008.

COMPUTER SKILLS

  • Platforms: DOS, Windows 3.x/95/98/NT/XP, Unix/Linux, MFC, Win32
  • Languages: C, C++, Java, JavaScript, HTML, Pascal, Perl, x86 Assembly language, Visual Basic, SML, Prolog
  • Proficiency in Web application development with Java Servlet and Java Server Page
  • Database application development, SQL, experience with Microsoft SQL Server, IBM DB2, Oracle and MySQL
  • Experiences in hardware designing and debugging