Penn State Official Sheild
Xiaonan Lu

Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University, University Park
personal photo
Office Address 312 IST Building
University Park, PA 16802
Office Phone 1-814-863-2556
E-Mail xlu [at] cse.psu.edu


I am a PhD candidate in the department of Computer Science and Engineering. My co-advisers are Dr. James Z. Wang and Dr. C. Lee Giles. I am a member of the Intelligent Information Systems Research Laboratory.

My research interests include information retrieval, computer vision, data mining, machine learning, medical image analysis, document image analysis and recognition, and digital library.


Research | Publications | Teaching | Professional Activities


Research

  • Intelligent Parsing of Scanned Volumes for Web Based Archives
    The increasing usage of web-based digital libraries and the large number of existing documents raise important issues in efficient handling of documents and effective retrieval of information contained in them. In this work, we present our system for intelligent parsing of scanned volumes. By automatic analyses of the logical structures of scanned volumes and automatic extraction of metadata, the system aims to enable high-level understanding and intelligent retrieval of semantic content of scanned volumes.[See ICSC2007 Paper for details]

  • Deriving Knowledge from Figures for Digital Libraries
    Figures in digital documents contain important information. Current digital libraries do not summarize and index information available within figures for document retrieval. We design a system on deriving knowledge from figures, including semantic type of figures and quantitative data embedded within figures, which can be integrated with textual information within documents to provide more effective document retrieval services for digital library users.[see WWW2007 paper]

  • Automated Data Extraction from Information Graphics in Scientific Documents
    Information graphics in digital documents contain important information. Often, the results of scientific experiments and performance of businesses are summarized using information graphics. Although information graphics are easily understood by human users, current search engines rarely utilize the information contained in the plots to enhance the results returned in response to queries posed by end-users. We propose an automated algorithm for exracting information from line curves in 2-D plots, a popular type of information graphics in scientific documents. The extracted information can be stored in a database and indexed to answer end-user queries and enhance search results.[See ICDAR2007 paper for details]

  • Automatic Categorization of Figures in Scientific Documents
    Figures are very important non-textual information contained in scientific documents. Current digital libraries do not provide users tools to retrieve documents based on the information available within the figures. We propose an architecture for retrieving documents by integrating figures and other information. The initial step in enabling integrated document search is to categorize figures into a set of pre-defined types. We propose several categories of figures based on their functionalities in scholarly articles. We have developed a machine-learning-based approach for automatic categorization of figures using features extracted from content of figures.[see JCDL2006 paper for details]

  • Learning Representative Objects from Images Using Quadratic Optimization
    With the development of Content-Based Image Retrieval (CBIR) and ever increasing computing power, there is a notable growing interest in automatic learning from images. We introduce a quadratic optimization based learning technique to enable computers to learn visual characteristics of a semantic concept from unlabeled images. In our work, images are represented by regions extracted from segmentation. Given a group of images conveying a semantic concept, we attempt to detect the region corresponding to the concept in every image using quadratic optimization. To characterize the visual properties of the concept, the mean of the feature vectors each describing the concept-associated region of an image is calculated and referred to as the representative feature vector. The proposed learning technique can be applied to semantics-sensitive image retrieval and object recognition applications. [see ACIDCA2005 paper for details]


  • Publications

    Journal Publications

  • Xiaonan Lu, James Z. Wang, Prasenjit Mitra, and C. Lee Giles, ``Automated Analysis of Images in Documents for Intelligent Search''(Second-round review).

    Peer-Reviewed Conference Publications

  • Xiaonan Lu, Brewster Kahle, James Z. Wang and C. Lee Giles, ``A Metadata Generation System for Scanned Scientific Volumes'', Proceedings of the ACM and IEEE Joint Conference on Digital Libraries, pp. ???-???, Pittsburgh, PA, ACM, June 2008. (download) (g-scholar)

  • Xiaonan Lu, James Z. Wang and C. Lee Giles, ``Intelligent Parsing of Scanned Volumes for Web based Archives'', Proceedings of the International Conference on Semantic Computing (ICSC2007), pp. 559-566, Irvine, September 2007. (download) (g-scholar)

  • Xiaonan Lu, James Z. Wang, Prasenjit Mitra and C. Lee Giles, ``Automatic Extraction of Data from 2-D Plots in Documents'', Proceedings of the International Conference on Document Analysis and Recognition (ICDAR2007), pp.188-192, Parana, Brazil, September 2007. (download) (g-scholar)

  • Xiaonan Lu, James Z. Wang, Prasenjit Mitra and C. Lee Giles, ``Deriving Knowledge from Figures for Digital Libraries,'' Proceedings of the International World Wide Web Conference (WWW2007), pp. 1229-1230, Banff, Alberta, Canada, May 2007. (download) (g-scholar)

  • Xiaonan Lu, Prasenjit Mitra, James Z. Wang and C. Lee Giles, ``Automatic Categorization of Figures in Scientific Documents,'' Proceedings of the ACM and IEEE Joint Conference on Digital Libraries (JCDL2006), pp. 129-138, Chapel Hill, NC, ACM, June 2006. (download) (g-scholar)

  • Xiaonan Lu, Jia Li and James Z. Wang, ``Learning Representative Objects from Images Using Quadratic Optimization,'' Proceedings of the Second International Conference on Machine Intelligence (ACIDCA2005), co-located with the UN World Summit on the Information Society, invited for a special session, pp. 730-737, Tozeur, Tunisia, ACIDCA, November 2005. (download) (g-scholar)


  • Teaching

    Instructor(recitation classes)
  • CMPSC201C: Computer Programming for Engineers using C++

    Teaching Assistant
  • CSE514: Computer Network
  • CMPSC360: Discrete Mathematics


  • Professional Activities

    Presentations
  • "Intelligent Parsing of Scanned Volumes for Web based Archives", Proceedings of the International Conference on Semantic Computing (ICSC2007),Irvine, September 2007.
  • "Automatic Categorization of Figures in Scientific Documents", the ACM and IEEE Joint Conference on Digital Libraries (JCDL2006), Chapel Hill, NC, June 2006.
  • "A region-based method for image annotation", 202 IST Building, February 2, 2005.
  • "An Introduction of Linear Programming", 205 IST Building, September 8, 2004.

    Reviewer for scientific publications
  • WWW, SIGIR, JCDL, VISUAL, ACM Multimedia, ICSC, CIKM, IEEE Transactions on Pattern Analysis and Machine Intelligence

    Membership
  • IEEE



  • Page maintained by: xlu@cse.psu.edu