I am a PhD candidate in the department of Computer Science and Engineering. My co-advisers are Dr. James Z. Wang and
Dr. C. Lee Giles. I am a member of the Intelligent Information Systems Research Laboratory.
My research interests include information retrieval, computer vision, data mining, machine learning, medical image analysis, document image analysis and recognition, and digital library.
Research |
Publications |
Teaching |
Professional Activities
Research
Intelligent Parsing of Scanned Volumes for Web Based Archives
The increasing usage of web-based digital libraries and the large number of existing documents
raise important issues in efficient handling of documents and effective retrieval of information
contained in them. In this work, we present our
system for intelligent parsing of scanned volumes. By automatic analyses of the logical structures
of scanned volumes and automatic extraction of metadata, the system
aims to enable high-level understanding and intelligent retrieval of
semantic content of scanned volumes.[See ICSC2007 Paper for details]
Deriving Knowledge from Figures for Digital Libraries
Figures in digital documents contain important information. Current
digital libraries do not summarize and index information available
within figures for document retrieval. We design a system on deriving
knowledge from figures, including semantic type of figures and quantitative
data embedded within figures, which can be integrated with textual information within
documents to provide more effective document retrieval services for digital library users.[see WWW2007 paper]
Automated Data Extraction from Information Graphics in Scientific Documents
Information graphics in digital documents contain important information. Often, the results of
scientific experiments and performance of businesses are summarized using information graphics.
Although information graphics are easily understood by human users, current search engines
rarely utilize the information contained in the plots to enhance the results returned in response
to queries posed by end-users. We propose an automated algorithm for exracting information from
line curves in 2-D plots, a popular type of information graphics in scientific documents. The
extracted information can be stored in a database and indexed to answer end-user queries and enhance
search results.[See ICDAR2007 paper for details]
Automatic Categorization of Figures in Scientific Documents
Figures are very important non-textual information contained in scientific documents.
Current digital libraries do not provide users tools to retrieve documents based on the
information available within the figures. We propose an architecture for retrieving documents
by integrating figures and other information. The initial step in enabling integrated
document search is to categorize figures into a set of pre-defined types. We propose
several categories of figures based on their functionalities in scholarly articles.
We have developed a machine-learning-based approach for automatic categorization of figures using
features extracted from content of figures.[see JCDL2006 paper for details]
Learning Representative Objects from Images Using Quadratic Optimization
With the development of Content-Based Image Retrieval (CBIR) and ever increasing computing power,
there is a notable growing interest in automatic learning from images.
We introduce a quadratic optimization based learning
technique to enable computers to learn visual characteristics of a semantic concept
from unlabeled images. In our work, images are represented by regions extracted
from segmentation. Given a group of images conveying a semantic concept, we attempt to
detect the region corresponding to the concept in every image using quadratic optimization.
To characterize the visual properties of the concept,
the mean of the feature vectors each describing the concept-associated region of an image
is calculated and referred to as the representative feature vector. The proposed learning
technique can be applied to semantics-sensitive image retrieval and object recognition applications.
[see ACIDCA2005 paper for details]
Publications
Journal Publications
Xiaonan Lu, James Z. Wang, Prasenjit Mitra, and C. Lee Giles,
``Automated Analysis of Images in Documents for Intelligent Search''(Second-round review).
Peer-Reviewed Conference Publications
Xiaonan Lu, Brewster Kahle, James Z. Wang and C. Lee Giles,
``A Metadata Generation System for Scanned Scientific Volumes'',
Proceedings of the ACM and IEEE Joint Conference on Digital Libraries,
pp. ???-???, Pittsburgh, PA, ACM, June 2008.
(download)
(g-scholar)
Xiaonan Lu, James Z. Wang and C. Lee Giles,
``Intelligent Parsing of Scanned Volumes for Web based Archives'',
Proceedings of the International Conference on Semantic Computing (ICSC2007),
pp. 559-566, Irvine, September 2007.
(download)
(g-scholar)
Xiaonan Lu, James Z. Wang, Prasenjit Mitra and C. Lee Giles,
``Automatic Extraction of Data from 2-D Plots in Documents'',
Proceedings of the International Conference on Document Analysis and Recognition (ICDAR2007),
pp.188-192, Parana, Brazil, September 2007.
(download)
(g-scholar)
Xiaonan Lu, James Z. Wang, Prasenjit Mitra and C. Lee Giles,
``Deriving Knowledge from Figures for Digital Libraries,''
Proceedings of the International World Wide Web Conference (WWW2007),
pp. 1229-1230, Banff, Alberta, Canada, May 2007.
(download)
(g-scholar)
Xiaonan Lu, Prasenjit Mitra, James Z. Wang and C. Lee Giles,
``Automatic Categorization of Figures in Scientific Documents,''
Proceedings of the ACM and IEEE Joint Conference on Digital Libraries (JCDL2006),
pp. 129-138, Chapel Hill, NC, ACM, June 2006.
(download)
(g-scholar)
Xiaonan Lu, Jia Li and James Z. Wang,
``Learning Representative Objects from Images Using Quadratic Optimization,''
Proceedings of the Second International Conference on Machine Intelligence (ACIDCA2005),
co-located with the UN World Summit on the Information Society, invited for a special session, pp. 730-737, Tozeur, Tunisia, ACIDCA, November 2005.
(download)
(g-scholar)
Teaching
Instructor(recitation classes)
CMPSC201C: Computer Programming for Engineers using C++
Teaching Assistant
CSE514: Computer Network
CMPSC360: Discrete Mathematics
Professional Activities
Presentations
"Intelligent Parsing of Scanned Volumes for Web based Archives", Proceedings of the International Conference on Semantic Computing (ICSC2007),Irvine, September 2007.
"Automatic Categorization of Figures in Scientific Documents", the ACM and IEEE Joint Conference on Digital Libraries (JCDL2006), Chapel Hill, NC, June 2006.
"A region-based method for image annotation", 202 IST Building, February 2, 2005.
"An Introduction of Linear Programming", 205 IST Building, September 8, 2004.
Reviewer for scientific publications
WWW, SIGIR, JCDL, VISUAL, ACM Multimedia, ICSC, CIKM, IEEE Transactions on Pattern Analysis and Machine Intelligence
Membership
IEEE
Page maintained by: xlu@cse.psu.edu
|