Research on Vision-based Gesture Analysis and Multimodal Interfaces

In everyday life, the natural communication between people consists of a complex mixture of speech, body movements, facial expressions, and eye motions. Clearly, the most natural means of human communication is multimodal . Our long terms goal is to develop a natural human-computer interaction (HCI) framework where many different  sensing modalities will be used simultaneously and cooperatively for interpreting the human input to the computer. We are especially interested in exploring the use of computer vision to interpret human motion (e.g. hand gestures) as part of a multimodal interface.

The gesture recognition and interpretation task is set in the context of a combined speech/gesture interface for controlling a graphical display. An experimental testbed called iMAP  has been developed for multimodal interaction with a computerized 3D Campus Map.  This is based on our past research results from analysis of "TV weather narration" and a speech/gesture for a virtual reality system for molecular biologists. The iMAP testbed enables the uses of free hand gestures and spoken words for dialog with the map system. The task context makes it feasible to study the critical components of the multimodal interpretation problem and define system architecture for human-computer intelligent interaction.The gesture analysis involves extracting the user hand from the background, distinguishing a meaningful gesture from unintentional hand movements using the context, and resolving the conflicts between gestures from multiple users. We have been exploring the use of Dynamic Bayesian Networks and Hidden Markov models (HMMs) for the combined speech/gesture analysis. Currently research efforts are in:

This research is partially supporetd by grants from National Science Foundation (NSF) and Army Research Lab (ARL)

Two views of the iMAP testbed:

.avi3D Model University Park

Selected Publications:


Back to Rajeev Sharma's Homepage