Research on Vision-based
Gesture Analysis and Multimodal Interfaces
In everyday life,
the natural communication between people consists of a complex mixture
of speech, body movements, facial expressions, and eye motions. Clearly,
the most natural means of human communication is multimodal . Our
long terms goal is to develop a natural human-computer interaction (HCI)
framework where many different sensing modalities will be used simultaneously
and cooperatively for interpreting the human input to the computer. We
are especially interested in exploring the use of computer vision to interpret
human motion (e.g. hand gestures) as part of a multimodal interface.
The gesture recognition
and interpretation task is set in the context of a combined speech/gesture
interface for controlling a graphical display. An experimental testbed
called iMAP has been developed for multimodal interaction
with a computerized 3D Campus Map. This is based on our past research
results from analysis of "TV weather narration" and a speech/gesture for
a virtual reality system for molecular biologists. The iMAP testbed
enables the uses of free hand gestures and spoken words for dialog
with the map system. The task context makes it feasible to study the critical
components of the multimodal interpretation problem and define system architecture
for human-computer intelligent interaction.The gesture analysis involves
extracting the user hand from the background, distinguishing a meaningful
gesture from unintentional hand movements using the context, and resolving
the conflicts between gestures from multiple users. We have been exploring
the use of Dynamic Bayesian Networks and Hidden Markov models (HMMs) for
the combined speech/gesture analysis. Currently research efforts are in:
-
Interpretation of natural
speech/gestures in the spatial and temporal contexts
-
Developing dialog strategies
for multimodal interaction
-
Improving recognition
of free-hand gestures and speech
-
Adaptive tracking /identification
of multiple users.
This research is partially
supporetd by grants from National Science
Foundation (NSF) and Army Research
Lab (ARL)
Two views of the
iMAP
testbed:


Selected Publications:
-
"Understanding
Gestures in a Multimodal Human Computer Interaction." S. Kettebekov and
R. Sharma. International Journal of Artificial Intelligence Tools,
September 2000 (to appear).
-
"Exploiting Speech/Gesture
Co-occurrence for Improving Continuous Gesture Recognition in Weather Narration."
R. Sharma, J. Cai. S. Chakravarthy, I. Poddar and Y. Sethi. In Proc. International
Conference on Face and Gesture Recognition, Grenoble, France, April
2000.
-
"Toward multimodal
interpretation in a natural speech/gesture interface." S.Kettebekov and
R. Sharma. In Proc. IEEE Symposium on Image, Speech, and Natural
Language Systems, Pages 328-335, November 1999.
-
"Toward Interpretation
of natural speech/gesture for spatial planning on a virtual map". R. Sharma,
I. Poddar, E. Ozyildiz, S. Kettebekov, H. Kim, and T. S. Huang. Pages
35-39, In Proc. 1999 Advanced Display Federated Laboratory Symposium,
Adelphi, MD, February 1999.
-
"Toward Natural
Gesture/Speech HCI: A Case Study of Weather Narration." I. Poddar, Y. Sethi,
E. Ozyildiz, and R. Sharma. In Proc. Workshop on Perceptual User
Interfaces (PUI98), Pages 1-6, November 1998, San Francisco, CA.
-
"Reliable Tracking
of Human Arm Dynamics by Multiple Cue Integration and Constraint Fusion."
Y. Azoz, L. Devi, and R. Sharma. In Proc. IEEE conf. on Computer
Vision and Pattern Recognition (CVPR), pages 905-910, June 1998,
Santa Barbara, CA.
-
"Toward Multimodal
Human-Computer Interface." R. Sharma, V. I. Pavlovic and T. S. Huang. Proceedings
of the IEEE special issue on Multimedia Signal Processing,
86(5):853-869,May 1998.
-
"Visual interpretation
of hand gestures for human-computer interaction: A review." V. I. Pavlovic,
R. Sharma, and T. S. Huang. IEEE Transaction on Pattern Analysis
and Machine Intelligence , 19(7):677-695, July 1997.
-
"A multimodal
framework for interacting with virtual environments." R. Sharma, V. I.
Pavlovic, and T. S. Huang. In C.~A. Ntuen and E.~H. Park, editors, Human
Interaction with Complex Systems. pp. 53-71, Kluwer Academic Publishers,
1996.
-
"Speech/gesture
interface to a visual computing environment for structural biology." R.
Sharma, M. Zeller, V. I. Pavlovic, T. S. Huang Z. Lo, S. Chu, Y. Zhao
J. Phillips, and K. Schulten. IEEE Computer Graphics and Applications,
20(2): 29-37, 2000.

Back
to Rajeev Sharma's Homepage