August 19, 1999
University Park, Pa. --- Penn State researchers have developed a
prototype system to help visitors locate campus parking lots and buildings
by talking with a computer-controlled map that responds not only to the
spoken word but also to natural hand gestures.
Project leader Dr. Rajeev Sharma, assistant professor of computer
science and engineering, says, "There still is a lot of work to be done but
we have a pretty fair, ground level, demonstration model of a system in
which a person can interact with a computer by using the most natural human
mode of communication - talking while gesturing.
"Besides the current application, the system could potentially be adapted
to help tourists locate the sights in large cities, shoppers to find stores
in malls, visitors to find patients in hospitals or even for roles in
crisis management, mission planning and briefing," he adds.
In a recent demonstration, Sanshzar Kettebekov, a doctoral student,
stood about five feet away from a map of the Penn State campus projected on
an 4 foot by 3 foot screen. "Scroll," he said gently into the cordless
microphone attached to his T-shirt and the map moved. "Stop," Kettebekov
directed and the map did. He waved his hand in the air and a little red
hand appeared on the screen. As Kettebekov continued to gesture with his
hand, the on-screen hand followed it, like a cursor obeying a mouse. When
the red hand settled on one of the buildings, Kettebekov said, "Show me the
nearest parking lot," and a bright blue line immediately appeared and
connected the building to the closest lot.
The system is based on off-the-shelf-equipment. The computer is a
standard PC workstation equipped with a video camera, the system's "eye" on
the gesturing human. A commercially available speech recognition package
currently takes care of the conversation.
However, the Penn State researchers developed new gesture recognition
software and used footage of TV weather broadcasters narrating the weather
to "train" it.
The new Penn State gesture recognition software is based on a
technique called Hidden Markov Models (HMM), a time-varying pattern
recognition method. HMMs had been used previously in gesture recognition
systems developed in 1996. However, only predefined gestures, such as sign
language, had been used. The new Penn State approach, based on
weathercaster movements, enables the computer to recognize and "understand"
a rich store of natural gestures that occur in combination with speech.
At this point, although the system recognizes quite a few human
gestures and spoken words, it doesn't like small talk. You can't tell it,
"Well, I'd like to go to the Creamery for an ice cream cone first and then
stop off at Old Main before parking at Beaver Stadium." At least not yet.
Yuhui Zhou, a master's degree candidate, has a background in
linguistics. She is working on dialog design and feedback systems that
will enable the computer to extract the most salient information from a
human conversation stream. Jiongyu Cai, doctoral candidate, is working on
extracting the salient gestures from the random hand waving that most
people use while talking. Kettebekov is trying to understand the
combination of speech and gestures so that he can develop software that
enables the computer to interpret gestures in the speech context.
The research team is also paying attention to the fact that people
from different cultures gesture differently but, at present, plans call for
the map to respond only to English.
Sharma says, "Computer users have been slaves to the mouse and the
keyboard too long. The equipment has, so far, limited the potential for
human interaction with computers. Incorporating gesture, which computer
vision makes possible, allows us to imagine all kinds of potential
applications. For example, I can imagine a computer you wear on your head,
like a virtual reality helmet, that could help you repair your PC by
telling you what to do and then "watching" as you do it. Or, a wearable
computerized surgical aide that could help direct a surgeon to the precise
location of a tumor."
"For now, our group will be working on trying to enable the
computer to more effectively talk back to the user. We'd like to model the
human/computer dialog so that the display could interactively influence the
user input enabling the computer to play a more active role in the natural
speech/gesture interface," he adds.
The research group has detailed the new system in a paper, Toward
Interpretation of Natural Speech/Gesture: Spatial Planning on a Virtual Map
published in the Proceedings of the Army Research Laboratory Annual
Symposium on Advanced Display, held in February, 1999 in Adelphi, MD. The
work on gesture recognition is detailed in Indrajit Poddar 's master's
thesis, completed in May, entitled, Continuous Recognition of Dieictic
Gestures for Multimodal Interfaces.
The research was supported, in part, by grants from the National
Science Foundation and the Army Research Laboratory.
Note: To contact Dr. Sharma, call (814) 863-0147 or e-mail
rsharma@cse.psu.edu Also see his website at http://www.cse.psu.edu/~rsharma