IBM: Your Handheld Will Hear You

SAN FRANCISCO (02/28/2000) - A speaking Palm Inc. handheld, Web pages you can talk to, a courteous robot, and a first draft of the Star Trek universal translator--science fiction? Not in IBM Corp.'s labs.

The year of speech technology is almost here, according to W.S. "Ozzie" Osborne, general manager of IBM Voice Systems. He offers prototypes as proof, previewing the futuristic systems at a recent Speech Fair at IBM's Santa Teresa Laboratory in San Jose. Some 2500 research scientists are exploring voice technologies throughout IBM.

IBM is focusing about 80 percent of its investment on speech-enabling the enterprise. But those higher-level technologies will benefit consumers as well.

"Voice will be integral to computing as devices change from PCs to [handhelds]; the interface will have to change," Osborne says. "Carrying around a keyboard will be too hard." After all, he points out, devices keep getting smaller, but our fingers are not.

The Web and voice technology are already being wed in wireless phones. With the help of a pending standard called VoiceXML, you may be able to access Web page content by phone, or surf by asking questions like, "What's the latest list of the New York Times bestsellers?" Already, 64 developers support VoiceXML, which uses Enterprise Java Beans.

"The cell phone is the ultimate thin client," observes Osborne. "Human interface is what we're really working on."

Watch Out for Robby

Focus on the interface is not limited to voice. Interpersonal communications are also driven by interactive visual cues, so IBM is developing bipedal robots that can react to humans.

An early prototype is a table-top robot consisting of a lollipop-shaped head of transparent orange plastic, Muppet-like bug-eyes, and a tiny video camera hidden in its nose. The camera senses movement, so the robot has freedom to make eye contact with its audience as it moves within a 12 degree range, says robot-handler Dr. David Nahamoo, director of worldwide research for IBM Voice Systems.

The colorful contraption could also respond intelligently to conversation; for example, if told, "You're stupid," the robot could frown and then reply in a synthesized voice, "You're rude."

But don't expect to have your own personal C-3PO protocol droid anytime soon.

"Ten to 15 years is a reasonable guess," Nahamoo says, pointedly noncommittal.

"The future is really going to be about being able to interact with computers the same way we interact with people."

Mechanical issues must be overcome before a fully interactive robot can become a reality.

"The user interface aspect needs to be worked out, as does the application integration," Nahamoo says. "Visual recognition is much more difficult than just speech recognition, since there are two dimensions [involved]."

From Hand to Voice

Coming soon, however, is a snap-on speech recognition base for Palm devices. A prototype contains a speaker, earphone jack, microphone, and--most importantly--a coprocessor that provides the necessary computing power to support voice technologies such as speech recognition and text-to-speech.

Using IBM's Personal Speech Assistant application, you can navigate through a to-do list, execute several hundred commands, and access your address book. For example, you can say, "Find Bill Smith," and the contact record for Bill Smith opens on-screen.

The integrated microphone offers a limited degree of noise cancellation; however, IBM's software is designed to compensate. Dictating a memo is as simple as holding down the record button and speaking into the unit's microphone. The prototype stores audio files in the base's 4MB of flash memory; IBM's compression scheme can contain 30 minutes of audio. The base can also be designed to accommodate removable media such as Compact Flash cards or even a 340MB IBM Microdrive.

When you sync the handheld with your desktop PC, IBM's ViaVoice engine on your desktop automatically transcribes the audio clip and uploads the transcript to the handheld. Though not unwieldy, a prototype base adds slightly to the weight and length of an IBM WorkPad unit (running the Palm OS), as demonstrated.

Great Expectations

As speech technology is refined in real-life labs like IBM's, the challenge lies in meeting the high expectations of popular culture.

"Science fiction movies have created the expectation," Osborne says. "We're not there yet, but we've made great steps."

In the meantime, IBM will test another gee-whiz prototype at the 2000 Olympics in Sydney, Australia. It's a smart, speech-enabled soda machine that will dispense the drink you ask for, or tell you the current temperature of the machine. But the potential goes beyond the novelty of a talking soda machine.

By adding an Internet connection, such a device could become a vertical application that alerts route masters when the soda machine is malfunctioning or needs restocking.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about IBM AustraliaVoice Technology

Show Comments
[]