ReadWriteWeb

HSTP: Hyperspeech Transfer Protocol

Written by Lidija Davis / March 15, 2009 11:52 PM / 11 Comments

ibm_mar_09.jpgIBM's research scientists in India have developed a technology that will offer users the ability to talk to the Web and create 'voice' sites using mobile phones according to a news article in the Economic Times today.

Hyperspeech Transfer Protocol (HSTP), a protocol designed to seamlessly connect telephony voice applications, will enable users to browse across voice applications by navigating the Hyperspeech (the voice hyperlink) content in a voice application.

"People will talk to the Web and the Web will respond. The research technology is analogous to the Internet. Unlike personal computers it will work on mobile phones where people can simply create their voice sites," IBM India Research Laboratory Associate Director Manish Gupta told the Economic Times.

In a 2007 paper describing the technology (PDF), IBM scientists explain the concepts of Hyperspeech using this scenario:

Jonathan is a busy salesman who travels frequently. His work typically requires him to stay in a place for a few days. Once he is in a new place, he has to go around looking for grocery stores in his locality for his daily needs. He prefers taking phone numbers of the identified stores and places orders on the phone subsequently. Home delivery services deliver the goods to his home. However, often the home delivery boys don't accept credit cards and even if some do, Jonathan tries paying by cash since he doesn't want to share his credit card information with untrusted home delivery agents. This often causes problems since he often runs out of cash.

During his travel, he visits a city and finds out that there is a yellow pages service in the city that he can call up to receive phone numbers of several businesses. He promptly calls up the service and uses the telephony voice application to browse through the grocery stores in the vicinity of his hotel.

On Jonathan's prompt, the call gets transferred to a grocery store and goes to the voice application of the store. Jonathan easily specifies the items he needs to buy from the cataloger. The order is placed and a delivery guarantee is made within half an hour

To his surprise, the grocery store's voice application also accepts credit cards securely over phone. Jonathan selects the option and his call gets transferred to yet another voice application of a secure payment gateway. The secure payment gateway already knows about the amount of money the grocery store wants to charge to Jonathan, and securely authorizes the payment by taking in Jonathan's credit card details and transacting with the credit card company's authorization system.

The delivery boy comes within half-hour and delivers the goods to Jonathan.

Given India's position as the fastest growing mobile phone market in the world, this new protocol may be particularly useful in India, where mobile phone sales are booming despite our current economic crisis.

If you're interested in reading the entire paper, HSTP : Hyperspeech Transfer Protocol, you can download it here (PDF).


Comments

Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts

  1. Voice Recognition technology is the specialty of IBM. So I don't think it is strange that IBM created Hyperspeed Transfer Protocol using the Voice Recognition technology.

     Posted by: Dezhi Author Profile Page Posted on FriendFeed   | March 16, 2009 3:43 AM



  2. Why no jobs for Americans, IBM?

    Globalization Sucks.

    Posted by: le roy | March 16, 2009 6:25 AM



  3. IBM has always been on the forefront of voice enabled applications. IBM took the early lead with their ViaVoice products. They also offer some of the most natural text to speech products currently available. Recently they have been eclipsed by Nuance's Naturally Speaking line of products. I knew they had more planned in the area of voice controlled apps, but they have been silent about it until now.

    My mother is blind from Glaucoma, and I'm always looking for new ways that might help here navigate and communicate in this increasingly digital world. This might be a solution for her. Most voice options have had to many errors to make them viable solutions for her. Kurzweil has a very nice screen reader, but it's still difficult for her to use and extremely expensive.
    She currently uses her cell phone, which is voice enabled, for making calls when she's out of the house. Verizon also has been kind enough to give her unlimited 411 connect calls at no additional charge. This helps her, but only to a certain degree. It doesn't allow her to access the web,which is what she want's to be able to do. Do you think this might help?
    If not, I remain optimistic that someone will come up with the perfect product for her even if this one isn't right for her. Besides, I love all of this too!

    Posted by: Michael Fidler | March 16, 2009 12:37 PM



  4. Hmm. I doubt that this will ever really take off. It seems like the amount of time it would take to implement, deploy, and attract service providers, the hand-held devices will already have moved on to the point of providing actual internet, and a voice-subset of that internet will be seen as nearly useless.

    Posted by: Brad | March 16, 2009 5:13 PM



  5. While this is an interesting development (does resemble "voice over XML" to some extent, doesn't it?), at first glance this does seems to have a restricted penetration. The challenge to succeed is, as this technology progresses further, hand-held devices are expected to become more innovative & friendlier in nature - and will present wider penetration & options to end users. So, I will probably assume that Jonathan (the busy salesman) will eventually use his mobile, connect to net, browse to the yellow page, there to the grocery store's site, place orders, pay online & get the delivery - all within 30 mins (or less perhaps).

    Posted by: Sarbagnas | March 17, 2009 1:29 AM



  6. As important as text messaging is to deaf people, this would be the answer for blind people. Lets not forget how important accessibility is in the modern view of web standardisation.

    Personally, being neither vision nor hearing impaired, I prefer to speak on the phone (to a real person) or type on the computer. Texting is also a valuable tool when silence is required (as it is in many situations in our society). Speach recognition (such as dragon naturally speaking) is mainly used by older people who are used to dictating letters.
    Anyone who can type at a decent rate is quickly annoyed by that sort of software, and having to correct all the mistakes it makes. Same story with OCR technology.

    Ring up for a taxi and tell the computer where you want to go is just annoying! It also gets it wrong a lot of the time. Even if it does work, its not preferred by humans. In india, with a high level of mobile phone penetration, coupled with a high level of illiteracy, this could well take off.

    Posted by: Dude | March 17, 2009 4:08 AM



  7. because we all love talking to those automated voices when we phone a store or a service.............

    Posted by: TheStalker | March 17, 2009 4:52 AM



  8. Anyone with any experience of building commercial voice recognition IVR systems will realise that the above scenario is unrealistic for a number of reasons:

    - telephony does not scale cheaply. You want to handle 1000 concurrent call, you'll need 1000 ports/TTS/ASR licenses

    - speech recognition technology is not good enough. And this has ultimately been the sticking point for the entire industry for many years.

    - if you thought i18n was challenging for text based channels, it's a different animal for speech - recognisers can't handle even subtle variations in language/dialect/accent.

    - dynamic content is either terrible (due to TTS engines) or extremely expensive due to the cost of professional voice talents and recording studios - you choice which you want, but Mr Greengrocer certainly could not afford the latter.

    ...but IBM should know this already.

    Posted by: Thomas Carnell | March 17, 2009 6:28 AM



  9. Having worked in that industry for years, here's the more realistic scenario:

    "On Jonathan's 3rd recognition attempt, the call finally gets transferred to a grocery store and goes to the voice application of the store. Jonathan can't just scan the items list like on the web, so he spends the next half an hour finding out through trial-and-error what items the system might recognize. Finally, the order is placed. After this, Jonathan learns quickly to hit the 0 button multiple times to get to an operator."

    Posted by: Hans Qurst | March 17, 2009 8:26 AM



  10. "As important as text messaging is to deaf people, this would be the answer for blind people." It's true.

    "On Jonathan's 3rd recognition attempt, the call finally gets transferred to a grocery store and goes to the voice application of the store. Jonathan can't just scan the items list like on the web, so he spends the next half an hour finding out through trial-and-error what items the system might recognize. Finally, the order is placed. After this, Jonathan learns quickly to hit the 0 button multiple times to get to an operator." It's more realistic XD

    Posted by: May B | March 17, 2009 1:45 PM



  11. Personally, being neither boya vision nor hearing impaired, I prefer to speak on the phone (to a real person) boya or type on the computer. Texting is also a valuable tool when silence is required (as it is in many situations in our society). Speach recognition (such as dragon naturally speaking) is mainly used by older people who are used to dictating letters.
    Anyone who can type at a decent boya rate is quickly annoyed by that sort of software, and having to correct all the mistakes it makes. Same story with OCR technology.

    Posted by: boya | April 8, 2009 1:16 PM



Leave a comment

Optional: Sign in with Connect Facebook   Sign in with Twitter Twitter   Sign in with OpenID OpenID  |  
RWW SPONSORS


FOLLOW @RWW ON TWITTER

ReadWriteWeb on Facebook



TEXT LINK ADS