Analytical Views
August 2001

Brian Strachman The Creature Comforts Of Speech Rec

BY BRIAN STRACHMAN


When I sit down to browse the Internet, I always sit in the comfortable desk chair. It's made of very soft black leather and has padding in all the right places. The armrests are high and in the perfect spot for my elbows to rest as I type. When I'm at home in this chair, I wind up online for at least an hour. I do my personal Web browsing here (not only because it's comfy, but also because I would never surf the Web for personal reasons at the office).

I'll buy J. Crew sweaters, check out prices for a new car, manage my stock portfolio, and maybe even do a bit of gaming. In short, I take my time. I look at things in detail and generally enjoy the process. Unlike most people, I have a DSL line and never turn off my computer so it's easy for me to get online and begin my surfing. Even so, when I sit down, it's usually for an extended stay.

But when I call for automated information I'm not sitting in my comfy chair. The chair I'm in while on the telephone may have very little padding or worse, might even be a stool. Sometimes I call for information while running for a plane, or while driving at high speeds. These are not the moments when I want to contemplate whether I want a turtle neck or V-neck on my next sweater, or what my long term plans are for retirement. I need information, I need it easily, and I need it in a big hurry. While it may be an oversimplification to categorize user interfaces based on the relative comfort of my rear end, it is oddly appropriate.

A FRIENDLY CONTEST
Even as an avid surfer, there are plenty of things I don't like to do on the Internet. Movie times, stock quotes, and airline tickets are good examples of information tidbits that I find easier to access over the phone. Data that is time-sensitive and doesn't require elaborate descriptions like the weather, flight status, or a quick stock quote are great applications for the telephone. The problem is that many vendors don't understand what belongs on the phone and what belongs on the Web.

Clearly there is some overlap in these two categories, but the point remains. Even with always-on Internet access in the home, it's still easier to pick up the phone for a fast bit of information. And while the promise of always-on mobile data with 3G wireless has been heavily touted in the media, it's still a long way off (if it will be here at all) and is not the best format for every application.

For fun, and to placate my nerdy competitive nature, one of my techno-geek friends and I had a little contest. He had a WAP (wireless access protocol) phone and I had my plain old vanilla one. We competed to see who could access a few tidbits of information faster, my geeky friend with his mobile data, or me using voice. For this test I made use of the free services of Tellme (which you will find reviewed here) and BeVocal, both voice portal providers who have 800-numbers set up for demonstration purposes.

Test Number 1: Find Tomb Raider playing at a close location.

  • Speech recognition interface: 45 seconds (I was familiar with the interface so I had a slight advantage).
     
  • WAP: 3 minutes 20 seconds (this time may have been affected by variable download rates and the expected lack of hand-eye coordination by my geeky friend).

Test Number 2: Find the current value of Microsoft.

  • Speech recognition interface: 12 seconds! Need I say more?
     
  • WAP: 1 minute 23 seconds (assuming the user knows the stock symbol, which is not needed with speech rec).

I agree that these are pretty basic applications that don't have much business use, but this is only an unscientific example. If we were to imagine a similar situation with voice-enabled e-mail, customer shipping information, or inventory, suddenly this becomes very compelling. Couple that with the fact that both of the examples above would be a practical impossibility for WAP while driving, and suddenly it's clear why speech recognition is so exciting.

BACK TO REALITY
The voice portal vendors need to understand that while they have a great interface, it's not always appropriate. No one is going to shop for any non-commodity products via speech rec. Nor are they going to administer their stock portfolio or manage their supply chain. They may purchase a stock or two, or check the inventory levels of a given product, but nothing very involved.

Speech recognition should be thought of as a shot of bourbon, not a pint of stout. It's very potent and very quick, not involved or drawn out. For complicated, time consuming applications, or those simply for pure enjoyment, the pint (Internet) is the far better choice. By focusing their marketing, speech rec and voice portal vendors will be far more successful by pairing the application with its best use. For a quick quaff, speech recognition is the obvious choice. For a leisurely and enjoyable beverage, the Internet still reigns supreme. Cheers!

Brian Strachman is senior analyst, Voice and Data Communications, Cahners In-Stat Group. To correspond with the author, please send your comments to brians@instat.com.

[ Return To The August 2001 Table Of Contents ]