Feature Article
March 2001

 

The Voice Portal -- Gateway To A Universe Of Data

BY TIM WALSH

[Go right to Speak To The Web]

With industry pundits estimating that 18 million consumers will use some kind of speech-recognition portal by 2005 to access the Web, and that by 2005 revenues from the voice portal market could top $12 billion (Source: The Kelsey Group), it is no wonder that voice portals have captured the attention of start-ups, investors, and the media. Indeed, the future of voice portals seems promising. But how this future will play out in terms of application development, consumer adoption, and business models remains to be seen. While exact predictions about the future are best left to accredited profits, we can, nonetheless, draw on what we do know about speech recognition technology, consumer behavior and business logic to make some educated guesses.

Advancements in speech recognition technology have made voice portals possible, and it is safe to say that further enhancements and refinements in this technology will play a major role in shaping the future of the burgeoning voice portal industry. People want fast and easy access to information, and voice portals, like the Internet, the newspaper, and the radio, provide a means for accessing this information. The key differentiator is that voice portals allow people to use the most natural interface -- the human voice -- to access information when and where they need it.

THE POWER BEHIND THE PORTAL
While the voice portal concept is relatively straightforward, delivering on this concept poses a significant technological challenge. Cheaper, more powerful computers, wide proliferation of wireless devices, and vastly improved speech recognition software has spurred the voice portal industry. Moving forward, success will increasingly hinge on the ability of a voice portal's speech recognition engine to create positive caller experiences by performing the following complex functions:

  • Support both open and closed grammars, industry terms that mean developers and integrators can build applications that either recognize any and all ways a request or statement can be phrased, or that only recognize specific sentence and question constructions. With open grammars, the application developer only needs to specify the categories of information that are required to proceed. Callers can speak "in their own words," a feature known as natural language understanding. Unknown words are allowed and calls are not rejected. By supporting both techniques, application developers can choose whichever method suits their needs or mix the techniques in a single application.
  • Correctly interpret the meaning of a caller's request. Called "attribute semantics," this capability focuses on associating a meaning with a request or statement, rather than simply recognizing the individual words. For example, for the statement "I would like the score of last night's Knicks game," the recognizer understands the meaning of "last night" and applies the appropriate date.
  • Expedite the dialogue process by "accumulating confidences." Through this technique, words that have the same meaning can be assigned the same attribute, and then the system's confidences for each attribute can be accumulated. The result is that an application is very certain when it recognizes a statement and interprets its meaning. The system is more likely to proceed intelligently, with less need to verify with the caller and thereby prolong the dialogue. In addition to the benefits to the caller, this feature also offers various timesaving benefits to the speech application developer.
  • Process complex statements by employing "mixed models" -- whole-word and phonetic recognition methods employed simultaneously to understand complex statements better.
  • Facilitate "mixed initiative" dialogues, which eliminates complex menu structures and annoying prompts by enabling the caller to take control of the dialogue.
  • Support multiple languages and accommodate regional dialects.

DEFINE MARKET NEEDS
While the scope of voice portal applications is seemingly infinite, developing unique services for targeted markets is paramount. With voice portals, offering fewer frequently used services may be more feasible than hitting the market with a big bang service offering. For example, the largest and longest running voice portal (Italy's Omnitel 2000 with 3,000 ports) offers 100 services, but the vast majority of the calls Omnitel receives go to a relatively small number of services. (Horoscopes are number one and lotto numbers are in the top five.) In this instance, a large-scale service offering turned out not to be as important as the project's planners had envisioned.

Another emerging voice portal play is in enhancing the functionality of enterprise intranets. More and more, companies rely on intranets to disseminate information among employees. With the ubiquity of mobile devices and an ever-increasing mobile workforce, businesses can leverage enterprise voice portals to streamline communications, increase productivity and drive cost-saving efficiency gains.

Like consumer-facing voice portals, the key will be to build applications around specific, need-driven services. Application will no doubt vary from company to company, but some possible services may include benefits information, the company stock price, help desk tickets, purchase requests, or just voice dialing from the company directory.

A common denominator in the success of both consumer- and business-facing voice portals is ease-of-use. To this end, technological advances that "humanize" voice portal applications are vital. The beginning and end of voice portals lies in the power of the spoken word, with all of its idiosyncrasies, dialects, mispronunciations and phrasings.

Things like eliminating complex menu structures and annoying prompts; allowing callers to speak as they would in natural conversations, in their real voices as if they were talking to another person; and supporting multiple languages and accommodating regional dialects are the inviolable prerequisites to the widespread adoption of voice portals.

THE TELCO ADVANTAGE
Voice portal providers have varying business models. Most Internet-based voice portal companies rely on advertising dollars, subjecting callers to anything from a brief sponsorship mention, a five second ad, or as much as a twenty second commercial. But ads could pose problems. According to a recent survey of 1,000 consumers, nearly half are very likely to use voice portals, but less than a third are willing to use them if forced to listen to ads (Source: Cahners In-Stat Group).

Moreover, Internet-based voice portal companies employing an advertising-supported business model must learn to compete in the advertising sales business -- a core competency few start-ups possess. Finally, Internet-based voice portal companies must devote significant resources to promoting their names and toll-free numbers, and to educating consumers on a brand new way to search for information. The high expenditures associated with these activities present a formidable obstacle along the path to profitability.

On the other hand, telcos or otherwise experienced service providers already have relationships with their customer base and a mechanism for generating revenue -- by minute, by call, by monthly access rate, and so on. Some telcos may even offer the voice portal for free, as a differentiating service providing a "sticky" relationship with the customer.

We have ample evidence that consumers will accept all the free services they can get.

As for the overall business surge in voice portals, they must now endure the test of time. Perhaps no factor will be more critical to their sustained popularity than that they expose more and more people to the power of the human voice.

Tim Walsh is vice president sales and marketing, Americas, Philips Speech Processing. Visit www.speech.philips.com for additional information.

[ Return To The March 2001 Table Of Contents ]


Speak To The Web

BY STEVEN DUNCAN

Although the technology is still in its toddlerhood, the term "voice portal" typically elicits images of mobile consumer services providing everything from horoscopes to stock quotes to traffic reports. Voice portals' ease of use and entertaining nature are ensuring their place in the spotlight right now. The true value of voice portals, however, lays not in the general consumer application but in the directed or enterprise application. These targeted voice portals allow staff, partners, clients, or other specific audiences to use speech recognition technologies to gain access to personalized information, while maintaining privacy.

The beauty of speech recognition technology lays in its familiarity to the end user, its speed (since users can skip menus and cut to the heart of their request), and its broad applicability. But in the business world the largest benefit is in access to information and transaction capability for a mobile audience. In enterprise or business-to-business (B2B) applications suppliers and buyers, clients and partners, can access information off of each other's Web sites without the requirement of a PC. Functions such as scheduling, parts ordering and status, inventory tracking, and the like can be completed over the phone, 24/7 without the aid of a human agent. These technical capabilities, powered by voice, will help to further cement existing partnerships and client relationships.

We are on the brink of developing true natural dialog applications, making Web surfing via voice and unconstrained dialogs with an application a fast reality. Combined with the industry's present high quality voice verification technology for user authentication and security, this will only help add to the growing number of voice sites being implemented. With market projections for mobile commerce in the next five years ranging from $12 billion to $200 billion depending upon analyst firm, this presents an attractive opportunity for the growth of speech.

From an end user perspective B2B voice portals are made for speech recognition. Business portals have an audience that is attuned to the language used in a specific enterprise. As such, specifically targeted voice portals allow the provision of business information of intrinsic value focused specifically to the needs of an audience because they can operate with a more limited grammar or vocabulary. This greatly increases the accuracy of the speech recognition application, which in turn creates greater end user satisfaction. Finally, since the delivery is in voice, business voice portals can be crafted with a personality, and the audio feel and persona that a business would like to present to its customers. This personality is consistent without the variances of CSRs, and can reinforce a brand or image for an enterprise.

As we move further on with the technology and industry standards efforts, we will begin to see greater use of natural language recognition in applications. Much as we have shopping bots and virtual assistants on the WWW today, we will soon begin to see enterprise voice assistants to help us conduct business so that users can more readily surf this growing "voice Web" as well.

Steven Duncan is head of marketing for Applications and Messaging, Mitel Corp. For more information, visit www.mitel.com.