Feature Article
November 2001
 

The Wireless Renaissance Of Speech Recognition

BY BRIAN DEMERS

[ Go Right To: Speech Rec And ROI: A Healthy Mix In A Sluggish Economy ]

The cell phone. It’s a handy little device that allows you to take a call or place a call to or from just about anywhere. Anywhere, that is, but your car. Thanks to a recent 125 to 19 vote in the Empire State’s Assembly, beginning December 1, 2001 New York’s six million cell phone users may no longer make quick calls home, check stock quotes, or reschedule tee times using a handheld cell phone while driving. New York isn’t alone in this prohibition; 38 other states have pending laws limiting handheld cell phone use in automobiles. Plus, England, Italy, Israel, Japan, and 20 other countries have already outlawed arm anchored cellular communication. 

While at first glance, all this appears to be bad news for the 40 percent of Americans who own cell phones — after all, 70 percent of all cell phone calls in North America originate from automobiles — it isn’t. These bans should make cell phone use safer and create robust growth within the speech recognition marketplace. 

Speech recognition research began more than 30 years ago with government funding, and now thanks to these new regulations, the promise of speech recognition is again marching towards the forefront of consumer awareness. Legislatures may change laws, but these laws will not change personal habits and certainly will not make the average American’s commute less than 23 miles. People will still make calls from their cars. 

NEW EXPECTATIONS
According to Forrester Research, car buyers will begin expecting and purchasing pre-installed, hands-free, speech recognition-enabled cell phones. In order to accommodate the obvious desire of consumers to continue using telephones while commuting, automobile manufacturers and wireless service providers will need to deliver seamless speech controlled operation of cellular phones. 

For those minutes both to and from work, car manufacturers have a captive collection of drivers who use cell phones. Now that audience is being handcuffed. Clearly, the automotive industry views this as an opportunity. General Motors (GM) already has plans to include its hands-free OnStar service in all future models. Ford, along with Cellport, is planning to offer a universal cellular docking system as an option on its automobiles. Whether GM, Ford, and other automobile manufacturers also decide to become wireless service providers or merely provide the hands-free hardware, the Cellular Telecommunications and Internet Association estimate of U.S. wireless subscribers will undoubtedly grow greater than its present 117 million as more cars come with wireless equipment. 

OTHER SPEECH REC OPPORTUNITIES
The future of speech recognition isn’t limited to the automotive industry. By now, most businesses know speech recognition can save corporate call centers enormous amounts of money by simultaneously creating a 24/7 environment and reducing head count. Fewer call center employees and greater availability to customers mean better, cheaper services, and shorter call times which create high returns on investment and 800-number savings. Of course, this is old news; companies such as E*trade, HP, Continental Airlines, and FedEx already use speech recognition in conjunction with their customer service numbers.

Wireless advertising is new, and as it advances, it will provide opportunities for speech recognition to expand and to grow ever more pervasive. In order to generate the industry’s estimated $3.9 billion in revenue from wireless advertising and promotion by 2005, wireless service providers will need larger screens, grander real estate upon which to display those ever more personalized ads. Naturally, to enhance screen size without growing device size, cell phone manufacturers need to enable speech recognition in order to eliminate those tiny buttons. 

Removing those tiny buttons reduces the cognitive and physical burdens on cell phone users while creating amazing benefits. Rather than using memorized responses, which distract users, people can access bank accounts and other typically PIN-controlled information by using speaker verification technology without training a system. Training would typically involve repeating key words several times allowing a system to record a user’s voice patterns. Speech recognition, used in any venue, is growing more intuitive, permitting consumers to think less about using the correct trained commands. This leads to greater consumer acceptance of speech technology.

SATISFIED CUSTOMERS
A study conducted by Evans Research for Nuance Communications demonstrates a positive reception of speech recognition. In the study, 96 percent of the wireless telephone users of speech systems are satisfied, often preferring the speech systems to DTMF or even human operators. A full 98 percent of the users claimed they would continue to use voice-driven services in the future. 

Traditional cellular service providers will also benefit from the expected growth and expanded capabilities of speech recognition. As speech recognition systems evolve beyond 97 percent accuracy and the cognitive burden on speech users is reduced through ever more intuitive speech engines, the number of billable minutes will increase beyond present estimations. 

In other words, the easier these devices become to use, the more often people will use them. USA Today has estimated that cell phone use will grow from 105 billion minutes in 1998 to 554 billion minutes in 2004. More minutes mean greater profits. Mark Plakias of the Kelsey Group predicts the size of speech-related services to reach $41 billion world wide by 2005, with more than half that number generated by consumer-focused applications.

Wireless service providers will be able to add these applications as premium services to their stables of services, creating more customized benefits and greater competitive differentiation. As customization improves, service providers will also decrease their churn rates, retaining ever more satisfied customers for longer periods. Through personalization, users can access information relevant to their own mobile activities. A simple example would be receiving verbal driving directions to a nearby restaurant offering a favorite cuisine. 

OPENING IT UP
In order to meet these new consumer-driven, network based challenges, speech recognition hardware systems must be reliable, scalable, carrier-grade, network-ready, and have an uptime of 24/7. 

Open platforms are needed to allow multiple vendors to work in unison towards bringing applications into a deregulated marketplace faster. The Enterprise Computer Telephony Forum, a group of nearly 100 computer telephony companies, is working on such interoperability specifications in order to make certain new applications will be able to work together.

VoiceXML (voice extensible markup language) is an example of a potential non-proprietary open standard. While VoiceXML itself has yet to be adopted as the industry norm, the Worldwide Web Consortium (W3C), the Internet standards organization, is evaluating its prospects for becoming the standard in developing voice Web applications. 

The VoiceXML Forum, a consortium of companies lead by IBM, Lucent Technologies, Motorola, and AT&T, designed VoiceXML to allow developers to build speech applications while simultaneously accessing Web-based information and supporting multiple platform resources, audio, and speech grammar formats, and Uniform Resource Indicator (URI) schemes. All this permits developers to build phone services without having to buy or run equipment.

CONCLUSION
In truth, voice recognition technology has been around for several years and historically, it hasn’t lived up to its promise. Today, its renewed expectations are owed mainly to the availability of cheap computing power. Recognizing a voice requires a high degree of processing power, and as larger vocabularies are created, the processing demands increase. Thankfully, as dictated in Moore’s law, chip speed and processing power have increased exponentially, and all the promises of speech recognition are now a possibility.

Ultimately, thanks to the advances in speech recognition and a safety-minded group of legislatures, the cell phone may become more than a handy little device. It just might emerge as the only handy little device.

Brian Demers is vice president and general manager of NMS Communications’ New Network Solutions group. NMS products are open, accessible, standards-based, layered, network protocol- and operating system-independent, modular and scalable and have been incorporated into technology sold to end users in over 65 countries. 

[ Return To The November 2001 Table Of Contents ]


Speech Rec And ROI:  A Healthy Mix In A Sluggish Economy

BY MICHAEL THOMPSON

Unless you’ve been in hiding for the past 10-12 months, you’ve probably noticed that the steady stream of new IT projects that flooded your call center a few months ago has ebbed. Poor earnings results have decimated IT spending, freezing any software or infrastructure projects that do not show a clear and near-term payback. Projects must now deliver a crystal clear measurable return on investment (ROI) with a near-term payback to justify IT investments. Enter speech recognition.

Automated speech recognition technology has become a powerful enhancement to leading call centers around the world with ROI metrics that would excite any CFO. Fortune 100 companies such as United Airlines, AT&T, and Federal Express have deployed automated speech systems to help control expanding call center costs while maintaining strong relations with customers. Callers can receive common information or conduct transactions 24 hours a day, 7 days a week simply by speaking into any telephone. 

REAL, TANGIBLE, MEASURABLE RESULTS
Call centers and industry analysts recognize that an on average, agent-based calls cost five times as much as an automated call. Agents require salary, benefits, training, and management. It costs $6,200 on average to find and hire a new agent. Agent turnover ranges from 20 to 100 percent. Even more, call centers are growing at 12 percent annually so the cycle can be quite vicious. Every day skilled agents touch your customers, but they are expensive and often difficult to find. Leading companies recognize they need to automate calls to control costs, and strategically focus their valuable call center agents on high value-add and revenue-generating services, rather than answering routine requests or processing common transactions.

Speech recognition drives automation in the call center. A speech system understands a caller’s request and provides immediate answers — automatically. The user interface is not constrained to the “enter your password and then press #” touchtone systems most customers deplore and often zero out of in order to speak with a live agent. Speech applications encourage users to stay in the automated system because of the ease of use and reliability. Furthermore, speech reduces toll charges. Callers can access their information in the automated environment without sitting in a queue waiting for an agent, while racking up 800-number charges for the company.

For example, United Airlines processes over 100,00 phone calls per day in its speech-activated flight arrival and departure information line. Now agents can focus on selling tickets to paying customers rather than answering mundane arrival and departure inquiries.

Many companies have deployed touchtone IVR systems to drive automation and reduce costs in the call center. But touchtone is an antiquated technology that in most cases has reached its maximum potential. Financial services organizations, which operate the most sophisticated IVR systems in the world, only average 60 percent automation with touchtone systems. Before speech recognition was deployed on United’s flight information system and touchtone was in place, 30 percent of callers bailed out before getting the information they were looking for, opting to speak with a live representative instead. Since it deployed its speech application, less than 10 percent of callers are bailing out. 

Leading organizations like America Online, E*Trade, Credit Suisse First Boston, Continental Airlines, and United have invested in speech recognition solutions to continue delivering unsurpassed customer service. In May, it was announced that BBN Technologies ran a trial in which they ran 11,000 calls over five weeks through both a natural language engine and an improved touchtone IVR (using as a control a standard touchtone IVR). Eighty-two percent of the customers preferred the natural voice engine, and agent time saved was double that of the improved touchtone IVR. So speech recognition not only lowers costs, it also improves customer satisfaction.

THE BOTTOM LINE
Organizations worldwide recognize that speech delivers rapid return on investment. Industry analyst firm Frost & Sullivan surveyed 100 major call centers in its 2001 Contact Center End User Analysis and found: “When respondents were asked ‘If you could implement one and only one technology in your [contact] center, which one would you choose,’ speech recognition received the highest score.”

Costs vary by application but the payback period for most companies is 6 to 12 months, and some companies have reported strong ROI in 3 to 4 months. Your CFO will be happy because speech delivers results that you can measure and track like increased automation, decreased hold times, and decreased toll charges. And, unlike other IT investments, the savings begin as soon as the system goes live to customers. It is a need-to-have technology for the sluggish economy if your company is looking to control costs and improve customer satisfaction. The business justification speaks for itself.

Michael Thompson is director, Solutions Marketing for SpeechWorks International, Inc. Learn more about speech recognition and ROI at a free Web seminar by visiting www.speechworks.com/learn/seminars/reruns.cfm.