Alexa’s recent and, thankfully, brief habit of breaking into unprompted maniacal laughter freaked out some users and made for a good story. Turns out the devices thought they heard the prompt “Alexa, laugh.”(The company disabled that prompt.)
A funny story, but one that neatly encompasses the state of voice tech. It’s cool, and it kind of works, but sometimes it doesn’t.
With sales of voice devices predicted to hit over 50 million this year, voice is being touted as the biggest consumer tech disruption since the smartphone. And where consumer tech goes, enterprise follows, as we saw with BYOD, social media, and tablets. The global voice tech industry is expected to reach US$126.5 billion by 2023.
Businesses are taking note. Last January, JPMorgan Chase hired VaynerMedia as its agency of record for voice technology to help the finance giant set up its customer voice strategy. In late 2017, Amazon introduced its Alexa for Business service, which uses Alexa devices and workplace software. Your future employees will have used voice tech since kindergarten. The voice revolution is coming.
Enterprise might be lagging behind consumer voice tech adoption, but 2018 could very well be the year when it begins to make itself felt in workplaces across the world. Why should executives care? One word: productivity. The ability of computers to convert voice to text using techniques like machine learning has quietly gained near-perfect accuracy. A study by researchers from Stanford University, the University of Washington, and Baidu USA found that voice input was nearly three times faster than typing and that the difference in error rate between the two types of input was nearly indistinguishable.
Further, voice is emerging as a powerful enabler for two other technologies that are hovering around the edges of the enterprise: augmented reality (AR) and virtual reality (VR). AR-equipped glasses have already made inroads into places like the warehouse, where a peek into the upper corner of a lens lets pickers find packages while leaving their hands free to work faster. Companies are already adding voice to the picture, ratcheting up productivity even further. Mixed reality apps—a combo meal of voice, AR, and VR, if you will—will hit $9 billion by 2022, according to Juniper Research, and most of that will be driven by voice.
So voice isn’t just for laughs anymore. But how far will it go in business and how will those changes manifest themselves? Will we say goodbye to keyboards, ciao to paper, never again to resetting a password?
There are still some pretty serious barriers to adoption that need to be tackled before voice becomes truly integrated into a business environment. And while spontaneous evil laughter can be funny, the room for error in consumer products simply does not exist in the enterprise sphere. That’s why it’s important to look at voice’s promise and challenges now, before unprompted hilarity causes a business disaster.
The Impending Enterprise Voice Tech Tsunami
Recent advances in voice tech mean that it can enable a more natural way to interact with computers and machines. Most people are already used to chatbots. Improvements in machine learning, artificial intelligence, and natural language all lead to voice. And its potential for improving accessibility for those with disabilities is enormous (see “Voicing a New Level of Accessibility”).
Enterprise technology trails consumer tech by about 18 months, says Mark Plakias, ex-vice president of knowledge transfer at telecom company Orange Silicon Valley, meaning voice is on track to hit offices this year. “The technology is going to continue to improve, the algorithms will improve, and there will be more functionality shipping with these devices because there will be more third-party apps,” he says.
Voice won’t replace all the other technologies you’re already using; rather it will likely be an add-on. The future of user experience will be multimodal, involving a combination of screens, AR, VR, voice, chat, stylus pens, and gestures.
An example is giving directions—a combination of text, voice, and visual works best, with visual leading the cast as the primary interface. How best to combine the various elements should be a case-by-case decision, says Alexander Rudnicky, a professor at Carnegie Mellon University and part of its Speech Group and Language Technologies Institute. The key is to ensure, as with giving directions, that you’re making the best choice of primary and secondary interfaces for each scenario. “Certain people like me need to always step back and think about what it is that the human actually needs in this situation rather than what sounds cool.”
Not every situation calls for the full menu of interface options, however. In cases where users are choosing between reading and voice, chatter won’t always win out. In some circumstances, reading will still be more efficient or convenient; in others, voice will be. For example, visually scanning an e-mail inbox is still the best way to figure out what’s important—does anyone want to listen to every e-mail? But voice might be the tool of choice when responding. The ability to switch between the two modes could help tame the e-mail beast.
As with any new technology in the workplace, there will be a period of building trust that it will be reliable and be a help and not a hindrance. Voice comes with additional complications: How comfortable will employees be speaking an important memo to a voice-to-text application instead of typing it? Perhaps not much, at first. And especially not in front of colleagues.