One of the drawbacks of relying on a voice-based digital assistant like Alexa is you may be interacting with it without the benefit of a screen or some kind of visual element. Meaning, much of the interaction may be dependent on your memory of what the assistant’s software can actually do — and the commands required to execute those capabilities. That said, a few new skills for Amazon’s Alexa are rolling out today in the US that are intended to help Alexa slowly get better at understanding what you mean when you talk to her.
More specifically, thanks to machine learning in the background, Alexa is increasingly able to teach herself to get better at understanding what you’re really trying to say.
These new skills facilitate the kind of interactions that will enable you to speak more naturally with Alexa devices. For example, Amazon is expanding so-called “name-free interactions” today in the US, as well as in the UK, Canada, Australia, India, Germany and Japan. What’s a name-free interaction? It’s being able to say something like, “Alexa, get me a car” instead of having to specify a specific ride-sharing service.
Such interactions for smart home-related skills are also being expanded in the US. Now, you can simply ask Alexa to perform a task like, say, cleaning, without having to specify a specific skill. Alexa will just figure out what you mean.
Ruhi Sarikaya, Alexa AI director of applied science, explains in a developer blog post published today what all is involved in helping Alexa make better sense of the world by reading context clues. “There has been remarkable progress in conversational AI systems this decade, thanks in large part to the power of cloud computing, the abundance of the data required to train AI systems, and improvements in foundational AI algorithms,” he writes. “Increasingly, though, as customers expand their conversational-AI horizons, they expect Alexa to interpret their requests contextually; provide more personal, contextually relevant responses; expand her knowledge and reasoning capabilities; and learn from her mistakes.”
Alexa is increasingly relying on a growing set of contextual signals, he continues, to “resolve ambiguity.” They include everything from personal contexts, which includes historical activity and preferences, plus existing session context and physical contexts, such as whether this device is in a home, car, hotel or office.
“Earlier this week we launched in the U.S. a new self-learning system that detects the defects in Alexa’s understanding and automatically recovers from these errors,” Ruhi writes. “This system is unsupervised, meaning that it doesn’t involve any manual human annotation; instead, it takes advantage of customers’ implicit or explicit contextual signals to detect unsatisfactory interactions or failures of understanding. The system learns how to address these issues and automatically deploys fixes to our production systems shortly after.”
During the beta phase of this work, Alexa learned to associate someone saying “Play Good for What” with the understanding that the user actually wanted Alexa to play Drake’s song Nice for What.
Ruhi’s post concludes with this: “We’re on a multiyear journey to fundamentally change human-computer interaction. It’s still Day 1, and not unlike the early days of the Internet, when some suggested that the metaphor of a market best described the technology’s future. Nearly a quarter-century later, a market segment is forming around Alexa, and it’s clear that for that market segment to thrive, we must expand our use of contextual signals to reduce ambiguity and friction and increase customer satisfaction.”