MIT's new AI understands the hardest part of human language

Image: YouTube

Voice-based artificial intelligence assistants can understand human speech better than ever. But while AI can decipher our words and context to deliver fast results and actions, AI doesn’t understand the tone, or one’s feelings. Researchers from MIT set out to change that, creating a wearable that can detect the tone of a conversation. In the future, the tool might help people with anxiety or Asperger’s syndrome better deal with stressful situations.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Institute of Medical Engineering and Science (IMES) published their findings on an AI-based wearable system that would correlate audio and vital-sign data to act as a social coach for the wearer.

The system would analyze the sound of a story, but also text transcriptions and physiological signals to determine the tone. The system has an 83% accuracy of detecting the overall tone, and can even offer a sentiment score using deep-learning technology.

“The system picks up on how, for example, the sentiment in the text transcription was more abstract than the raw accelerometer data,” MIT graduate student Tuka Alhanai said. “It’s quite remarkable that a machine could approximate how we humans perceive these interactions, without significant input from us as researchers.”

Rather than analyzing just the contents of speech, the system will record other data with the help of a wearable device, including movements and heart rate to assess the overall tone of the discussion.

The algorithm found that long pauses and monotonous vocal tones were associated with sadder stories. Similarly, increased fidgeting and cardiovascular activity, as well as certain hand postures, like putting one’s hands on one’s face, were strongly associated with sad stories.

The researchers used Samsung Simband research wearables to create the algorithm. Currently, the algorithm can classify the mood of each five-second interval with an accuracy that’s 18% above chance, and 7.5% better than other approaches.

However, the system is not ready to be deployed for social coaching purposes. It currently labels interactions as positive and negative, but the goal is for AI to be able to determine tone more accurately than that and identify boring, tense, or exciting moments.

Interestingly, the researchers say that all the computing required to analyze the tone of a conversation is done locally, on a device, to protect privacy. However, a consumer version would also need clear protocols for obtaining the consent of the other people involved in the conversation, not just the wearer.

A video showing MIT’s invention follows below.

This article talks about:

MIT

Chris Smith Senior Writer

Chris Smith has been covering consumer electronics ever since the iPhone revolutionized the industry in 2007. When he’s not writing about the most recent tech news for BGR, he closely follows the events in Marvel’s Cinematic Universe and other blockbuster franchises.

Outside of work, you’ll catch him streaming new movies and TV shows, or training to run his next marathon.