Seeing What We Say: Improving Siri And Alexa

Doug ClintonJanuary 6, 2017

Amazon, Apple , Artificial Intelligence , Future , Google

We’ve been talking a lot about digital assistants lately. They were a big theme at CES and a recent survey of ours showed that US consumers view digital assistants as the fourth most frustrating tech product, behind devices, poor Internet service, and automated-telephone systems. Here’s a view into how we might be able to improve digital assistants in the future.

Humans are non-verbal communicators by nature. Almost 60% of human-to-human communication is through body language, but our current natural language interfaces only use voice. This means robot assistants miss 60% of the information we send to them. How often do you say thanks to Siri or Alexa after you get a right answer? How often do you curse at them when you get a wrong one? Then how often do you nod your head when Siri or Alexa give you a right answer? How often do you scrunch your face up in anger when they give you a wrong one?

The most obvious answer to this problem would seem to be some sort of computer vision implementation. This would solve part of the body language problem as the digital assistant could see any obvious gestures we make in response to its answers, but that’s not all the device would need to know. The assistant would also need to know who’s talking if there are multiple people in a room and what the speaker’s facial expressions mean in the context of the answer. You might frown at bad news, even if that was the correct answer to your question. You might smile at the hilarity of a wrong answer. This means the robot needs to build a model of what humans may interpret as good or bad or associated with some other emotion and that model must be specific to the user. Good and bad are subjective to the individual with politics as a dangerous example.

Another potential solution to help digital assistants interpret body language might be connecting with a sensor on your body. Sensors could help address one issue with computer vision solutions: that we aren’t always in the robot’s line of sight. Some of these sensors are already built into watches or advanced fitness trackers and detect biomarkers like change in heart rate or blood pressure. A rise in blood pressure might signify anger at a wrong response. A decrease in body temperature may imply sadness.

Both of these solutions beg the privacy question. Are we comfortable with allowing our robot assistants to see us and our physical data? Privacy tends to be a point of contention for every evolution of technology. It was an issue for Facebook as it grew to be indispensable for over a billion users. It was an issue for Google Glass as the most recognizable wearable with a camera. Our belief is that we already live in a post-privacy world. One of the key trade-offs we make for the convenience of many technologies we use today is that we give up privacy. We trade privacy for the benefit of connecting with people on social platforms. We trade privacy for better recommendations in search and shopping. Yes, there will be some noise about the intrusion of privacy that comes with incorporating body language and body data into digital assistants, but we expect that concern to go about as far as it did with Facebook.

The bottom line is this: adding the ability to read and interpret body language would result in a step-function change in our experiences with digital assistants. Incorporating body language is a crucial step in being able to create robots that can truly understand humans, allowing them to perform complex, human-like tasks. We view this as a complex and extremely interesting AI and robotics opportunity and a problem we need to solve as we pursue The Future Perfect.

Disclaimer: We actively write about the themes in which we invest: virtual reality, augmented reality, artificial intelligence, and robotics. From time to time, we will write about companies that are in our portfolio. Content on this site including opinions on specific themes in technology, market estimates, and estimates and commentary regarding publicly traded or private companies is not intended for use in making investment decisions. We hold no obligation to update any of our projections. We express no warranties about any estimates or opinions we make.