Microsoft’s internal Tellme team is working on implementing speech recognition software into the Redmond-based firm’s portfolio of software and hardware products. Microsoft will build the feature into its new Windows 8 operating system, its Bing search engine, Windows Phone, Kinect and Xbox, Azure and other products, ZDNET has learned. We already know Microsoft’s upcoming Windows Phone Mango release will offer voice-to-text and text-to-voice functionality, but Tellme senior director of sales and marketing Ilya Bukshteyn told ZDNET‘s Mary Jo Foley that the HTML 5 speech tag will allow Microsoft to develop Windows 8 applications that are “speech capable.” The Tellme team is capable of taking conversational speech, querying your social networks and creating appointments, too. For example, one might say “I’m meeting Zach Epstein for sushi in Philadelphia on Wednesday,” and the voice-recognition tech can pull “Zach Epstein” from LinkedIn or Facebook, setup a calendar event and search for sushi in Philadelphia using Bing. Of note, it looks like we’re still several years away from seeing devices capable of deciphering natural conversation. Read on for more information.
Microsoft’s Tellme team recently posted the following explanation of its speech recognition work on its official blog:
We see a future where the service will know you: know your intent, your social and business connections, your likes and dislikes, your privacy preferences, and the things that define the context that’s important to you. The result will be a speech NUI service that helps you accomplish everyday tasks in a more natural and conversational manner. This service will simplify tasks that used to be tedious or impossible on a TV or other device, by combining an understanding of language and intent with a deep knowledge of you, the user. We envision a future where we build on the experiences we deliver today with Kinect for Xbox 360, Windows Phone, or Bing for iPad or iPhone apps, by enhancing the speech NUI experience to understand more layers of context: what you are doing, where you are doing it, the kinds of devices you are using and your historical preferences. Because this is a cloud-based service, your interactions will be able to persist over time, enabling you to pick up where you left off, regardless of what device you may be using.