The Reality of the Siri Voice Recognition System

Written by Randall | Oct 11, 2011 3:13:36 PM

It is easy to discount Siri as just another Voice recognition application, albeit a rather good one. Siri is an Artificial Intelligence infrastructure and continual learning and contextual awareness systems - the definition of true Synergy, e.g.: “Two or more things functioning together to produce a result not independently obtainable”. None of the individual parts are "new" but the combination Siri created has never really been seen before.

With Siri, Apple is using the results of over 40 years of research funded by DARPA (https://www.darpa.mil/ Siri Inc. was a spin off of SRI Intentional) through the Personalized Assistant That Learns Program (PAL,https://pal.sri.com) and Cognitive Agent that Learns and Organizes Program (CALO).

This includes research teams from Carnegie Mellon University, the University of Massachusetts, the University of Rochester, the Institute for Human and Machine Cognition, Oregon State University, the University of Southern California, and Stanford University. This technology has come a very long way with dialog and natural language understanding, machine learning, evidential and probabilistic reasoning, ontology and knowledge representation, planning, reasoning and service delegation.

Siri started in 1966 when SRI International was tasked by the Defense Department for the “development of computer capabilities for intelligent behavior in complex situations”. The failure of earlier forms of voice recognition and AI had a number of break points. The primary ones were based on computational power and the workable model for an operateable system. Moore’s Law, the Internet and Apple has delivered the computer horse power and some 40 years of University research has delivered the other part, Siri. Siri has focused on the 3 important points for this technology: Conversational Interface, Personal Context Awareness and Service Delegation.

Siri will become the 4th and perhaps the most important way to interact with devices. The keyboard, mouse and gestures will always be around and are not going away. However the way humans usually interact is in an even flow of questions and answers most effectively by speaking. There is a huge barrier for most simple questions that is presented once one has to reach for a device and compose a question in a physical manner. The old way of shaping just the right question to get just the right answer in a search field is also not going away anytime soon. But asking a device for a quick answer in just the same manner you would ask a librarian or perhaps a friend will become very, very powerful.

View full post