Thursday, February 18, 2010

Military seeks one bad-ass universal speech translator

DARPA wants technology that can listen to speech, detect key words and identify the speaker in degraded communication lines.

By Layer 8

The military and other parts of the government have long sought the technology that can listen to spoken words, translate them if necessary and identify the voice. That's the general idea of a system the experimental researchers at the Defense Advanced Research Projects Agency want to develop.

Specifically DARPA is looking to build technology that can accurately - 99% of the time -- listen to speech patterns, detect key words, identify the language and speaker in highly degraded, weak or noisy communication channels. The system, called Robust Automatic Transcription of Speech (RATS) will be composed of a variety of software development and speech processing algorithms that specialize in Arabic, Farsi, Pashto, Dari and Urdu, DARPA said.

Existing transcription and translation and speech signal processing technologies are insufficient for working with noisy or degraded speech signals, DARPA stated. Currently, there is no technology that effectively addresses this kind of noisy and distorted speech signal, so operational units are forced to allocate significant human resources for this task. Operators are frequently working "in the dark," searching blindly over thousands of possible channels at any given moment without prior knowledge of the quality, relevance or language contained in the signal, if there is any signal at all, DARPA stated.

The agency says the government has "a compelling need for reliable, relevant information to directly support intelligence gathering in the field, to inform military decision makers and respond to national security requirements." The ides is that real-time language translation technology will help US forces better understand adversaries and overall social and political contexts of particular situations. This improved awareness will decrease costly mistakes due to misunderstandings, and of course let us in on nefarious activities.

The RATS system will focus on these areas:

  • Speech Activity Detection: The ability to determine whether a signal is actual speech or background noise, or music.
  • Language Identification: Once a signal is determined to be actual speech, LID is the capability that identifies the language being spoken.
  • Speaker Identification: Once a signal is identified as actual speech, SID is the capability that uniquely identifies whether the person who is speaking is one of the people identified in a list of wanted speakers.
  • Key Word Spotting: Once a signal is identified as actual speech, KWS is the capability to identify specific words or phrases from a list of items in the language being spoken.

DARPA says the RATS program is currently planned as a 3 phase effort with Phase 1 lasting 18 months. Phases 2 and 3 will each last 12 months.

Language translation software is big business. For example, researchers at Raytheon BBN have taken in over $30 million from the US Defense Advanced Research Projects Agency over the past few years to fill out the agency's Global Autonomous Language Exploitation (GALE) program. The goal of GALE is to translate and distill foreign language material (television shows and newspapers) in near real-time, highlight salient information, and store the results in a searchable database -- all with more than 90% accuracy by the end of the program. Through this process, GALE would help US analysts recognize critical information in foreign languages quickly so they could act on it in a timely fashion.

Also via DARPA funds, BBN is working on the Automatic Document Classification, Analysis and Translation (Madcat) program which looks to build a system that quickly provides relevant, distilled, actionable information to military commands and personnel by converting foreign language text images into English transcripts automatically (without the use of linguists and analysts) and with high accuracy.

BBN is also developing a prototype machine reading system that transforms prose into knowledge that can be interpreted by an artificial intelligence application. The prototype is part of the DARPA's Machine Reading Program (MRP) that wants to develop systems that can capture knowledge from naturally occurring text and transform it into the formal representations used by AI reasoning systems.

There have been other DARPA-based speech recognition systems. For example the Phraselator is PDA-like device that was developed for use in Afghanistan and Iraq by American soldiers for communicating with locals who spoke Farsi, Dari, Pashto and other languages.