Amazon Alexa scientists find ways to improve speech and sound recognition | cross pond high tech | Scoop.it

How do assistants like Alexa discern sound? The answer lies in two Amazon research papers scheduled to be presented at this year’s International Conference on Acoustics, Speech, and Signal Processing in Aachen, Germany. Ming Sun, a senior speech scientist in the Alexa Speech group, detailed them this morning in a blog post.

“We develop[ed] a way to better characterize media audio by examining longer-duration audio streams versus merely classifying short audio snippets,” he said, “[and] we used semisupervised learning to train a system developed from an external dataset to do audio event detection.”

 

The first paper addresses the problem of media detection — that is, recognizing when voices captured from an assistant originate from a TV or radio rather than a human speaker. To tackle this, Sun and colleagues devised a machine learning model that identifies certain characteristics common to media sound, regardless of content, to delineate it from speech.