MSR CORE12 Project:
Evolution Strategy Based Design of Low-Power and High Performance Compact Hardware Speech Sensors
Takahiro Shinozaki, Kai Zhu, Boyu Qian, Jian Wang, Yi Liu
Project Goal
Developping a speech recognition system that is suitable for hardware implementation of speech sensors.
Motivation
In our daily lives, it is often the case that we want to control electric divides such as an audio player
and an illumination lamp, find a small item such as a wallet and eyeglasses, catch an event such as a baby
is crying and a dog is barking. Sometimes, however, it is bothering to walk in a room interrupting what
you are doing, is time-consuming to find something, and is impossible without a help of someone else.
These problems can be solved if tiny and energy efficient speech sensors are ubiquitously embedded in our
living environment. Moreover, such speech sensors would have wide applications in toys, education, safety, and security.
Problems
The sensors must be very small so that it can be attached to various things.
The energy consumption must be minimum since it must continuously work with a tiny energy source so that
it can react to a voice at any time. It must be noise robust since it is used in noisy environments and
there is a distance between the user and the speech sensor, and the SNR is low.
Approach
-
*Combines deep neural network (DNN) based speech feature extraction and the template based matching
-
By using DNN trained with a large amount of data containing many speakers,
robust speech features can be obtained.
By using the DNN features, accurate speech recognition can be realized using the template matching approaches.
The DNN is first trained and optimized using fast computers and then implemented in speech sensor hardwares.
Any keyword or speaker can be detected immediately by just registering a template to the sensor.
-
*Apply evolution strategy based system optimization
-
Many design factors of the speech recognition system must be jointly optimized to realize
tiny and energy efficient speech sensors.
Since the design factors influence each other, the optimization is a very complex problem.
To solve the problem, we apply evolution algorithms to optimize the design factors.
Hardware implementation
An FPGA-based hardware platform has been developed.
The FPGA-based implementation has an advantage that any speech sensor design can be easily tested.
Demo video
In this demo, a Japanese keyword "Onsei" is detected.