A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3/speech_command_recognition/README.html below:

Website Navigation


Command Word - ESP32-S3 - — ESP-SR latest documentation

Command Word

[中文]

MultiNet Command Word Recognition Model

MultiNet is a lightweight model designed to recognize multiple speech command words offline based on ESP32-S3. Currently, up to 200 speech commands, including customized commands, are supported.

The MultiNet input is the audio processed by the audio-front-end algorithm (AFE), with the format of 16 KHz, 16 bit and mono. By recognizing the audio signals, speech commands can be recognized.

Please refer to Models Benchmark to check models supported by Espressif SoCs.

For details on flash models, see Section Flashing Models .

Note

Models ending with Q8 represents the 8 bit version of the model, which is more lightweight.

Commands Recognition Process

Please see the flow diagram for commands recognition below:

speech_command-recognition-system

Speech Commands Customization Methods

Note

Mixed Chinese and English is not supported in command words.

The command word cannot contain Arabic numerals and special characters.

Please refer to Chinese version documentation for Chinese speech commands customization methods.

MultiNet7 customize speech commands

MultiNet7 use phonemes for English speech commands. Please modify a text file model/multinet_model/fst/commands_en.txt by the following format:

# command_id,command_grapheme,command_phoneme
1,tell me a joke,TfL Mm c qbK
2,sing a song,Sgl c Sel

If Column 3 is left empty, then an internal Grapheme-to-Phoneme tool will be called at runtime. But there might be a little accuracy drop in this way due the different Grapheme-to-Phoneme algorithms used.

MultiNet6 customize speech commands

MultiNet6 use grapheme for English speech commands, you can add/modify speech commands by words directly. Please modify a text file model/multinet_model/fst/commands_en.txt by the following format:

# command_id,command_grapheme
1,TELL ME A JOKE
2,MAKE A COFFEE

The extra column in the default commands_en.txt is to keep it compatible with MultiNet7, there is no need to fill the third column when using MultiNet6.

MultiNet5 customize speech commands

MultiNet5 use phonemes for English speech commands. For simplicity, we use characters to denote different phonemes. Please use tool/multinet_g2p.py to do the convention.

Customize Speech Commands Via API calls

Alternatively, speech commands can be modified via API calls, this method works for MultiNet5, MultiNet6 and MultiNet7.

MutiNet5 requires the input command string to be phonemes, and MultiNet6 and MultiNet7 only accepts grapheme inputs to API calls.

Use MultiNet

We suggest to use MultiNet together with audio front-end (AFE) in ESP-SR. For details, see Section AFE Introduction and Use .

After configuring AFE, users can follow the steps below to configure and run MultiNet.

Initialize MultiNet Run MultiNet

Users can start MultiNet after enabling AFE and WakeNet, but must pay attention to the following limitations:

MultiNet Output

Speech command recognition must be used with WakeNet. After wake-up, MultiNet detection can start.

Afer running, MultiNet returns the recognition output of the current frame in real time mn_state, which is currently divided into the following identification states:

Single recognition mode and Continuous recognition mode: * Single recognition mode: exit the speech recognition when the return status is ESP_MN_STATE_DETECTED * Continuous recognition mode: exit the speech recognition when the return status is ESP_MN_STATE_TIMEOUT

Resource Occupancy

For the resource occupancy for this model, see Resource Occupancy.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4