This guide will get you up and running with Spokestack for Python, and youâll have a voice interface in your application in no time.
Installation System DependenciesThere are some system dependencies that need to be downloaded in order to install spokestack
via pip.
brew install lame portaudio
Debian/Ubuntu
sudo apt-get install portaudio19-dev libmp3lame-dev
Windows
We currently do not support Windows 10 natively, and recommend you install Windows Subsystem for Linux (WSL) with the Debian dependencies. However, if you would like to work on native Windows support, we gladly accept pull requests.
Another potential avenue for using Spokestack on Windows 10 is via anaconda. PortAudio can be installed via conda
, but Lame cannot. Hence, microphone input will be supported, but text-to-speech will not.
Once system dependencies have been satisfied, you can install the library with the following.
SetupWe use pyenv
for virtual environments.
pyenv install 3.8.6
pyenv virtualenv 3.8.6 spokestack
pyenv local spokestack
pip install -r requirements.txt
Install Tensorflow
This library requires a way to run TFLite models. There are two ways to add this ability. The first is installing the full Tensorflow library:
In use cases where you require a small footprint, such as on a Raspberry Pi or similar Internet of Things (IOT) devices, you will want to install the TFLite Interpreter. You can install it for your platform by following the instructions.
IntegrationIn order for your application to use Spokestackâs features, there are a few things you will need:
SpeechPipeline
InstanceGo to spokestack.io to set up your own account (itâs free!). Once youâve got that, go grab one of our free NLU models. Weâll use the Highlow
one in this example, but you can choose another, or create your own
Once youâve downloaded your NLU, unzip nlu.tar.gz
with the three files inside (metadata.json
, nlu.tflite
, vocab.txt
). The location of the directory isnât important, because we will pass the path on initialization.
The PyAudioInput
class will use the system default audio input device. Most personal computers have some form of microphone, but in the case of an embedded device, you may need to purchase a small USB microphone.
SpeechPipeline
Instance
Spokestackâs speech pipeline handles collecting audio from the input device and transcribing speech directed at your app. The SpeechPipeline
guide has a detailed explanation of how to set up the pipeline, so we will show the quickest way here using a profile, which configures the pipelineâs components for a specific use case. The profile we use here includes wake word activation and speech transcription using Spokestackâs cloud ASR.
from spokestack.profile.wakeword_asr import WakewordSpokestackASR
pipeline = WakewordSpokestackASR.create(
"spokestack_id", "spokestack_secret", model_dir="path_to_tflite_model_dir"
)
pipeline.start()
From text to meaning
Translating the text into an action is the job of the Natural Language Understanding (NLU) component. A great thing about Spokestack NLU models is that they run entirely on device. The NLU can be initialized like this:
from spokestack.nlu.tflite import TFLiteNLU
nlu = TFLiteNLU("path_to_tflite_model_dir")
Input to the NLU model is the ASR transcript. The transcript can be accessed as a property of SpeechContext
. Below is a sample event handler for running inference on the speech transcript.
@pipeline.event
def on_recognize(context)
results = nlu(context.transcript)
Some useful links for configuring Spokestackâs NLU:
Talking back to your usersIf you want the full smart speaker experience, you will need to give your application a voice. This can be achieved with text-to-speech (TTS). For more information on TTS, see the TTS concept guide. TTS playback uses the PyAudioOutput
class, which plays audio with the default speaker for the device. Like NLU, TTS can be used in an event handler. Take a look at the example below, which simply speaks what the ASR heard.
@pipeline.event
def on_recognize(context):
tts.synthesize("welcome to spokestack")
Conclusion
Thatâs all there is to setting up an application with Spokestack. Your Python application can now accept and respond to voice commands.
Thank you for taking the time to read this!
Related ResourcesWant to dive deeper into the world of Android voice integration? We've got a lot to say on the subject:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4