I'm pretty new to Azure speech service and I'm using twilo/plivo service for connecting a number with azure stt and process it further after transcription.
My problem is when I speak something, it's detecting well, and when I stop speaking or stay silent, it will automatically process the empty speech which contains empty transcription text and it is returning it, this happens for every 10-15 seconds.. it automatically detecting speech.. I'm not cancelling the continuous recognition until the end of the call.
Anyone has similar experiences or anything I can change in speech configuration? Please let me know.
I used azure SDK and used both intial and speech segmentation timeout, but no change.. I'm using it for real time so I cannot add more then a second.
2I tried the sample code for continuous speech recognition to convert speech to text and avoid Empty transcriptions were being processed due to silence or noise.
I used InitialSilenceTimeoutMs
and EndSilenceTimeoutMs
to manage silence, last_recognition_time
to filter valid recognitions, and evt.result.text.strip()
to skip empty transcriptions.
Code :
import azure.cognitiveservices.speech as speechsdk
import time
SUBSCRIPTION_KEY = "<speechKey>"
REGION = "<speechRegion>"
speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)
speech_config.speech_recognition_language = "en-US"
speech_config.set_service_property(name="InitialSilenceTimeoutMs", value="1000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)
speech_config.set_service_property(name="EndSilenceTimeoutMs", value="1000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
last_recognition_time = time.time()
def recognizing_handler(evt):
"""Handles partial recognition results."""
if evt.result.text.strip():
print(f"Recognizing: {evt.result.text}")
def recognized_handler(evt):
"""Handles final recognition results."""
global last_recognition_time
if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
if evt.result.text.strip() and (time.time() - last_recognition_time > 2):
print(f"Recognized: {evt.result.text}")
last_recognition_time = time.time()
elif evt.result.reason == speechsdk.ResultReason.NoMatch:
print("No speech recognized.")
def canceled_handler(evt):
"""Handles recognition cancellation events."""
print(f"Recognition canceled: {evt.reason}")
if evt.reason == speechsdk.CancellationReason.Error:
print(f"Error details: {evt.error_details}")
def session_started_handler(evt):
"""Handles session start events."""
print("Session started.")
def session_stopped_handler(evt):
"""Handles session stop events."""
print("Session stopped.")
recognizer.recognizing.connect(recognizing_handler)
recognizer.recognized.connect(recognized_handler)
recognizer.canceled.connect(canceled_handler)
recognizer.session_started.connect(session_started_handler)
recognizer.session_stopped.connect(session_stopped_handler)
print("Starting continuous recognition...")
recognizer.start_continuous_recognition()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
print("Stopping recognition...")
recognizer.stop_continuous_recognition()
Output :
Dasari KamaliDasari Kamali4,30522 gold badges44 silver badges1313 bronze badges
Start asking to get answers
Find the answer to your question by asking.
Ask questionExplore related questions
See similar questions with these tags.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4