A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://stackoverflow.com/questions/79388910/azure-speech-service-continuous-speech-recognition below:

Azure speech service continuous speech recognition

I'm pretty new to Azure speech service and I'm using twilo/plivo service for connecting a number with azure stt and process it further after transcription.

My problem is when I speak something, it's detecting well, and when I stop speaking or stay silent, it will automatically process the empty speech which contains empty transcription text and it is returning it, this happens for every 10-15 seconds.. it automatically detecting speech.. I'm not cancelling the continuous recognition until the end of the call.

Anyone has similar experiences or anything I can change in speech configuration? Please let me know.

I used azure SDK and used both intial and speech segmentation timeout, but no change.. I'm using it for real time so I cannot add more then a second.

2

I tried the sample code for continuous speech recognition to convert speech to text and avoid Empty transcriptions were being processed due to silence or noise.

I used InitialSilenceTimeoutMs and EndSilenceTimeoutMs to manage silence, last_recognition_time to filter valid recognitions, and evt.result.text.strip() to skip empty transcriptions.

Code :

import azure.cognitiveservices.speech as speechsdk
import time

SUBSCRIPTION_KEY = "<speechKey>"
REGION = "<speechRegion>"

speech_config = speechsdk.SpeechConfig(subscription=SUBSCRIPTION_KEY, region=REGION)
speech_config.speech_recognition_language = "en-US" 

speech_config.set_service_property(name="InitialSilenceTimeoutMs", value="1000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)
speech_config.set_service_property(name="EndSilenceTimeoutMs", value="1000", channel=speechsdk.ServicePropertyChannel.UriQueryParameter)
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=True)
recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
last_recognition_time = time.time()

def recognizing_handler(evt):
    """Handles partial recognition results."""
    if evt.result.text.strip():
        print(f"Recognizing: {evt.result.text}")

def recognized_handler(evt):
    """Handles final recognition results."""
    global last_recognition_time
    if evt.result.reason == speechsdk.ResultReason.RecognizedSpeech:
        if evt.result.text.strip() and (time.time() - last_recognition_time > 2):  
            print(f"Recognized: {evt.result.text}")
            last_recognition_time = time.time()
    elif evt.result.reason == speechsdk.ResultReason.NoMatch:
        print("No speech recognized.")

def canceled_handler(evt):
    """Handles recognition cancellation events."""
    print(f"Recognition canceled: {evt.reason}")
    if evt.reason == speechsdk.CancellationReason.Error:
        print(f"Error details: {evt.error_details}")

def session_started_handler(evt):
    """Handles session start events."""
    print("Session started.")

def session_stopped_handler(evt):
    """Handles session stop events."""
    print("Session stopped.")

recognizer.recognizing.connect(recognizing_handler)
recognizer.recognized.connect(recognized_handler)
recognizer.canceled.connect(canceled_handler)
recognizer.session_started.connect(session_started_handler)
recognizer.session_stopped.connect(session_stopped_handler)

print("Starting continuous recognition...")
recognizer.start_continuous_recognition()

try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    print("Stopping recognition...")
    recognizer.stop_continuous_recognition()

Output :

Dasari KamaliDasari Kamali

4,30522 gold badges44 silver badges1313 bronze badges

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4