A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://developers.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-transcribe below:

The ML.TRANSCRIBE function | BigQuery

Stay organized with collections Save and categorize content based on your preferences.

The ML.TRANSCRIBE function Note: This feature is automatically available in the Enterprise and Enterprise Plus editions. If you use the Standard edition or on-demand pricing and would like to use this feature, send an email to bqml-feedback@google.com.

This document describes the ML.TRANSCRIBE function, which lets you transcribe audio files from an object table by using the Speech-to-Text API.

Syntax
ML.TRANSCRIBE(
  MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
  TABLE `PROJECT_ID.DATASET.OBJECT_TABLE`,
  [RECOGNITION_CONFIG => ( JSON 'RECOGNITION_CONFIG')]
)
Arguments

ML.TRANSCRIBE takes the following arguments:

Output

ML.TRANSCRIBE returns the following columns:

Quotas

See Cloud AI service functions quotas and limits.

Known issues

This section contains information about known issues.

Resource exhausted errors

Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:

A retryable error occurred: RESOURCE EXHAUSTED error from <remote endpoint>

This issue occurs because BigQuery query jobs finish successfully even if the function fails for some of the rows. The function fails when the volume of API calls to the remote endpoint exceeds the quota limits for that service. This issue occurs most often when you are running multiple parallel batch queries. BigQuery retries these calls, but if the retries fail, the resource exhausted error message is returned.

To iterate through inference calls until all rows are successfully processed, you can use the BigQuery remote inference SQL scripts or the BigQuery remote inference pipeline Dataform package.

Invalid argument errors

Sometimes after a query job that uses this function finishes successfully, some returned rows contain the following error message:

INVALID_ARGUMENT: The audio file cannot be processed in time.

This issue occurs because one of the audio files being processed is too long. Check your input audio files to make sure they are all 30 minutes or less.

Locations

You can run the ML.TRANSCRIBE function in the following locations:

ML.TRANSCRIBE must run in the same region as the remote model that the function references.

Limitations

The function can't process audio files that are longer than 30 minutes. Any row that contains such a file returns an error.

Example

The following example transcribes the audio files represented by the audio table:

Create the model:

# Create model
CREATE OR REPLACE MODEL
`myproject.mydataset.transcribe_model`
REMOTE WITH CONNECTION `myproject.myregion.myconnection`
OPTIONS (remote_service_type = 'CLOUD_AI_SPEECH_TO_TEXT_V2',
speech_recognizer = 'projects/project_number/locations/recognizer_location/recognizer/recognizer_id');

Transcribe the audio files without overriding the recognizer's default configuration:

SELECT *
FROM ML.TRANSCRIBE(
  MODEL `myproject.mydataset.transcribe_model`,
  TABLE `myproject.mydataset.audio`
);

Transcribe the audio files and override the recognizer's default configuration:

SELECT *
FROM ML.TRANSCRIBE(
  MODEL `myproject.mydataset.transcribe_model`,
  TABLE `myproject.mydataset.audio`,
  recognition_config => ( JSON '{"language_codes": ["en-US" ],"model": "chirp","auto_decoding_config": {}}')
);

The result is similar to the following:

transcripts | ml_transcribe_result | ml_transcribe_status | uri | ... |
------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | --------
OK Google stream stranger things from Netflix to my TV. Okay, stranger things from Netflix playing on t v smart home and it's just... | {"metadata":{"total_billed_duration":{"seconds":56}},"results":[{"alternatives":[{"confidence":0.738729,"transcript"... | | gs://mybucket/audio_files |
What's next

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["The `ML.TRANSCRIBE` function transcribes audio files from an object table using the Speech-to-Text API, a feature available in Enterprise and Enterprise Plus editions."],["This function requires a remote model with a `REMOTE_SERVICE_TYPE` of `CLOUD_AI_SPEECH_TO_TEXT_V2` and an object table containing audio file URIs."],["You can optionally provide a `recognition_config` in JSON format to override the default configuration of the speech recognizer, but it's only needed if a recognizer has not been specified in the model."],["The output of `ML.TRANSCRIBE` includes the transcribed text, the raw API response, the status of the API call for each row, and the object table's columns, all as columns in the result."],["The function has limitations, including the inability to process audio files longer than 30 minutes and potential `RESOURCE EXHAUSTED` errors when API quota limits are exceeded, the latter of which you can use a SQL script or a Dataform package to resolve."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4