Stay organized with collections Save and categorize content based on your preferences.
Translate SQL queries with the translation APIThis document describes how to use the translation API in BigQuery to translate scripts written in other SQL dialects into GoogleSQL queries. The translation API can simplify the process of migrating workloads to BigQuery.
Before you beginBefore you submit a translation job, complete the following steps:
To get the permissions that you need to create translation jobs using the translation API, ask your administrator to grant you the MigrationWorkflow Editor (roles/bigquerymigration.editor
) IAM role on the parent
resource. For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to create translation jobs using the translation API. To see the exact permissions that are required, expand the Required permissions section:
Required permissionsThe following permissions are required to create translation jobs using the translation API:
bigquerymigration.workflows.create
bigquerymigration.workflows.get
You might also be able to get these permissions with custom roles or other predefined roles.
Enable the BigQuery Migration APIIf your Google Cloud CLI project was created before February 15, 2022, enable the BigQuery Migration API as follows:
In the Google Cloud console, go to the BigQuery Migration API page.
Click Enable.
If you want to use the Google Cloud console or the BigQuery Migration API to perform a translation job, you must upload the source files containing the queries and scripts you want to translate to Cloud Storage. You can also upload any metadata files or configuration YAML files to the same Cloud Storage bucket containing the source files. For more information about creating buckets and uploading files to Cloud Storage, see Create buckets and Upload objects from a filesystem.
Supported task typesThe translation API can translate the following SQL dialects into GoogleSQL:
Redshift2BigQuery_Translation
HiveQL2BigQuery_Translation
SparkSQL2BigQuery_Translation
AzureSynapse2BigQuery_Translation
Greenplum2BigQuery_Translation
Db22BigQuery_Translation
Netezza2BigQuery_Translation
MySQL2BigQuery_Translation
Oracle2BigQuery_Translation
Postgresql2BigQuery_Translation
Presto2BigQuery_Translation
Snowflake2BigQuery_Translation
SQLite2BigQuery_Translation
SQLServer2BigQuery_Translation
Teradata2BigQuery_Translation
Vertica2BigQuery_Translation
When translating SQL from a source dialect to BigQuery, some functions might not have a direct equivalent. To address this, the BigQuery Migration Service (and the broader BigQuery community) provide helper user-defined functions (UDFs) that replicate the behavior of these unsupported source dialect functions.
These UDFs are often found in the bqutil
public dataset, allowing translated queries to initially reference them using the format bqutil.<dataset>.<function>()
. For example, bqutil.fn.cw_count()
.
While bqutil
offers convenient access to these helper UDFs for initial translation and testing, direct reliance on bqutil
for production workloads is not recommended for several reasons:
bqutil
project hosts the latest version of these UDFs, which means their definitions can change over time. Relying directly on bqutil
could lead to unexpected behavior or breaking changes in your production queries if a UDF's logic is updated.bqutil
for production data processing. Copying UDFs to your controlled environment aligns with such policies.For reliable and stable production use, you should deploy these helper UDFs into your own project and dataset. This gives you full control over their version, customization, and access. For detailed instructions on how to deploy these UDFs, refer to the UDFs deployment guide on GitHub. This guide provides the necessary scripts and steps to copy the UDFs into your environment.
LocationsThe translation API is available in the following processing locations:
Submit a translation jobTo submit a translation job using the translation API, use the projects.locations.workflows.create
method and supply an instance of the MigrationWorkflow
resource with a supported task type.
Once the job is submitted, you can issue a query to get results.
Create a batch translationThe following curl
command creates a batch translation job where the input and output files are stored in Cloud Storage. The source_target_mapping
field contains a list that maps the source literal
entries to an optional relative path for the target output.
curl -d "{ \"tasks\": { string: { \"type\": \"TYPE\", \"translation_details\": { \"target_base_uri\": \"TARGET_BASE\", \"source_target_mapping\": { \"source_spec\": { \"base_uri\": \"BASE\" } }, \"target_types\": \"TARGET_TYPES\", } } } }" \ -H "Content-Type:application/json" \ -H "Authorization: Bearer TOKEN" -X POST https://bigquerymigration.googleapis.com/v2alpha/projects/PROJECT_ID/locations/LOCATION/workflows
Replace the following:
TYPE
: the task type of the translation, which determines the source and target dialect.TARGET_BASE
: the base URI for all translation outputs.BASE
: the base URI for all files read as sources for translation.TARGET_TYPES
(optional): the generated output types. If not specified, SQL is generated.
sql
(default): The translated SQL query files.suggestion
: AI generated suggestions.The output is stored in a subfolder in the output directory. The subfolder is named based on the value in TARGET_TYPES
.
TOKEN
: the token for authentication. To generate a token, use the gcloud auth print-access-token
command or the OAuth 2.0 playground (use the scope https://www.googleapis.com/auth/cloud-platform
).
PROJECT_ID
: the project to process the translation.
LOCATION
: the location where the job is processed.
The preceding command returns a response that includes a workflow ID written in the format projects/PROJECT_ID/locations/LOCATION/workflows/WORKFLOW_ID
.
To translate the Teradata SQL scripts in the Cloud Storage directory gs://my_data_bucket/teradata/input/
and store the results in the Cloud Storage directory gs://my_data_bucket/teradata/output/
, you might use the following query:
{
"tasks": {
"task_name": {
"type": "Teradata2BigQuery_Translation",
"translation_details": {
"target_base_uri": "gs://my_data_bucket/teradata/output/",
"source_target_mapping": {
"source_spec": {
"base_uri": "gs://my_data_bucket/teradata/input/"
}
},
}
}
}
}
Note: The string "task_name"
in this example is an identifier for the translation task and can be set to any value you prefer.
This call will return a message containing the created workflow ID in the "name"
field:
{
"name": "projects/123456789/locations/us/workflows/12345678-9abc-def1-2345-6789abcdef00",
"tasks": {
"task_name": { /*...*/ }
},
"state": "RUNNING"
}
To get the updated status for the workflow, run a GET
query. The job sends outputs to Cloud Storage as it progresses. The job state
changes to COMPLETED
after all the requested target_types
are generated. If the task succeeds, you can find the translated SQL query in gs://my_data_bucket/teradata/output
.
Preview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see the launch stage descriptions.
Note: The translation API can call Gemini using BigQuery Vertex AI integration to generate suggestions to your translated SQL query based on your AI configuration YAML file.The following example translates the Teradata SQL scripts located in the gs://my_data_bucket/teradata/input/
Cloud Storage directory and stores results in the Cloud Storage directory gs://my_data_bucket/teradata/output/
with additional AI suggestion:
{
"tasks": {
"task_name": {
"type": "Teradata2BigQuery_Translation",
"translation_details": {
"target_base_uri": "gs://my_data_bucket/teradata/output/",
"source_target_mapping": {
"source_spec": {
"base_uri": "gs://my_data_bucket/teradata/input/"
}
},
"target_types": "suggestion",
}
}
}
}
Note: To generate AI suggestions, the Cloud Storage source directory must contain at least one configuration YAML file with a suffix of .ai_config.yaml
. To learn how to write the configuration YAML file for AI suggestions, see Create a Gemini-based configuration YAML file.
After the task runs successfully, AI suggestions can be found in gs://my_data_bucket/teradata/output/suggestion
Cloud Storage directory.
The following curl
command creates a translation job with string literal inputs and outputs. The source_target_mapping
field contains a list that maps the source directories to an optional relative path for the target output.
curl -d "{ \"tasks\": { string: { \"type\": \"TYPE\", \"translation_details\": { \"source_target_mapping\": { \"source_spec\": { \"literal\": { \"relative_path\": \"PATH\", \"literal_string\": \"STRING\" } } }, \"target_return_literals\": \"TARGETS\", } } } }" \ -H "Content-Type:application/json" \ -H "Authorization: Bearer TOKEN" -X POST https://bigquerymigration.googleapis.com/v2alpha/projects/PROJECT_ID/locations/LOCATION/workflows
Replace the following:
TYPE
: the task type of the translation, which determines the source and target dialect.PATH
: the identifier of the literal entry, similar to a filename or path.STRING
: string of literal input data (for example, SQL) to be translated.TARGETS
: the expected targets that the user wants to be directly returned in the response in the literal
format. These should be in the target URI format (for example, GENERATED_DIR + target_spec.relative_path
+ source_spec.literal.relative_path
). Anything not in this list is not returned in the response. The generated directory, GENERATED_DIR for general SQL translations is sql/
.TOKEN
: the token for authentication. To generate a token, use the gcloud auth print-access-token
command or the OAuth 2.0 playground (use the scope https://www.googleapis.com/auth/cloud-platform
).PROJECT_ID
: the project to process the translation.LOCATION
: the location where the job is processed.The preceding command returns a response that includes a workflow ID written in the format projects/PROJECT_ID/locations/LOCATION/workflows/WORKFLOW_ID
.
When your job completes, you can view the results by by querying the job and examining the inline translation_literals
field in the response after the workflow completes.
To translate the Hive SQL string select 1
interactively, you might use the following query:
"tasks": {
string: {
"type": "HiveQL2BigQuery_Translation",
"translation_details": {
"source_target_mapping": {
"source_spec": {
"literal": {
"relative_path": "input_file",
"literal_string": "select 1"
}
}
},
"target_return_literals": "sql/input_file",
}
}
}
Note: The string "task_name"
in this example is an identifier for the translation task and can be set to any value you prefer.
You can use any relative_path
you would like for your literal, but the translated literal will only appear in the results if you include sql/$relative_path
in your target_return_literals
. You can also include multiple literals in a single query, in which case each of their relative paths must be included in target_return_literals
.
This call will return a message containing the created workflow ID in the "name"
field:
{
"name": "projects/123456789/locations/us/workflows/12345678-9abc-def1-2345-6789abcdef00",
"tasks": {
"task_name": { /*...*/ }
},
"state": "RUNNING"
}
To get the updated status for the workflow, run a GET
query. The job is complete when "state"
changes to COMPLETED
. If the task succeeds, you will find the translated SQL in the response message:
{
"name": "projects/123456789/locations/us/workflows/12345678-9abc-def1-2345-6789abcdef00",
"tasks": {
"string": {
"id": "0fedba98-7654-3210-1234-56789abcdef",
"type": "HiveQL2BigQuery_Translation",
/* ... */
"taskResult": {
"translationTaskResult": {
"translatedLiterals": [
{
"relativePath": "sql/input_file",
"literalString": "-- Translation time: 2023-10-05T21:50:49.885839Z\n-- Translation job ID: projects/123456789/locations/us/workflows/12345678-9abc-def1-2345-6789abcdef00\n-- Source: input_file\n-- Translated from: Hive\n-- Translated to: BigQuery\n\nSELECT\n 1\n;\n"
}
],
"reportLogMessages": [
...
]
}
},
/* ... */
}
},
"state": "COMPLETED",
"createTime": "2023-10-05T21:50:49.543221Z",
"lastUpdateTime": "2023-10-05T21:50:50.462758Z"
}
Explore the translation output
After running the translation job, retrieve the results by specifying the translation job workflow ID using the following command:
curl \ -H "Content-Type:application/json" \ -H "Authorization:Bearer TOKEN" -X GET https://bigquerymigration.googleapis.com/v2alpha/projects/PROJECT_ID/locations/LOCATION/workflows/WORKFLOW_ID
Replace the following:
TOKEN
: the token for authentication. To generate a token, use the gcloud auth print-access-token
command or the OAuth 2.0 playground (use the scope https://www.googleapis.com/auth/cloud-platform
).PROJECT_ID
: the project to process the translation.LOCATION
: the location where the job is processed.WORKFLOW_ID
: the ID generated when you create a translation workflow.The response contains the status of your migration workflow, and any completed files in target_return_literals
.
The response will contain the status of your migration workflow, and any completed files in target_return_literals
. You can poll this endpoint to check your workflow's status.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4