Stay organized with collections Save and categorize content based on your preferences.
Introduction to federated queriesThis page introduces how to use federated queries and provides guidance on querying Spanner, AlloyDB, and Cloud SQL data from BigQuery.
Federated queries let you send a query statement to AlloyDB, Spanner, or Cloud SQL databases and get the result back as a temporary table. Federated queries use the BigQuery Connection API to establish a connection with AlloyDB, Spanner, or Cloud SQL. In your query, you use the EXTERNAL_QUERY
function to send a query statement to the external database, using that database's SQL dialect. The results are converted to GoogleSQL data types.
You can use federated queries with the following data stores:
Workflowbigquery.admin
user creates a connection resource in BigQuery.EXTERNAL_QUERY
SQL function.Another option to query operational databases such as Bigtable, Spanner, Cloud Storage, Google Drive, and Salesforce Data Cloud, is to use external tables and datasets. External datasets and tables let you view tables and their schemas and query them without using an EXTERNAL_QUERY
SQL function. You don't have to bring data back into BigQuery and you can use the BigQuery syntax instead of writing in the specific SQL database dialect of SQL.
For a list of supported locations, see the following sections:
AlloyDB and Cloud SQLFederated queries are only supported in regions that support both the external data source and BigQuery.
You can create a connection and run a federated query across regions according to the following rules:
Single regions
A BigQuery single region can only query a resource in the same region.
For example, if your dataset is in us-east4
, you can query Cloud SQL instances or AlloyDB instances that are located in us-east4
. The query processing location is the BigQuery single region.
Multi-regions
A BigQuery multi-region can query any data source region in the same large geographic area (US, EU). Multi-regional locations aren't available for Cloud SQL instances, because these are only used for backups.
A query that runs in the BigQuery US multi-region can query any single region in the US geographic area, such as us-central1
, us-east4
, or us-west2
.
southamerica-east1
from BigQuery datasets in the US multi-region isn't supported.A query that runs in the BigQuery EU multi-region can query any single region in member states of the European Union, such as europe-north1
or europe-west3
.
The location where the query runs must be the same as the location of the connection resource. For example, queries executed from the US multi-region must use a connection located in the US multi-region.
Caution: Queries that originate in multi-regions can no longer reference connections in single regions. If you have an affected connection, then recreate the connection in the same multi-region as your query.The query performance varies based on the proximity between the dataset and the external data source. For example, a federated query between a dataset in the US multi-region and a Cloud SQL instance in us-central1
is fast. However, if you run the same query between the US multi-region and a Cloud SQL instance in us-east4
, the performance might be slower.
The query processing location is the multi-region location, either US
or EU
.
For Spanner, both regional and multi-regional configurations are supported. A BigQuery single region/multi-region can query a Spanner instance in any supported Spanner region. For more details refer to cross region queries.
Data type mappingsWhen you execute a federated query, the data from the external data source is converted to GoogleSQL types. For more information, see Cloud SQL federated queries.
Quotas and limitsus-west1
while the BigQuery connection is based in the US multi-region. The BigQuery query processing location is US
.Federated queries are subject to the following limitations:
Performance. A federated query is likely to not be as fast as querying only BigQuery storage. BigQuery needs to wait for the source database to execute the external query and temporarily move data from the external data source to BigQuery. Also, the source database might not be optimized for complex analytical queries.
The query performance also varies based on the proximity between the dataset and the external data source. For more information, see Supported regions.
Federated queries are read-only. The external query that is executed in the source database must be read-only. Therefore, DML or DDL statements are not supported.
Unsupported data types. If your external query contains a data type that is unsupported in BigQuery, the query fails immediately. You can cast the unsupported data type to a different supported data type.
Customer-managed encryption keys (CMEK). CMEK is configured separately for BigQuery and for external data sources. If you configure the source database to use CMEK but not BigQuery, then the temporary table that contains results of a federated query is encrypted with a Google-owned and Google-managed encryption key.
If you are using the on-demand pricing model, you are charged for the number of bytes returned from the external query when executing federated queries from BigQuery. For more information, see On-demand analysis pricing.
If you are using BigQuery editions, you are charged based on the number of slots you use. For more information, see Capacity compute pricing.
Federated queries are subject to the optimization technique known as SQL pushdowns. They improve the performance of a query by delegating operations like filtering down to the external data source instead of performing them in BigQuery. Reducing the amount of data transferred from the external data source can reduce query execution time and lower costs. SQL pushdowns include column pruning (SELECT
clauses) and filter pushdowns (WHERE
clauses).
When you use theEXTERNAL_QUERY
function, SQL pushdowns work by rewriting the original query. In the following example, the EXTERNAL_QUERY
function is used to communicate with a Cloud SQL database:
SELECT COUNT(*) FROM ( SELECT * FROM EXTERNAL_QUERY("CONNECTION_ID", "select * from operations_table") ) WHERE a = 'Y' AND b NOT IN ('COMPLETE','CANCELLED');
Replace CONNECTION_ID
with the ID of the BigQuery connection.
Without SQL pushdowns, the following query is sent to Cloud SQL:
SELECT * FROM operations_table
When this query is executed, the entire table is sent back to BigQuery, even though only some rows and columns are needed.
With SQL pushdowns, the following query is sent to Cloud SQL:
SELECT `a`, `b` FROM ( SELECT * FROM operations_table) t WHERE ((`a` = 'Y') AND (NOT `b` IN ('COMPLETE', 'CANCELLED')))
When this query is executed, only two columns and the rows that match the filtering predicate are sent back to BigQuery.
SQL pushdowns are also applied when running federated queries with Spanner external datasets.
You can examine applied pushdowns (if any) in the query plan.
LimitationsSQL pushdowns have various limitations that vary depending on the external data source and on the way you query data.
Limitations for query federation when usingEXTERNAL_QUERY
SELECT * FROM T
.BOOL
, INT64
, FLOAT64
, STRING
, DATE
, DATETIME
, TIMESTAMP
. Literals that are structs aren't supported.BOOL
, INT64
, FLOAT64
, STRING
, DATE
, DATETIME
, TIMESTAMP
, BYTE
or Arrays. Literals that are structs aren't supported.The following are supported SQL functions by data source. No functions are supported for SAP Datasphere.
Cloud SQL MySQLAND
, OR
, NOT
.=
, >
, >=
, <
, <=
, <>
, IN
, BETWEEN
, IS NULL
.+
, -
, *
(only for INT64
and FLOAT64
).AND
, OR
, NOT
.=
, >
, >=
, <
, <=
, <>
, IN
, BETWEEN
, IS NULL
.+
, -
, *
, /
(only for INT64
, FLOAT64
, and DATE
types, except for DATE
subtraction).AND
, OR
, NOT
.=
, >
, >=
, <
, <=
, <>
, IN
, BETWEEN
, IS NULL
.+
, -
, *
, /
(only for INT64
, FLOAT64
, NUMERIC
).AND
, OR
, NOT
.=
, >
, >=
, <
, <=
, <>
, IN
, BETWEEN
, IS NULL
.+
, -
, *
, /
(only for INT64
, FLOAT64
, NUMERIC
).SAFE_ADD
, SAFE_SUBTRACT
, SAFE_MULTIPLY
, SAFE_DIVIDE
(only for INT64
, FLOAT64
, NUMERIC
).An external data source might have a collation set on a column (for example, case-insensitivity). When you execute a federated query, the remote database takes into account the configured collation.
Consider the following example where you have a flag
column with a case-insensitive collation in the external data source:
SELECT * FROM EXTERNAL_QUERY("CONNECTION_ID", "select * from operations_table where flag = 'Y'")
Replace CONNECTION_ID
with the ID of the BigQuery connection.
The preceding query returns rows where flag
is y
or Y
because the query is executed on the external data source.
However, for query federation with Cloud SQL, SAP Datasphere, or AlloyDB data sources, if you add a filter on your main query, the query is executed on the BigQuery side with the default collation. See the following query:
SELECT * FROM ( SELECT * FROM EXTERNAL_QUERY("CONNECTION_ID", "select * from operations_table") ) WHERE flag = 'Y'
Due to the default case-sensitive collation in BigQuery, the preceding query only returns rows where the flag is Y
and filters out rows where the flag is y
. To make your WHERE
clause case-insensitive, specify the collation in the query:
SELECT * FROM ( SELECT * FROM EXTERNAL_QUERY("CONNECTION_ID", "select * from operations_table") ) WHERE COLLATE(flag, 'und:ci') = 'Y'What's next
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4