Stay organized with collections Save and categorize content based on your preferences.
Create clustered tablesYou can reduce the amount of data processed by a query by using clustered tables in BigQuery.
With clustered tables, table data is organized based on the values of specified columns, also called the clustering columns. BigQuery sorts the data by the clustered columns, then stores the rows that have similar values in the same or nearby physical blocks. When a query filters on a clustered column, BigQuery efficiently scans only the relevant blocks and skips the data that doesn't match the filter.
For more information, see the following:
To create a table, you need the following IAM permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.jobs.create
Additionally, you might require the bigquery.tables.getData
permission to access the data that you write to the table.
Each of the following predefined IAM roles includes the permissions that you need in order to create a table:
roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin
(includes the bigquery.jobs.create
permission)roles/bigquery.user
(includes the bigquery.jobs.create
permission)roles/bigquery.jobUser
(includes the bigquery.jobs.create
permission)Additionally, if you have the bigquery.datasets.create
permission, you can create and update tables in the datasets that you create.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Table naming requirementsWhen you create a table in BigQuery, the table name must be unique per dataset. The table name can:
The following are all examples of valid table names: table 01
, ग्राहक
, 00_お客様
, étudiant-01
.
Caveats:
mytable
and MyTable
can coexist in the same dataset, unless they are part of a dataset with case-sensitivity turned off.If you include multiple dot operators (.
) in a sequence, the duplicate operators are implicitly stripped.
For example, this: project_name....dataset_name..table_name
Becomes this: project_name.dataset_name.table_name
You can specify the columns used to create the clustered table when you create a table in BigQuery. After the table is created, you can modify the columns used to create the clustered table. For details, see Modifying the clustering specification.
Clustering columns must be top-level, non-repeated columns, and they must be one of the following data types:
BIGNUMERIC
BOOL
DATE
DATETIME
GEOGRAPHY
INT64
NUMERIC
RANGE
STRING
TIMESTAMP
You can specify up to four clustering columns. When you specify multiple columns, the order of the columns determines how the data is sorted. For example, if the table is clustered by columns a, b and c, the data is sorted in the same order: first by column a, then by column b, and then by column c. As a best practice, place the most frequently filtered or aggregated column first.
The order of your clustering columns also affects query performance and pricing. For more information about query best practices for clustered tables, see Querying clustered tables.
Create an empty clustered table with a schema definitionTo create an empty clustered table with a schema definition:
ConsoleIn the Google Cloud console, go to the BigQuery page.
bq show --format=prettyjson dataset.table
Use the CREATE TABLE
DDL statement command with the CLUSTER BY
option. The following example creates a clustered table named myclusteredtable
in mydataset
:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE TABLE mydataset.myclusteredtable ( customer_id STRING, transaction_amount NUMERIC ) CLUSTER BY customer_id OPTIONS ( description = 'a table clustered by customer_id');
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqUse the bq mk
command with the following flags:
--table
(or the -t
shortcut).--schema
. You can supply the table's schema definition inline or use a JSON schema file.--clustering_fields
. You can specify up to four clustering columns.Optional parameters include --expiration
, --description
, --time_partitioning_type
, --time_partitioning_field
, --time_partitioning_expiration
, --destination_kms_key
, and --label
.
If you are creating a table in a project other than your default project, add the project ID to the dataset in the following format: project_id:dataset
.
--destination_kms_key
is not demonstrated here. For information about using --destination_kms_key
, see customer-managed encryption keys.
Enter the following command to create an empty clustered table with a schema definition:
bq mk \ --table \ --expiration INTEGER1 \ --schema SCHEMA \ --clustering_fields CLUSTER_COLUMNS \ --description "DESCRIPTION" \ --label KEY:VALUE,KEY:VALUE \ PROJECT_ID:DATASET.TABLE
Replace the following:
INTEGER1
: the default lifetime, in seconds, for the table. The minimum value is 3,600 seconds (one hour). The expiration time evaluates to the current UTC time plus the integer value. If you set the table's expiration time when you create a table, the dataset's default table expiration setting is ignored. Setting this value deletes the table after the specified time.SCHEMA
: an inline schema definition in the format COLUMN:DATA_TYPE,COLUMN:DATA_TYPE
or the path to the JSON schema file on your local machine.CLUSTER_COLUMNS
: a comma-separated list of up to four clustering columns. The list cannot contain any spaces.DESCRIPTION
: a description of the table, in quotes.KEY:VALUE
: the key-value pair that represents a label. You can enter multiple labels using a comma-separated list.PROJECT_ID
: your project ID.DATASET
: a dataset in your project.TABLE
: the name of the table you're creating.When you specify the schema on the command line, you cannot include a RECORD
(STRUCT
) type, you cannot include a column description, and you cannot specify the column's mode. All modes default to NULLABLE
. To include descriptions, modes, and RECORD
types, supply a JSON schema file instead.
Examples:
Enter the following command to create a clustered table named myclusteredtable
in mydataset
in your default project. The table's expiration is set to 2,592,000 (1 30-day month), the description is set to This is my clustered table
, and the label is set to organization:development
. The command uses the -t
shortcut instead of --table
.
The schema is specified inline as: timestamp:timestamp,customer_id:string,transaction_amount:float
. The specified clustering field customer_id
is used to cluster the table.
bq mk \
-t \
--expiration 2592000 \
--schema 'timestamp:timestamp,customer_id:string,transaction_amount:float' \
--clustering_fields customer_id \
--description "This is my clustered table" \
--label org:dev \
mydataset.myclusteredtable
Enter the following command to create a clustered table named myclusteredtable
in myotherproject
, not your default project. The description is set to This is my clustered table
, and the label is set to organization:development
. The command uses the -t
shortcut instead of --table
. This command does not specify a table expiration. If the dataset has a default table expiration, it is applied. If the dataset has no default table expiration, the table never expires.
The schema is specified in a local JSON file: /tmp/myschema.json
. The customer_id
field is used to cluster the table.
bq mk \
-t \
--expiration 2592000 \
--schema /tmp/myschema.json \
--clustering_fields=customer_id \
--description "This is my clustered table" \
--label org:dev \
myotherproject:mydataset.myclusteredtable
After the table is created, you can update the table's description and labels.
TerraformUse the google_bigquery_table
resource.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
The following example creates a table named mytable
that is clustered on the ID
and Created
columns:
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud ShellSet the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
Environment variables are overridden if you set explicit values in the Terraform configuration file.
Each Terraform configuration file must have its own directory (also called a root module).
.tf
extension—for example main.tf
. In this tutorial, the file is referred to as main.tf
.
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created main.tf
.
Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
terraform init
Optionally, to use the latest Google provider version, include the -upgrade
option:
terraform init -upgrade
terraform plan
Make corrections to the configuration as necessary.
yes
at the prompt:
terraform apply
Wait until Terraform displays the "Apply complete!" message.
Call the tables.insert
method with a defined table resource that specifies the clustering.fields
property and the schema
property.
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
GoBefore trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Create a clustered table from a query resultThere are two ways to create a clustered table from a query result:
CREATE TABLE AS SELECT
statement. For more information about this method, see Creating a clustered table from the result of a query on the Using data definition language statements page.You can create a clustered table by querying either a partitioned table or a non-partitioned table. You cannot change an existing table to a clustered table by using query results.
When you create a clustered table from a query result, you must use standard SQL. Currently, legacy SQL is not supported for querying clustered tables or for writing query results to clustered tables.
SQLTo create a clustered table from a query result, use the CREATE TABLE
DDL statement with the CLUSTER BY
option. The following example creates a new table clustered by customer_id
by querying an existing unclustered table:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE TABLE mydataset.clustered_table ( customer_id STRING, transaction_amount NUMERIC ) CLUSTER BY customer_id AS ( SELECT * FROM mydataset.unclustered_table );
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqEnter the following command to create a new, clustered destination table from a query result:
bq --location=LOCATION query \ --use_legacy_sql=false 'QUERY'
Replace the following:
LOCATION
: the name of your location. The --location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1
. You can set a default value for the location using the .bigqueryrc file.QUERY
: a query in GoogleSQL syntax. Currently, you cannot use legacy SQL to query clustered tables or to write query results to clustered tables. The query can contain a CREATE TABLE
DDL statement that specifies the options for creating your clustered table. You can use DDL rather than specifying the individual command-line flags.Examples:
Enter the following command to write query results to a clustered destination table named myclusteredtable
in mydataset
. mydataset
is in your default project. The query retrieves data from a non-partitioned table: mytable. The table's customer_id
column is used to cluster the table. The table's timestamp
column is used to create a partitioned table.
bq query --use_legacy_sql=false \
'CREATE TABLE
mydataset.myclusteredtable
PARTITION BY
DATE(timestamp)
CLUSTER BY
customer_id
AS (
SELECT
*
FROM
`mydataset.mytable`
);'
API
To save query results to a clustered table, call the jobs.insert
method, configure a query
job, and include a CREATE TABLE
DDL statement that creates your clustered table.
Specify your location in the location
property in the jobReference
section of the job resource.
You can create a clustered table by specifying clustering columns when you load data into a new table. You do not need to create an empty table before loading data into it. You can create the clustered table and load your data at the same time.
For more information about loading data, see Introduction to loading data into BigQuery.
To define clustering when defining a load job:
SQLUse the LOAD DATA
statement. The following example loads AVRO data to create a table that is partitioned by the transaction_date
field and clustered by the customer_id
field. It also configures the partitions to expire after three days.
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
LOAD DATA INTO mydataset.mytable PARTITION BY transaction_date CLUSTER BY customer_id OPTIONS ( partition_expiration_days = 3) FROM FILES( format = 'AVRO', uris = ['gs://bucket/path/file.avro']);
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
APITo define a clustering configuration when creating a table through a load job, you can populate the Clustering
properties for the table.
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonBefore trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
What's nextINFORMATION_SCHEMA
, see Introduction to BigQuery INFORMATION_SCHEMA
.Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["This document provides a comprehensive guide on creating and utilizing clustered tables in BigQuery, detailing various methods, including SQL, bq command-line tool, API calls, and client libraries."],["Clustered tables can be created from query results, by loading data, or by defining an empty table with a schema and clustering specifications using up to four non-repeated columns of specified data types."],["Table names must be unique per dataset, up to 1,024 UTF-8 bytes, and can include Unicode characters from specific categories, with certain caveats regarding case-sensitivity and reserved names."],["Proper IAM permissions are required to create and access tables, including `bigquery.tables.create`, `bigquery.tables.updateData`, and `bigquery.jobs.create`, with predefined roles like `roles/bigquery.dataEditor` providing necessary permissions."],["Clustering specifications of tables can be modified or removed even after the table is created by using `bq update` command or `tables.update` or `tables.patch` API method, then running the `UPDATE` statement to enforce the changes."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4