Stay organized with collections Save and categorize content based on your preferences.
Create object tablesThis document describes how to access unstructured data in BigQuery by creating an object table.
To create an object table, you must complete the following tasks:
roles/storage.objectViewer
) role to the service account associated with the connection.CREATE EXTERNAL TABLE
statement.In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery and BigQuery Connection API APIs.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery and BigQuery Connection API APIs.
To work with object tables, your users need the following IAM permissions based on their role in your organization. For more information on user roles, see Security model. For more information about giving permissions, see Viewing the grantable roles on resources.
Data lake administrator
To get the permissions that you need to connect to Cloud Storage, ask your administrator to grant you the BigQuery Connection Admin (roles/bigquery.connectionAdmin
) role on the project.
To get the permissions that you need to create and manage Cloud Storage buckets, ask your administrator to grant you the Storage Admin (roles/storage.admin
) role on the project.
This predefined role contains the permissions required to connect to Cloud Storage and create and manage Cloud Storage buckets. To see the exact permissions that are required, expand the Required permissions section:
Required permissionsbigquery.connections.create
bigquery.connections.get
bigquery.connections.list
bigquery.connections.update
bigquery.connections.use
bigquery.connections.delete
storage.bucket.*
storage.object.*
Data warehouse administrator
To get the permissions that you need to create object tables, ask your administrator to grant you the following roles on the project:
roles/bigquery.dataEditor
) role.roles/bigquery.connectionAdmin
) role.This predefined role contains the permissions required to create object tables. To see the exact permissions that are required, expand the Required permissions section:
Required permissionsbigquery.tables.create
bigquery.tables.update
bigquery.connections.delegate
Data analyst
To get the permissions that you need to query object tables, ask your administrator to grant you the following roles on the project:
roles/bigquery.dataViewer
) roleroles/bigquery.connectionUser
) roleThis predefined role contains the permissions required to query object tables. To see the exact permissions that are required, expand the Required permissions section:
Required permissionsbigquery.jobs.create
bigquery.tables.get
bigquery.tables.getData
bigquery.readsessions.create
You might also be able to get these permissions with custom roles or other predefined roles.
The ability to bind tables to connections (like the BigQuery Connection Administrator).
Otherwise, data analysts can create new object tables that don't have any access controls, thus circumventing controls placed by data warehouse administrators.
Before you create an object table, you must have an existing dataset to contain it. For more information, see Creating datasets.
To create an object table:
SQLUse the CREATE EXTERNAL TABLE
statement.
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
CREATE EXTERNAL TABLE `PROJECT_ID.DATASET_ID.TABLE_NAME` WITH CONNECTION {`PROJECT_ID.REGION.CONNECTION_ID`| DEFAULT} OPTIONS( object_metadata = 'SIMPLE', uris = ['BUCKET_PATH'[,...]], max_staleness = STALENESS_INTERVAL, metadata_cache_mode = 'CACHE_MODE');
Replace the following:
PROJECT_ID
: your project ID.DATASET_ID
: the ID of the dataset to contain the object table.TABLE_NAME
: the name of the object table.REGION
: the region or multi-region that contains the connection.CONNECTION_ID
: the ID of the cloud resource connection to use with this object table. The connection determines which service account is used to read data from Cloud Storage.
When you view the connection details in the Google Cloud console, the connection ID is the value in the last section of the fully qualified connection ID that is shown in Connection ID—for example projects/myproject/locations/connection_location/connections/myconnection
.
To use a default connection, specify DEFAULT
instead of the connection string containing PROJECT_ID.REGION.CONNECTION_ID.
BUCKET_PATH
: the path to the Cloud Storage bucket that contains the objects represented by the object table, in the format ['gs://bucket_name/[folder_name/]*']
.
You can use one asterisk (*
) wildcard character in each path to limit the objects included in the object table. For example, if the bucket contains several types of unstructured data, you could create the object table over only PDF objects by specifying ['gs://bucket_name/*.pdf']
. For more information, see Wildcard support for Cloud Storage URIs.
You can specify multiple buckets for the uris
option by providing multiple paths, for example ['gs://mybucket1/*', 'gs://mybucket2/folder5/*']
.
For more information about using Cloud Storage URIs in BigQuery, see Cloud Storage resource path.
STALENESS_INTERVAL
: specifies whether cached metadata is used by operations against the object table, and how fresh the cached metadata must be in order for the operation to use it. For more information on metadata caching considerations, see Metadata caching for performance.
To disable metadata caching, specify 0. This is the default.
To enable metadata caching, specify an interval literal value between 30 minutes and 7 days. For example, specify INTERVAL 4 HOUR
for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation retrieves metadata from Cloud Storage instead.
CACHE_MODE
: specifies whether the metadata cache is refreshed automatically or manually. For more information on metadata caching considerations, see Metadata caching for performance.
Set to AUTOMATIC
for the metadata cache to be refreshed at a system-defined interval, usually somewhere between 30 and 60 minutes.
Set to MANUAL
if you want to refresh the metadata cache on a schedule you determine. In this case, you can call the BQ.REFRESH_EXTERNAL_METADATA_CACHE
system procedure to refresh the cache.
You must set CACHE_MODE
if STALENESS_INTERVAL
is set to a value greater than 0.
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
Examples
The following example creates an object table with a metadata cache staleness interval of 1 day:
CREATE EXTERNAL TABLE `my_dataset.object_table` WITH CONNECTION `us.my-connection` OPTIONS( object_metadata = 'SIMPLE', uris = ['gs://mybucket/*'], max_staleness = INTERVAL 1 DAY, metadata_cache_mode = 'AUTOMATIC' );
The following example creates an object table over the objects in three Cloud Storage buckets:
CREATE EXTERNAL TABLE `my_dataset.object_table` WITH CONNECTION `us.my-connection` OPTIONS( object_metadata = 'SIMPLE', uris = ['gs://bucket1/*','gs://bucket2/folder1/*','gs://bucket3/*'] );
The following example creates an object table over just the PDF objects in a Cloud Storage bucket:
CREATE EXTERNAL TABLE `my_dataset.object_table` WITH CONNECTION `us.my-connection` OPTIONS( object_metadata = 'SIMPLE', uris = ['gs://bucket1/*.pdf'] );bq
Use the bq mk
command.
bq mk --table \ --external_table_definition=BUCKET_PATH@REGION.CONNECTION_ID \ --object_metadata=SIMPLE \ --max_staleness=STALENESS_INTERVAL \ --metadata_cache_mode=CACHE_MODE \ PROJECT_ID:DATASET_ID.TABLE_NAME
Replace the following:
PROJECT_ID
: your project ID.DATASET_ID
: the ID of the dataset to contain the object table.TABLE_NAME
: the name of the object table.REGION
: the region or multi-region that contains the connection.CONNECTION_ID
: the ID of the cloud resource connection to use with this external table. The connection determines which service account is used to read data from Cloud Storage.
When you view the connection details in the Google Cloud console, the connection ID is the value in the last section of the fully qualified connection ID that is shown in Connection ID—for example projects/myproject/locations/connection_location/connections/myconnection
.
BUCKET_PATH
: the path to the Cloud Storage bucket that contains the objects represented by the object table, in the format gs://bucket_name/[folder_name/]*
.
You can use one asterisk (*
) wildcard character in each path to limit the objects included in the object table. For example, if the bucket contains several types of unstructured data, you could create the object table over only PDF objects by specifying gs://bucket_name/*.pdf
. For more information, see Wildcard support for Cloud Storage URIs.
You can specify multiple buckets for the uris
option by providing multiple paths, for example gs://mybucket1/*,gs://mybucket2/folder5/*
.
For more information about using Cloud Storage URIs in BigQuery, see Cloud Storage resource path.
STALENESS_INTERVAL
: specifies whether cached metadata is used by operations against the object table, and how fresh the cached metadata must be in order for the operation to use it. For more information on metadata caching considerations, see Metadata caching for performance.
To disable metadata caching, specify 0. This is the default.
To enable metadata caching, specify an interval value between 30 minutes and 7 days, using the Y-M D H:M:S
format described in the INTERVAL
data type documentation. For example, specify 0-0 0 4:0:0
for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation retrieves metadata from Cloud Storage instead.
CACHE_MODE
: specifies whether the metadata cache is refreshed automatically or manually. For more information on metadata caching considerations, see Metadata caching for performance.
Set to AUTOMATIC
for the metadata cache to be refreshed at a system-defined interval, usually somewhere between 30 and 60 minutes.
Set to MANUAL
if you want to refresh the metadata cache on a schedule you determine. In this case, you can call the BQ.REFRESH_EXTERNAL_METADATA_CACHE
system procedure to refresh the cache.
You must set CACHE_MODE
if STALENESS_INTERVAL
is set to a value greater than 0.
Examples
The following example creates an object table with a metadata cache staleness interval of 1 day:
bq mk --table \ --external_table_definition=gs://mybucket/*@us.my-connection \ --object_metadata=SIMPLE \ --max_staleness=0-0 1 0:0:0 \ --metadata_cache_mode=AUTOMATIC \ my_dataset.object_table
The following example creates an object table over the objects in three Cloud Storage buckets:
bq mk --table \ --external_table_definition=gs://bucket1/*,gs://bucket2/folder1/*,gs://bucket3/*@us.my-connection \ --object_metadata=SIMPLE \ my_dataset.object_table
The following example creates an object table over just the PDF objects in a Cloud Storage bucket:
bq mk --table \ --external_table_definition=gs://bucket1/*.pdf@us.my-connection \ --object_metadata=SIMPLE \ my_dataset.object_tableAPI
Call the tables.insert
method. Include an ExternalDataConfiguration
object with the objectMetadata
field set to SIMPLE
in the Table
resource that you pass in.
The following example shows how to call this method by using curl
:
ACCESS_TOKEN=$(gcloud auth print-access-token) curl \
-H "Authorization: Bearer ${ACCESS_TOKEN}" \
-H "Content-Type: application/json" \
-X POST \
-d '{"tableReference": {"projectId": "my_project", "datasetId": "my_dataset", "tableId": "object_table_name"}, "externalDataConfiguration": {"objectMetadata": "SIMPLE", "sourceUris": ["gs://mybucket/*"]}}' \
https://www.googleapis.com/bigquery/v2/projects/my_project/datasets/my_dataset/tables
Terraform
This example creates an object table with metadata caching enabled with manual refresh.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
The key fields to specify for an object table are google_bigquery_table.external_data_configuration.object_metadata
, google_bigquery_table.external_data_configuration.metadata_cache_mode
, and google_bigquery_table.max_staleness
. For more information on each resource, see the Terraform BigQuery documentation.
To apply your Terraform configuration in a Google Cloud project, complete the steps in the following sections.
Prepare Cloud ShellSet the default Google Cloud project where you want to apply your Terraform configurations.
You only need to run this command once per project, and you can run it in any directory.
export GOOGLE_CLOUD_PROJECT=PROJECT_ID
Environment variables are overridden if you set explicit values in the Terraform configuration file.
Each Terraform configuration file must have its own directory (also called a root module).
.tf
extension—for example main.tf
. In this tutorial, the file is referred to as main.tf
.
mkdir DIRECTORY && cd DIRECTORY && touch main.tf
If you are following a tutorial, you can copy the sample code in each section or step.
Copy the sample code into the newly created main.tf
.
Optionally, copy the code from GitHub. This is recommended when the Terraform snippet is part of an end-to-end solution.
terraform init
Optionally, to use the latest Google provider version, include the -upgrade
option:
terraform init -upgrade
terraform plan
Make corrections to the configuration as necessary.
yes
at the prompt:
terraform apply
Wait until Terraform displays the "Apply complete!" message.
You can query an object table like any other BigQuery, for example:
SELECT * FROM mydataset.myobjecttable;
Querying an object table returns metadata for the underlying objects. For more information, see Object table schema.
What's nextExcept as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["This document guides you through creating object tables in BigQuery to access unstructured data in Cloud Storage."],["Creating an object table requires setting up a connection to Cloud Storage, granting the connection's service account the appropriate permissions, and using the `CREATE EXTERNAL TABLE` statement."],["Different user roles, such as Data lake administrator, Data warehouse administrator, and Data analyst, have different required IAM permissions for working with object tables."],["When creating an object table, you can configure metadata caching by setting a staleness interval and choosing between automatic or manual cache refresh modes to optimize performance."],["You can create object tables using the SQL `CREATE EXTERNAL TABLE` statement, the `bq mk` command, API `tables.insert` method, or Terraform, with examples provided for each."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4