Stay organized with collections Save and categorize content based on your preferences.
Export table data to Cloud StorageThis page describes how to export or extract data from BigQuery tables to Cloud Storage.
After you've loaded your data into BigQuery, you can export the data in several formats. BigQuery can export up to 1 GB of logical data size to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files. When you export your data to multiple files, the size of the files will vary.
You can also export the results of a query by using the EXPORT DATA
statement. You can use EXPORT DATA OPTIONS
to specify the format of the exported data.
Finally, you can use a service such as Dataflow to read data from BigQuery instead of exporting it from BigLake. For more information about using Dataflow to read from and write to BigQuery, see BigQuery I/O documentation.
Export limitationsWhen you export data from BigQuery, note the following:
Caution: If you are exporting data to a Cloud Storage bucket, we strongly recommend that you disable the Bucket Lock and Soft Delete retention policies on your bucket. When you export to a bucket with these retention policies, BigQuery attempts to rewrite files to the bucket, which can fail if the bucket's retention policy prevents a file from being overwritten, resulting in additional incurred charges. You can re-enable these policies after the export completes.EXPORT DATA
statement is not guaranteed.GZIP
when you export data using the Google Cloud console.<
, >
, and &
are converted by using the unicode notation \uNNNN
, where N
is a hexadecimal digit. For example, profit&loss
becomes profit\u0026loss
. This unicode conversion is done to avoid security vulnerabilities.EXPORT DATA
statement and specify an ORDER BY
clause in the query_statement
.gs://bucket/my//object//name
.Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document.
Required permissionsTo perform the tasks in this document, you need the following permissions.
Permissions to export data from a BigQuery tableTo export data from a BigQuery table, you need the bigquery.tables.export
IAM permission.
Each of the following predefined IAM roles includes the bigquery.tables.export
permission:
roles/bigquery.dataViewer
roles/bigquery.dataOwner
roles/bigquery.dataEditor
roles/bigquery.admin
To run an export job, you need the bigquery.jobs.create
IAM permission.
Each of the following predefined IAM roles includes the permissions that you need in order to run an export job:
roles/bigquery.user
roles/bigquery.jobUser
roles/bigquery.admin
To write the data to an existing Cloud Storage bucket, you need the following IAM permissions:
storage.objects.create
storage.objects.delete
Each of the following predefined IAM roles includes the permissions that you need in order to write the data to an existing Cloud Storage bucket:
roles/storage.objectAdmin
roles/storage.admin
For more information about IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Export formats and compression typesBigQuery supports the following data formats and compression types for exported data.
Data format Supported compression types Details CSV GZIPYou can control the CSV delimiter in your exported data by using the --field_delimiter
bq command-line tool flag or the configuration.extract.fieldDelimiter
extract job property.
Nested and repeated data is not supported.
JSON GZIP Nested and repeated data are supported. Avro DEFLATE, SNAPPYGZIP is not supported for Avro exports.
Nested and repeated data are supported. See Avro export details.
Parquet SNAPPY, GZIP, ZSTDNested and repeated data are supported. See Parquet export details.
Export dataThe following sections show you how to export your table data, table metadata, and query results to Cloud Storage.
Export table dataYou can export table data by:
bq extract
command in the bq command-line toolextract
job using the API or client librariesSelect one of the following:
ConsoleOpen the BigQuery page in the Google Cloud console.
In the Explorer panel, expand your project and dataset, then select the table.
In the details panel, click Export and select Export to Cloud Storage.
In the Export to Google Cloud Storage dialog:
None
for no compression.Click Save to export the table.
To check on the progress of the job, expand the Job history pane and look for the job with the EXTRACT
type.
To export views to Cloud Storage, use EXPORT DATA OPTIONS
statement.
Use the EXPORT DATA
statement. The following example exports selected fields from a table named mydataset.table1
:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
EXPORT DATA OPTIONS ( uri = 'gs://bucket/folder/*.csv', format = 'CSV', overwrite = true, header = true, field_delimiter = ';') AS ( SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 );
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqUse the bq extract
command with the --destination_format
flag.
(Optional) Supply the --location
flag and set the value to your location.
Other optional flags include:
--compression
: The compression type to use for exported files.--field_delimiter
: The character that indicates the boundary between columns in the output file for CSV exports. Both \t
and tab
are allowed for tab delimiters.--print_header
: When specified, print header rows for formats that have headers such as CSV.bq extract --location=location \ --destination_format format \ --compression compression_type \ --field_delimiter delimiter \ --print_header=boolean \ project_id:dataset.table \ gs://bucket/filename.ext
Where:
--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1
. You can set a default value for the location using the .bigqueryrc file.CSV
, NEWLINE_DELIMITED_JSON
, AVRO
, or PARQUET
.\t
and tab
are accepted names for tab.true
or false
. When set to true
, header rows are printed to the exported data if the data format supports headers. The default value is true
.$
character.Examples:
For example, the following command exports mydataset.mytable
into a gzip compressed file named myfile.csv
. myfile.csv
is stored in a Cloud Storage bucket named example-bucket
.
bq extract \ --compression GZIP \ 'mydataset.mytable' \ gs://example-bucket/myfile.csv
The default destination format is CSV. To export into JSON or Avro, use the destination_format
flag and set it to either NEWLINE_DELIMITED_JSON
or AVRO
. For example:
bq extract \ --destination_format NEWLINE_DELIMITED_JSON \ 'mydataset.mytable' \ gs://example-bucket/myfile.json
The following command exports mydataset.mytable
into an Avro file that is compressed using Snappy. The file is named myfile.avro
. myfile.avro
is exported to a Cloud Storage bucket named example-bucket
.
bq extract \ --destination_format AVRO \ --compression SNAPPY \ 'mydataset.mytable' \ gs://example-bucket/myfile.avro
The following command exports a single partition of mydataset.my_partitioned_table
into a CSV file in Cloud Storage:
bq extract \ --destination_format CSV \ 'mydataset.my_partitioned_table$0' \ gs://example-bucket/single_partition.csvAPI
To export data, create an extract
job and populate the job configuration.
(Optional) Specify your location in the location
property in the jobReference
section of the job resource.
Create an extract job that points to the BigQuery source data and the Cloud Storage destination.
Specify the source table by using the sourceTable
configuration object that contains the project ID, dataset ID, and table ID.
The destination URI(s)
property must be fully-qualified, in the format gs://bucket/filename.ext
. Each URI can contain one '*' wildcard character and it must come after the bucket name.
Specify the data format by setting the configuration.extract.destinationFormat
property. For example, to export a JSON file, set this property to the value NEWLINE_DELIMITED_JSON
.
To check the job status, call jobs.get(job_id) with the ID of the job returned by the initial request.
status.state = DONE
, the job completed successfully.status.errorResult
property is present, the request failed, and that object will include information describing what went wrong.status.errorResult
is absent, the job finished successfully, although there might have been some nonfatal errors. Nonfatal errors are listed in the returned job object's status.errors
property.API notes:
As a best practice, generate a unique ID and pass it as jobReference.jobId
when calling jobs.insert
to create a job. This approach is more robust to network failure because the client can poll or retry on the known job ID.
Calling jobs.insert
on a given job ID is idempotent; in other words, you can retry as many times as you like on the same job ID, and at most one of those operations will succeed.
Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
GoBefore trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.jsBefore trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PHPBefore trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonBefore trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
RubyBefore trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Export table metadataTo export table metadata from Iceberg tables, use the following SQL statement:
EXPORT TABLE METADATA FROM `[[PROJECT_NAME.]DATASET_NAME.]TABLE_NAME`;
Replace the following:
The exported metadata is located in the STORAGE_URI/metadata
folder, where STORAGE_URI is the table's storage location set in the options.
You can export your query results to Cloud Storage in the Google Cloud console with the following steps:
Open the BigQuery page in the Google Cloud console.
Click add_box SQL query.
Enter a valid GoogleSQL query in the Query editor text area.
Click Run.
When the results are returned, click Save results > Cloud Storage.
In the Export to Google Cloud Storage dialog:
None
for no compression.Click Save to export the query results.
To check on the progress of the job, expand the Job history pane and look for the job with the EXTRACT
type.
BigQuery expresses Avro formatted data in the following ways:
REQUIRED
fields are represented as the corresponding Avro types. For example, a BigQuery INTEGER
type maps to an Avro LONG
type.NULLABLE
fields are represented as an Avro union of the corresponding type and "null".REPEATED
fields are represented as Avro arrays.TIMESTAMP
data types are represented as timestamp-micros
logical type (it annotates an Avro LONG
type) by default in both Extract jobs and Export Data SQL. (Caution: you can add use_avro_logical_types=False
to Export Data Options
to disable the logical type so it uses string
type instead on timestamp column, but in Extract Jobs, it always uses the Avro logical type.)DATE
data types are represented as date
logical type (it annotates an Avro INT
types) by default in Export Data SQL, but are represented as string
type by default in Extract jobs. (Note: you can add use_avro_logical_types=False
to Export Data Options
to disable the logical type, or use the flag --use_avro_logical_types=True
to enable the logical type in Extract jobs.)TIME
data types are represented as timestamp-micro
logical type (it annotates an Avro LONG
types) by default in Export Data SQL, but are represented as string
type by default in Extract jobs. (Note: you can add use_avro_logical_types=False
to Export Data Options
to disable the logical type, or use the flag --use_avro_logical_types=True
to enable the logical type in Extract jobs.)DATETIME
data types are represented as Avro STRING
types (a string type with custom named logical type datetime
) by default in Export Data SQL, but are represented as string
type by default in Extract jobs. (Note: you can add use_avro_logical_types=False
to Export Data Options
to disable the logical type, or use the flag --use_avro_logical_types=True
to enable logical type in Extract jobs.)Parameterized NUMERIC(P[, S])
and BIGNUMERIC(P[, S])
data types transfer their precision and scale type parameters to the Avro decimal logical type.
DATETIME
type to Avro, you cannot load the Avro file directly back into the same table schema, because the converted STRING
won't match the schema. As a workaround, load the file into a staging table. Then use a SQL query to cast the field to a DATETIME
type and save the result to a new table. For more information, see Changing a column's data type .use_avro_logical_types
and Extract Job flag --use_avro_logical_types
are applied to all the logical types at the same time once specified.The Avro format can't be used in combination with GZIP compression. To compress Avro data, use the bq command-line tool or the API and specify one of the supported compression types for Avro data: DEFLATE
or SNAPPY
.
BigQuery converts GoogleSQL data types to the following Parquet data types:
BigQuery data type Parquet primitive type Parquet logical type IntegerINT64
NONE
Numeric FIXED_LEN_BYTE_ARRAY
DECIMAL (precision = 38, scale = 9)
Numeric(P[, S]) FIXED_LEN_BYTE_ARRAY
DECIMAL (precision = P, scale = S)
BigNumeric FIXED_LEN_BYTE_ARRAY
DECIMAL (precision = 76, scale = 38)
BigNumeric(P[, S]) FIXED_LEN_BYTE_ARRAY
DECIMAL (precision = P, scale = S)
Floating point FLOAT
NONE
Boolean BOOLEAN
NONE
String BYTE_ARRAY
STRING
(UTF8)
Bytes BYTE_ARRAY
NONE
Date INT32
DATE
Datetime INT64
TIMESTAMP (isAdjustedToUTC = false, unit = MICROS)
Time INT64
TIME (isAdjustedToUTC = true, unit = MICROS)
Timestamp INT64
TIMESTAMP (isAdjustedToUTC = false, unit = MICROS)
Geography BYTE_ARRAY
GEOGRAPHY (edges = spherical)
The Parquet schema represents nested data as a group and repeated records as repeated groups. For more information about using nested and repeated data in BigQuery, see Specifying nested and repeated columns.
Caution: If you export aDATETIME
type to Parquet, you cannot load the Parquet file directly back into the same table schema, because the converted value won't match the schema.
You can use the following workarounds for DATETIME
types:
DATETIME
and save the result to a new table. For more information, see Changing a column's data type.--schema
flag in the load job. Define the datetime column as col:DATETIME
.The GEOGRAPHY
logical type is represented with GeoParquet metadata added to the exported file(s).
The destinationUris
property indicates the one or more locations and filenames where BigQuery should export your files.
BigQuery supports a single wildcard operator (*) in each URI. The wildcard can appear anywhere in the filename component. Using the wildcard operator instructs BigQuery to create multiple sharded files based on the supplied pattern. The wildcard operator is replaced with a number (starting at 0), left-padded to 12 digits. For example, a URI with a wildcard at the end of the filename would create files with000000000000
appended to the first file, and 000000000001
appended to the second file, continuing in that pattern.
The following table describes several possible options for the destinationUris
property:
destinationUris
options Single URI
Use a single URI if you are exporting table data that is 1 GB or less. This option is the most common use case, as exported data is generally less than the 1 GB maximum value. This option is not supported for the EXPORT DATA
statement; you must use a single wildcard URI.
Property definition:
['gs://my-bucket/file-name.json']
Creates:
gs://my-bucket/file-name.jsonSingle wildcard URI
A single wildcard can be used only in the filename component of the URI.
Use a single wildcard URI if you think your exported data will be larger than the 1 GB maximum value. BigQuery shards your data into multiple files based on the provided pattern. The size of the exported files will vary.
Property definition:
['gs://my-bucket/file-name-*.json']
Creates:
gs://my-bucket/file-name-000000000000.json gs://my-bucket/file-name-000000000001.json gs://my-bucket/file-name-000000000002.json ...
['gs://my-bucket/*']
Creates:
gs://my-bucket/000000000000 gs://my-bucket/000000000001 gs://my-bucket/000000000002 ...Limit the exported file size
When you export more than 1 GB of data in a single export, you must use a wildcard to export the data into multiple files and the size of the files varies. If you need to limit the maximum size of each exported file, one option is to randomly partition your data and then export each partition to a file:
Create a new table that is partitioned and clustered by a new randomly generated column called export_id
. The following example shows how to create a new processed_table
from an existing table called source_table
which requires n
partitions to achieve the chosen file size:
CREATE TABLE my_dataset.processed_table PARTITION BY RANGE_BUCKET(export_id, GENERATE_ARRAY(0, n, 1)) CLUSTER BY export_id AS ( SELECT *, CAST(FLOOR(n*RAND()) AS INT64) AS export_id FROM my_dataset.source_table );
For each integer i
between 0 and n-1
, run an EXPORT DATA
statement on the following query:
SELECT * EXCEPT(export_id) FROM my_dataset.processed_table WHERE export_id = i;
This example shows how you can export data to Cloud Storage.
Suppose you are streaming data to Cloud Storage from endpoint logs continuously. A daily snapshot is to be exported to Cloud Storage for backup and archival purposes. The best choice is an extract job subject to certain quotas and limitations.
Submit an extract job with the API or client libraries, passing in a unique ID as jobReference.jobId
. Extract Jobs are asynchronous. Check the job status using the unique job ID used to create the job. The job completed successfully if status.status
is DONE
. If status.errorResult
is present, the job failed and needs to be retried.
Batch data processing
Suppose a nightly batch job is used to load data by a fixed deadline. After this load job completes, a table with statistics is materialized from a query as described in the preceding section. Data from this table is retrieved and compiled into a PDF report and sent to a regulator.
Since the amount of data that needs to be read is small, use the tabledata.list
API to retrieve all rows of the table in JSON dictionary format. If there is more than one page of data, the results have the pageToken
property set. To retrieve the next page of results, make another tabledata.list
call and include the token value as the pageToken
parameter. If the API call fails with a 5xx error, retry with exponential backoff. Most 4xx errors cannot be retried. For better decoupling of BigQuery export and report generation, results should be persisted to disk.
For information on export job quotas, see Export jobs on the Quotas and limits page.
Usage for export jobs are available in the INFORMATION_SCHEMA
. The job entry in the JOBS_BY_*
system tables for the export job contains a total_bytes_processed
value that can be used to monitor the aggregate usage to ensure that it stays under 50 TiB per-day. To learn how to query the INFORMATION_SCHEMA.JOBS
view to get the total_bytes_processed
value, see INFORMATION_SCHEMA.JOBS
schema
You can view your current usage of query, load, extract, or copy jobs by running an INFORMATION_SCHEMA
query to view metadata about the jobs ran over a specified time period. You can compare your current usage against the quota limit to determine your quota usage for a particular type of job. The following example query uses the INFORMATION_SCHEMA.JOBS
view to list the number of query, load, extract, and copy jobs by project:
SELECT sum(case when job_type="QUERY" then 1 else 0 end) as QRY_CNT, sum(case when job_type="LOAD" then 1 else 0 end) as LOAD_CNT, sum(case when job_type="EXTRACT" then 1 else 0 end) as EXT_CNT, sum(case when job_type="COPY" then 1 else 0 end) as CPY_CNT FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE date(creation_time)= CURRENT_DATE()
You can set up a Cloud Monitoring alerting policy that monitors the number of bytes exported.
In the Google Cloud console, go to the notifications Alerting page:
If you use the search bar to find this page, then select the result whose subheading is Monitoring.
On the Alerting page, click Create Policy.
Under Policy configuration mode, select Code editor (MQL or PromQL).
In the PromQL query editor, enter the following query:
(
sum by (project_id, quota_metric, location) (increase({"serviceruntime.googleapis.com/quota/rate/net_usage", monitored_resource="consumer_quota", service="bigquery.googleapis.com"}[1m]))
/
max by (project_id, quota_metric, location) ({"serviceruntime.googleapis.com/quota/limit", monitored_resource="consumer_quota", service="bigquery.googleapis.com", limit_name="ExtractBytesPerDay"})
) > 0.01
If Auto-run isn't enabled, then click Run Query.
Configure the rest of your alert and click Create policy.
For detailed steps about creating PromQL-based alerting policies, see Create PromQL-based alerting policies (Console).
TroubleshootingTo diagnose issues with extract jobs, you can use the Logs Explorer to review the logs for a specific extract job and identify possible errors. The following Logs Explorer filter returns information about your extract jobs:
resource.type="bigquery_resource"
protoPayload.methodName="jobservice.insert"
(protoPayload.serviceData.jobInsertRequest.resource.jobConfiguration.query.query=~"EXPORT" OR
protoPayload.serviceData.jobCompletedEvent.eventName="extract_job_completed" OR
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.query=~"EXPORT")
Pricing
For information on data export pricing, see the BigQuery pricing page.
Once the data is exported, you are charged for storing the data in Cloud Storage. For more information, see Cloud Storage pricing.
Table securityTo control access to tables in BigQuery, see Control access to resources with IAM.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4