Stay organized with collections Save and categorize content based on your preferences.
Loading CSV data from Cloud StorageWhen you load CSV data from Cloud Storage, you can load the data into a new table or partition, or you can append to or overwrite an existing table or partition. When your data is loaded into BigQuery, it is converted into columnar format for Capacitor (BigQuery's storage format).
When you load data from Cloud Storage into a BigQuery table, the dataset that contains the table must be in the same regional or multi- regional location as the Cloud Storage bucket.
For information about loading CSV data from a local file, see Loading data into BigQuery from a local data source.
Try it for yourselfIf you're new to Google Cloud, create an account to evaluate how BigQuery performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try BigQuery free LimitationsYou are subject to the following limitations when you load data into BigQuery from a Cloud Storage bucket:
When you load CSV files into BigQuery, note the following:
DATE
columns must use the dash (-
) separator and the date must be in the following format: YYYY-MM-DD
(year-month-day).TIMESTAMP
columns must use a dash (-
) or slash (/
) separator for the date portion of the timestamp, and the date must be in one of the following formats: YYYY-MM-DD
(year-month-day) or YYYY/MM/DD
(year/month/day). The hh:mm:ss
(hour-minute-second) portion of the timestamp must use a colon (:
) separator.Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document, and create a dataset to store your data.
Required permissionsTo load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.
Permissions to load data into BigQueryTo load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create
Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:
roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin
(includes the bigquery.jobs.create
permission)bigquery.user
(includes the bigquery.jobs.create
permission)bigquery.jobUser
(includes the bigquery.jobs.create
permission)Additionally, if you have the bigquery.datasets.create
permission, you can create and update tables using a load job in the datasets that you create.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Permissions to load data from Cloud StorageTo get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you the Storage Admin (roles/storage.admin
) IAM role on the bucket. For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand the Required permissions section:
Required permissionsThe following permissions are required to load data from a Cloud Storage bucket:
storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URI wildcard)
You might also be able to get these permissions with custom roles or other predefined roles.
Create a datasetCreate a BigQuery dataset to store your data.
CSV compressionYou can use the gzip
utility to compress CSV files. Note that gzip
performs full file compression, unlike the file content compression performed by compression codecs for other file formats, such as Avro. Using gzip
to compress your CSV files might have a performance impact; for more information about the trade-offs, see Loading compressed and uncompressed data.
To load CSV data from Cloud Storage into a new BigQuery table, select one of the following options:
ConsoleTo follow step-by-step guidance for this task directly in the Cloud Shell Editor, click Guide me:
In the Google Cloud console, go to the BigQuery page.
bq show --format=prettyjson dataset.table
0
or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in an invalid
message and fail. This option applies only to CSV and JSON files.MM/DD/YYYY
). If this value is present, this format is the only compatible DATE format. Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible DATETIME format. Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with the default formats. (Preview).HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIME format. Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIMESTAMP format. Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with the default formats. (Preview).Default
: Default behavior is chosen based on how the schema is provided. If autodetect is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position. This is done to keep the behavior backward-compatible.Position
: Matches columns by position, assuming that the columns are ordered the same way as the schema.Name
: Matches by name by reading the header row as the column names and reordering columns to match the field names in the schema. Column names are read from the last skipped row based on Header rows to skip.0
.false
.false
.After the table is created, you can update the table's expiration, description, and labels, but you cannot add a partition expiration after a table is created using the Google Cloud console. For more information, see Managing tables.
SQLUse the LOAD DATA
DDL statement. The following example loads a CSV file into the new table mytable
:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
LOAD DATA OVERWRITE mydataset.mytable (x INT64,y STRING) FROM FILES ( format = 'CSV', uris = ['gs://bucket/path/file.csv']);
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqUse the bq load
command, specify CSV
using the --source_format
flag, and include a Cloud Storage URI. You can include a single URI, a comma-separated list of URIs, or a URI containing a wildcard. Supply the schema inline, in a schema definition file, or use schema auto-detect. If you don't specify a schema, and --autodetect
is false
, and the destination table exists, then the schema of the destination table is used.
(Optional) Supply the --location
flag and set the value to your location.
Other optional flags include:
--allow_jagged_rows
: When specified, accept rows in CSV files that are missing trailing optional columns. The missing values are treated as nulls. If unchecked, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false
.--allow_quoted_newlines
: When specified, allows quoted data sections that contain newline characters in a CSV file. The default value is false
.--field_delimiter
: The character that indicates the boundary between columns in the data. Both \t
and tab
are allowed for tab delimiters. The default value is ,
.--null_marker
: An optional custom string that represents a NULL value in CSV data.--null_markers
: (Preview) An optional comma-separated list of custom strings that represent NULL values in CSV data. This option cannot be used with --null_marker
flag.--source_column_match
: (Preview) Specifies the strategy used to match loaded columns to the schema. You can specify POSITION
to match loaded columns by position, assuming that the columns are ordered the same way as the schema. You can also specify NAME
to match by name by reading the header row as the column names and reordering columns to match the field names in the schema. If this value is unspecified, then the default is based on how the schema is provided. If --autodetect
is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position.--skip_leading_rows
: Specifies the number of header rows to skip at the top of the CSV file. The default value is 0
.--quote
: The quote character to use to enclose records. The default value is "
. To indicate no quote character, use an empty string.--max_bad_records
: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0
. At most, five errors of any type are returned regardless of the --max_bad_records
value.--ignore_unknown_values
: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.--time_zone
: (Preview) An optional default time zone that will apply when parsing timestamp values that have no specific time zone in CSV or JSON data.--date_format
: (Preview) An optional custom string that defines how the DATE values are formatted in CSV or JSON data.--datetime_format
: (Preview) An optional custom string that defines how the DATETIME values are formatted in CSV or JSON data.--time_format
: (Preview) An optional custom string that defines how the TIME values are formatted in CSV or JSON data.--timestamp_format
: (Preview) An optional custom string that defines how the TIMESTAMP values are formatted in CSV or JSON data.--autodetect
: When specified, enable schema auto-detection for CSV and JSON data.--time_partitioning_type
: Enables time-based partitioning on a table and sets the partition type. Possible values are HOUR
, DAY
, MONTH
, and YEAR
. This flag is optional when you create a table partitioned on a DATE
, DATETIME
, or TIMESTAMP
column. The default partition type for time-based partitioning is DAY
. You cannot change the partitioning specification on an existing table.--time_partitioning_expiration
: An integer that specifies (in seconds) when a time-based partition should be deleted. The expiration time evaluates to the partition's UTC date plus the integer value.--time_partitioning_field
: The DATE
or TIMESTAMP
column used to create a partitioned table. If time-based partitioning is enabled without this value, an ingestion-time partitioned table is created.--require_partition_filter
: When enabled, this option requires users to include a WHERE
clause that specifies the partitions to query. Requiring a partition filter may reduce cost and improve performance. For more information, see Querying partitioned tables.--clustering_fields
: A comma-separated list of up to four column names used to create a clustered table.--destination_kms_key
: The Cloud KMS key for encryption of the table data.--column_name_character_map
: Defines the scope and handling of characters in column names, with the option of enabling flexible column names. Requires the --autodetect
option for CSV files. For more information, see load_option_list
.
For more information on the bq load
command, see:
For more information on partitioned tables, see:
For more information on clustered tables, see:
For more information on table encryption, see:
To load CSV data into BigQuery, enter the following command:
bq --location=location load \ --source_format=format \ dataset.table \ path_to_source \ schema
Where:
--location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1
. You can set a default value for the location using the .bigqueryrc file.CSV
.--autodetect
flag instead of supplying a schema definition.Examples:
The following command loads data from gs://mybucket/mydata.csv
into a table named mytable
in mydataset
. The schema is defined in a local schema file named myschema.json
.
bq load \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into a table named mytable
in mydataset
. The schema is defined in a local schema file named myschema.json
. The CSV file includes two header rows. If --skip_leading_rows
is unspecified, the default behavior is to assume the file does not contain headers.
bq load \
--source_format=CSV \
--skip_leading_rows=2
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into an ingestion-time partitioned table named mytable
in mydataset
. The schema is defined in a local schema file named myschema.json
.
bq load \
--source_format=CSV \
--time_partitioning_type=DAY \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into a new partitioned table named mytable
in mydataset
. The table is partitioned on the mytimestamp
column. The schema is defined in a local schema file named myschema.json
.
bq load \
--source_format=CSV \
--time_partitioning_field mytimestamp \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
The following command loads data from gs://mybucket/mydata.csv
into a table named mytable
in mydataset
. The schema is auto detected.
bq load \
--autodetect \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv
The following command loads data from gs://mybucket/mydata.csv
into a table named mytable
in mydataset
. The schema is defined inline in the format field:data_type,field:data_type
.
bq load \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv \
qtr:STRING,sales:FLOAT,year:STRING
Note: When you specify the schema using the bq command-line tool, you cannot include a RECORD
(STRUCT
) type, you cannot include a field description, and you cannot specify the field mode. All field modes default to NULLABLE
. To include field descriptions, modes, and RECORD
types, supply a JSON schema file instead.
The following command loads data from multiple files in gs://mybucket/
into a table named mytable
in mydataset
. The Cloud Storage URI uses a wildcard. The schema is auto detected.
bq load \
--autodetect \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata*.csv
The following command loads data from multiple files in gs://mybucket/
into a table named mytable
in mydataset
. The command includes a comma- separated list of Cloud Storage URIs with wildcards. The schema is defined in a local schema file named myschema.json
.
bq load \
--source_format=CSV \
mydataset.mytable \
"gs://mybucket/00/*.csv","gs://mybucket/01/*.csv" \
./myschema.json
API
Create a load
job that points to the source data in Cloud Storage.
(Optional) Specify your location in the location
property in the jobReference
section of the job resource.
The source URIs
property must be fully-qualified, in the format gs://bucket/object
. Each URI can contain one '*' wildcard character.
Specify the CSV data format by setting the sourceFormat
property to CSV
.
To check the job status, call jobs.get(job_id*)
, where job_id is the ID of the job returned by the initial request.
status.state = DONE
, the job completed successfully.status.errorResult
property is present, the request failed, and that object will include information describing what went wrong. When a request fails, no table is created and no data is loaded.status.errorResult
is absent, the job finished successfully, although there might have been some nonfatal errors, such as problems importing a few rows. Nonfatal errors are listed in the returned job object's status.errors
property.API notes:
Load jobs are atomic and consistent; if a load job fails, none of the data is available, and if a load job succeeds, all of the data is available.
As a best practice, generate a unique ID and pass it as jobReference.jobId
when calling jobs.insert
to create a load job. This approach is more robust to network failure because the client can poll or retry on the known job ID.
Calling jobs.insert
on a given job ID is idempotent. You can retry as many times as you like on the same job ID, and at most one of those operations will succeed.
Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
GoBefore trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.jsBefore trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PHPBefore trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonBefore trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Use the Client.load_table_from_uri() method to load data from a CSV file in Cloud Storage. Supply an explicit schema definition by setting the LoadJobConfig.schema property to a list of SchemaField objects.
RubyBefore trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Loading CSV data into a table that uses column-based time partitioningTo load CSV data from Cloud Storage into a BigQuery table that uses column-based time partitioning:
GoBefore trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.jsBefore trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonBefore trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Appending to or overwriting a table with CSV dataYou can load additional data into a table either from source files or by appending query results.
In the Google Cloud console, use the Write preference option to specify what action to take when you load data from a source file or from a query result.
You have the following options when you load additional data into a table:
Console option bq tool flag BigQuery API property Description Write if empty Not supportedWRITE_EMPTY
Writes the data only if the table is empty. Append to table --noreplace
or --replace=false
; if --[no]replace
is unspecified, the default is append WRITE_APPEND
(Default) Appends the data to the end of the table. Overwrite table --replace
or --replace=true
WRITE_TRUNCATE
Erases all existing data in a table before writing the new data. This action also deletes the table schema, row level security, and removes any Cloud KMS key.
If you load data into an existing table, the load job can append the data or overwrite the table.
Note: This page does not cover appending or overwriting partitioned tables. For information on appending and overwriting partitioned tables, see: Appending to and overwriting partitioned table data. ConsoleIn the Google Cloud console, go to the BigQuery page.
bq show --format=prettyjson dataset.table
0
or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in an invalid
message and fail. This option applies only to CSV and JSON files.MM/DD/YYYY
). If this value is present, this format is the only compatible DATE format. Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible DATETIME format. Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with the default formats. (Preview).HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIME format. Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIMESTAMP format. Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with the default formats. (Preview).Default
: Default behavior is chosen based on how the schema is provided. If autodetect is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position. This is done to keep the behavior backward-compatible.Position
: Matches columns by position, assuming that the columns are ordered the same way as the schema.Name
: Matches by name by reading the header row as the column names and reordering columns to match the field names in the schema. Column names are read from the last skipped row based on Header rows to skip.0
.false
.false
.Use the LOAD DATA
DDL statement. The following example appends a CSV file to the table mytable
:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
LOAD DATA INTO mydataset.mytable FROM FILES ( format = 'CSV', uris = ['gs://bucket/path/file.csv']);
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqUse the bq load
command, specify CSV
using the --source_format
flag, and include a Cloud Storage URI. You can include a single URI, a comma-separated list of URIs, or a URI containing a wildcard.
Supply the schema inline, in a schema definition file, or use schema auto-detect. If you don't specify a schema, and --autodetect
is false
, and the destination table exists, then the schema of the destination table is used.
Specify the --replace
flag to overwrite the table. Use the --noreplace
flag to append data to the table. If no flag is specified, the default is to append data.
It is possible to modify the table's schema when you append or overwrite it. For more information on supported schema changes during a load operation, see Modifying table schemas.
(Optional) Supply the --location
flag and set the value to your location.
Other optional flags include:
--allow_jagged_rows
: When specified, accept rows in CSV files that are missing trailing optional columns. The missing values are treated as nulls. If unchecked, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false
.--allow_quoted_newlines
: When specified, allows quoted data sections that contain newline characters in a CSV file. The default value is false
.--field_delimiter
: The character that indicates the boundary between columns in the data. Both \t
and tab
are allowed for tab delimiters. The default value is ,
.--null_marker
: An optional custom string that represents a NULL value in CSV data.--null_markers
: (Preview) An optional comma-separated list of custom strings that represent NULL values in CSV data. This option cannot be used with --null_marker
flag.--source_column_match
: (Preview) Specifies the strategy used to match loaded columns to the schema. You can specify POSITION
to match loaded columns by position, assuming that the columns are ordered the same way as the schema. You can also specify NAME
to match by name by reading the header row as the column names and reordering columns to match the field names in the schema. If this value is unspecified, then the default is based on how the schema is provided. If --autodetect
is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position.--skip_leading_rows
: Specifies the number of header rows to skip at the top of the CSV file. The default value is 0
.--quote
: The quote character to use to enclose records. The default value is "
. To indicate no quote character, use an empty string.--max_bad_records
: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0
. At most, five errors of any type are returned regardless of the --max_bad_records
value.--ignore_unknown_values
: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.--time_zone
: (Preview) An optional default time zone that will apply when parsing timestamp values that have no specific time zone in CSV or JSON data.--date_format
: (Preview) An optional custom string that defines how the DATE values are formatted in CSV or JSON data.--datetime_format
: (Preview) An optional custom string that defines how the DATETIME values are formatted in CSV or JSON data.--time_format
: (Preview) An optional custom string that defines how the TIME values are formatted in CSV or JSON data.--timestamp_format
: (Preview) An optional custom string that defines how the TIMESTAMP values are formatted in CSV or JSON data.--autodetect
: When specified, enable schema auto-detection for CSV and JSON data.--destination_kms_key
: The Cloud KMS key for encryption of the table data.bq --location=location load \ --[no]replace \ --source_format=format \ dataset.table \ path_to_source \ schema
where:
--location
flag is optional. You can set a default value for the location using the .bigqueryrc file.CSV
.--autodetect
flag instead of supplying a schema definition.Examples:
The following command loads data from gs://mybucket/mydata.csv
and overwrites a table named mytable
in mydataset
. The schema is defined using schema auto-detection.
bq load \
--autodetect \
--replace \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv
The following command loads data from gs://mybucket/mydata.csv
and appends data to a table named mytable
in mydataset
. The schema is defined using a JSON schema file — myschema.json
.
bq load \
--noreplace \
--source_format=CSV \
mydataset.mytable \
gs://mybucket/mydata.csv \
./myschema.json
API
Create a load
job that points to the source data in Cloud Storage.
(Optional) Specify your location in the location
property in the jobReference
section of the job resource.
The source URIs
property must be fully-qualified, in the format gs://bucket/object
. You can include multiple URIs as a comma-separated list. Note that wildcards are also supported.
Specify the data format by setting the configuration.load.sourceFormat
property to CSV
.
Specify the write preference by setting the configuration.load.writeDisposition
property to WRITE_TRUNCATE
or WRITE_APPEND
.
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Node.jsBefore trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To replace the rows in an existing table, set the writeDisposition
value in the metadata
parameter to 'WRITE_TRUNCATE'
.
Before trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonBefore trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
To replace the rows in an existing table, set the LoadJobConfig.write_disposition property to the SourceFormat constant WRITE_TRUNCATE
.
BigQuery supports loading hive-partitioned CSV data stored on Cloud Storage and will populate the hive partitioning columns as columns in the destination BigQuery managed table. For more information, see Loading Externally Partitioned Data from Cloud Storage.
Details of loading CSV dataThis section describes how BigQuery handles various CSV formatting options.
EncodingBigQuery expects CSV data to be UTF-8 encoded. If you have CSV files with other supported encoding types, you should explicitly specify the encoding so that BigQuery can properly convert the data to UTF-8.
BigQuery supports the following encoding types for CSV files:
If you don't specify an encoding, or if you specify UTF-8 encoding when the CSV file is not UTF-8 encoded, BigQuery attempts to convert the data to UTF-8. Generally, if the CSV file is ISO-8859-1 encoded, your data will be loaded successfully, but it may not exactly match what you expect. If the CSV file is UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE encoded, the load might fail. To avoid unexpected failures, specify the correct encoding by using the --encoding
flag.
--allow_quoted_newlines
flag is set as true
, then the CSV file has a maximum size limit of 1GB. Note: By default, if the CSV file contains the ASCII 0
(NULL) character, you can't load the data into BigQuery. If you want to allow ASCII 0
and other ASCII control characters, then set --preserve_ascii_control_characters=true
to your load jobs.
If BigQuery can't convert a character other than the ASCII 0
character, BigQuery converts the character to the standard Unicode replacement character: �.
Delimiters in CSV files can be any single-byte character. If the source file uses ISO-8859-1 encoding, any character can be a delimiter. If the source file uses UTF-8 encoding, any character in the decimal range 1-127 (U+0001-U+007F) can be used without modification. You can insert an ISO-8859-1 character outside of this range as a delimiter, and BigQuery will interpret it correctly. However, if you use a multibyte character as a delimiter, some of the bytes will be interpreted incorrectly as part of the field value.
Generally, it's a best practice to use a standard delimiter, such as a tab, pipe, or comma. The default is a comma.
Data typesBoolean. BigQuery can parse any of the following pairs for Boolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all case insensitive). Schema
autodetectionautomatically detects any of these except 0 and 1.
Bytes. Columns with BYTES types must be encoded as Base64.
Date. Columns with DATE types must be in the format YYYY-MM-DD
.
Datetime. Columns with DATETIME types must be in the format YYYY-MM-DD HH:MM:SS[.SSSSSS]
.
Geography. Columns with GEOGRAPHY types must contain strings in one of the following formats:
If you use WKB, the value should be hex encoded.
The following list shows examples of valid data:
POINT(1 2)
{ "type": "Point", "coordinates": [1, 2] }
0101000000feffffffffffef3f0000000000000040
Before loading GEOGRAPHY data, also read Loading geospatial data.
Interval. Columns with INTERVAL
types must be in the format Y-M D H:M:S[.F]
, where:
You can indicate a negative value by prepending a dash (-).
The following list shows examples of valid data:
10-6 0 0:0:0
0-0 -5 0:0:0
0-0 0 0:0:1.25
To load INTERVAL data, you must use the bq load
command and use the --schema
flag to specify a schema. You can't upload INTERVAL data by using the console.
JSON. Quotes are escaped by using the two character sequence ""
. For more information, see an example of loading JSON data from a CSV file
Time. Columns with TIME types must be in the format HH:MM:SS[.SSSSSS]
.
Timestamp. BigQuery accepts various timestamp formats. The timestamp must include a date portion and a time portion.
The date portion can be formatted as YYYY-MM-DD
or YYYY/MM/DD
.
The timestamp portion must be formatted as HH:MM[:SS[.SSSSSS]]
(seconds and fractions of seconds are optional).
The date and time must be separated by a space or 'T'.
Optionally, the date and time can be followed by a UTC offset or the UTC zone designator (Z
). For more information, see Time zones.
For example, any of the following are valid timestamp values:
If you provide a schema, BigQuery also accepts Unix epoch time for timestamp values. However, schema autodetection doesn't detect this case, and treats the value as a numeric or string type instead.
Examples of Unix epoch timestamp values:
RANGE. Represented in CSV files in the format [LOWER_BOUND, UPPER_BOUND)
, where LOWER_BOUND
and UPPER_BOUND
are valid DATE
, DATETIME
, or TIMESTAMP
strings. NULL
and UNBOUNDED
represent unbounded start or end values.
The following are example of CSV values for RANGE<DATE>
:
"[2020-01-01, 2021-01-01)"
"[UNBOUNDED, 2021-01-01)"
"[2020-03-01, NULL)"
"[UNBOUNDED, UNBOUNDED)"
This section describes the behavior of schema auto-detection when loading CSV files.
CSV delimiterBigQuery detects the following delimiters:
BigQuery infers headers by comparing the first row of the file with other rows in the file. If the first line contains only strings, and the other lines contain other data types, BigQuery assumes that the first row is a header row. BigQuery assigns column names based on the field names in the header row. The names might be modified to meet the naming rules for columns in BigQuery. For example, spaces will be replaced with underscores.
Otherwise, BigQuery assumes the first row is a data row, and assigns generic column names such as string_field_1
. Note that after a table is created, the column names cannot be updated in the schema, although you can change the names manually after the table is created. Another option is to provide an explicit schema instead of using autodetect.
You might have a CSV file with a header row, where all of the data fields are strings. In that case, BigQuery will not automatically detect that the first row is a header. Use the --skip_leading_rows
option to skip the header row. Otherwise, the header will be imported as data. Also consider providing an explicit schema in this case, so that you can assign column names.
BigQuery detects quoted new line characters within a CSV field and does not interpret the quoted new line character as a row boundary.
Troubleshoot parsing errorsIf there's a problem parsing your CSV files, then the load job's errors
resource is populated with the error details.
Generally, these errors identify the start of the problematic line with a byte offset. For uncompressed files you can use gcloud storage
with the --recursive
argument to access the relevant line.
For example, you run the bq load
command and receive an error:
bq load --skip_leading_rows=1 \ --source_format=CSV \ mydataset.mytable \ gs://my-bucket/mytable.csv \ 'Number:INTEGER,Name:STRING,TookOffice:STRING,LeftOffice:STRING,Party:STRING'
The error in the output is similar to the following:
Waiting on bqjob_r5268069f5f49c9bf_0000018632e903d7_1 ... (0s) Current status: DONE BigQuery error in load operation: Error processing job 'myproject:bqjob_r5268069f5f49c9bf_0000018632e903d7_1': Error while reading data, error message: Error detected while parsing row starting at position: 1405. Error: Data between close quote character (") and field separator. File: gs://my-bucket/mytable.csv Failure details: - gs://my-bucket/mytable.csv: Error while reading data, error message: Error detected while parsing row starting at position: 1405. Error: Data between close quote character (") and field separator. File: gs://my-bucket/mytable.csv - Error while reading data, error message: CSV processing encountered too many errors, giving up. Rows: 22; errors: 1; max bad: 0; error percent: 0
Based on the preceding error, there's a format error in the file. To view the file's content, run the gcloud storage cat
command:
gcloud storage cat 1405-1505 gs://my-bucket/mytable.csv --recursive
The output is similar to the following:
16,Abraham Lincoln,"March 4, 1861","April 15, "1865,Republican 18,Ulysses S. Grant,"March 4, 1869", ...
Based on the output of the file, the problem is a misplaced quote in "April 15, "1865
.
Debugging parsing errors is more challenging for compressed CSV files, since the reported byte offset refers to the location in the uncompressed file. The following gcloud storage cat
command streams the file from Cloud Storage, decompresses the file, identifies the appropriate byte offset, and prints the line with the format error:
gcloud storage cat gs://my-bucket/mytable.csv.gz | gunzip - | tail -c +1406 | head -n 1
The output is similar to the following:
16,Abraham Lincoln,"March 4, 1861","April 15, "1865,RepublicanCSV options
To change how BigQuery parses CSV data, specify additional options in the Google Cloud console, the bq command-line tool, or the API.
For more information on the CSV format, see RFC 4180.
CSV option Console option bq tool flag BigQuery API property Description Field delimiter Field delimiter: Comma, Tab, Pipe, Custom-F
or --field_delimiter
fieldDelimiter
(Java, Python) (Optional) The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character. BigQuery converts the string to ISO-8859-1 encoding, and uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (`,`). Header rows Header rows to skip --skip_leading_rows
skipLeadingRows
(Java, Python) (Optional) An integer indicating the number of header rows in the source data. Source column match Source column match: Default, Position, Name --source_column_match
None (Preview) (Optional) This controls the strategy used to match loaded columns to the schema. Supported values include:
POSITION
: matches by position. This option assumes that the columns are ordered the same way as the schema.NAME
: matches by name. This option reads the header row as column names and reorders columns to match the field names in the schema. Column names are read from the last skipped row based on the skipLeadingRows
property.--max_bad_records
maxBadRecords
(Java, Python) (Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid. Newline characters Allow quoted newlines --allow_quoted_newlines
allowQuotedNewlines
(Java, Python) (Optional) Indicates whether to allow quoted data sections that contain newline characters in a CSV file. The default value is false. Custom null values None --null_marker
nullMarker
(Java, Python) (Optional) Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value. Trailing optional columns Allow jagged rows --allow_jagged_rows
allowJaggedRows
(Java, Python) (Optional) Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats. Unknown values Ignore unknown values --ignore_unknown_values
ignoreUnknownValues
(Java, Python) (Optional) Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. The sourceFormat
property determines what BigQuery treats as an extra value:
--quote
quote
(Java, Python) (Optional) The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ('"'). If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the allowQuotedNewlines
property to true
. To include the specific quote character within a quoted value, precede it with an additional matching quote character. For example, if you want to escape the default character ' " ', use ' "" '. Encoding None -E
or --encoding
encoding
(Java, Python) (Optional) The character encoding of the data. The supported values are UTF-8, ISO-8859-1, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. The default value is UTF-8. BigQuery decodes the data after the raw, binary data has been split using the values of the quote
and fieldDelimiter
properties. ASCII control character None --preserve_ascii_control_characters
None (Optional) If you want to allow ASCII 0 and other ASCII control characters, then set --preserve_ascii_control_characters
to true
to your load jobs. Null Markers Null Markers --null_markers
None (Preview) (Optional) A list of custom strings that represents a NULL value in CSV data. This option cannot be used with --null_marker
option. Time Zone Time Zone --time_zone
None (Preview) (Optional) Default time zone that will apply when parsing timestamp values that have no specific time zone. Check valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. Date Format Date Format --date_format
None (Preview) (Optional) Format elements that define how the DATE values are formatted in the input files (for example, MM/DD/YYYY
). If this value is present, this format is the only compatible DATE format. Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with the default formats. Datetime Format Datetime Format --datetime_format
None (Preview) (Optional) Format elements that define how the DATETIME values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible DATETIME format. Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with the default formats. Time Format Time Format --time_format
None (Preview) (Optional) Format elements that define how the TIME values are formatted in the input files (for example, HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIME format. Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with the default formats. Timestamp Format Timestamp Format --timestamp_format
None (Preview) (Optional) Format elements that define how the TIMESTAMP values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIMESTAMP format. Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with the default formats.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["Loading CSV data from Cloud Storage into BigQuery requires the dataset and storage bucket to be in the same regional or multi-regional location, with considerations for data format, compression, and schema definition."],["BigQuery offers several methods for loading CSV data, including the Google Cloud console, SQL's `LOAD DATA` statement, the `bq` command-line tool, and the BigQuery API, each with unique options for schema, error handling, and data transformation."],["Permissions for loading data include specific BigQuery table and Cloud Storage permissions, allowing users to create, update, and manage data in tables, as well as access data in Cloud Storage buckets."],["Data can be loaded into new or existing BigQuery tables with options to append, overwrite, or only write if the table is empty, with specific write preferences available through the console, SQL, `bq`, and API methods."],["Various formatting considerations are necessary for CSV data, including encoding, delimiters, date/time formats, handling null values, and escaping quotes, each manageable through specific configurations in the loading process."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4