Stay organized with collections Save and categorize content based on your preferences.
Loading JSON data from Cloud StorageYou can load newline-delimited JSON (ndJSON) data from Cloud Storage into a new table or partition, or append to or overwrite an existing table or partition. When your data is loaded into BigQuery, it is converted into columnar format for Capacitor (BigQuery's storage format).
When you load data from Cloud Storage into a BigQuery table, the dataset that contains the table must be in the same regional or multi- regional location as the Cloud Storage bucket.
The ndJSON format is the same format as the JSON Lines format.
LimitationsYou are subject to the following limitations when you load data into BigQuery from a Cloud Storage bucket:
When you load JSON files into BigQuery, note the following:
BigQuery supports the JSON
type even if schema information is not known at the time of ingestion. A field that is declared as JSON
type is loaded with the raw JSON values.
If you use the BigQuery API to load an integer outside the range of [-253+1, 253-1] (usually this means larger than 9,007,199,254,740,991), into an integer (INT64) column, pass it as a string to avoid data corruption. This issue is caused by a limitation on integer size in JSON or ECMAScript. For more information, see the Numbers section of RFC 7159.
DATE
columns must use the dash (-
) separator and the date must be in the following format: YYYY-MM-DD
(year-month-day).TIMESTAMP
columns must use a dash (-
) or slash (/
) separator for the date portion of the timestamp, and the date must be in one of the following formats: YYYY-MM-DD
(year-month-day) or YYYY/MM/DD
(year/month/day). The hh:mm:ss
(hour-minute-second) portion of the timestamp must use a colon (:
) separator.Your files must meet the JSON file size limits described in the load jobs limits.
Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document, and create a dataset to store your data.
Required permissionsTo load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.
Permissions to load data into BigQueryTo load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create
Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:
roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin
(includes the bigquery.jobs.create
permission)bigquery.user
(includes the bigquery.jobs.create
permission)bigquery.jobUser
(includes the bigquery.jobs.create
permission)Additionally, if you have the bigquery.datasets.create
permission, you can create and update tables using a load job in the datasets that you create.
For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions.
Permissions to load data from Cloud StorageTo get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you the Storage Admin (roles/storage.admin
) IAM role on the bucket. For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand the Required permissions section:
Required permissionsThe following permissions are required to load data from a Cloud Storage bucket:
storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URI wildcard)
You might also be able to get these permissions with custom roles or other predefined roles.
Create a datasetCreate a BigQuery dataset to store your data.
JSON compressionYou can use the gzip
utility to compress JSON files. Note that gzip
performs full file compression, unlike the file content compression performed by compression codecs for other file formats, such as Avro. Using gzip
to compress your JSON files might have a performance impact; for more information about the trade-offs, see Loading compressed and uncompressed data.
To load JSON data from Cloud Storage into a new BigQuery table:
ConsoleIn the Google Cloud console, go to the BigQuery page.
bq show --format=prettyjson dataset.table
0
or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in an invalid
message and fail. This option applies only to CSV and JSON files.MM/DD/YYYY
). If this value is present, this format is the only compatible DATE format. Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible DATETIME format. Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with the default formats. (Preview).HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIME format. Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIMESTAMP format. Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with the default formats. (Preview).After the table is created, you can update the table's expiration, description, and labels, but you cannot add a partition expiration after a table is created using the Google Cloud console. For more information, see Managing tables.
SQLUse the LOAD DATA
DDL statement. The following example loads a JSON file into the new table mytable
:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
LOAD DATA OVERWRITE mydataset.mytable (x INT64,y STRING) FROM FILES ( format = 'JSON', uris = ['gs://bucket/path/file.json']);
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqUse the bq load
command, specify NEWLINE_DELIMITED_JSON
using the --source_format
flag, and include a Cloud Storage URI. You can include a single URI, a comma-separated list of URIs, or a URI containing a wildcard. Supply the schema inline, in a schema definition file, or use schema auto-detect.
(Optional) Supply the --location
flag and set the value to your location.
Other optional flags include:
--max_bad_records
: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0
. At most, five errors of any type are returned regardless of the --max_bad_records
value.--ignore_unknown_values
: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.--time_zone
: (Preview) An optional default time zone that will apply when parsing timestamp values that have no specific time zone in CSV or JSON data.--date_format
: (Preview) An optional custom string that defines how the DATE values are formatted in CSV or JSON data.--datetime_format
: (Preview) An optional custom string that defines how the DATETIME values are formatted in CSV or JSON data.--time_format
: (Preview) An optional custom string that defines how the TIME values are formatted in CSV or JSON data.--timestamp_format
: (Preview) An optional custom string that defines how the TIMESTAMP values are formatted in CSV or JSON data.--autodetect
: When specified, enable schema auto-detection for CSV and JSON data.--time_partitioning_type
: Enables time-based partitioning on a table and sets the partition type. Possible values are HOUR
, DAY
, MONTH
, and YEAR
. This flag is optional when you create a table partitioned on a DATE
, DATETIME
, or TIMESTAMP
column. The default partition type for time-based partitioning is DAY
. You cannot change the partitioning specification on an existing table.--time_partitioning_expiration
: An integer that specifies (in seconds) when a time-based partition should be deleted. The expiration time evaluates to the partition's UTC date plus the integer value.--time_partitioning_field
: The DATE
or TIMESTAMP
column used to create a partitioned table. If time-based partitioning is enabled without this value, an ingestion-time partitioned table is created.--require_partition_filter
: When enabled, this option requires users to include a WHERE
clause that specifies the partitions to query. Requiring a partition filter can reduce cost and improve performance. For more information, see Require a partition filter in queries.--clustering_fields
: A comma-separated list of up to four column names used to create a clustered table.--destination_kms_key
: The Cloud KMS key for encryption of the table data.
For more information on partitioned tables, see:
For more information on clustered tables, see:
For more information on table encryption, see:
To load JSON data into BigQuery, enter the following command:
bq --location=LOCATION load \ --source_format=FORMAT \ DATASET.TABLE \ PATH_TO_SOURCE \ SCHEMA
Replace the following:
LOCATION
: your location. The --location
flag is optional. For example, if you are using BigQuery in the Tokyo region, you can set the flag's value to asia-northeast1
. You can set a default value for the location using the .bigqueryrc file.FORMAT
: NEWLINE_DELIMITED_JSON
.DATASET
: an existing dataset.TABLE
: the name of the table into which you're loading data.PATH_TO_SOURCE
: a fully qualified Cloud Storage URI or a comma-separated list of URIs. Wildcards are also supported.SCHEMA
: a valid schema. The schema can be a local JSON file, or it can be typed inline as part of the command. If you use a schema file, do not give it an extension. You can also use the --autodetect
flag instead of supplying a schema definition.Examples:
The following command loads data from gs://mybucket/mydata.json
into a table named mytable
in mydataset
. The schema is defined in a local schema file named myschema
.
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
gs://mybucket/mydata.json \
./myschema
The following command loads data from gs://mybucket/mydata.json
into a new ingestion-time partitioned table named mytable
in mydataset
. The schema is defined in a local schema file named myschema
.
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
--time_partitioning_type=DAY \
mydataset.mytable \
gs://mybucket/mydata.json \
./myschema
The following command loads data from gs://mybucket/mydata.json
into a partitioned table named mytable
in mydataset
. The table is partitioned on the mytimestamp
column. The schema is defined in a local schema file named myschema
.
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
--time_partitioning_field mytimestamp \
mydataset.mytable \
gs://mybucket/mydata.json \
./myschema
The following command loads data from gs://mybucket/mydata.json
into a table named mytable
in mydataset
. The schema is auto detected.
bq load \
--autodetect \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
gs://mybucket/mydata.json
The following command loads data from gs://mybucket/mydata.json
into a table named mytable
in mydataset
. The schema is defined inline in the format FIELD:DATA_TYPE, FIELD:DATA_TYPE
.
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
gs://mybucket/mydata.json \
qtr:STRING,sales:FLOAT,year:STRING
Note: When you specify the schema using the bq tool, you cannot include a RECORD
(STRUCT
) type, you cannot include a field description, and you cannot specify the field mode. All field modes default to NULLABLE
. To include field descriptions, modes, and RECORD
types, supply a JSON schema file instead.
The following command loads data from multiple files in gs://mybucket/
into a table named mytable
in mydataset
. The Cloud Storage URI uses a wildcard. The schema is auto detected.
bq load \
--autodetect \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
gs://mybucket/mydata*.json
The following command loads data from multiple files in gs://mybucket/
into a table named mytable
in mydataset
. The command includes a comma- separated list of Cloud Storage URIs with wildcards. The schema is defined in a local schema file named myschema
.
bq load \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
"gs://mybucket/00/*.json","gs://mybucket/01/*.json" \
./myschema
API
Create a load
job that points to the source data in Cloud Storage.
(Optional) Specify your location in the location
property in the jobReference
section of the job resource.
The source URIs
property must be fully qualified, in the format gs://BUCKET/OBJECT
. Each URI can contain one '*' wildcard character.
Specify the JSON
data format by setting the sourceFormat
property to NEWLINE_DELIMITED_JSON
.
To check the job status, call jobs.get(JOB_ID*)
, replacing JOB_ID
with the ID of the job returned by the initial request.
status.state = DONE
, the job completed successfully.status.errorResult
property is present, the request failed, and that object includes information describing what went wrong. When a request fails, no table is created and no data is loaded.status.errorResult
is absent, the job finished successfully; although, there might have been some nonfatal errors, such as problems importing a few rows. Nonfatal errors are listed in the returned job object's status.errors
property.API notes:
Load jobs are atomic and consistent; if a load job fails, none of the data is available, and if a load job succeeds, all of the data is available.
As a best practice, generate a unique ID and pass it as jobReference.jobId
when calling jobs.insert
to create a load job. This approach is more robust to network failure because the client can poll or retry on the known job ID.
Calling jobs.insert
on a given job ID is idempotent. You can retry as many times as you like on the same job ID, and at most, one of those operations succeed.
Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery C# API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Use theBigQueryClient.CreateLoadJob()
method to start a load job from Cloud Storage. To use JSONL, create a CreateLoadJobOptions
object and set its SourceFormat
property to FileFormat.NewlineDelimitedJson
. Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
JavaBefore trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Use the LoadJobConfiguration.builder(tableId, sourceUri) method to start a load job from Cloud Storage. To use newline-delimited JSON, use the LoadJobConfiguration.setFormatOptions(FormatOptions.json()). Node.jsBefore trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PHPBefore trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonBefore trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Use the Client.load_table_from_uri() method to start a load job from Cloud Storage. To use JSONL, set the LoadJobConfig.source_format property to the stringNEWLINE_DELIMITED_JSON
and pass the job config as the job_config
argument to the load_table_from_uri()
method. Ruby
Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Use the Dataset.load_job() method to start a load job from Cloud Storage. To use JSONL, set theformat
parameter to "json"
. Loading nested and repeated JSON data
BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON, Avro, ORC, Parquet, Firestore, and Datastore.
One JSON object, including any nested or repeated fields, must appear on each line.
The following example shows sample nested or repeated data. This table contains information about people. It consists of the following fields:
id
first_name
last_name
dob
(date of birth)addresses
(a nested and repeated field)
addresses.status
(current or previous)addresses.address
addresses.city
addresses.state
addresses.zip
addresses.numberOfYears
(years at the address)The JSON data file would look like the following. Notice that the address field contains an array of values (indicated by [ ]
).
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]} {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}
The schema for this table would look like the following:
[ { "name": "id", "type": "STRING", "mode": "NULLABLE" }, { "name": "first_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "last_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "dob", "type": "DATE", "mode": "NULLABLE" }, { "name": "addresses", "type": "RECORD", "mode": "REPEATED", "fields": [ { "name": "status", "type": "STRING", "mode": "NULLABLE" }, { "name": "address", "type": "STRING", "mode": "NULLABLE" }, { "name": "city", "type": "STRING", "mode": "NULLABLE" }, { "name": "state", "type": "STRING", "mode": "NULLABLE" }, { "name": "zip", "type": "STRING", "mode": "NULLABLE" }, { "name": "numberOfYears", "type": "STRING", "mode": "NULLABLE" } ] } ]
For information on specifying a nested and repeated schema, see Specifying nested and repeated fields.
Loading semi-structured JSON dataBigQuery supports loading semi-structured data, in which a field can take values of different types. The following example shows data similar to the preceding nested and repeated JSON data example, except that the address
field can be a STRING
, a STRUCT
, or an ARRAY
:
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","address":"123 First Avenue, Seattle WA 11111"} {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","address":{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"}} {"id":"3","first_name":"Bob","last_name":"Doe","dob":"1982-01-10","address":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"}, "321 Main Street Hoboken NJ 44444"]}
You can load this data into BigQuery by using the following schema:
[ { "name": "id", "type": "STRING", "mode": "NULLABLE" }, { "name": "first_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "last_name", "type": "STRING", "mode": "NULLABLE" }, { "name": "dob", "type": "DATE", "mode": "NULLABLE" }, { "name": "address", "type": "JSON", "mode": "NULLABLE" } ]
The address
field is loaded into a column with type JSON
that allows it to hold the mixed types in the example. You can ingest data as JSON
whether it contains mixed types or not. For example, you could specify JSON
instead of STRING
as the type for the first_name
field. For more information, see Working with JSON data in GoogleSQL.
You can load additional data into a table either from source files or by appending query results.
In the Google Cloud console, use the Write preference option to specify what action to take when you load data from a source file or from a query result.
You have the following options when you load additional data into a table:
Console option bq tool flag BigQuery API property Description Write if empty Not supportedWRITE_EMPTY
Writes the data only if the table is empty. Append to table --noreplace
or --replace=false
; if --[no]replace
is unspecified, the default is append WRITE_APPEND
(Default) Appends the data to the end of the table. Overwrite table --replace
or --replace=true
WRITE_TRUNCATE
Erases all existing data in a table before writing the new data. This action also deletes the table schema, row level security, and removes any Cloud KMS key.
If you load data into an existing table, the load job can append the data or overwrite the table.
You can append or overwrite a table by using one of the following:
bq load
commandjobs.insert
API method and configuring a load
jobIn the Google Cloud console, go to the BigQuery page.
bq show --format=prettyjson dataset.table
0
or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in an invalid
message and fail. This option applies only to CSV and JSON files.MM/DD/YYYY
). If this value is present, this format is the only compatible DATE format. Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible DATETIME format. Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with the default formats. (Preview).HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIME format. Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with the default formats. (Preview).MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIMESTAMP format. Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with the default formats. (Preview).Use the LOAD DATA
DDL statement. The following example appends a JSON file to the table mytable
:
In the Google Cloud console, go to the BigQuery page.
In the query editor, enter the following statement:
LOAD DATA INTO mydataset.mytable FROM FILES ( format = 'JSON', uris = ['gs://bucket/path/file.json']);
Click play_circle Run.
For more information about how to run queries, see Run an interactive query.
bqUse the bq load
command, specify NEWLINE_DELIMITED_JSON
using the --source_format
flag, and include a Cloud Storage URI. You can include a single URI, a comma-separated list of URIs, or a URI containing a wildcard.
Supply the schema inline, in a schema definition file, or use schema auto-detect.
Specify the --replace
flag to overwrite the table. Use the --noreplace
flag to append data to the table. If no flag is specified, the default is to append data.
It is possible to modify the table's schema when you append or overwrite it. For more information on supported schema changes during a load operation, see Modifying table schemas.
(Optional) Supply the --location
flag and set the value to your location.
Other optional flags include:
--max_bad_records
: An integer that specifies the maximum number of bad records allowed before the entire job fails. The default value is 0
. At most, five errors of any type are returned regardless of the --max_bad_records
value.--ignore_unknown_values
: When specified, allows and ignores extra, unrecognized values in CSV or JSON data.--time_zone
: (Preview) An optional default time zone that will apply when parsing timestamp values that have no specific time zone in CSV or JSON data.--date_format
: (Preview) An optional custom string that defines how the DATE values are formatted in CSV or JSON data.--datetime_format
: (Preview) An optional custom string that defines how the DATETIME values are formatted in CSV or JSON data.--time_format
: (Preview) An optional custom string that defines how the TIME values are formatted in CSV or JSON data.--timestamp_format
: (Preview) An optional custom string that defines how the TIMESTAMP values are formatted in CSV or JSON data.--autodetect
: When specified, enable schema auto-detection for CSV and JSON data.--destination_kms_key
: The Cloud KMS key for encryption of the table data.bq --location=LOCATION load \ --[no]replace \ --source_format=FORMAT \ DATASET.TABLE \ PATH_TO_SOURCE \ SCHEMA
Replace the following:
LOCATION
: your location. The --location
flag is optional. You can set a default value for the location using the .bigqueryrc file.FORMAT
: NEWLINE_DELIMITED_JSON
.DATASET
: an existing dataset.TABLE
: the name of the table into which you're loading data.PATH_TO_SOURCE
: a fully qualified Cloud Storage URI or a comma-separated list of URIs. Wildcards are also supported.SCHEMA
: a valid schema. The schema can be a local JSON file, or it can be typed inline as part of the command. You can also use the --autodetect
flag instead of supplying a schema definition.Examples:
The following command loads data from gs://mybucket/mydata.json
and overwrites a table named mytable
in mydataset
. The schema is defined using schema auto-detection.
bq load \
--autodetect \
--replace \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
gs://mybucket/mydata.json
The following command loads data from gs://mybucket/mydata.json
and appends data to a table named mytable
in mydataset
. The schema is defined using a JSON schema file — myschema
.
bq load \
--noreplace \
--source_format=NEWLINE_DELIMITED_JSON \
mydataset.mytable \
gs://mybucket/mydata.json \
./myschema
API
Create a load
job that points to the source data in Cloud Storage.
(Optional) Specify your location in the location
property in the jobReference
section of the job resource.
The source URIs
property must be fully-qualified, in the format gs://BUCKET/OBJECT
. You can include multiple URIs as a comma-separated list. The wildcards are also supported.
Specify the data format by setting the configuration.load.sourceFormat
property to NEWLINE_DELIMITED_JSON
.
Specify the write preference by setting the configuration.load.writeDisposition
property to WRITE_TRUNCATE
or WRITE_APPEND
.
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Go API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Java Node.jsBefore trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Node.js API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PHPBefore trying this sample, follow the PHP setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery PHP API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
PythonTo replace the rows in an existing table, set the LoadJobConfig.write_disposition property to the string WRITE_TRUNCATE
.
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Python API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
RubyTo replace the rows in an existing table, set the write
parameter of Table.load_job() to "WRITE_TRUNCATE"
.
Before trying this sample, follow the Ruby setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Ruby API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Loading hive-partitioned JSON dataBigQuery supports loading hive partitioned JSON data stored on Cloud Storage and populates the hive partitioning columns as columns in the destination BigQuery managed table. For more information, see Loading externally partitioned data.
Details of loading JSON dataThis section describes how BigQuery parses various data types when loading JSON data.
Data typesBoolean. BigQuery can parse any of the following pairs for Boolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all case insensitive). Schema
autodetectionautomatically detects any of these except 0 and 1.
Bytes. Columns with BYTES types must be encoded as Base64.
Date. Columns with DATE types must be in the format YYYY-MM-DD
.
Datetime. Columns with DATETIME types must be in the format YYYY-MM-DD HH:MM:SS[.SSSSSS]
.
Geography. Columns with GEOGRAPHY types must contain strings in one of the following formats:
If you use WKB, the value should be hex encoded.
The following list shows examples of valid data:
POINT(1 2)
{ "type": "Point", "coordinates": [1, 2] }
0101000000feffffffffffef3f0000000000000040
Before loading GEOGRAPHY data, also read Loading geospatial data.
Interval. Columns with INTERVAL types must be in ISO 8601 format PYMDTHMS
, where:
You can indicate a negative value by prepending a dash (-).
The following list shows examples of valid data:
P-10000Y0M-3660000DT-87840000H0M0S
P0Y0M0DT0H0M0.000001S
P10000Y0M3660000DT87840000H0M0S
To load INTERVAL data, you must use the bq load
command and use the --schema
flag to specify a schema. You can't upload INTERVAL data by using the console.
Time. Columns with TIME types must be in the format HH:MM:SS[.SSSSSS]
.
Timestamp. BigQuery accepts various timestamp formats. The timestamp must include a date portion and a time portion.
The date portion can be formatted as YYYY-MM-DD
or YYYY/MM/DD
.
The timestamp portion must be formatted as HH:MM[:SS[.SSSSSS]]
(seconds and fractions of seconds are optional).
The date and time must be separated by a space or 'T'.
Optionally, the date and time can be followed by a UTC offset or the UTC zone designator (Z
). For more information, see Time zones.
For example, any of the following are valid timestamp values:
If you provide a schema, BigQuery also accepts Unix epoch time for timestamp values. However, schema autodetection doesn't detect this case, and treats the value as a numeric or string type instead.
Examples of Unix epoch timestamp values:
Array (repeated field). The value must be a JSON array or null
. JSON null
is converted to SQL NULL
. The array itself cannot contain null
values.
This section describes the behavior of schema auto-detection when loading JSON files.
JSON nested and repeated fieldsBigQuery infers nested and repeated fields in JSON files. If a field value is a JSON object, then BigQuery loads the column as a RECORD
type. If a field value is an array, then BigQuery loads the column as a repeated column. For an example of JSON data with nested and repeated data, see Loading nested and repeated JSON data.
If you enable schema auto-detection, then BigQuery converts strings into Boolean, numeric, or date/time types when possible. For example, using the following JSON data, schema auto-detection converts the id
field to an INTEGER
column:
{ "name":"Alice","id":"12"}
{ "name":"Bob","id":"34"}
{ "name":"Charles","id":"45"}
Encoding types
BigQuery expects JSON data to be UTF-8 encoded. If you have JSON files with other supported encoding types, you should explicitly specify the encoding by using the --encoding
flag so that BigQuery converts the data to UTF-8.
BigQuery supports the following encoding types for JSON files:
To change how BigQuery parses JSON data, specify additional options in the Google Cloud console, the bq command-line tool, the API, or the client libraries.
JSON option Console option bq tool flag BigQuery API property Description Number of bad records allowed Number of errors allowed--max_bad_records
maxBadRecords
(Java, Python) (Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is `0`, which requires that all records are valid. Unknown values Ignore unknown values --ignore_unknown_values
ignoreUnknownValues
(Java, Python) (Optional) Indicates whether BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. The `sourceFormat` property determines what BigQuery treats as an extra value: CSV: trailing columns, JSON: named values that don't match any column names. Encoding None -E
or --encoding
encoding
(Python) (Optional) The character encoding of the data. The supported values are UTF-8, ISO-8859-1, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. The default value is UTF-8. Time Zone Time Zone --time_zone
None (Preview) (Optional) Default time zone that is applied when parsing timestamp values that have no specific time zone. Check valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. Date Format Date Format --date_format
None (Preview) (Optional) Format elements that define how the DATE values are formatted in the input files (for example, MM/DD/YYYY
). If this value is present, this format is the only compatible DATE format. Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with the default formats. Datetime Format Datetime Format --datetime_format
None (Preview) (Optional) Format elements that define how the DATETIME values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible DATETIME format. Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with the default formats. Time Format Time Format --time_format
None (Preview) (Optional) Format elements that define how the TIME values are formatted in the input files (for example, HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIME format. Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with the default formats. Timestamp Format Timestamp Format --timestamp_format
None (Preview) (Optional) Format elements that define how the TIMESTAMP values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3
). If this value is present, this format is the only compatible TIMESTAMP format. Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with the default formats. What's next
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-07 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["Newline-delimited JSON (ndJSON) data can be loaded from Cloud Storage into BigQuery, either creating a new table or partition or appending to/overwriting an existing one, where the data is converted into a columnar format."],["When loading data from Cloud Storage, the dataset containing the BigQuery table must be located in the same regional or multi-regional location as the Cloud Storage bucket."],["BigQuery offers flexibility in managing data load operations through write preferences, allowing users to write to empty tables, append data to existing tables, or overwrite entire tables."],["BigQuery supports loading semi-structured JSON data, allowing fields to take on different types, which can be handled using the `JSON` type in the table schema."],["The process of loading JSON data can be performed through the console, SQL statements, the `bq` command-line tool, API methods, or client libraries, with each method providing options for schema specification, error handling, and data manipulation."]]],[]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4