A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://cloud.google.com/bigquery/docs/reference/standard-sql/load-statements below:

Load statements in GoogleSQL | BigQuery

Skip to main content

Stay organized with collections Save and categorize content based on your preferences.

Load statements in GoogleSQL LOAD DATA statement

Loads data from one or more files into a table. The statement can create a new table, append data into an existing table or partition, or overwrite an existing table or partition. If the LOAD DATA statement fails, the table into which you are loading data remains unchanged.

Syntax
LOAD DATA {OVERWRITE|INTO}  [{TEMP|TEMPORARY} TABLE]
[[project_name.]dataset_name.]table_name
[(
  column_list
)]
[[OVERWRITE] PARTITIONS (partition_column_name=partition_value)]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS (table_option_list)]
FROM FILES(load_option_list)
[WITH PARTITION COLUMNS
  [(partition_column_list)]
]
[WITH CONNECTION connection_name]

column_list: column[, ...]

partition_column_list: partition_column_name, partition_column_type[, ...]
Arguments

If no table exists with the specified name, then the statement creates a new table. If a table already exists with the specified name, then the behavior depends on the INTO or OVERWRITE keyword. The INTO keyword appends the data to the table, and the OVERWRITE keyword overwrites the table.

If your external data uses a hive-partitioned layout, then include the WITH PARTITION COLUMNS clause. If you include the WITH PARTITION COLUMNS clause without partition_column_list, then BigQuery infers the partitioning from the data layout. If you include both column_list and WITH PARTITION COLUMNS, then partition_column_list is required.

You can't use the LOAD DATA statement to load data into a temporary table.

column

(column_name column_schema[, ...]) contains the table's schema information in a comma-separated list.

Note: Constraints cannot be specified on ARRAY or STRUCT elements.
column :=
  column_name column_schema

column_schema :=
   {
     simple_type
     | STRUCT<field_list>
     | ARRAY<array_element_schema>
   }
   [PRIMARY KEY NOT ENFORCED | REFERENCES table_name(column_name) NOT ENFORCED]
   [DEFAULT default_expression]
   [NOT NULL]
   [OPTIONS(column_option_list)]

simple_type :=
  { data_type | STRING COLLATE collate_specification }

field_list :=
  field_name column_schema [, ...]

array_element_schema :=
  { simple_type | STRUCT<field_list> }
  [NOT NULL]
column_option_list

Specify a column option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details description

STRING

Example: description="a unique id"

This property is equivalent to the schema.fields[].description table resource property.

rounding_mode

STRING

Example: rounding_mode = "ROUND_HALF_EVEN"

This specifies the rounding mode that's used for values written to a NUMERIC or BIGNUMERIC type column or STRUCT field. The following values are supported:

This property is equivalent to the roundingMode table resource property.

data_policies ARRAY<STRING>

Applies a data policy to a column in a table (Preview).

Example: data_policies = ["{'name':'myproject.region-us.data_policy_name1'}", "{'name':'myproject.region-us.data_policy_name2'}"]

The ALTER TABLE ALTER COLUMN statement supports the = and += operators to add data policies to a specific column.

Example: data_policies +=["data_policy1", "data_policy2"]

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

Setting the VALUE replaces the existing value of that option for the column, if there was one. Setting the VALUE to NULL clears the column's value for that option.

partition_expression

PARTITION BY is an optional clause that controls table and vector index partitioning. partition_expression is an expression that determines how to partition the table or vector index. The partition expression can contain the following values:

table_option_list

The option list lets you set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details expiration_timestamp TIMESTAMP

Example: expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to the expirationTime table resource property.

partition_expiration_days

FLOAT64

Example: partition_expiration_days=7

Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions don't expire.

This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours.

This property can only be set if the table is partitioned.

require_partition_filter

BOOL

Example: require_partition_filter=true

Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value is false.

This property is equivalent to the timePartitioning.requirePartitionFilter table resource property.

This property can only be set if the table is partitioned.

friendly_name

STRING

Example: friendly_name="my_table"

This property is equivalent to the friendlyName table resource property.

description

STRING

Example: description="a table that expires in 2025"

This property is equivalent to the description table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example: labels=[("org_unit", "development")]

This property is equivalent to the labels table resource property.

default_rounding_mode

STRING

Example: default_rounding_mode = "ROUND_HALF_EVEN"

This specifies the default rounding mode that's used for values written to any new NUMERIC or BIGNUMERIC type columns or STRUCT fields in the table. It does not impact existing fields in the table. The following values are supported:

This property is equivalent to the defaultRoundingMode table resource property.

enable_change_history

BOOL

In preview.

Example: enable_change_history=TRUE

Set this property to TRUE in order to capture change history on the table, which you can then view by using the CHANGES function. Enabling this table option has an impact on costs; for more information see Pricing and costs. The default is FALSE.

max_staleness

INTERVAL

Example: max_staleness=INTERVAL "4:0:0" HOUR TO SECOND

The maximum interval behind the current time where it's acceptable to read stale data. For example, with change data capture, when this option is set, the table copy operation is denied if data is more stale than the max_staleness value.

max_staleness is disabled by default.

enable_fine_grained_mutations

BOOL

In preview.

Example: enable_fine_grained_mutations=TRUE

Set this property to TRUE to enable fine-grained DML optimization on the table. The default is FALSE.

storage_uri

STRING

In preview.

Example: storage_uri=gs://BUCKET_DIRECTORY/TABLE_DIRECTORY/

A fully qualified location prefix for the external folder where data is stored. Supports gs: buckets.

Required for managed tables.

file_format

STRING

In preview.

Example: file_format=PARQUET

The open-source file format in which the table data is stored. Only PARQUET is supported.

Required for managed tables.

The default is PARQUET.

table_format

STRING

In preview.

Example: table_format=ICEBERG

The open table format in which metadata-only snapshots are stored. Only ICEBERG is supported.

Required for managed tables.

The default is ICEBERG.

tags <ARRAY<STRUCT<STRING, STRING>>> An array of IAM tags for the table, expressed as key-value pairs. The key should be the namespaced key name, and the value should be the short name.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

load_option_list

Specifies options for loading data from external files. The format and uris options are required. Specify the option list in the following format: NAME=VALUE, ...

Options allow_jagged_rows

BOOL

If true, allow rows that are missing trailing optional columns.

Applies to CSV data.

allow_quoted_newlines

BOOL

If true, allow quoted data sections that contain newline characters in the file.

Applies to CSV data.

bigtable_options

STRING

Only required when creating a Bigtable external table.

Specifies the schema of the Bigtable external table in JSON format.

For a list of Bigtable table definition options, see BigtableOptions in the REST API reference.

column_name_character_map

STRING

Defines the scope of supported column name characters and the handling behavior of unsupported characters. The default setting is STRICT, which means unsupported characters cause BigQuery to throw errors. V1 and V2 replace any unsupported characters with underscores.

Supported values include:

compression

STRING

The compression type of the data source. Supported values include: GZIP. If not specified, the data source is uncompressed.

Applies to CSV and JSON data.

decimal_target_types

ARRAY<STRING>

Determines how to convert a Decimal type. Equivalent to ExternalDataConfiguration.decimal_target_types

Example: ["NUMERIC", "BIGNUMERIC"].

enable_list_inference

BOOL

If true, use schema inference specifically for Parquet LIST logical type.

Applies to Parquet data.

enable_logical_types

BOOL

If true, convert Avro logical types into their corresponding SQL types. For more information, see Logical types.

Applies to Avro data.

encoding

STRING

The character encoding of the data. Supported values include: UTF8 (or UTF-8), ISO_8859_1 (or ISO-8859-1), UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. The default value is UTF-8.

Applies to CSV data.

enum_as_string

BOOL

If true, infer Parquet ENUM logical type as STRING instead of BYTES by default.

Applies to Parquet data.

field_delimiter

STRING

The separator for fields in a CSV file.

Applies to CSV data.

format

STRING

The format of the external data. Supported values for CREATE EXTERNAL TABLE include: AVRO, CLOUD_BIGTABLE, CSV, DATASTORE_BACKUP, DELTA_LAKE (preview), GOOGLE_SHEETS, NEWLINE_DELIMITED_JSON (or JSON), ORC, PARQUET.

Supported values for LOAD DATA include: AVRO, CSV, DELTA_LAKE (preview) NEWLINE_DELIMITED_JSON (or JSON), ORC, PARQUET.

The value JSON is equivalent to NEWLINE_DELIMITED_JSON.

hive_partition_uri_prefix

STRING

A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables.

Applies to Avro, CSV, JSON, Parquet, and ORC data.

Example: "gs://bucket/path".

file_set_spec_type

STRING

Specifies how to interpret source URIs for load jobs and external tables.

Supported values include:

For example, if you have a source URI of "gs://bucket/path/file" and the file_set_spec_type is FILE_SYSTEM_MATCH, then the file is used directly as a data file. If the file_set_spec_type is NEW_LINE_DELIMITED_MANIFEST, then each line in the file is interpreted as a URI that points to a data file.

ignore_unknown_values

BOOL

If true, ignore extra values that are not represented in the table schema, without returning an error.

Applies to CSV and JSON data.

json_extension

STRING

For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records.

Supported values include:
GEOJSON. Newline-delimited GeoJSON data. For more information, see Creating an external table from a newline-delimited GeoJSON file.

max_bad_records

INT64

The maximum number of bad records to ignore when reading the data.

Applies to: CSV, JSON, and Google Sheets data.

max_staleness

INTERVAL

Applicable for BigLake tables and object tables.

Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it.

To disable metadata caching, specify 0. This is the default.

To enable metadata caching, specify an interval literal value between 30 minutes and 7 days. For example, specify INTERVAL 4 HOUR for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation falls back to retrieving metadata from Cloud Storage instead.

null_marker

STRING

The string that represents NULL values in a CSV file.

Applies to CSV data.

null_markers

ARRAY<STRING>

(Preview)

The list of strings that represent NULL values in a CSV file.

This option cannot be used with null_marker option.

Applies to CSV data.

object_metadata

STRING

Only required when creating an object table.

Set the value of this option to SIMPLE when creating an object table.

preserve_ascii_control_characters

BOOL

If true, then the embedded ASCII control characters which are the first 32 characters in the ASCII table, ranging from '\x00' to '\x1F', are preserved.

Applies to CSV data.

quote

STRING

The string used to quote data sections in a CSV file. If your data contains quoted newline characters, also set the allow_quoted_newlines property to true.

Applies to CSV data.

skip_leading_rows

INT64

The number of rows at the top of a file to skip when reading the data.

Applies to CSV and Google Sheets data.

source_column_match

STRING

(Preview)

This controls the strategy used to match loaded columns to the schema.

If this value is unspecified, then the default is based on how the schema is provided. If autodetect is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position. This is done to keep the behavior backward-compatible.

Supported values include:

tags <ARRAY<STRUCT<STRING, STRING>>>

An array of IAM tags for the table, expressed as key-value pairs. The key should be the namespaced key name, and the value should be the short name.

time_zone

STRING

(Preview)

Default time zone that will apply when parsing timestamp values that have no specific time zone.

Check valid time zone names.

If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC.

Applies to CSV and JSON data.

date_format

STRING

(Preview)

Format elements that define how the DATE values are formatted in the input files (for example, MM/DD/YYYY).

If this value is present, this format is the only compatible DATE format. Schema autodetection will also decide DATE column type based on this format instead of the existing format.

If this value is not present, the DATE field is parsed with the default formats.

Applies to CSV and JSON data.

datetime_format

STRING

(Preview)

Format elements that define how the DATETIME values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3).

If this value is present, this format is the only compatible DATETIME format. Schema autodetection will also decide DATETIME column type based on this format instead of the existing format.

If this value is not present, the DATETIME field is parsed with the default formats.

Applies to CSV and JSON data.

time_format

STRING

(Preview)

Format elements that define how the TIME values are formatted in the input files (for example, HH24:MI:SS.FF3).

If this value is present, this format is the only compatible TIME format. Schema autodetection will also decide TIME column type based on this format instead of the existing format.

If this value is not present, the TIME field is parsed with the default formats.

Applies to CSV and JSON data.

timestamp_format

STRING

(Preview)

Format elements that define how the TIMESTAMP values are formatted in the input files (for example, MM/DD/YYYY HH24:MI:SS.FF3).

If this value is present, this format is the only compatible TIMESTAMP format. Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format.

If this value is not present, the TIMESTAMP field is parsed with the default formats.

Applies to CSV and JSON data.

uris

For external tables, including object tables, that aren't Bigtable tables:

ARRAY<STRING>

An array of fully qualified URIs for the external data locations. Each URI can contain one asterisk (*) wildcard character, which must come after the bucket name. When you specify uris values that target multiple files, all of those files must share a compatible schema.

The following examples show valid uris values:

For Bigtable tables:

STRING

The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI.

Example: https://googleapis.com/bigtable/projects/project_id/instances/instance_id[/appProfiles/app_profile]/tables/table_name

For more information on constructing a Bigtable URI, see Retrieve the Bigtable URI.

Examples

The following examples show common use cases for the LOAD DATA statement.

Load data into a table

The following example loads an Avro file into a table. Avro is a self-describing format, so BigQuery infers the schema.

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )

The following example loads two CSV files into a table, using schema autodetection.

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='CSV',
    uris = ['gs://bucket/path/file1.csv', 'gs://bucket/path/file2.csv']
  )
Load data using a schema

The following example loads a CSV file into a table, using a specified table schema.

LOAD DATA INTO mydataset.table1(x INT64, y STRING)
  FROM FILES(
    skip_leading_rows=1,
    format='CSV',
    uris = ['gs://bucket/path/file.csv']
  )
Set options when creating a new table

The following example creates a new table with a description and an expiration time.

LOAD DATA INTO mydataset.table1
  OPTIONS(
    description="my table",
    expiration_timestamp="2025-01-01 00:00:00 UTC"
  )
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )
Overwrite an existing table

The following example overwrites an existing table.

LOAD DATA OVERWRITE mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )
Load data into a temporary table

The following example loads an Avro file into a temporary table.

LOAD DATA INTO TEMP TABLE mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )
Specify table partitioning and clustering

The following example creates a table that is partitioned by the transaction_date field and clustered by the customer_id field. It also configures the partitions to expire after three days.

LOAD DATA INTO mydataset.table1
  PARTITION BY transaction_date
  CLUSTER BY customer_id
  OPTIONS(
    partition_expiration_days=3
  )
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )
Load data into a partition

The following example loads data into a selected partition of an ingestion-time partitioned table:

LOAD DATA INTO mydataset.table1
PARTITIONS(_PARTITIONTIME = TIMESTAMP '2016-01-01')
  PARTITION BY _PARTITIONTIME
  FROM FILES(
    format = 'AVRO',
    uris = ['gs://bucket/path/file.avro']
  )
Load a file that is externally partitioned

The following example loads a set of external files that use a hive partitioning layout.

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/*'],
    hive_partition_uri_prefix='gs://bucket/path'
  )
  WITH PARTITION COLUMNS(
    field_1 STRING, -- column order must match the external path
    field_2 INT64
  )

The following example infers the partitioning layout:

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/*'],
    hive_partition_uri_prefix='gs://bucket/path'
  )
  WITH PARTITION COLUMNS

If you include both column_list and WITH PARTITION COLUMNS, then you must explicitly list the partitioning columns. For example, the following query returns an error:

-- This query returns an error.
LOAD DATA INTO mydataset.table1
  (
    x INT64, -- column_list is given but the partition column list is missing
    y STRING
  )
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/*'],
    hive_partition_uri_prefix='gs://bucket/path'
  )
  WITH PARTITION COLUMNS
Load data with cross-cloud transfer Example 1

The following example loads a parquet file named sample.parquet from an Amazon S3 bucket into the test_parquet table with an auto-detect schema:

LOAD DATA INTO mydataset.testparquet
  FROM FILES (
    uris = ['s3://test-bucket/sample.parquet'],
    format = 'PARQUET'
  )
  WITH CONNECTION `aws-us-east-1.test-connection`
Example 2

The following example loads a CSV file with the prefix sampled* from your Blob Storage into the test_csv table with predefined column partitioning by time:

LOAD DATA INTO mydataset.test_csv (Number INT64, Name STRING, Time DATE)
  PARTITION BY Time
  FROM FILES (
    format = 'CSV', uris = ['azure://test.blob.core.windows.net/container/sampled*'],
    skip_leading_rows=1
  )
  WITH CONNECTION `azure-eastus2.test-connection`
Example 3

The following example overwrites the existing table test_parquet with data from a file named sample.parquet with an auto-detect schema:

LOAD DATA OVERWRITE mydataset.testparquet
  FROM FILES (
    uris = ['s3://test-bucket/sample.parquet'],
    format = 'PARQUET'
  )
  WITH CONNECTION `aws-us-east-1.test-connection`

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-08-07 UTC.

[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["The `LOAD DATA` statement in GoogleSQL allows loading data from one or more files into a table, enabling the creation of new tables, appending data to existing tables or partitions, or overwriting existing tables or partitions."],["The behavior of `LOAD DATA` depends on whether `INTO` (appends data) or `OVERWRITE` (replaces data) is specified, and if no table exists, a new one is created, also if using `INTO` with the `PARTITIONS` clause, `OVERWRITE` is not permitted."],["The `column_list` option specifies the table's schema and, if omitted, BigQuery attempts schema auto-detection; when loading hive-partitioned data, the partitioned columns must be included in the schema."],["`PARTITION BY` clause allows for table partitioning based on a specified expression, while the `CLUSTER BY` clause allows clustering based on column names, and various options are available for managing table expiration and filtering requirements with the `OPTIONS` clause."],["The `load_option_list` allows the specification of options for loading data from external files such as `format`, `uris` and more, with support for various data formats including AVRO, CSV, JSON, ORC, and PARQUET, and allowing for cross-cloud transfers."]]],[]]


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4