Stay organized with collections Save and categorize content based on your preferences.
Transform data with data manipulation language (DML)The BigQuery data manipulation language (DML) lets you to update, insert, and delete data from your BigQuery tables.
You can execute DML statements just as you would a SELECT
statement, with the following conditions:
For more information about how to compute the number of bytes processed by a DML statement, see On-demand query size calculation.
LimitationsEach DML statement initiates an implicit transaction, which means that changes made by the statement are automatically committed at the end of each successful DML statement.
Rows that were recently written using the tabledata.insertall
streaming method can't be modified with data manipulation language (DML), such as UPDATE
, DELETE
, MERGE
, or TRUNCATE
statements. The recent writes are those that occurred within the last 30 minutes. All other rows in the table remain modifiable by using UPDATE
, DELETE
, MERGE
, or TRUNCATE
statements. The streamed data can take up to 90 minutes to become available for copy operations.
Alternatively, rows that were recently written using the Storage Write API can be modified using UPDATE
, DELETE
, or MERGE
statements. For more information, see Use data manipulation language (DML) with recently streamed data.
Correlated subqueries within a when_clause
, search_condition
, merge_update_clause
or merge_insert_clause
are not supported for MERGE
statements.
Queries that contain DML statements cannot use a wildcard table as the target of the query. For example, a wildcard table can be used in the FROM
clause of an UPDATE
query, but a wildcard table cannot be used as the target of the UPDATE
operation.
The following sections describe the different types of DML statements and how you can use them.
INSERT
statement
Use the INSERT
statement to add new rows to an existing table. The following example inserts new rows into the table dataset.Inventory
with explicitly specified values.
INSERT dataset.Inventory (product, quantity)
VALUES('whole milk', 10),
('almond milk', 20),
('coffee beans', 30),
('sugar', 0),
('matcha', 20),
('oat milk', 30),
('chai', 5)
/+-------------------+----------+
| product | quantity |
+-------------------+----------+
| almond milk | 20 |
| chai | 5 |
| coffee beans | 30 |
| matcha | 20 |
| oat milk | 30 |
| sugar | 0 |
| whole milk | 10 |
+-------------------+----------+/
For more information about INSERT statements, see INSERT
statement.
DELETE
statement
Use the DELETE
statement to delete rows in a table. The following example deletes all rows in the table dataset.Inventory
that has the quantity
value 0
.
DELETE dataset.Inventory
WHERE quantity = 0
/+-------------------+----------+
| product | quantity |
+-------------------+----------+
| almond milk | 20 |
| chai | 5 |
| coffee beans | 30 |
| matcha | 20 |
| oat milk | 30 |
| whole milk | 10 |
+-------------------+----------+/
To delete all rows in a table, use the TRUNCATE TABLE
statement instead. For more information about DELETE
statements, see DELETE
statement.
TRUNCATE
statement
Use the TRUNCATE statement to remove all rows from a table but leaving the table metadata intact, including table schema, description, and labels. The following example removes all rows from the table dataset.Inventory
.
TRUNCATE dataset.Inventory
To delete specific rows in a table. Use the DELETE statement instead. For more information about the TRUNCATE statement, see TRUNCATE
statement.
UPDATE
statement
Use the UPDATE
statement to update existing rows in a table. The UPDATE
statement must also include the WHERE keyword to specify a condition. The following example reduces the quantity
value of rows by 10 for products that contain the string milk
.
UPDATE dataset.Inventory
SET quantity = quantity - 10,
WHERE product LIKE '%milk%'
/+-------------------+----------+
| product | quantity |
+-------------------+----------+
| almond milk | 10 |
| chai | 5 |
| coffee beans | 30 |
| matcha | 20 |
| oat milk | 20 |
| whole milk | 0 |
+-------------------+----------+/
UPDATE
statements can also include FROM
clauses to include joined tables. For more information about UPDATE
statements, see UPDATE
statement.
MERGE
statement
The MERGE statement combines the INSERT
, UPDATE
, and DELETE
operations into a single statement and performs the operations atomically to merge data from one table to another. For more information and examples about the MERGE statement, see MERGE
statement.
BigQuery manages the concurrency of DML statements that add, modify, or delete rows in a table.
Note: DML statements are subject to rate limits such as the maximum rate of table writes. You might hit a rate limit if you submit a high number of jobs against a table at one time. These rates do not limit the total number of DML statements that can be run. If you get an error message that says you've exceeded a rate limit, retry the operation using exponential backoff between retries. INSERT DML concurrencyDuring any 24 hour period, the first 1500 INSERT
statements run immediately after they are submitted. After this limit is reached, the concurrency of INSERT
statements that write to a table is limited to 10. Additional INSERT
statements are added to a PENDING
queue. Up to 100 INSERT
statements can be queued against a table at any given time. When an INSERT
statement completes, the next INSERT
statement is removed from the queue and run.
If you must run DML INSERT
statements more frequently, consider streaming data to your table using the Storage Write API.
The UPDATE
, DELETE
, and MERGE
DML statements are called mutating DML statements. If you submit one or more mutating DML statements on a table while other mutating DML jobs on it are still running (or pending), BigQuery runs up to 2 of them concurrently, after which up to 20 are queued as PENDING
. When a previously running job finishes, the next pending job is dequeued and run. Queued mutating DML statements share a per-table queue with maximum length 20. Additional statements past the maximum queue length for each table fail with the error message: Resources exceeded during query execution: Too many DML statements outstanding against table PROJECT_ID:DATASET.TABLE, limit is 20.
Interactive priority DML jobs that are queued for more than 7 hours fail with the following error message:
DML statement has been queued for too long
Mutating DML statements that run concurrently on a table cause DML statement conflicts when the statements try to mutate the same partition. The statements succeed as long as they don't modify the same partition. BigQuery tries to rerun failed statements up to three times.
An INSERT
DML statement that inserts rows to a table doesn't conflict with any other concurrently running DML statement.
A MERGE
DML statement does not conflict with other concurrently running DML statements as long as the statement only inserts rows and does not delete or update any existing rows. This can include MERGE
statements with UPDATE
or DELETE
clauses, as long as those clauses aren't invoked when the query runs.
Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. You can process personal data for this feature as outlined in the Cloud Data Processing Addendum, subject to the obligations and restrictions described in the agreement under which you access Google Cloud. Pre-GA features are available "as is" and might have limited support. For more information, see the launch stage descriptions.
Note: To provide feedback or request support for this feature, send an email to bq-fine-grained-dml-feedback@google.com.Fine-grained DML is a performance enhancement designed to optimize the execution of UPDATE
, DELETE
, and MERGE
statements (also known as mutating DML statements). Without fine-grained DML enabled, mutations are performed at the file-group level, which can lead to inefficient data rewrites. Fine-grained DML introduces a more granular approach that aims to reduce the amount of data that needs to be rewritten, and to reduce overall slot consumption.
To enable fine-grained DML, set the enable_fine_grained_mutations
table option to TRUE
when you run a CREATE TABLE
or ALTER TABLE
DDL statement.
To create a new table with fine-grained DML, use the CREATE TABLE
statement:
CREATE TABLE mydataset.mytable ( product STRING, inventory INT64) OPTIONS(enable_fine_grained_mutations = TRUE);
To alter an existing table with fine-grained DML, use the ALTER TABLE
statement:
ALTER TABLE mydataset.mytable SET OPTIONS(enable_fine_grained_mutations = TRUE);
To alter all existing tables in a dataset with fine-grained DML, use the ALTER TABLE
statement:
FOR record IN (SELECT CONCAT(table_schema, '.', table_name) AS table_path FROM mydataset.INFORMATION_SCHEMA.TABLES) DO EXECUTE IMMEDIATE "ALTER TABLE"
|| record.table_path || "
SET OPTIONS(enable_fine_grained_mutations = TRUE)"; END FOR;
After the enable_fine_grained_mutations
option is set to TRUE
, mutating DML statements are run with fine-grained DML capabilities enabled and use existing DML statement syntax.
To disable fine-grained DML on a table, set enable_fine_grained_mutations
to FALSE
by using the ALTER TABLE
DDL statement.
To determine if a table has been enabled with fine-grained DML, query the INFORMATION_SCHEMA.TABLES
view. The following example checks which tables within a dataset have been enabled with this feature:
SELECT table_schema AS datasetId, table_name AS tableId, is_fine_grained_mutations_enabled FROM DATASET_NAME.INFORMATION_SCHEMA.TABLES;
Replace DATASET_NAME
with the name of the dataset in which to check if any tables have fine-grained DML enabled.
Enabling fine-grained DML for a table can incur additional BigQuery storage costs to store the extra mutation metadata that is associated with fine-grained DML operations. The actual cost depends on the amount of data that is modified, but for most situations it's expected to be negligible in comparison to the size of the table itself.
Fine-grained DML operations process deleted data in a hybrid approach that distributes rewrite costs across numerous table mutations. Each DML operation processes a portion of the deleted data, and the system also offloads the remaining processing to the background. These deleted data processing jobs incur additional BigQuery compute costs.
You can use BigQuery reservations to allocate dedicated BigQuery compute resources for these offline deleted data processing jobs. Reservations let you set a cap on the cost of performing these operations. This approach is particularly useful for very large tables with frequent fine-grained mutating DML operations, which otherwise would have high on-demand costs due to the large number of bytes processed when performing each offline fine-grained deleted data processing job.
Offline fine-grained deleted data processing jobs are considered background jobs and use the BACKGROUND
assignment type, rather than the QUERY
assignment type. Projects that perform fine-grained DML operations without a BACKGROUND
assignment process deleted data using on-demand pricing.
For projects configured to use on-demand compute pricing, fine-grained DML statements won't reduce scanned bytes.
To find the offline fine-grained DML deleted data processing jobs:
SELECT * FROM region-us.INFORMATION_SCHEMA.JOBS WHERE job_id LIKE "%fine_grained_mutation_garbage_collection%"
If the table receives a large number of DML jobs, then the deleted data processing could be handled entirely by DMLs, eliminating the need for background processing.
Deleted data considerationsProjects that perform fine-grained DML operations with a BACKGROUND
assignment process deleted data using slots, and are subject to the configured reservation's resource availability. If there aren't enough resources available within the configured reservation, processing deleted data might take longer than anticipated.
Projects that perform fine-grained DML operations by using on-demand pricing, or without a BACKGROUND
assignment, process deleted data by using the on-demand pricing and regularly have deleted data processed using internal BigQuery resources.
Tables enabled with fine-grained DML are subject to the following limitations:
tabledata.list
method to read content from a table with fine-grained DML enabled. Instead, query the table with a SELECT
statement to read table records.UPDATE
, DELETE
, or MERGE
statement.UPDATE
, DELETE
, or MERGE
statement.CREATE TEMP TABLE
statement.For best performance, Google recommends the following patterns:
Avoid submitting large numbers of individual row updates or insertions. Instead, group DML operations together when possible. For more information, see DML statements that update or insert single rows.
If updates or deletions generally happen on older data, or within a particular range of dates, consider partitioning your tables. Partitioning ensures that the changes are limited to specific partitions within the table.
Avoid partitioning tables if the amount of data in each partition is small and each update modifies a large fraction of the partitions.
If you often update rows where one or more columns fall within a narrow range of values, consider using clustered tables. Clustering ensures that changes are limited to specific sets of blocks, reducing the amount of data that needs to be read and written. The following is an example of an UPDATE
statement that filters on a range of column values:
UPDATE mydataset.mytable SET string_col = 'some string' WHERE id BETWEEN 54 AND 75;
Here is a similar example that filters on a small list of column values:
UPDATE mydataset.mytable SET string_col = 'some string' WHERE id IN (54, 57, 60);
Consider clustering on the id
column in these cases.
If you need OLTP functionality, consider using Cloud SQL federated queries, which enable BigQuery to query data that resides in Cloud SQL.
For best practices to optimize query performance, see Introduction to optimizing query performance.
What's nextRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4