Applies to: Databricks SQL Databricks Runtime
Merges a set of updates, insertions, and deletions based on a source table into a target Delta table.
This statement is supported only for Delta Lake tables.
This page contains details for using the correct syntax with the MERGE
command. See Upsert into a Delta Lake table using merge for more guidance on how to use MERGE
operations to manage your data.
[ common_table_expression ]
MERGE [ WITH SCHEMA EVOLUTION ] INTO target_table_name [target_alias]
USING source_table_reference [source_alias]
ON merge_condition
{ WHEN MATCHED [ AND matched_condition ] THEN matched_action |
WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ] THEN not_matched_action |
WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } [...]
matched_action
{ DELETE |
UPDATE SET * |
UPDATE SET { column = { expr | DEFAULT } } [, ...] }
not_matched_action
{ INSERT * |
INSERT (column1 [, ...] ) VALUES ( expr | DEFAULT ] [, ...] )
not_matched_by_source_action
{ DELETE |
UPDATE SET { column = { expr | DEFAULT } } [, ...] }
Parametersâ
Common table expressions (CTE) are one or more named queries which can be reused multiple times within the main query block to avoid repeated computations or to improve readability of complex, nested queries.
WITH SCHEMA EVOLUTION
Applies to: Databricks Runtime 15.2 and above
Enables automatic schema evolution for this MERGE
operation. When enabled, the schema of the target Delta table is automatically updated to match the schema of the source table.
A Table name identifying the table being modified. The table referenced must be a Delta table.
The name must not include an options specification.
The table must not be a foreign table.
A Table alias for the target table. The alias must not include a column list.
A Table name identifying the source table to be merged into the target table.
A Table alias for the source table. The alias must not include a column list.
How the rows from one relation are combined with the rows of another relation. An expression with a return type of BOOLEAN.
WHEN MATCHED [ AND
matched_condition ]
WHEN MATCHED
clauses are executed when a source row matches a target table row based on the merge_condition
and the optional match_condition
.
matched_action
DELETE
Deletes the matching target table row.
Multiple matches are allowed when matches are unconditionally deleted. An unconditional delete is not ambiguous, even if there are multiple matches.
UPDATE
Updates the matched target table row.
To update all the columns of the target Delta table with the corresponding columns of the source dataset, use UPDATE SET *
. This is equivalent to UPDATE SET col1 = source.col1 [, col2 = source.col2 ...]
for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.
Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
You can specify DEFAULT
as expr
to explicitly update the column to its default value.
If there are multiple WHEN MATCHED
clauses, then they are evaluated in the order they are specified. Each WHEN MATCHED
clause, except the last one, must have a matched_condition
. Otherwise, the query returns a NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION error.
If none of the WHEN MATCHED
conditions evaluate to true for a source and target row pair that matches the merge_condition
, then the target row is left unchanged.
WHEN NOT MATCHED [BY TARGET] [ AND
not_matched_condition ]
WHEN NOT MATCHED
clauses insert a row when a source row does not match any target row based on the merge_condition
and the optional not_matched_condition
.
Applies to: Databricks SQL Databricks Runtime 12.2 LTS and above
WHEN NOT MATCHED BY TARGET
can be used as an alias for WHEN NOT MATCHED
.
not_matched_condition
must be a Boolean expression.
INSERT *
Inserts all the columns of the target Delta table with the corresponding columns of the source dataset. This is equivalent to INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...])
for all the columns of the target Delta table. This action requires that the source table has the same columns as those in the target table.
INSERT ( ... ) VALUES ( ... )
The new row is generated based on the specified column and corresponding expressions. All the columns in the target table do not need to be specified. For unspecified target columns, the column default is inserted, or NULL
if none exists.
Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
You can specify DEFAULT
as an expression to explicitly insert the column default for a target column.
If there are multiple WHEN NOT MATCHED
clauses, then they are evaluated in the order they are specified. All WHEN NOT MATCHED
clauses, except the last one, must have not_matched_condition
s. Otherwise, the query returns a NON_LAST_NOT_MATCHED_CLAUSE_OMIT_CONDITION error.
WHEN NOT MATCHED BY SOURCE [ AND
not_matched_by_source_condition ]
Applies to: Databricks SQL Databricks Runtime 12.2 LTS and above
WHEN NOT MATCHED BY SOURCE
clauses are executed when a target row does not match any rows in the source table based on the merge_condition
and the optional not_match_by_source_condition
evaluates to true.
not_matched_by_source_condition
must be a Boolean expression that only references columns from the target table.
not_matched_by_source_action
DELETE
Deletes the target table row.
UPDATE
Updates the target table row. expr
may only reference columns from the target table, otherwise the query will throw an analysis error.
Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
You can specify DEFAULT
as expr
to explicitly update the column to its default value.
important
Adding a WHEN NOT MATCHED BY SOURCE
clause to update or delete target rows when the merge_condition
evaluates to false can lead to a large number of target rows being modified. For best performance, apply not_matched_by_source_condition
s to limit the number of target rows updated or deleted.
If there are multiple WHEN NOT MATCHED BY SOURCE clauses
, then they are evaluated in the order they are specified. Each WHEN NOT MATCHED BY SOURCE
clause, except the last one, must have a not_matched_by_source_condition
. Otherwise, the query returns a NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION error.
If none of the WHEN NOT MATCHED BY SOURCE
conditions evaluate to true for a target row that doesn't match any rows in the source table based on the merge_condition
, then the target row is left unchanged.
important
MERGE
operations fail with a DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE error if more than one row in the source table matches the same row in the target table based on the conditions specified in the ON
and WHEN MATCHED
clauses. According to the SQL semantics of merge, this type of update operation is ambiguous because it is unclear which source row should be used to update the matched target row. You can preprocess the source table to eliminate the possibility of multiple matches. See the change data capture example. This example preprocesses the change dataset (the source dataset) to retain only the latest change for each key before applying that change into the target Delta table. In Databricks Runtime 15.4 LTS and below, MERGE
only considers conditions in the ON
clause before evaluating multiple matches.
You can use MERGE INTO
for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Upsert into a Delta Lake table using merge for a few examples.
WHEN MATCHED
â
SQL
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN DELETE
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.updated_at < source.updated_at THEN UPDATE SET *
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.marked_for_deletion THEN DELETE
WHEN MATCHED THEN UPDATE SET target.updated_at = source.updated_at, target.value = DEFAULT
WHEN NOT MATCHED [BY TARGET]
â
SQL
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED THEN INSERT *
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY TARGET AND source.created_at > now() - INTERVAL â1â DAY THEN INSERT (created_at, value) VALUES (source.created_at, DEFAULT)
WHEN NOT MATCHED BY SOURCE
â
SQL
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY SOURCE THEN DELETE
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY SOURCE AND target.marked_for_deletion THEN DELETE
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET target.value = DEFAULT
WITH SCHEMA EVOLUTION
â
SQL
> MERGE WITH SCHEMA EVOLUTION INTO target USING source
ON source.key = target.key
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *
WHEN NOT MATCHED BY SOURCE THEN DELETE
Related articlesâ
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4