A streaming table is a table with support for streaming or incremental data processing. Streaming tables are backed by Lakeflow Declarative Pipelines. Each time an streaming table is refreshed, data added to the source tables is appended to the streaming table. You can refresh streaming tables manually or on a schedule.
CREATE [OR REFRESH] [PRIVATE] STREAMING TABLE
table_name
[ table_specification ]
[ table_clauses ]
[ AS query ]
table_specification
( { column_identifier column_type [column_properties] } [, ...]
[ column_constraint ] [, ...]
[ , table_constraint ] [...] )
column_properties
{ NOT NULL | COMMENT column_comment | column_constraint | MASK clause } [ ... ]
table_clauses
{ USING DELTA
PARTITIONED BY (col [, ...]) |
CLUSTER BY clause |
LOCATION path |
COMMENT view_comment |
TBLPROPERTIES clause |
WITH { ROW FILTER clause } } [ ... ]
REFRESH
If specified, will create the table, or update an existing table and its content.
PRIVATE
Creates a private streaming table.
Private streaming tables were previously created with the TEMPORARY
parameter.
table_name
The name of the newly created table. The fully qualified table name must be unique.
table_specification
This optional clause defines the list of columns, their types, properties, descriptions, and column constraints.
The column names must be unique and map to the output columns of the query.
Specifies the column's data type. Not all data types supported by Databricks are supported by streaming tables.
column_comment
An optional STRING
literal describing the column. This option must be specified along with column_type
. If the column type is not specified, the column comment is skipped.
Adds a constraint that validates data as it flows into the table. See Manage data quality with pipeline expectations.
Adds a column mask function to anonymize sensitive data.
table_constraint
When specifying a schema, you can define primary and foreign keys. The constraints are informational and are not enforced. See the CONSTRAINT clause in the SQL language reference.
note
To define table constraints, your pipeline must be a Unity Catalog-enabled pipeline.
table_clauses
Optionally specify partitioning, comments, and user defined properties for the table. Each sub clause may only be specified once.
USING DELTA
Specifies the data format. The only option is DELTA.
This clause is optional, and defaults to DELTA.
PARTITIONED BY
An optional list of one or more columns to use for partitioning in the table. Mutually exclusive with CLUSTER BY
.
Liquid clustering provides a flexible, optimized solution for clustering. Consider using CLUSTER BY
instead of PARTITIONED BY
for Lakeflow Declarative Pipelines.
CLUSTER BY
Enable liquid clustering on the table and define the columns to use as clustering keys. Use automatic liquid clustering with CLUSTER BY AUTO
, and Databricks intelligently chooses clustering keys to optimize query performance. Mutually exclusive with PARTITIONED BY
.
LOCATION
An optional storage location for table data. If not set, the system will default to the pipeline storage location.
COMMENT
An optional STRING
literal to describe the table.
TBLPROPERTIES
An optional list of table properties for the table.
WITH ROW FILTER
Adds a row filter function to the table. Future queries for that table receive a subset of the rows for which the function evaluates to TRUE. This is useful for fine-grained access control, because it allows the function to inspect the identity and group memberships of the invoking user to decide whether to filter certain rows.
See ROW FILTER
clause.
This clause populates the table using the data from query
. This query must be a streaming query. Use the STREAM keyword to use streaming semantics to read from the source. If the read encounters a change or deletion to an existing record, an error is thrown. It is safest to read from static or append-only sources. To ingest data that has change commits, you can use Python and the SkipChangeCommits
option to handle errors.
When you specify a query
and a table_specification
together, the table schema specified in table_specification
must contain all the columns returned by the query
, otherwise you get an error. Any columns specified in table_specification
but not returned by query
return null
values when queried.
For more information on streaming data, see Transform data with pipelines.
For a user to be able to update the pipeline the streaming table is defined within, they require:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4