CLUSTER BY
clause (TABLE)
Applies to: Databricks SQL Databricks Runtime 13.3 LTS and above Delta Lake only
Defines liquid, multi-dimensional clustering for a relation.
Databricks recommends using automatic liquid clustering and predictive optimization for all Unity Catalog managed tables. These features provide intelligent optimization of data layout based on your data usage patterns.
You can use this clause when you:
Updated rows do not get automatically re-clustered. Run OPTIMIZE to re-cluster updated rows..
For more information on liquid clustering see Use liquid clustering for tables
SyntaxâCLUSTER BY { ( column_name [, ...] ) |
AUTO |
NONE }
Parametersâ
Specifies columns of the relation by which to cluster the data. The column order does not matter. To benefit from altering clustering you should run OPTIMIZE.
AUTO
Applies to: Databricks SQL Databricks Runtime 15.4 and above
Directs Delta Lake to automatically determine and over time adapt to the best columns to cluster by. For more information on liquid clustering see Use liquid clustering for tables.
NONE
Turns off clustering for the relation being altered. Newly inserted or updated data will not be clustered by OPTIMIZE. To not use clustering when creating a relation, omit the CLUSTER BY
clause.
You can find more examples in Use liquid clustering for tables.
SQL
-- Create a table with a clustering column
> CREATE TABLE t(a int, b string) CLUSTER BY (a);
-- The clustering of an existing Delta table to add a second dimension
> ALTER TABLE t CLUSTER BY (a, b);
-- Recluster the table
> OPTIMIZE t;
-- Remove the clustering
> ALTER TABLE t CLUSTER BY NONE;
Related articlesâ
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4