Predictive I/O is a collection of Databricks optimizations that improve performance for data interactions. Predictive I/O capabilities are grouped into the following categories:
Predictive I/O is exclusive to the Photon engine on Databricks.
Use predictive I/O to accelerate readsâPredictive I/O is used to accelerate data scanning and filtering performance for all operations on supported compute types.
important
Predictive I/O reads are supported by the serverless and pro types of SQL warehouses, and Photon-accelerated clusters running Databricks Runtime 11.3 LTS and above.
Predictive I/O improves scanning performance by applying deep learning techniques to do the following:
Predictive I/O for updates are used automatically for all tables that have deletion vectors enabled using the following Photon-enabled compute types:
note
Support for predictive I/O for updates is present in Databricks Runtime 12.2 LTS and above, but Databricks recommends using 14.0 and above for best performance.
See What are deletion vectors?.
important
A workspace admin setting controls whether deletion vectors are auto-enabled for new Delta tables. See Auto-enable deletion vectors.
You enable support for deletion vectors on a Delta Lake table by setting a Delta Lake table property. You enable deletion vectors during table creation or alter an existing table, as in the following examples:
SQL
CREATE TABLE <table-name> [options] TBLPROPERTIES ('delta.enableDeletionVectors' = true);
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
Predictive I/O leverages deletion vectors to accelerate updates by reducing the frequency of full file rewrites during data modification on Delta tables. Predictive I/O optimizes DELETE
, MERGE
, and UPDATE
operations.
Rather than rewriting all records in a data file when any record is updated or deleted, predictive I/O uses deletion vectors to indicate records have been removed from the target data files. Supplemental data files are used to indicate updates.
Subsequent reads on the table resolve current table state by applying the noted changes to the most recent table version.
important
Predictive I/O updates share all limitations with deletion vectors. In Databricks Runtime 12.2 LTS and greater, the following limitations exist:
REORG TABLE ... APPLY (PURGE)
and ensure no concurrent write operations are running in order to generate a manifest.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4