Main class for programmatically interacting with Delta tables. You can create DeltaTable instances using the path of the Delta table.:
deltaTable = DeltaTable.forPath(spark, "/path/to/table")
In addition, you can convert an existing Parquet table in place into a Delta table.:
deltaTable = DeltaTable.convertToDelta(spark, "parquet.`/path/to/table`")
New in version 0.4.
toDF
() → pyspark.sql.dataframe.DataFrame¶
Get a DataFrame representation of this Delta table.
New in version 0.4.
alias
(aliasName: str) → delta.tables.DeltaTable¶
Apply an alias to the Delta table.
New in version 0.4.
generate
(mode: str) → None¶
Generate manifest files for the given delta table.
mode â
mode for the type of manifest file to be generated The valid modes are as follows (not case sensitive):
for Presto and Athena read support.
See the online documentation for more information.
New in version 0.5.
delete
(condition: Union[pyspark.sql.column.Column, str, None] = None) → None¶
Delete data from the table that match the given condition
.
Example:
deltaTable.delete("date < '2017-01-01'") # predicate using SQL formatted string deltaTable.delete(col("date") < "2017-01-01") # predicate using Spark SQL functions
condition (str or pyspark.sql.Column) â condition of the update
New in version 0.4.
update
(condition: Union[pyspark.sql.column.Column, str, None] = None, set: Optional[Dict[str, Union[str, pyspark.sql.column.Column]]] = None) → None¶
Update data from the table on the rows that match the given condition
, which performs the rules defined by set
.
Example:
# condition using SQL formatted string deltaTable.update( condition = "eventType = 'clck'", set = { "eventType": "'click'" } ) # condition using Spark SQL functions deltaTable.update( condition = col("eventType") == "clck", set = { "eventType": lit("click") } )
condition (str or pyspark.sql.Column) â Optional condition of the update
set (dict with str as keys and str or pyspark.sql.Column as values) â Defines the rules of setting the values of columns that need to be updated. Note: This param is required. Default value None is present to allow positional args in same order across languages.
New in version 0.4.
merge
(source: pyspark.sql.dataframe.DataFrame, condition: Union[str, pyspark.sql.column.Column]) → delta.tables.DeltaMergeBuilder¶
Merge data from the source DataFrame based on the given merge condition. This returns a DeltaMergeBuilder
object that can be used to specify the update, delete, or insert actions to be performed on rows based on whether the rows matched the condition or not. See DeltaMergeBuilder
for a full description of this operation and what combinations of update, delete and insert operations are allowed.
Example 1 with conditions and update expressions as SQL formatted string:
deltaTable.alias("events").merge( source = updatesDF.alias("updates"), condition = "events.eventId = updates.eventId" ).whenMatchedUpdate(set = { "data": "updates.data", "count": "events.count + 1" } ).whenNotMatchedInsert(values = { "date": "updates.date", "eventId": "updates.eventId", "data": "updates.data", "count": "1" } ).execute()
Example 2 with conditions and update expressions as Spark SQL functions:
from pyspark.sql.functions import * deltaTable.alias("events").merge( source = updatesDF.alias("updates"), condition = expr("events.eventId = updates.eventId") ).whenMatchedUpdate(set = { "data" : col("updates.data"), "count": col("events.count") + 1 } ).whenNotMatchedInsert(values = { "date": col("updates.date"), "eventId": col("updates.eventId"), "data": col("updates.data"), "count": lit("1") } ).execute()
source (pyspark.sql.DataFrame) â Source DataFrame
condition (str or pyspark.sql.Column) â Condition to match sources rows with the Delta table rows.
builder object to specify whether to update, delete or insert rows based on whether the condition matched or not
New in version 0.4.
vacuum
(retentionHours: Optional[float] = None) → pyspark.sql.dataframe.DataFrame¶
Recursively delete files and directories in the table that are not needed by the table for maintaining older versions up to the given retention threshold. This method will return an empty DataFrame on successful completion.
Example:
deltaTable.vacuum() # vacuum files not required by versions more than 7 days old deltaTable.vacuum(100) # vacuum files not required by versions more than 100 hours old
retentionHours â Optional number of hours retain history. If not specified, then the default retention period of 168 hours (7 days) will be used.
New in version 0.4.
history
(limit: Optional[int] = None) → pyspark.sql.dataframe.DataFrame¶
Get the information of the latest limit commits on this table as a Spark DataFrame. The information is in reverse chronological order.
Example:
fullHistoryDF = deltaTable.history() # get the full history of the table lastOperationDF = deltaTable.history(1) # get the last operation
limit â Optional, number of latest commits to returns in the history.
Tableâs commit history. See the online Delta Lake documentation for more details.
pyspark.sql.DataFrame
New in version 0.4.
detail
() → pyspark.sql.dataframe.DataFrame¶
Get the details of a Delta table such as the format, name, and size.
Example:
detailDF = deltaTable.detail() # get the full details of the table
:return Information of the table (format, name, size, etc.) :rtype: pyspark.sql.DataFrame
New in version 2.1.
convertToDelta
(sparkSession: pyspark.sql.session.SparkSession, identifier: str, partitionSchema: Union[str, pyspark.sql.types.StructType, None] = None) → delta.tables.DeltaTable¶
Create a DeltaTable from the given parquet table. Takes an existing parquet table and constructs a delta transaction log in the base path of the table. Note: Any changes to the table during the conversion process may not result in a consistent state at the end of the conversion. Users should stop any changes to the table before the conversion is started.
Example:
# Convert unpartitioned parquet table at path 'path/to/table' deltaTable = DeltaTable.convertToDelta( spark, "parquet.`path/to/table`") # Convert partitioned parquet table at path 'path/to/table' and partitioned by # integer column named 'part' partitionedDeltaTable = DeltaTable.convertToDelta( spark, "parquet.`path/to/table`", "part int")
sparkSession (pyspark.sql.SparkSession) â SparkSession to use for the conversion
identifier (str) â Parquet table identifier formatted as âparquet.`path`â
partitionSchema â Hive DDL formatted string, or pyspark.sql.types.StructType
DeltaTable representing the converted Delta table
New in version 0.4.
forPath
(sparkSession: pyspark.sql.session.SparkSession, path: str, hadoopConf: Dict[str, str] = {}) → delta.tables.DeltaTable¶
Instantiate a DeltaTable
object representing the data at the given path, If the given path is invalid (i.e. either no table exists or an existing table is not a Delta table), it throws a not a Delta table error.
sparkSession (pyspark.sql.SparkSession) â SparkSession to use for loading the table
hadoopConf (optional dict with str as key and str as value.) â Hadoop configuration starting with âfs.â or âdfs.â will be picked up by DeltaTable to access the file system when executing queries. Other configurations will not be allowed.
loaded Delta table
Example:
hadoopConf = {"fs.s3a.access.key" : "<access-key>", "fs.s3a.secret.key": "secret-key"} deltaTable = DeltaTable.forPath( spark, "/path/to/table", hadoopConf)
New in version 0.4.
forName
(sparkSession: pyspark.sql.session.SparkSession, tableOrViewName: str) → delta.tables.DeltaTable¶
Instantiate a DeltaTable
object using the given table or view name. If the given tableOrViewName is invalid (i.e. either no table exists or an existing table is not a Delta table), it throws a not a Delta table error.
The given tableOrViewName can also be the absolute path of a delta datasource (i.e. delta.`path`), If so, instantiate a DeltaTable
object representing the data at the given path (consistent with the forPath).
sparkSession â SparkSession to use for loading the table
tableOrViewName â name of the table or view
loaded Delta table
Example:
deltaTable = DeltaTable.forName(spark, "tblName")
New in version 0.7.
create
(sparkSession: Optional[pyspark.sql.session.SparkSession] = None) → delta.tables.DeltaTableBuilder¶
Return DeltaTableBuilder
object that can be used to specify the table name, location, columns, partitioning columns, table comment, and table properties to create a Delta table, error if the table exists (the same as SQL CREATE TABLE).
See DeltaTableBuilder
for a full description and examples of this operation.
sparkSession â SparkSession to use for creating the table
an instance of DeltaTableBuilder
New in version 1.0.
createIfNotExists
(sparkSession: Optional[pyspark.sql.session.SparkSession] = None) → delta.tables.DeltaTableBuilder¶
Return DeltaTableBuilder
object that can be used to specify the table name, location, columns, partitioning columns, table comment, and table properties to create a Delta table, if it does not exists (the same as SQL CREATE TABLE IF NOT EXISTS).
See DeltaTableBuilder
for a full description and examples of this operation.
sparkSession â SparkSession to use for creating the table
an instance of DeltaTableBuilder
New in version 1.0.
replace
(sparkSession: Optional[pyspark.sql.session.SparkSession] = None) → delta.tables.DeltaTableBuilder¶
Return DeltaTableBuilder
object that can be used to specify the table name, location, columns, partitioning columns, table comment, and table properties to replace a Delta table, error if the table doesnât exist (the same as SQL REPLACE TABLE).
See DeltaTableBuilder
for a full description and examples of this operation.
sparkSession â SparkSession to use for creating the table
an instance of DeltaTableBuilder
New in version 1.0.
createOrReplace
(sparkSession: Optional[pyspark.sql.session.SparkSession] = None) → delta.tables.DeltaTableBuilder¶
Return DeltaTableBuilder
object that can be used to specify the table name, location, columns, partitioning columns, table comment, and table properties replace a Delta table, error if the table doesnât exist (the same as SQL REPLACE TABLE).
See DeltaTableBuilder
for a full description and examples of this operation.
sparkSession â SparkSession to use for creating the table
an instance of DeltaTableBuilder
New in version 1.0.
isDeltaTable
(sparkSession: pyspark.sql.session.SparkSession, identifier: str) → bool¶
Check if the provided identifier string, in this case a file path, is the root of a Delta table using the given SparkSession.
sparkSession â SparkSession to use to perform the check
path â location of the table
If the table is a delta table or not
bool
Example:
DeltaTable.isDeltaTable(spark, "/path/to/table")
New in version 0.4.
upgradeTableProtocol
(readerVersion: int, writerVersion: int) → None¶
Updates the protocol version of the table to leverage new features. Upgrading the reader version will prevent all clients that have an older version of Delta Lake from accessing this table. Upgrading the writer version will prevent older versions of Delta Lake to write to this table. The reader or writer version cannot be downgraded.
See online documentation and Deltaâs protocol specification at PROTOCOL.md for more details.
New in version 0.8.
restoreToVersion
(version: int) → pyspark.sql.dataframe.DataFrame¶
Restore the DeltaTable to an older version of the table specified by version number.
Example:
io.delta.tables.DeltaTable.restoreToVersion(1)
version â target version of restored table
Dataframe with metrics of restore operation.
pyspark.sql.DataFrame
New in version 1.2.
restoreToTimestamp
(timestamp: str) → pyspark.sql.dataframe.DataFrame¶
Restore the DeltaTable to an older version of the table specified by a timestamp. Timestamp can be of the format yyyy-MM-dd or yyyy-MM-dd HH:mm:ss
Example:
io.delta.tables.DeltaTable.restoreToTimestamp('2021-01-01') io.delta.tables.DeltaTable.restoreToTimestamp('2021-01-01 01:01:01')
timestamp â target timestamp of restored table
Dataframe with metrics of restore operation.
pyspark.sql.DataFrame
New in version 1.2.
optimize
() → delta.tables.DeltaOptimizeBuilder¶
Optimize the data layout of the table. This returns a DeltaOptimizeBuilder
object that can be used to specify the partition filter to limit the scope of optimize and also execute different optimization techniques such as file compaction or order data using Z-Order curves.
See the DeltaOptimizeBuilder
for a full description of this operation.
Example:
deltaTable.optimize().where("date='2021-11-18'").executeCompaction()
an instance of DeltaOptimizeBuilder.
New in version 2.0.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4