The following release notes provide information about Databricks Runtime 16.4 LTS, powered by Apache Spark 3.5.2.
Databricks released this LTS version in May 2025. There are 2 variants for this release, one supporting Scala 2.12 and one supporting Scala 2.13.
Starting with DBR 17 (Spark 4), only Scala 2.13 will be supported. To help you transition, two images are available in 16.4 LTS: one with Scala 2.12 and one with Scala 2.13. Use the 2.13 image to test and update your code base to Scala 2.13 before migrating to DBR 17.
Scala 2.13 migration guidanceâIf your code uses the sparklyr
R library, you must use the image that supports Scala 2.12 instead.
Databricks Runtime considers a breaking change major when it requires you to make significant code changes to support it.
Collection incompatibility: Read this official Scala docs page for details on migrating collections to Scala 2.13. If your code uses an earlier version of Scala code, collections will be primary source of incompatibilities when using Databricks, particularly with API parameters and return types.
Hash algorithm: When reviewing your code built with Scala 2.12, do not rely on the implicit order of data structures that do not guarantee ordering, such as HashMap
and Set
. Collections that do implicit ordering may order their elements differently when run under 2.13 (vs. 2.12) when you iterate through them.
Databricks Runtime considers a breaking change minor when there are no specific error messages raised due to version changes, but your code would generally fail to compile under the version. In this case, the error messages may provide sufficient information for updating your code.
+
operator for string concatenation when used with a non-String
type on the left, and the postfix operator (use dot notation instead). For more details, see: Scala dropped features in the official Scala docs.f: Foo = Foo(1)
may become f: Foo = Foo(i = 1)
in some messages.The list of library versions supported by each Scala version of Databricks Runtime 16.4 LTS:
New features and improvementsâlistagg
and string_agg
functionsMERGE INTO
to tables with fine-grained access control on dedicated compute is now generally available (GA)Dashboards, alerts, and queries are now supported as workspace files. You can now programmatically interact with these Databricks objects from anywhere the workspace filesystem is available, including writing, reading, and deleting them like any other file. To learn more, see What are workspace files? and Programmatically interact with workspace files.
Liquid clustering auto-compaction improvementâUnity Catalog-managed liquid clustering tables now trigger auto-compaction to automatically reduce small file problems between OPTIMIZE
runs.
For more details, see Auto compaction for Delta Lake on Databricks.
Auto Loader can now clean processed files in the source directoryâCustomers can now instruct Auto Loader to automatically move or delete files that have been processed. Opt in to this feature by using the cloudFiles.cleanSource
Auto Loader option.
For more details, see Auto Loader options under cloudFiles.cleanSource
.
This release adds support for streaming from a Delta table that has type-widened column data, and for sharing a Delta table with type widening enabled using Databricks-to-Databricks Delta Sharing. The type widening feature is currently in Public Preview.
For more details, see Type widening.
IDENTIFIER support now available in DBSQL for catalog operationsâDatabricks customers can now use the IDENTIFIER
clause when performing the following catalog operations:
CREATE CATALOG
DROP CATALOG
COMMENT ON CATALOG
ALTER CATALOG
This new syntax allows customers to dynamically specify catalog names using parameters defined for these operations, enabling more flexible and reusable SQL workflows. As an example of the syntax, consider CREATE CATALOG IDENTIFIER(:param)
where param
is a parameter provided to specify a catalog name.
For more details, see IDENTIFIER clause.
Collated expressions now provide autogenerated transient aliasesâAutogenerated aliases for collated expressions will now always deterministically incorporate COLLATE
information. Autogenerated aliases are transient (unstable) and should not be relied on. Instead, as a best practice, use expression AS alias
consistently and explicitly.
In Databricks Runtime 16.4 LTS, Databricks added support for filter pushdown to Python data source batch read as an a API similar to SupportsPushDownFilters
interface. You can now implement DataSourceReader.pushFilters
to receive filters that may be pushed down. You can also implement this API to provide logic to select filters to push down, to track them, and to return the remaining filters for application by Apache Spark.
Filter pushdown allows the data source to handle a subset of filters. This can improve performance by reducing the amount of data that needs to be processed by Spark.
Filter pushdown is only supported for batch reads, not for streaming reads. The new API must be added to DataSourceReader
and not to DataSource
or DataSourceStreamReader
. The list of filters must be interpreted as the logical AND
of the elements in your implementation. This method is called once during query planning. In the default implementation, it returns all filters which indicates that no filters can be pushed down. Use subclasses to override this method and implement your logic for filter pushdown.
Initially and to keep the API simple, Databricks only supports V1 filters that have a column, a boolean operator, and a literal value. The filter serialization is a placeholder and will be implemented in a future PR. For example:
Python
class DataSourceReader(ABC):
...
def pushFilters(self, filters: List["Filter"]) -> Iterable["Filter"]:
Databricks recommends you implement this method only for data sources that natively support filtering, such as databases and GraphQL APIs.
Python UDF traceback improvementâThe Python UDF traceback now includes frames from both the driver and executor along with client frames, resulting in better error messages that show greater and more relevant details (such as the line content of frames inside a UDF).
UNION/EXCEPT/INTERSECT inside a view and EXECUTE IMMEDIATE now return correct resultsâQueries for temporary and persistent view definitions with top-level UNION
/EXCEPT
/INTERSECT
and un-aliased columns previously returned incorrect results because UNION
/EXCEPT
/INTERSECT
keywords were considered aliases. Now those queries will correctly perform the whole set operation.
EXECUTE IMMEDIATE ... INTO
with a top-level UNION
/EXCEPT
/INTERSECT
and un-aliased columns also wrote an incorrect result of a set operation into the specified variable due to the parser interpreting these keywords as aliases. Similarly, SQL queries with invalid tail text were also allowed. Set operations in these cases now write a correct result into the specified variable, or fail in case of invalid SQL text.
Since DBR 16.4.0, reading from a file source table will correctly respect query options, e.g. delimiters. Previously, the first query plan was cached and subsequent option changes ignored. To restore the previous behavior, set spark.sql.legacy.readFileSourceTableCacheIgnoreOptions
to true
.
listagg
and string_agg
functionsâ
Starting with this release you can use the listagg
or string_agg
functions to aggregate STRING
and BINARY
values within a group. See string_agg for more details.
MERGE INTO
to tables with fine-grained access control on dedicated compute is now generally available (GA)â
In Databricks Runtime 16.3 and above, dedicated compute supports MERGE INTO
to Unity Catalog tables that use fine-grained access control. This feature is now generally available.
See Fine-grained access control on dedicated compute.
DBR 16.4 LTS behavioral changesâDESCRIBE DETAIL {table}
will now show the clusterByAuto
status of the table (true or false) next to the current clustering columns. For more details on clusterByAuto
, see:Automatic liquid clustering.
This update ensures table reads respect options set for all data source plans when cached, not just the first cached table read.
Previously, data source table reads cached the first plan but failed to account for different options in subsequent queries.
For example, the following query:
spark.sql("CREATE TABLE t(a string, b string) USING CSV".stripMargin)
spark.sql("INSERT INTO TABLE t VALUES ('a;b', 'c')")
spark.sql("SELECT * FROM t").show()
spark.sql("SELECT * FROM t WITH ('delimiter' = ';')")
would produce this output:
+----+----+
|col1|col2|
+----+----+
| a;b| c |
+----+----+
+----+----+
|col1|col2|
+----+----+
| a;b| c |
+----+----+
With this fix, it now returns the expected output:
+----+----+
|col1|col2|
+----+----+
| a;b| c |
+----+----+
+----+----+
|col1|col2|
+----+----+
| a | b,c|
+----+----+
If your workloads have dependencies on the previous incorrect behavior, you may see different results after this change.
Moved redaction rule from analyzer to optimizerâPreviously, DataFrames could create tables that contained redacted values when valid SECRET SQL functions were used. This change removes redaction when saving DataFrames with valid secret access to a table, and the redaction rule has moved from the analyzer to the optimizer.
variant_get and get_json_object now consider leading spaces in paths in Apache SparkâPrior to this change, leading whitespaces and tabs in paths in the variant_get
and get_json_object
expressions were being ignored with Photon disabled. For example, select get_json_object('{" key": "value"}', '$[' key']')
would not be effective in extracting the value of " key"
. However, users will be able to extract such keys.
Previously, users could disable source materialization in MERGE by setting merge.materializeSource
to none
. With the new flag enabled, this will be forbidden and cause an error. Databricks plans to enable the flag only for customers who haven't used this configuration flag before, so no customer should notice any change in behavior.
The partition metadata log feature has been changed so that once a table is created with spark.databricks.nonDelta.partitionLog.enabled = true
, you can anchor it to a table so a cluster doesn't set spark.databricks.nonDelta.partitionLog.enabled = true
for all tables processed by the cluster.
Updated the dependency snowflake-jdbc
from 3.16.1 to 3.22.0. This may impact users if they directly use the l3.16.1 version of the library.
Customers cannot use databricks-connect 16.1+ and Apache Spark⢠3.5.x together in the same application because of significant discrepancies in API behavior between Json4s version 3.7.0-M11 and version 4.0.7. To address this, Databricks has downgraded Json4s to 3.7.0-M11.
Library upgrades (applies to the Scala 2.12 image only)âThe Scala 2.13 Databricks Runtime release is considered a "new" version and may have different library versions from 2.12. Refer to the table below for specific library versions in that release image. It does not include sparklyr
in this release.
Databricks Runtime 16.4 LTS includes Apache Spark 3.5.2. This release includes all Spark fixes and improvements included in Databricks Runtime 16.3, as well as the following additional bug fixes and improvements made to Spark:
scala.collection.Set
instead of Set in ValidateExternalTypeSpark Master
Environment page support filtersif
branch from TaskSchedulerImpl#statusUpdate
ExplainUtils.generateFieldString
to directly call QueryPlan.generateFieldString
StreamingPythonRunnerInitializationException
to PySpark base exceptionUnstable
from SparkSessionExtensionsProvider
traitQueryExecutionMetering
instantiationnonEmpty
/isEmpty
for empty check for explicit Iterable
HigherOrderFunction
InSubquery
in InTypeCoercion
if there are no type changesto_pandas
on an empty tableColumnDefinition.toV1Column
should preserve EXISTS_DEFAULT
resolutionoperation_id
Databricks supports ODBC/JDBC drivers released in the past 2 years. Download the recently released drivers here:
System environmentâR libraries are installed from the Posit Package Manager CRAN snapshot on 2024-08-04.
note
sparklyr is only supported in the Databricks Runtime 16.4 LTS release image with support for Scala 2.12. It is not supported in the DBR 16.4 release image with Scala 2.13 support.
Installed Java and Scala libraries (Scala 2.13 cluster version)â Installed Java and Scala libraries (Scala 2.12 cluster version)âRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4