The following release notes provide information about Databricks Runtime 9.1 LTS and Databricks Runtime 9.1 LTS Photon, powered by Apache Spark 3.1.2. Databricks released this version in September 2021. Photon is in Public Preview.
New features and improvementsâestimatedTotalBytesBehindLatest
metricArray and map types are supported in Override schema inference with schema hints for Auto Loader.
Examples of schema hints for arrays include:
arr Array<TYPE>
changes the array type.arr.element TYPE
changes the array type (by using the element
keyword).arr.element.x TYPE
changes a nested field type in an array of structures.The first two examples are hints to change the type of array arr
to TYPE
, but they use different syntax. In the third example, arr
is an array of structures with a field x
. This example shows how to change the type of x
to TYPE
.
Examples of schema hints for maps include:
m Map<KEY-TYPE, VALUE-TYPE>
changes map key and value types.m.key TYPE
changes the type of map keys.m.value TYPE
changes the type of map values.m.key.x TYPE
changes the field type in a map key.m.value.x TYPE
changes the field type of a map value.The first example changes both the key and value types of map m
to KEY_TYPE
and VALUE_TYPE
respectively. The second and third examples can be used if only the key type or only the value type needs to be changed. In the fourth and fifth examples, m
is a map with key and value of structure types with a field x
. This example shows how to change the type of x
to TYPE
.
The Avro file format now supports the mergeSchema
option when reading files. Setting mergeSchema
to true when reading Avro files will infer a schema from a set of Avro files rather than from a single file. This improves usability by inferring a schema that may be able to read all files even if their individual schemas differ. See Configuration.
In the case of lexicographically generated files, What is Auto Loader? now leverages lexical file ordering and existing optimized APIs to make the directory listing more efficient by listing from previously-ingested files rather than by listing the entire directory. Auto Loader automatically detects whether a given directory is suitable for incremental listing by default. To control this behavior explicitly, set the new cloudFiles.useIncrementalListing
option to on (true
), off (false
), or automatic (auto
). If you set this behavior to true
, you can also set the cloudFiles.backfillInterval
option to schedule regular backfills over your data, to make sure all of your data is completely ingested.
In combination with overwrite mode, the replaceWhere
option can be used to simultaneously overwrite data that matches a predicate defined in the option. Previously, replaceWhere
supported a predicate only over partition columns, but it can now be an arbitrary expression. See Write to a table.
Auto Loader now supports file notification mode on Google Cloud. Set .option("cloudFiles.useNotifications", "true")
to allow Auto Loader to automatically set up Google Cloud Pub/Sub resources for you. With file notification mode, new files are detected and ingested as they arrive without listing the input directory. See Configure Auto Loader streams in file notification mode.
In addition to creating a scalar function that returns a scalar value, you can now create a table function that returns a set of rows. See CREATE FUNCTION (SQL and Python).
Kafka Streaming Source now reportsestimatedTotalBytesBehindLatest
metricâ
The Kafka streaming source now reports an estimate of how many bytes the consumer is behind the latest available byte after every batch. You can use this metric to track stream progress. See Retrieve Kafka metrics.
Example metric output:
StreamingQueryProgress {
"batchId": 0,
.....
"sources": [ {
"description" : "KafkaV2[Subscribe[topic-0]]",
"metrics":{
"avgOffsetsBehindLatest" : "1.0",
"estimatedTotalBytesBehindLatest" : "80.0", // new
"maxOffsetsBehindLatest" : "1",
"minOffsetsBehindLatest" : "1"
} ],
....
}
For structs inside of arrays, Delta MERGE INTO now resolves struct fields by name and evolves struct schemasâ
Delta MERGE INTO
now supports resolution of struct fields by name and automatic schema evolution for arrays of structs. When automatic schema evolution is enabled by setting spark.databricks.delta.schema.autoMerge.enabled
to true, UPDATE
and INSERT
clauses will resolve struct fields inside of an array by name, casting to the corresponding data type that is defined in the target array and filling additional or missing fields in the source or target with null values. When automatic schema evolution is disabled, UPDATE
and INSERT
clauses will resolve struct fields inside of an array by name but will not be able to evolve the additional fields. See Update Delta Lake table schema.
Fixed a memory leak in the Amazon S3 connector that could happen in long running jobs or services, which was caused by JVM DeleteOnExit
functionality.
Databricks Runtime 9.1 LTS includes Apache Spark 3.1.2. This release includes all Spark fixes and improvements included in Databricks Runtime 9.0 (EoS), as well as the following additional bug fixes and improvements made to Spark:
udf
return valuepath
property when reading Hive tablesSee Databricks Runtime 9.1 LTS maintenance updates.
System environmentâR libraries are installed from the Microsoft CRAN snapshot on 2021-09-08. The snapshot is no longer available.
Installed Java and Scala libraries (Scala 2.12 cluster version)âRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4