RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://docs.databricks.com/aws/en/udf/scala below:

User-defined scalar functions - Scala

This article contains Scala user-defined function (UDF) examples. It shows how to register UDFs, how to invoke UDFs, and caveats regarding evaluation order of subexpressions in Spark SQL. See External user-defined scalar functions (UDFs) for more details.

note

Scala UDFs on Unity Catalog-enabled compute resources with standard access mode (formerly shared access mode) requires Databricks Runtime 14.2 and above.

Graviton instance support for Scala UDFs on Unity Catalog-enabled clusters is available in Databricks Runtime 15.2 and above.

Scala

val squared = (s: Long) => {
  s * s
}
spark.udf.register("square", squared)

Call the UDF in Spark SQLâ

Scala

spark.range(1, 20).createOrReplaceTempView("test")

SQL

%sql select id, square(id) as id_squared from test

Use UDF with DataFramesâ

Scala

import org.apache.spark.sql.functions.{col, udf}
val squared = udf((s: Long) => s * s)
display(spark.range(1, 20).select(squared(col("id")) as "id_squared"))

Evaluation order and null checkingâ

Spark SQL (including SQL and the DataFrame and Dataset APIs) does not guarantee the order of evaluation of subexpressions. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order. For example, logical AND and OR expressions do not have left-to-right âshort-circuitingâ semantics.

Therefore, it is dangerous to rely on the side effects or order of evaluation of Boolean expressions, and the order of WHERE and HAVING clauses, since such expressions and clauses can be reordered during query optimization and planning. Specifically, if a UDF relies on short-circuiting semantics in SQL for null checking, there's no guarantee that the null check will happen before invoking the UDF. For example,

Scala

spark.udf.register("strlen", (s: String) => s.length)
spark.sql("select s from test1 where s is not null and strlen(s) > 1")

This WHERE clause does not guarantee the strlen UDF to be invoked after filtering out nulls.

To perform proper null checking, we recommend that you do either of the following:

Make the UDF itself null-aware and do null checking inside the UDF itself
Use IF or CASE WHEN expressions to do the null check and invoke the UDF in a conditional branch

Scala

spark.udf.register("strlen_nullsafe", (s: String) => if (s != null) s.length else -1)
spark.sql("select s from test1 where s is not null and strlen_nullsafe(s) > 1") 
spark.sql("select s from test1 where if(s is not null, strlen(s), null) > 1")

Typed Dataset APIsâ

note

This feature is supported on Unity Catalog-enabled clusters with standard access mode in Databricks Runtime 15.4 and above.

Typed Dataset APIs allow you to run transformations such as map, filter, and aggregations on resulting Datasets with a user-defined function.

For example, the following Scala application uses the map() API to modify a number in a result column to a prefixed string.

Scala

spark.range(3).map(f => s"row-$f").show()

While this example uses the map() API, this also applies to other typed Dataset APIs, such as filter(), mapPartitions(), foreach(), foreachPartition(), reduce(), and flatMap().

Scala UDF features and Databricks Runtime compatibilityâ

The following Scala features require minimum Databricks Runtime versions when used on Unity Catalog enabled clusters in standard (shared) access mode.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4