Applies to: Databricks SQL Databricks Runtime
Creates a SQL scalar or table function that takes a set of arguments and returns a scalar value or a set of rows.
Applies to: Databricks SQL Databricks Runtime 14.1 and above
Creates a Python scalar function that takes a set of arguments and returns a scalar value.
Python UDFs require Unity Catalog on pro SQL warehouses or a Unity Catalog-enabled compute resource.
Applies to: Databricks SQL Databricks Runtime 14.1 and above
In addition to positional parameter invocation, you can also invoke SQL and Python UDF using named parameter invocation.
Applies to: Databricks SQL Databricks Runtime 16.2 and above
Use the ENVIRONMENT
clause to specify the Python environment for a function declared with LANGUAGE PYTHON
. Used to install custom dependencies and set the environment version.
CREATE [OR REPLACE] [TEMPORARY] FUNCTION [IF NOT EXISTS]
function_name ( [ function_parameter [, ...] ] )
{ [ RETURNS data_type ] |
RETURNS TABLE [ ( column_spec [, ...]) ] }
[ characteristic [...] ]
{ AS dollar_quoted_string | RETURN { expression | query } }
function_parameter
parameter_name data_type [DEFAULT default_expression] [COMMENT parameter_comment]
column_spec
column_name data_type [COMMENT column_comment]
characteristic
{ LANGUAGE { SQL | PYTHON } |
[NOT] DETERMINISTIC |
COMMENT function_comment |
[CONTAINS SQL | READS SQL DATA] |
DEFAULT COLLATION default_collation_name } |
environment }
environment
ENVIRONMENT ( { environment_key = environment_value } [, ...] )
Parametersâ
OR REPLACE
If specified, the function with the same name and signature (number of parameters and parameter types) is replaced. You cannot replace an existing function with a different signature or a procedure. This is mainly useful to update the function body and the return type of the function. You cannot specify this parameter with IF NOT EXISTS
.
TEMPORARY
The scope of the function being created. When you specify TEMPORARY
, the created function is valid and visible in the current session. No persistent entry is made in the catalog.
IF NOT EXISTS
If specified, creates the function only when it does not exist. The creation of the function succeeds (no error is thrown) if the specified function already exists in the system. You cannot specify this parameter with OR REPLACE
.
A name for the function. For a permanent function, you can optionally qualify the function name with a schema name. If the name is not qualified the permanent function is created in the current schema.
The function name must be a unique for all routines (procedures and functions) in the schema.
function_parameter
Specifies a parameter of the function.
The parameter name must be unique within the function.
Any supported data type. For Python, data_type
is cast to a Python data type according to this language mapping.
For a STRING
data_type
, the default collation is the function default_collation_name
.
DEFAULT default_expression
Applies to: Databricks SQL Databricks Runtime 10.4 LTS and above
An optional default to be used when a function invocation does not assign an argument to the parameter. default_expression
must be castable to data_type
. The expression must not reference another parameter or contain a subquery.
When you specify a default for one parameter, all following parameters must also have a default.
DEFAULT
is supported for LANGUAGE SQL
only.
COMMENT comment
An optional description of the parameter. comment
must be a STRING
literal.
RETURNS data_type
The return data type of the scalar function. For Python UDFs, return values must exactly match the data type as specified in data_type
. Otherwise, to prevent unanticipated type conversions, the function will fail.
For SQL UDF this clause is optional. The data type will be derived from the function body if it is not provided.
RETURNS TABLE [ (column_spec [,â¦] ) ]
This clause marks the function as a table function. Optionally it also specifies the signature of the result of the table function. If no column_spec is specified it will be derived from the body of the SQL UDF.
RETURNS TABLE
is supported for LANGUAGE SQL
only.
The column name must be unique within the signature.
Any supported data type.
COMMENT column_comment
An optional description of the column. comment
must be a STRING
literal.
RETURN { expression | query }
The body of the function. For a scalar function, it can either be a query or an expression. For a table function, it can only be a query. The expression cannot contain:
Within the body of the function you can refer to parameter by its unqualified name or by qualifying the parameter with the function name.
AS dollar_quoted_definition
dollar_quoted_definition
is the Python function body
enclosed by two matching $[tag]$body$[tag]$
. tag
can be an empty string.
Examples:
$$
return âHello worldâ
$$
$py$
return "Hello World"
$py$
characteristic
All characteristic clauses are optional. You can specify any number of them in any order, but you can specify each clause only once.
LANGUAGE SQL or LANGUAGE PYTHON
The language of the function implementation.
[NOT] DETERMINISTIC
Whether the function is deterministic. A function is deterministic when it returns only one result for a given set of arguments. You may mark a function as DETERMINISTIC
when its body is not and vice versa. A reason for this may be to encourage or discourage query optimizations such as constant folding or query caching. If you do not specify this option it is derived from the function body.
COMMENT function_comment
A comment for the function. function_comment
must be String literal.
CONTAINS SQL or READS SQL DATA
Whether a function reads data directly or indirectly from a table or a view. When the function reads SQL data, you cannot specify CONTAINS SQL
. If you don't specify either clause, the property is derived from the function body.
DEFAULT COLLATION default_collation_name
Applies to: Databricks SQL Databricks Runtime 17.0 and above
Defines the default collation to use for:
STRING
parameters and RETURNS
datatype and RETURNS TABLE
fields of the function.DEFAULT
expression.If not specified, the default collation is derived from the schema in which the function is created.
environment
Specifies the Python environment for a function declared with LANGUAGE PYTHON
. The ENVIRONMENT
clause is not supported for SQL functions.
dependencies
A JSON array of strings specifying the required Python packages or wheel files for the function. The dependencies
key is case-insensitive. Supported formats:
environment_version
A String specifying the Python environment version. Use None
to use the default Python environment. If omitted, the default environment is used.
None
is currently supported.To use any dependencies, use import <package>
within the function body. For example, see the following:
SQL
CREATE FUNCTION [â¦]
AS $$
import json
[... (rest of function definition)]
$$
By default, dependencies are limited to the standard Python library and the following libraries:
Custom dependencies in Python UDFsâTo use additional dependencies beyond the standard library and the supported built-in packages, specify them in the ENVIRONMENT
clause.
SQL
> CREATE VIEW t(c1, c2) AS VALUES (0, 1), (1, 2);
SQL
> CREATE TEMPORARY FUNCTION hello() RETURNS STRING
RETURN 'Hello World!';
> SELECT hello();
Hello World!
> CREATE FUNCTION area(x DOUBLE, y DOUBLE) RETURNS DOUBLE RETURN x * y;
> SELECT area(c1, c2) AS area FROM t;
0.0
2.0
> SELECT * FROM t WHERE area(c1, c2) > 0;
1 2
> CREATE FUNCTION square(x DOUBLE) RETURNS DOUBLE RETURN area(x, x);
> SELECT c1, square(c1) AS square FROM t;
0 0.0
1 1.0
> CREATE FUNCTION roll_dice()
RETURNS INT
NOT DETERMINISTIC
CONTAINS SQL
COMMENT 'Roll a single 6 sided die'
RETURN (rand() * 6)::INT + 1;
> SELECT roll_dice();
3
Create and use a function that uses DEFAULTsâ
SQL
> DROP FUNCTION roll_dice;
> CREATE FUNCTION roll_dice(num_dice INT DEFAULT 1 COMMENT 'number of dice to roll (Default: 1)',
num_sides INT DEFAULT 6 COMMENT 'number of sides per die (Default: 6)')
RETURNS INT
NOT DETERMINISTIC
CONTAINS SQL
COMMENT 'Roll a number of n-sided dice'
RETURN aggregate(sequence(1, roll_dice.num_dice, 1),
0,
(acc, x) -> (rand() * roll_dice.num_sides)::int,
acc -> acc + roll_dice.num_dice);
> SELECT roll_dice();
3
> SELECT roll_dice(3);
15
> SELECT roll_dice(3, 10)
21
> SELECT roll_dice(10 => num_sides, num_dice => 3)
17
> CREATE VIEW scores(player, score) AS VALUES (0, 1), (0, 2), (1, 2), (1, 5);
> CREATE FUNCTION avg_score(p INT) RETURNS FLOAT
COMMENT 'get an average score of the player'
RETURN SELECT AVG(score) FROM scores WHERE player = p;
> SELECT c1, avg_score(c1) FROM t;
0 1.5
1 3.5
Create a SQL table functionâ
SQL
> CREATE FUNCTION weekdays(start DATE, end DATE)
RETURNS TABLE(day_of_week STRING, day DATE)
RETURN SELECT extract(DAYOFWEEK_ISO FROM day), day
FROM (SELECT sequence(weekdays.start, weekdays.end)) AS T(days)
LATERAL VIEW explode(days) AS day
WHERE extract(DAYOFWEEK_ISO FROM day) BETWEEN 1 AND 5;
> SELECT weekdays.day_of_week, day
FROM weekdays(DATE'2022-01-01', DATE'2022-01-14');
1 2022-01-03
2 2022-01-04
3 2022-01-05
4 2022-01-06
5 2022-01-07
1 2022-01-10
2 2022-01-11
3 2022-01-12
4 2022-01-13
5 2022-01-14
> SELECT weekdays.*
FROM VALUES (DATE'2020-01-01'),
(DATE'2021-01-01'),
(DATE'2022-01-01') AS starts(start),
LATERAL weekdays(start, start + INTERVAL '7' DAYS);
3 2020-01-01
4 2020-01-02
5 2020-01-03
1 2020-01-06
2 2020-01-07
3 2020-01-08
5 2021-01-01
1 2021-01-04
2 2021-01-05
3 2021-01-06
4 2021-01-07
5 2021-01-08
1 2022-01-03
2 2022-01-04
3 2022-01-05
4 2022-01-06
5 2022-01-07
Replace a SQL functionâ
SQL
> CREATE OR REPLACE FUNCTION square(x DOUBLE) RETURNS DOUBLE RETURN x * x;
> CREATE OR REPLACE FUNCTION getemps(deptno INT)
RETURNS TABLE (name STRING)
RETURN SELECT name FROM employee e WHERE e.deptno = getemps.deptno;
> DESCRIBE FUNCTION getemps;
Function: default.getemps
Type: TABLE
Input: deptno INT
Returns: id INT
name STRING
note
You cannot replace an existing function with a different signature.
Describe a SQL functionâSQL
> DESCRIBE FUNCTION hello;
Function: hello
Type: SCALAR
Input: ()
Returns: STRING
> DESCRIBE FUNCTION area;
Function: default.area
Type: SCALAR
Input: x DOUBLE
y DOUBLE
Returns: DOUBLE
> DESCRIBE FUNCTION roll_dice;
Function: default.roll_dice
Type: SCALAR
Input: num_dice INT
num_sides INT
Returns: INT
> DESCRIBE FUNCTION EXTENDED roll_dice;
Function: default.roll_dice
Type: SCALAR
Input: num_dice INT DEFAULT 1 'number of dice to roll (Default: 1)'
num_sides INT DEFAULT 6 'number of sides per dice (Default: 6)'
Returns: INT
Comment: Roll a number of m-sided dice
Deterministic: false
Data Access: CONTAINS SQL
Configs: ...
Owner: the.house@always.wins
Create Time: Sat Feb 12 09:29:02 PST 2022
Body: aggregate(sequence(1, roll_dice.num_dice, 1),
0,
(acc, x) -> (rand() * roll_dice.num_sides)::int,
acc -> acc + roll_dice.num_dice)
Create Python functionsâ
SQL
â- Hello World-like functionality using Python UDFs
> CREATE FUNCTION main.default.greet(s STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
def greet(name):
return "Hello " + name + "!"
return greet(s) if s else None
$$
â- Can import functions from std library and environment
> CREATE FUNCTION main.default.isleapyear(year INT)
RETURNS BOOLEAN
LANGUAGE PYTHON
AS $$
import calendar
return calendar.isleap(year) if year else None
$$
â- Must return the correct type. Otherwise will fail at runtime.
> CREATE FUNCTION main.default.a_number()
RETURNS INTEGER
LANGUAGE PYTHON
AS $$
return 10
$$
â- Deal with exceptions.
> CREATE FUNCTION main.default.custom_divide(n1 INT, n2 INT)
RETURNS FLOAT
LANGUAGE PYTHON
AS $$
try:
return n1/n2
except ZeroDivisionException:
return None
$$
Define custom dependencies in Python functionsâ
SQL
> CREATE FUNCTION main.default.dump_json(data STRING)
RETURNS STRING
LANGUAGE PYTHON
ENVIRONMENT (
dependencies = '["simplejson==3.19.3", "/Volumes/my_catalog/my_schema/my_volume/packages/custom_package-1.0.0.whl", "https://my-bucket.s3.amazonaws.com/packages/special_package-2.0.0.whl?Expires=2043167927&Signature=abcd"]',
environment_version = 'None'
)
AS $$
import simplejson as json
import custom_package
return json.dumps(custom_package.process(data))
$$;
> SELECT dump_json('{"foo": "bar"}');
Related articlesâ
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4