This page is generated from the Apache DataFusion project's documents:
Scalar functions operate on a single row at a time and return a single value.
Math Functionsabs
Returns the absolute value of a number.
Argumentsacos
Returns the arc cosine or inverse cosine of a number.
Argumentsacosh
Returns the area hyperbolic cosine or inverse hyperbolic cosine of a number.
acosh(numeric_expression)
Arguments
asin
Returns the arc sine or inverse sine of a number.
Argumentsasinh
Returns the area hyperbolic sine or inverse hyperbolic sine of a number.
asinh(numeric_expression)
Arguments
atan
Returns the arc tangent or inverse tangent of a number.
Argumentsatan2
Returns the arc tangent or inverse tangent of expression_y / expression_x
.
atan2(expression_y, expression_x)
Arguments
atanh
Returns the area hyperbolic tangent or inverse hyperbolic tangent of a number.
atanh(numeric_expression)
Arguments
cbrt
Returns the cube root of a number.
Argumentsceil
Returns the nearest integer greater than or equal to a number.
Argumentscos
Returns the cosine of a number.
Argumentscosh
Returns the hyperbolic cosine of a number.
Argumentscot
Returns the cotangent of a number.
Argumentsdegrees
Converts radians to degrees.
degrees(numeric_expression)
Arguments
exp
Returns the base-e exponential of a number.
Argumentsfactorial
Factorial. Returns 1 if value is less than 2.
factorial(numeric_expression)
Arguments
floor
Returns the nearest integer less than or equal to a number.
floor(numeric_expression)
Arguments
gcd
Returns the greatest common divisor of expression_x
and expression_y
. Returns 0 if both inputs are zero.
gcd(expression_x, expression_y)
Arguments
isnan
Returns true if a given number is +NaN or -NaN otherwise returns false.
isnan(numeric_expression)
Arguments
iszero
Returns true if a given number is +0.0 or -0.0 otherwise returns false.
iszero(numeric_expression)
Arguments
lcm
Returns the least common multiple of expression_x
and expression_y
. Returns 0 if either input is zero.
lcm(expression_x, expression_y)
Arguments
ln
Returns the natural logarithm of a number.
Argumentslog
Returns the base-x logarithm of a number. Can either provide a specified base, or if omitted then takes the base-10 of a number.
log(base, numeric_expression)
log(numeric_expression)
Arguments
log10
Returns the base-10 logarithm of a number.
log10(numeric_expression)
Arguments
log2
Returns the base-2 logarithm of a number.
Argumentsnanvl
Returns the first argument if it's not NaN. Returns the second argument otherwise.
nanvl(expression_x, expression_y)
Arguments
pi
Returns an approximate value of π.
pow
Alias of power.
power
Returns a base expression raised to the power of an exponent.
Argumentsradians
Converts degrees to radians.
radians(numeric_expression)
Arguments
random
Returns a random float value in the range [0, 1). The random seed is unique to each row.
round
Rounds a number to the nearest integer.
round(numeric_expression[, decimal_places])
Arguments
signum
Returns the sign of a number. Negative numbers return -1
. Zero and positive numbers return 1
.
signum(numeric_expression)
Arguments
sin
Returns the sine of a number.
Argumentssinh
Returns the hyperbolic sine of a number.
Argumentssqrt
Returns the square root of a number.
Argumentstan
Returns the tangent of a number.
Argumentstanh
Returns the hyperbolic tangent of a number.
Argumentstrunc
Truncates a number to a whole number or truncated to the specified decimal places.
trunc(numeric_expression[, decimal_places])
Arguments
decimal_places
is a positive integer, truncates digits to the right of the decimal point. If decimal_places
is a negative integer, replaces digits to the left of the decimal point with 0
.coalesce
Returns the first of its arguments that is not null. Returns null if all arguments are null. This function is often used to substitute a default value for null values.
coalesce(expression1[, ..., expression_n])
Arguments
> select coalesce(null, null, 'datafusion');
+
| coalesce(NULL,NULL,Utf8("datafusion")) |
+
| datafusion |
+
greatest
Returns the greatest value in a list of expressions. Returns null if all expressions are null.
greatest(expression1[, ..., expression_n])
Arguments
> select greatest(4, 7, 5);
+
| greatest(4,7,5) |
+
| 7 |
+
ifnull
Alias of nvl.
least
Returns the smallest value in a list of expressions. Returns null if all expressions are null.
least(expression1[, ..., expression_n])
Arguments
> select least(4, 7, 5);
+
| least(4,7,5) |
+
| 4 |
+
nullif
Returns null if expression1 equals expression2; otherwise it returns expression1. This can be used to perform the inverse operation of coalesce
.
nullif(expression1, expression2)
Arguments
> select nullif('datafusion', 'data');
+
| nullif(Utf8("datafusion"),Utf8("data")) |
+
| datafusion |
+
> select nullif('datafusion', 'datafusion');
+
| nullif(Utf8("datafusion"),Utf8("datafusion")) |
+
| |
+
nvl
Returns expression2 if expression1 is NULL otherwise it returns expression1.
nvl(expression1, expression2)
Arguments
> select nvl(null, 'a');
+
| nvl(NULL,Utf8("a")) |
+
| a |
+
> select nvl('b', 'a');
+
| nvl(Utf8("b"),Utf8("a")) |
+
| b |
+
Aliases
nvl2
Returns expression2 if expression1 is not NULL; otherwise it returns expression3.
nvl2(expression1, expression2, expression3)
Arguments
> select nvl2(null, 'a', 'b');
+
| nvl2(NULL,Utf8("a"),Utf8("b")) |
+
| b |
+
> select nvl2('data', 'a', 'b');
+
| nvl2(Utf8("data"),Utf8("a"),Utf8("b")) |
+
| a |
+
String Functions
ascii
Returns the Unicode character code of the first character in a string.
Arguments> select ascii('abc');
+
| ascii(Utf8("abc")) |
+
| 97 |
+
> select ascii('🚀');
+
| ascii(Utf8("🚀")) |
+
| 128640 |
+
Related functions:
bit_length
Returns the bit length of a string.
Arguments> select bit_length('datafusion');
+
| bit_length(Utf8("datafusion")) |
+
| 80 |
+
Related functions:
btrim
Trims the specified trim string from the start and end of a string. If no trim string is provided, all whitespace is removed from the start and end of the input string.
Arguments> select btrim('__datafusion____', '_');
+
| btrim(Utf8("__datafusion____"),Utf8("_")) |
+
| datafusion |
+
Alternative Syntax
trim(BOTH trim_str FROM str)
Aliases
Related functions:
char_length
Alias of character_length.
character_length
Returns the number of characters in a string.
Arguments> select character_length('Ångström');
+
| character_length(Utf8("Ångström")) |
+
| 8 |
+
Aliases
Related functions:
chr
Returns the character with the specified ASCII or Unicode code value.
Arguments> select chr(128640);
+
| chr(Int64(128640)) |
+
| 🚀 |
+
Related functions:
concat
Concatenates multiple strings together.
concat(str[, ..., str_n])
Arguments
> select concat('data', 'f', 'us', 'ion');
+
| concat(Utf8("data"),Utf8("f"),Utf8("us"),Utf8("ion")) |
+
| datafusion |
+
Related functions:
concat_ws
Concatenates multiple strings together with a specified separator.
concat_ws(separator, str[, ..., str_n])
Arguments
> select concat_ws('_', 'data', 'fusion');
+
| concat_ws(Utf8("_"),Utf8("data"),Utf8("fusion")) |
+
| data_fusion |
+
Related functions:
contains
Return true if search_str is found within string (case-sensitive).
contains(str, search_str)
Arguments
> select contains('the quick brown fox', 'row');
+
| contains(Utf8("the quick brown fox"),Utf8("row")) |
+
| true |
+
ends_with
Tests if a string ends with a substring.
Arguments> select ends_with('datafusion', 'soin');
+
| ends_with(Utf8("datafusion"),Utf8("soin")) |
+
| false |
+
> select ends_with('datafusion', 'sion');
+
| ends_with(Utf8("datafusion"),Utf8("sion")) |
+
| true |
+
find_in_set
Returns a value in the range of 1 to N if the string str is in the string list strlist consisting of N substrings.
find_in_set(str, strlist)
Arguments
> select find_in_set('b', 'a,b,c,d');
+
| find_in_set(Utf8("b"),Utf8("a,b,c,d")) |
+
| 2 |
+
initcap
Capitalizes the first character in each word in the input string. Words are delimited by non-alphanumeric characters.
Arguments> select initcap('apache datafusion');
+
| initcap(Utf8("apache datafusion")) |
+
| Apache Datafusion |
+
Related functions:
instr
Alias of strpos.
left
Returns a specified number of characters from the left side of a string.
Arguments> select left('datafusion', 4);
+
| left(Utf8("datafusion"),Int64(4)) |
+
| data |
+
Related functions:
length
Alias of character_length.
levenshtein
Returns the Levenshtein distance
between the two given strings.
> select levenshtein('kitten', 'sitting');
+
| levenshtein(Utf8("kitten"),Utf8("sitting")) |
+
| 3 |
+
lower
Converts a string to lower-case.
Arguments> select lower('Ångström');
+
| lower(Utf8("Ångström")) |
+
| ångström |
+
Related functions:
lpad
Pads the left side of a string with another string to a specified string length.
lpad(str, n[, padding_str])
Arguments
> select lpad('Dolly', 10, 'hello');
+
| lpad(Utf8("Dolly"),Int64(10),Utf8("hello")) |
+
| helloDolly |
+
Related functions:
ltrim
Trims the specified trim string from the beginning of a string. If no trim string is provided, all whitespace is removed from the start of the input string.
Arguments> select ltrim(' datafusion ');
+
| ltrim(Utf8(" datafusion ")) |
+
| datafusion |
+
> select ltrim('___datafusion___', '_');
+
| ltrim(Utf8("___datafusion___"),Utf8("_")) |
+
| datafusion___ |
+
Alternative Syntax
trim(LEADING trim_str FROM str)
Related functions:
octet_length
Returns the length of a string in bytes.
Arguments> select octet_length('Ångström');
+
| octet_length(Utf8("Ångström")) |
+
| 10 |
+
Related functions:
overlay
Returns the string which is replaced by another string from the specified position and specified count length.
overlay(str PLACING substr FROM pos [FOR count])
Arguments
> select overlay('Txxxxas' placing 'hom' from 2 for 4);
+
| overlay(Utf8("Txxxxas"),Utf8("hom"),Int64(2),Int64(4)) |
+
| Thomas |
+
position
Alias of strpos.
repeat
Returns a string with an input string repeated a specified number.
Arguments> select repeat('data', 3);
+
| repeat(Utf8("data"),Int64(3)) |
+
| datadatadata |
+
replace
Replaces all occurrences of a specified substring in a string with a new substring.
replace(str, substr, replacement)
Arguments
> select replace('ABabbaBA', 'ab', 'cd');
+
| replace(Utf8("ABabbaBA"),Utf8("ab"),Utf8("cd")) |
+
| ABcdbaBA |
+
reverse
Reverses the character order of a string.
Arguments> select reverse('datafusion');
+
| reverse(Utf8("datafusion")) |
+
| noisufatad |
+
right
Returns a specified number of characters from the right side of a string.
Arguments> select right('datafusion', 6);
+
| right(Utf8("datafusion"),Int64(6)) |
+
| fusion |
+
Related functions:
rpad
Pads the right side of a string with another string to a specified string length.
rpad(str, n[, padding_str])
Arguments
> select rpad('datafusion', 20, '_-');
+
| rpad(Utf8("datafusion"),Int64(20),Utf8("_-")) |
+
| datafusion_-_-_-_-_- |
+
Related functions:
rtrim
Trims the specified trim string from the end of a string. If no trim string is provided, all whitespace is removed from the end of the input string.
Arguments> select rtrim(' datafusion ');
+
| rtrim(Utf8(" datafusion ")) |
+
| datafusion |
+
> select rtrim('___datafusion___', '_');
+
| rtrim(Utf8("___datafusion___"),Utf8("_")) |
+
| ___datafusion |
+
Alternative Syntax
trim(TRAILING trim_str FROM str)
Related functions:
split_part
Splits a string based on a specified delimiter and returns the substring in the specified position.
split_part(str, delimiter, pos)
Arguments
> select split_part('1.2.3.4.5', '.', 3);
+
| split_part(Utf8("1.2.3.4.5"),Utf8("."),Int64(3)) |
+
| 3 |
+
starts_with
Tests if a string starts with a substring.
Arguments> select starts_with('datafusion','data');
+
| starts_with(Utf8("datafusion"),Utf8("data")) |
+
| true |
+
strpos
Returns the starting position of a specified substring in a string. Positions begin at 1. If the substring does not exist in the string, the function returns 0.
Arguments> select strpos('datafusion', 'fus');
+
| strpos(Utf8("datafusion"),Utf8("fus")) |
+
| 5 |
+
Alternative Syntax
position(substr in origstr)
Aliases
substr
Extracts a substring of a specified number of characters from a specific starting position in a string.
substr(str, start_pos[, length])
Arguments
> select substr('datafusion', 5, 3);
+
| substr(Utf8("datafusion"),Int64(5),Int64(3)) |
+
| fus |
+
Alternative Syntax
substring(str from start_pos for length)
Aliases
substr_index
Returns the substring from str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned.
substr_index(str, delim, count)
Arguments
> select substr_index('www.apache.org', '.', 1);
+
| substr_index(Utf8("www.apache.org"),Utf8("."),Int64(1)) |
+
| www |
+
> select substr_index('www.apache.org', '.', -1);
+
| substr_index(Utf8("www.apache.org"),Utf8("."),Int64(-1)) |
+
| org |
+
Aliases
substring
Alias of substr.
substring_index
Alias of substr_index.
to_hex
Converts an integer to a hexadecimal string.
Arguments> select to_hex(12345689);
+
| to_hex(Int64(12345689)) |
+
| bc6159 |
+
translate
Translates characters in a string to specified translation characters.
translate(str, chars, translation)
Arguments
> select translate('twice', 'wic', 'her');
+
| translate(Utf8("twice"),Utf8("wic"),Utf8("her")) |
+
| there |
+
trim
Alias of btrim.
upper
Converts a string to upper-case.
Arguments> select upper('dataFusion');
+
| upper(Utf8("dataFusion")) |
+
| DATAFUSION |
+
Related functions:
uuid
Returns UUID v4
string value which is unique per row.
> select uuid();
+
| uuid() |
+
| 6ec17ef8-1934-41cc-8d59-d0c8f9eea1f0 |
+
Binary String Functions
decode
Decode binary data from textual representation in string.
decode(expression, format)
Arguments
Related functions:
encode
Encode binary data into a textual representation.
encode(expression, format)
Arguments
base64
, hex
Related functions:
Regular Expression FunctionsApache DataFusion uses a PCRE-like regular expression syntax (minus support for several features including look-around and backreferences). The following regular expression functions are supported:
regexp_count
Returns the number of matches that a regular expression has in a string.
regexp_count(str, regexp[, start, flags])
Arguments
> select regexp_count('abcAbAbc', 'abc', 2, 'i');
+
| regexp_count(Utf8("abcAbAbc"),Utf8("abc"),Int64(2),Utf8("i")) |
+
| 1 |
+
regexp_like
Returns true if a regular expression has at least one match in a string, false otherwise.
regexp_like(str, regexp[, flags])
Arguments
select regexp_like('Köln', '[a-zA-Z]ö[a-zA-Z]{2}');
+
| regexp_like(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) |
+
| true |
+
SELECT regexp_like('aBc', '(b|d)', 'i');
+
| regexp_like(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) |
+
| true |
+
Additional examples can be found here
regexp_match
Returns the first regular expression matches in a string.
regexp_match(str, regexp[, flags])
Arguments
> select regexp_match('Köln', '[a-zA-Z]ö[a-zA-Z]{2}');
+
| regexp_match(Utf8("Köln"),Utf8("[a-zA-Z]ö[a-zA-Z]{2}")) |
+
| [Köln] |
+
SELECT regexp_match('aBc', '(b|d)', 'i');
+
| regexp_match(Utf8("aBc"),Utf8("(b|d)"),Utf8("i")) |
+
| [B] |
+
Additional examples can be found here
regexp_replace
Replaces substrings in a string that match a regular expression.
regexp_replace(str, regexp, replacement[, flags])
Arguments
> select regexp_replace('foobarbaz', 'b(..)', 'X\\1Y', 'g');
+
| regexp_replace(Utf8("foobarbaz"),Utf8("b(..)"),Utf8("X\1Y"),Utf8("g")) |
+
| fooXarYXazY |
+
SELECT regexp_replace('aBc', '(b|d)', 'Ab\\1a', 'i');
+
| regexp_replace(Utf8("aBc"),Utf8("(b|d)"),Utf8("Ab\1a"),Utf8("i")) |
+
| aAbBac |
+
Additional examples can be found here
Time and Date Functionscurrent_date
Returns the current UTC date.
The current_date()
return value is determined at query time and will return the same date, no matter when in the query plan the function executes.
current_time
Returns the current UTC time.
The current_time()
return value is determined at query time and will return the same time, no matter when in the query plan the function executes.
current_timestamp
Alias of now.
date_bin
Calculates time intervals and returns the start of the interval nearest to the specified timestamp. Use date_bin
to downsample time series data by grouping rows into time-based "bins" or "windows" and applying an aggregate or selector function to each window.
For example, if you "bin" or "window" data into 15 minute intervals, an input timestamp of 2023-01-01T18:18:18Z
will be updated to the start time of the 15 minute bin it is in: 2023-01-01T18:15:00Z
.
date_bin(interval, expression, origin-timestamp)
Arguments
interval: Bin interval.
expression: Time expression to operate on. Can be a constant, column, or function.
origin-timestamp: Optional. Starting point used to determine bin boundaries. If not specified defaults 1970-01-01T00:00:00Z (the UNIX epoch in UTC). The following intervals are supported:
> SELECT date_bin(interval '1 day', time) as bin
FROM VALUES ('2023-01-01T18:18:18Z'), ('2023-01-03T19:00:03Z') t(time);
+
| bin |
+
| 2023-01-01T00:00:00 |
| 2023-01-03T00:00:00 |
+
2 row(s) fetched.
> SELECT date_bin(interval '1 day', time, '2023-01-01T03:00:00') as bin
FROM VALUES ('2023-01-01T18:18:18Z'), ('2023-01-03T19:00:03Z') t(time);
+
| bin |
+
| 2023-01-01T03:00:00 |
| 2023-01-03T03:00:00 |
+
2 row(s) fetched.
date_format
Alias of to_char.
date_part
Returns the specified part of the date as an integer.
date_part(part, expression)
Arguments
part: Part of the date to return. The following date parts are supported:
expression: Time expression to operate on. Can be a constant, column, or function.
extract(field FROM source)
Aliases
date_trunc
Truncates a timestamp value to a specified precision.
date_trunc(precision, expression)
Arguments
precision: Time precision to truncate to. The following precisions are supported:
expression: Time expression to operate on. Can be a constant, column, or function.
datepart
Alias of date_part.
datetrunc
Alias of date_trunc.
from_unixtime
Converts an integer to RFC3339 timestamp format (YYYY-MM-DDT00:00:00.000000000Z
). Integers and unsigned integers are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z
) return the corresponding timestamp.
from_unixtime(expression[, timezone])
Arguments
> select from_unixtime(1599572549, 'America/New_York');
+
| from_unixtime(Int64(1599572549),Utf8("America/New_York")) |
+
| 2020-09-08T09:42:29-04:00 |
+
make_date
Make a date from year/month/day component parts.
make_date(year, month, day)
Arguments
> select make_date(2023, 1, 31);
+
| make_date(Int64(2023),Int64(1),Int64(31)) |
+
| 2023-01-31 |
+
> select make_date('2023', '01', '31');
+
| make_date(Utf8("2023"),Utf8("01"),Utf8("31")) |
+
| 2023-01-31 |
+
Additional examples can be found here
now
Returns the current UTC timestamp.
The now()
return value is determined at query time and will return the same timestamp, no matter when in the query plan the function executes.
to_char
Returns a string representation of a date, time, timestamp or duration based on a Chrono format. Unlike the PostgreSQL equivalent of this function numerical formatting is not supported.
to_char(expression, format)
Arguments
> select to_char('2023-03-01'::date, '%d-%m-%Y');
+
| to_char(Utf8("2023-03-01"),Utf8("%d-%m-%Y")) |
+
| 01-03-2023 |
+
Additional examples can be found here
Aliasesto_date
Converts a value to a date (YYYY-MM-DD
). Supports strings, integer and double types as input. Strings are parsed as YYYY-MM-DD (e.g. '2023-07-20') if no Chrono formats are provided. Integers and doubles are interpreted as days since the unix epoch (1970-01-01T00:00:00Z
). Returns the corresponding date.
Note: to_date
returns Date32, which represents its values as the number of days since unix epoch(1970-01-01
) stored as signed 32 bit value. The largest supported date value is 9999-12-31
.
to_date('2017-05-31', '%Y-%m-%d')
Arguments
> select to_date('2023-01-31');
+
| to_date(Utf8("2023-01-31")) |
+
| 2023-01-31 |
+
> select to_date('2023/01/31', '%Y-%m-%d', '%Y/%m/%d');
+
| to_date(Utf8("2023/01/31"),Utf8("%Y-%m-%d"),Utf8("%Y/%m/%d")) |
+
| 2023-01-31 |
+
Additional examples can be found here
to_local_time
Converts a timestamp with a timezone to a timestamp without a timezone (with no offset or timezone information). This function handles daylight saving time changes.
to_local_time(expression)
Arguments
> SELECT to_local_time('2024-04-01T00:00:20Z'::timestamp);
+
| to_local_time(Utf8("2024-04-01T00:00:20Z")) |
+
| 2024-04-01T00:00:20 |
+
> SELECT to_local_time('2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels');
+
| to_local_time(Utf8("2024-04-01T00:00:20Z")) |
+
| 2024-04-01T00:00:20 |
+
> SELECT
time,
arrow_typeof(time) as type,
to_local_time(time) as to_local_time,
arrow_typeof(to_local_time(time)) as to_local_time_type
FROM (
SELECT '2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels' AS time
);
+
| time | type | to_local_time | to_local_time_type |
+
| 2024-04-01T00:00:20+02:00 | Timestamp(Nanosecond, Some("Europe/Brussels")) | 2024-04-01T00:00:20 | Timestamp(Nanosecond, None) |
+
> SELECT date_bin(interval '1 day', to_local_time('2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels')) AS date_bin;
+
| date_bin |
+
| 2024-04-01T00:00:00 |
+
> SELECT date_bin(interval '1 day', to_local_time('2024-04-01T00:00:20Z'::timestamp AT TIME ZONE 'Europe/Brussels')) AT TIME ZONE 'Europe/Brussels' AS date_bin_with_timezone;
+
| date_bin_with_timezone |
+
| 2024-04-01T00:00:00+02:00 |
+
to_timestamp
Converts a value to a timestamp (YYYY-MM-DDT00:00:00Z
). Supports strings, integer, unsigned integer, and double types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no [Chrono formats] are provided. Integers, unsigned integers, and doubles are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z
). Returns the corresponding timestamp.
Note: to_timestamp
returns Timestamp(Nanosecond)
. The supported range for integer input is between -9223372037
and 9223372036
. Supported range for string input is between 1677-09-21T00:12:44.0
and 2262-04-11T23:47:16.0
. Please use to_timestamp_seconds
for the input outside of supported bounds.
to_timestamp(expression[, ..., format_n])
Arguments
> select to_timestamp('2023-01-31T09:26:56.123456789-05:00');
+
| to_timestamp(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+
| 2023-01-31T14:26:56.123456789 |
+
> select to_timestamp('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+
| to_timestamp(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+
| 2023-05-17T03:59:00.123456789 |
+
Additional examples can be found here
to_timestamp_micros
Converts a value to a timestamp (YYYY-MM-DDT00:00:00.000000Z
). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as microseconds since the unix epoch (1970-01-01T00:00:00Z
) Returns the corresponding timestamp.
to_timestamp_micros(expression[, ..., format_n])
Arguments
> select to_timestamp_micros('2023-01-31T09:26:56.123456789-05:00');
+
| to_timestamp_micros(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+
| 2023-01-31T14:26:56.123456 |
+
> select to_timestamp_micros('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+
| to_timestamp_micros(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+
| 2023-05-17T03:59:00.123456 |
+
Additional examples can be found here
to_timestamp_millis
Converts a value to a timestamp (YYYY-MM-DDT00:00:00.000Z
). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as milliseconds since the unix epoch (1970-01-01T00:00:00Z
). Returns the corresponding timestamp.
to_timestamp_millis(expression[, ..., format_n])
Arguments
> select to_timestamp_millis('2023-01-31T09:26:56.123456789-05:00');
+
| to_timestamp_millis(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+
| 2023-01-31T14:26:56.123 |
+
> select to_timestamp_millis('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+
| to_timestamp_millis(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+
| 2023-05-17T03:59:00.123 |
+
Additional examples can be found here
to_timestamp_nanos
Converts a value to a timestamp (YYYY-MM-DDT00:00:00.000000000Z
). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as nanoseconds since the unix epoch (1970-01-01T00:00:00Z
). Returns the corresponding timestamp.
to_timestamp_nanos(expression[, ..., format_n])
Arguments
> select to_timestamp_nanos('2023-01-31T09:26:56.123456789-05:00');
+
| to_timestamp_nanos(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+
| 2023-01-31T14:26:56.123456789 |
+
> select to_timestamp_nanos('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+
| to_timestamp_nanos(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+
| 2023-05-17T03:59:00.123456789 |
+
Additional examples can be found here
to_timestamp_seconds
Converts a value to a timestamp (YYYY-MM-DDT00:00:00.000Z
). Supports strings, integer, and unsigned integer types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided. Integers and unsigned integers are interpreted as seconds since the unix epoch (1970-01-01T00:00:00Z
). Returns the corresponding timestamp.
to_timestamp_seconds(expression[, ..., format_n])
Arguments
> select to_timestamp_seconds('2023-01-31T09:26:56.123456789-05:00');
+
| to_timestamp_seconds(Utf8("2023-01-31T09:26:56.123456789-05:00")) |
+
| 2023-01-31T14:26:56 |
+
> select to_timestamp_seconds('03:59:00.123456789 05-17-2023', '%c', '%+', '%H:%M:%S%.f %m-%d-%Y');
+
| to_timestamp_seconds(Utf8("03:59:00.123456789 05-17-2023"),Utf8("%c"),Utf8("%+"),Utf8("%H:%M:%S%.f %m-%d-%Y")) |
+
| 2023-05-17T03:59:00 |
+
Additional examples can be found here
to_unixtime
Converts a value to seconds since the unix epoch (1970-01-01T00:00:00Z
). Supports strings, dates, timestamps and double types as input. Strings are parsed as RFC3339 (e.g. '2023-07-20T05:44:00') if no Chrono formats are provided.
to_unixtime(expression[, ..., format_n])
Arguments
> select to_unixtime('2020-09-08T12:00:00+00:00');
+
| to_unixtime(Utf8("2020-09-08T12:00:00+00:00")) |
+
| 1599566400 |
+
> select to_unixtime('01-14-2023 01:01:30+05:30', '%q', '%d-%m-%Y %H/%M/%S', '%+', '%m-%d-%Y %H:%M:%S%#z');
+
| to_unixtime(Utf8("01-14-2023 01:01:30+05:30"),Utf8("%q"),Utf8("%d-%m-%Y %H/%M/%S"),Utf8("%+"),Utf8("%m-%d-%Y %H:%M:%S%#z")) |
+
| 1673638290 |
+
today
Alias of current_date.
Array Functionsarray_any_value
Returns the first non-null element in the array.
Arguments> select array_any_value([NULL, 1, 2, 3]);
+
| array_any_value(List([NULL,1,2,3])) |
+
| 1 |
+
Aliases
array_append
Appends an element to the end of an array.
array_append(array, element)
Arguments
> select array_append([1, 2, 3], 4);
+
| array_append(List([1,2,3]),Int64(4)) |
+
| [1, 2, 3, 4] |
+
Aliases
array_cat
Alias of array_concat.
array_concat
Concatenates arrays.
array_concat(array[, ..., array_n])
Arguments
> select array_concat([1, 2], [3, 4], [5, 6]);
+
| array_concat(List([1,2]),List([3,4]),List([5,6])) |
+
| [1, 2, 3, 4, 5, 6] |
+
Aliases
array_contains
Alias of array_has.
array_dims
Returns an array of the array's dimensions.
Arguments> select array_dims([[1, 2, 3], [4, 5, 6]]);
+
| array_dims(List([1,2,3,4,5,6])) |
+
| [2, 3] |
+
Aliases
array_distance
Returns the Euclidean distance between two input arrays of equal length.
array_distance(array1, array2)
Arguments
> select array_distance([1, 2], [1, 4]);
+
| array_distance(List([1,2], [1,4])) |
+
| 2.0 |
+
Aliases
array_distinct
Returns distinct values from the array after removing duplicates.
Arguments> select array_distinct([1, 3, 2, 3, 1, 2, 4]);
+
| array_distinct(List([1,2,3,4])) |
+
| [1, 2, 3, 4] |
+
Aliases
array_element
Extracts the element with the index n from the array.
array_element(array, index)
Arguments
> select array_element([1, 2, 3, 4], 3);
+
| array_element(List([1,2,3,4]),Int64(3)) |
+
| 3 |
+
Aliases
array_empty
Alias of empty.
array_except
Returns an array of the elements that appear in the first array but not in the second.
array_except(array1, array2)
Arguments
> select array_except([1, 2, 3, 4], [5, 6, 3, 4]);
+
| array_except([1, 2, 3, 4], [5, 6, 3, 4]); |
+
| [1, 2] |
+
> select array_except([1, 2, 3, 4], [3, 4, 5, 6]);
+
| array_except([1, 2, 3, 4], [3, 4, 5, 6]); |
+
| [1, 2] |
+
Aliases
Alias of array_element.
array_has
Returns true if the array contains the element.
array_has(array, element)
Arguments
> select array_has([1, 2, 3], 2);
+
| array_has(List([1,2,3]), 2) |
+
| true |
+
Aliases
array_has_all
Returns true if all elements of sub-array exist in array.
array_has_all(array, sub-array)
Arguments
> select array_has_all([1, 2, 3, 4], [2, 3]);
+
| array_has_all(List([1,2,3,4]), List([2,3])) |
+
| true |
+
Aliases
array_has_any
Returns true if any elements exist in both arrays.
array_has_any(array, sub-array)
Arguments
> select array_has_any([1, 2, 3], [3, 4]);
+
| array_has_any(List([1,2,3]), List([3,4])) |
+
| true |
+
Aliases
array_indexof
Alias of array_position.
array_intersect
Returns an array of elements in the intersection of array1 and array2.
array_intersect(array1, array2)
Arguments
> select array_intersect([1, 2, 3, 4], [5, 6, 3, 4]);
+
| array_intersect([1, 2, 3, 4], [5, 6, 3, 4]); |
+
| [3, 4] |
+
> select array_intersect([1, 2, 3, 4], [5, 6, 7, 8]);
+
| array_intersect([1, 2, 3, 4], [5, 6, 7, 8]); |
+
| [] |
+
Aliases
array_join
Alias of array_to_string.
array_length
Returns the length of the array dimension.
array_length(array, dimension)
Arguments
> select array_length([1, 2, 3, 4, 5], 1);
+
| array_length(List([1,2,3,4,5]), 1) |
+
| 5 |
+
Aliases
array_max
Returns the maximum value in the array.
Arguments> select array_max([3,1,4,2]);
+
| array_max(List([3,1,4,2])) |
+
| 4 |
+
Aliases
array_ndims
Returns the number of dimensions of the array.
array_ndims(array, element)
Arguments
> select array_ndims([[1, 2, 3], [4, 5, 6]]);
+
| array_ndims(List([1,2,3,4,5,6])) |
+
| 2 |
+
Aliases
array_pop_back
Returns the array without the last element.
Arguments> select array_pop_back([1, 2, 3]);
+
| array_pop_back(List([1,2,3])) |
+
| [1, 2] |
+
Aliases
array_pop_front
Returns the array without the first element.
Arguments> select array_pop_front([1, 2, 3]);
+
| array_pop_front(List([1,2,3])) |
+
| [2, 3] |
+
Aliases
array_position
Returns the position of the first occurrence of the specified element in the array.
array_position(array, element)
array_position(array, element, index)
Arguments
> select array_position([1, 2, 2, 3, 1, 4], 2);
+
| array_position(List([1,2,2,3,1,4]),Int64(2)) |
+
| 2 |
+
> select array_position([1, 2, 2, 3, 1, 4], 2, 3);
+
| array_position(List([1,2,2,3,1,4]),Int64(2), Int64(3)) |
+
| 3 |
+
Aliases
array_positions
Searches for an element in the array, returns all occurrences.
array_positions(array, element)
Arguments
> select array_positions([1, 2, 2, 3, 1, 4], 2);
+
| array_positions(List([1,2,2,3,1,4]),Int64(2)) |
+
| [2, 3] |
+
Aliases
array_prepend
Prepends an element to the beginning of an array.
array_prepend(element, array)
Arguments
> select array_prepend(1, [2, 3, 4]);
+
| array_prepend(Int64(1),List([2,3,4])) |
+
| [1, 2, 3, 4] |
+
Aliases
array_push_back
Alias of array_append.
array_push_front
Alias of array_prepend.
array_remove
Removes the first element from the array equal to the given value.
array_remove(array, element)
Arguments
> select array_remove([1, 2, 2, 3, 2, 1, 4], 2);
+
| array_remove(List([1,2,2,3,2,1,4]),Int64(2)) |
+
| [1, 2, 3, 2, 1, 4] |
+
Aliases
array_remove_all
Removes all elements from the array equal to the given value.
array_remove_all(array, element)
Arguments
> select array_remove_all([1, 2, 2, 3, 2, 1, 4], 2);
+
| array_remove_all(List([1,2,2,3,2,1,4]),Int64(2)) |
+
| [1, 3, 1, 4] |
+
Aliases
array_remove_n
Removes the first max
elements from the array equal to the given value.
array_remove_n(array, element, max))
Arguments
> select array_remove_n([1, 2, 2, 3, 2, 1, 4], 2, 2);
+
| array_remove_n(List([1,2,2,3,2,1,4]),Int64(2),Int64(2)) |
+
| [1, 3, 2, 1, 4] |
+
Aliases
array_repeat
Returns an array containing element count
times.
array_repeat(element, count)
Arguments
> select array_repeat(1, 3);
+
| array_repeat(Int64(1),Int64(3)) |
+
| [1, 1, 1] |
+
> select array_repeat([1, 2], 2);
+
| array_repeat(List([1,2]),Int64(2)) |
+
| [[1, 2], [1, 2]] |
+
Aliases
array_replace
Replaces the first occurrence of the specified element with another specified element.
array_replace(array, from, to)
Arguments
> select array_replace([1, 2, 2, 3, 2, 1, 4], 2, 5);
+
| array_replace(List([1,2,2,3,2,1,4]),Int64(2),Int64(5)) |
+
| [1, 5, 2, 3, 2, 1, 4] |
+
Aliases
array_replace_all
Replaces all occurrences of the specified element with another specified element.
array_replace_all(array, from, to)
Arguments
> select array_replace_all([1, 2, 2, 3, 2, 1, 4], 2, 5);
+
| array_replace_all(List([1,2,2,3,2,1,4]),Int64(2),Int64(5)) |
+
| [1, 5, 5, 3, 5, 1, 4] |
+
Aliases
array_replace_n
Replaces the first max
occurrences of the specified element with another specified element.
array_replace_n(array, from, to, max)
Arguments
> select array_replace_n([1, 2, 2, 3, 2, 1, 4], 2, 5, 2);
+
| array_replace_n(List([1,2,2,3,2,1,4]),Int64(2),Int64(5),Int64(2)) |
+
| [1, 5, 5, 3, 2, 1, 4] |
+
Aliases
array_resize
Resizes the list to contain size elements. Initializes new elements with value or empty if value is not set.
array_resize(array, size, value)
Arguments
> select array_resize([1, 2, 3], 5, 0);
+
| array_resize(List([1,2,3],5,0)) |
+
| [1, 2, 3, 0, 0] |
+
Aliases
array_reverse
Returns the array with the order of the elements reversed.
Arguments> select array_reverse([1, 2, 3, 4]);
+
| array_reverse(List([1, 2, 3, 4])) |
+
| [4, 3, 2, 1] |
+
Aliases
array_slice
Returns a slice of the array based on 1-indexed start and end positions.
array_slice(array, begin, end)
Arguments
> select array_slice([1, 2, 3, 4, 5, 6, 7, 8], 3, 6);
+
| array_slice(List([1,2,3,4,5,6,7,8]),Int64(3),Int64(6)) |
+
| [3, 4, 5, 6] |
+
Aliases
array_sort
Sort array.
array_sort(array, desc, nulls_first)
Arguments
ASC
or DESC
).NULLS FIRST
or NULLS LAST
).> select array_sort([3, 1, 2]);
+
| array_sort(List([3,1,2])) |
+
| [1, 2, 3] |
+
Aliases
array_to_string
Converts each element to its text representation.
array_to_string(array, delimiter[, null_string])
Arguments
> select array_to_string([[1, 2, 3, 4], [5, 6, 7, 8]], ',');
+
| array_to_string(List([1,2,3,4,5,6,7,8]),Utf8(",")) |
+
| 1,2,3,4,5,6,7,8 |
+
Aliases
array_union
Returns an array of elements that are present in both arrays (all elements from both arrays) with out duplicates.
array_union(array1, array2)
Arguments
> select array_union([1, 2, 3, 4], [5, 6, 3, 4]);
+
| array_union([1, 2, 3, 4], [5, 6, 3, 4]); |
+
| [1, 2, 3, 4, 5, 6] |
+
> select array_union([1, 2, 3, 4], [5, 6, 7, 8]);
+
| array_union([1, 2, 3, 4], [5, 6, 7, 8]); |
+
| [1, 2, 3, 4, 5, 6, 7, 8] |
+
Aliases
arrays_overlap
Alias of array_has_any.
cardinality
Returns the total number of elements in the array.
Arguments> select cardinality([[1, 2, 3, 4], [5, 6, 7, 8]]);
+
| cardinality(List([1,2,3,4,5,6,7,8])) |
+
| 8 |
+
empty
Returns 1 for an empty array or 0 for a non-empty array.
Arguments> select empty([1]);
+
| empty(List([1])) |
+
| 0 |
+
Aliases
flatten
Converts an array of arrays to a flat array.
The flattened array contains all the elements from all source arrays.
Arguments> select flatten([[1, 2], [3, 4]]);
+
| flatten(List([1,2], [3,4])) |
+
| [1, 2, 3, 4] |
+
generate_series
Similar to the range function, but it includes the upper bound.
generate_series(start, stop, step)
Arguments
> select generate_series(1,3);
+
| generate_series(Int64(1),Int64(3)) |
+
| [1, 2, 3] |
+
list_any_value
Alias of array_any_value.
list_append
Alias of array_append.
list_cat
Alias of array_concat.
list_concat
Alias of array_concat.
list_contains
Alias of array_has.
list_dims
Alias of array_dims.
list_distance
Alias of array_distance.
list_distinct
Alias of array_distinct.
list_element
Alias of array_element.
list_empty
Alias of empty.
list_except
Alias of array_except.
Alias of array_element.
list_has
Alias of array_has.
list_has_all
Alias of array_has_all.
list_has_any
Alias of array_has_any.
list_indexof
Alias of array_position.
list_intersect
Alias of array_intersect.
list_join
Alias of array_to_string.
list_length
Alias of array_length.
list_max
Alias of array_max.
list_ndims
Alias of array_ndims.
list_pop_back
Alias of array_pop_back.
list_pop_front
Alias of array_pop_front.
list_position
Alias of array_position.
list_positions
Alias of array_positions.
list_prepend
Alias of array_prepend.
list_push_back
Alias of array_append.
list_push_front
Alias of array_prepend.
list_remove
Alias of array_remove.
list_remove_all
Alias of array_remove_all.
list_remove_n
Alias of array_remove_n.
list_repeat
Alias of array_repeat.
list_replace
Alias of array_replace.
list_replace_all
Alias of array_replace_all.
list_replace_n
Alias of array_replace_n.
list_resize
Alias of array_resize.
list_reverse
Alias of array_reverse.
list_slice
Alias of array_slice.
list_sort
Alias of array_sort.
list_to_string
Alias of array_to_string.
list_union
Alias of array_union.
make_array
Returns an array using the specified input expressions.
make_array(expression1[, ..., expression_n])
Arguments
> select make_array(1, 2, 3, 4, 5);
+
| make_array(Int64(1),Int64(2),Int64(3),Int64(4),Int64(5)) |
+
| [1, 2, 3, 4, 5] |
+
Aliases
make_list
Alias of make_array.
range
Returns an Arrow array between start and stop with step. The range start..end contains all values with start <= x < end
. It is empty if start >= end
. Step cannot be 0.
> select range(2, 10, 3);
+
| range(Int64(2),Int64(10),Int64(3))|
+
| [2, 5, 8] |
+
> select range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH);
+
| range(DATE '1992-09-01', DATE '1993-03-01', INTERVAL '1' MONTH) |
+
| [1992-09-01, 1992-10-01, 1992-11-01, 1992-12-01, 1993-01-01, 1993-02-01] |
+
string_to_array
Splits a string into an array of substrings based on a delimiter. Any substrings matching the optional null_str
argument are replaced with NULL.
string_to_array(str, delimiter[, null_str])
Arguments
NULL
.> select string_to_array('abc##def', '##');
+
| string_to_array(Utf8('abc##def')) |
+
| ['abc', 'def'] |
+
> select string_to_array('abc def', ' ', 'def');
+
| string_to_array(Utf8('abc def'), Utf8(' '), Utf8('def')) |
+
| ['abc', NULL] |
+
Aliases
string_to_list
Alias of string_to_array.
Struct Functionsnamed_struct
Returns an Arrow struct using the specified name and input expressions pairs.
named_struct(expression1_name, expression1_input[, ..., expression_n_name, expression_n_input])
Arguments
For example, this query converts two columns a
and b
to a single column with a struct type of fields field_a
and field_b
:
> select * from t;
+
| a | b |
+
| 1 | 2 |
| 3 | 4 |
+
> select named_struct('field_a', a, 'field_b', b) from t;
+
| named_struct(Utf8("field_a"),t.a,Utf8("field_b"),t.b) |
+
| {field_a: 1, field_b: 2} |
| {field_a: 3, field_b: 4} |
+
row
Alias of struct.
struct
Returns an Arrow struct using the specified input expressions optionally named. Fields in the returned struct use the optional name or the cN
naming convention. For example: c0
, c1
, c2
, etc.
struct(expression1[, ..., expression_n])
Arguments
For example, this query converts two columns a
and b
to a single column with a struct type of fields field_a
and c1
:
> select * from t;
+
| a | b |
+
| 1 | 2 |
| 3 | 4 |
+
> select struct(a, b) from t;
+
| struct(t.a,t.b) |
+
| {c0: 1, c1: 2} |
| {c0: 3, c1: 4} |
+
select struct(a as field_a, b) from t;
+
| named_struct(Utf8("field_a"),t.a,Utf8("c1"),t.b) |
+
| {field_a: 1, c1: 2} |
| {field_a: 3, c1: 4} |
+
Aliases
element_at
Alias of map_extract.
map
Returns an Arrow map with the specified key-value pairs.
The make_map
function creates a map from two lists: one for keys and one for values. Each key must be unique and non-null.
map(key, value)
map(key: value)
make_map(['key1', 'key2'], ['value1', 'value2'])
Arguments
map
: Expression to be used for key. Can be a constant, column, function, or any combination of arithmetic or string operators. For make_map
: The list of keys to be used in the map. Each key must be unique and non-null.map
: Expression to be used for value. Can be a constant, column, function, or any combination of arithmetic or string operators. For make_map
: The list of values to be mapped to the corresponding keys.
SELECT MAP('type', 'test');
{type: test}
SELECT MAP(['POST', 'HEAD', 'PATCH'], [41, 33, null]);
{POST: 41, HEAD: 33, PATCH: NULL}
SELECT MAP([[1,2], [3,4]], ['a', 'b']);
{[1, 2]: a, [3, 4]: b}
SELECT MAP { 'a': 1, 'b': 2 };
{a: 1, b: 2}
SELECT MAKE_MAP(['POST', 'HEAD'], [41, 33]);
{POST: 41, HEAD: 33}
SELECT MAKE_MAP(['key1', 'key2'], ['value1', null]);
{key1: value1, key2: }
Returns a list containing the value for the given key or an empty list if the key is not present in the map.
ArgumentsSELECT map_extract(MAP {'a': 1, 'b': NULL, 'c': 3}, 'a');
[1]
SELECT map_extract(MAP {1: 'one', 2: 'two'}, 2);
['two']
SELECT map_extract(MAP {'x': 10, 'y': NULL, 'z': 30}, 'y');
[]
Aliases
map_keys
Returns a list of all keys in the map.
ArgumentsSELECT map_keys(MAP {'a': 1, 'b': NULL, 'c': 3});
[a, b, c]
SELECT map_keys(map([100, 5], [42, 43]));
[100, 5]
map_values
Returns a list of all values in the map.
ArgumentsSELECT map_values(MAP {'a': 1, 'b': NULL, 'c': 3});
[1, , 3]
SELECT map_values(map([100, 5], [42, 43]));
[42, 43]
Hashing Functions
digest
Computes the binary hash of an expression using the specified algorithm.
digest(expression, algorithm)
Arguments
> select digest('foo', 'sha256');
+
| digest(Utf8("foo"), Utf8("sha256")) |
+
| <binary_hash_result> |
+
md5
Computes an MD5 128-bit checksum for a string expression.
Arguments> select md5('foo');
+
| md5(Utf8("foo")) |
+
| <md5_checksum_result> |
+
sha224
Computes the SHA-224 hash of a binary string.
Arguments> select sha224('foo');
+
| sha224(Utf8("foo")) |
+
| <sha224_hash_result> |
+
sha256
Computes the SHA-256 hash of a binary string.
Arguments> select sha256('foo');
+
| sha256(Utf8("foo")) |
+
| <sha256_hash_result> |
+
sha384
Computes the SHA-384 hash of a binary string.
Arguments> select sha384('foo');
+
| sha384(Utf8("foo")) |
+
| <sha384_hash_result> |
+
sha512
Computes the SHA-512 hash of a binary string.
Arguments> select sha512('foo');
+
| sha512(Utf8("foo")) |
+
| <sha512_hash_result> |
+
Union Functions
Functions to work with the union data type, also know as tagged unions, variant types, enums or sum types. Note: Not related to the SQL UNION operator
Returns the value of the given field in the union when selected, or NULL otherwise.
union_extract(union, field_name)
Arguments
❯ select union_column, union_extract(union_column, 'a'), union_extract(union_column, 'b') from table_with_union;
+
| union_column | union_extract(union_column, 'a') | union_extract(union_column, 'b') |
+
| {a=1} | 1 | |
| {b=3.0} | | 3.0 |
| {a=4} | 4 | |
| {b=} | | |
| {a=} | | |
+
union_tag
Returns the name of the currently selected field in the union
union_tag(union_expression)
Arguments
❯ select union_column, union_tag(union_column) from table_with_union;
+
| union_column | union_tag(union_column) |
+
| {a=1} | a |
| {b=3.0} | b |
| {a=4} | a |
| {b=} | b |
| {a=} | a |
+
Other Functions
arrow_cast
Casts a value to a specific Arrow data type.
arrow_cast(expression, datatype)
Arguments
arrow_typeof
]> select arrow_cast(-5, 'Int8') as a,
arrow_cast('foo', 'Dictionary(Int32, Utf8)') as b,
arrow_cast('bar', 'LargeUtf8') as c,
arrow_cast('2023-01-02T12:53:02', 'Timestamp(Microsecond, Some("+08:00"))') as d
;
+
| a | b | c | d |
+
| -5 | foo | bar | 2023-01-02T12:53:02+08:00 |
+
arrow_typeof
Returns the name of the underlying Arrow data type of the expression.
Arguments> select arrow_typeof('foo'), arrow_typeof(1);
+
| arrow_typeof(Utf8("foo")) | arrow_typeof(Int64(1)) |
+
| Utf8 | Int64 |
+
get_field
Returns a field within a map or a struct with the given key. Note: most users invoke get_field
indirectly via field access syntax such as my_struct_col['field_name']
which results in a call to get_field(my_struct_col, 'field_name')
.
get_field(expression1, expression2)
Arguments
> create table t (idx varchar, v varchar) as values ('data','fusion'), ('apache', 'arrow');
> select struct(idx, v) from t as c;
+
| struct(c.idx,c.v) |
+
| {c0: data, c1: fusion} |
| {c0: apache, c1: arrow} |
+
> select get_field((select struct(idx, v) from t), 'c0');
+
| struct(t.idx,t.v)[c0] |
+
| data |
| apache |
+
> select get_field((select struct(idx, v) from t), 'c1');
+
| struct(t.idx,t.v)[c1] |
+
| fusion |
| arrow |
+
version
Returns the version of DataFusion.
Example> select version();
+
| version() |
+
| Apache DataFusion 42.0.0, aarch64 on macos |
+
404: Not Found
Aggregate FunctionsAggregate functions operate on a set of values to compute a single result.
General Functionsarray_agg
Returns an array created from the expression elements. If ordering is required, elements are inserted in the specified order. This aggregation function can only mix DISTINCT and ORDER BY if the ordering expression is exactly the same as the argument expression.
array_agg(expression [ORDER BY expression])
Arguments
> SELECT array_agg(column_name ORDER BY other_column) FROM table_name;
+
| array_agg(column_name ORDER BY other_column) |
+
| [element1, element2, element3] |
+
> SELECT array_agg(DISTINCT column_name ORDER BY column_name) FROM table_name;
+
| array_agg(DISTINCT column_name ORDER BY column_name) |
+
| [element1, element2, element3] |
+
avg
Returns the average of numeric values in the specified column.
Arguments> SELECT avg(column_name) FROM table_name;
+
| avg(column_name) |
+
| 42.75 |
+
Aliases
bit_and
Computes the bitwise AND of all non-null input values.
Argumentsbit_or
Computes the bitwise OR of all non-null input values.
Argumentsbit_xor
Computes the bitwise exclusive OR of all non-null input values.
Argumentsbool_and
Returns true if all non-null input values are true, otherwise false.
Arguments> SELECT bool_and(column_name) FROM table_name;
+
| bool_and(column_name) |
+
| true |
+
bool_or
Returns true if all non-null input values are true, otherwise false.
Arguments> SELECT bool_and(column_name) FROM table_name;
+
| bool_and(column_name) |
+
| true |
+
count
Returns the number of non-null values in the specified column. To include null values in the total count, use count(*)
.
> SELECT count(column_name) FROM table_name;
+
| count(column_name) |
+
| 100 |
+
> SELECT count(*) FROM table_name;
+
| count(*) |
+
| 120 |
+
first_value
Returns the first element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
first_value(expression [ORDER BY expression])
Arguments
> SELECT first_value(column_name ORDER BY other_column) FROM table_name;
+
| first_value(column_name ORDER BY other_column)|
+
| first_element |
+
grouping
Returns 1 if the data is aggregated across the specified column, or 0 if it is not aggregated in the result set.
Arguments> SELECT column_name, GROUPING(column_name) AS group_column
FROM table_name
GROUP BY GROUPING SETS ((column_name), ());
+
| column_name | group_column |
+
| value1 | 0 |
| value2 | 0 |
| NULL | 1 |
+
last_value
Returns the last element in an aggregation group according to the requested ordering. If no ordering is given, returns an arbitrary element from the group.
last_value(expression [ORDER BY expression])
Arguments
> SELECT last_value(column_name ORDER BY other_column) FROM table_name;
+
| last_value(column_name ORDER BY other_column) |
+
| last_element |
+
max
Returns the maximum value in the specified column.
Arguments> SELECT max(column_name) FROM table_name;
+
| max(column_name) |
+
| 150 |
+
mean
Alias of avg.
median
Returns the median value in the specified column.
Arguments> SELECT median(column_name) FROM table_name;
+
| median(column_name) |
+
| 45.5 |
+
min
Returns the minimum value in the specified column.
Arguments> SELECT min(column_name) FROM table_name;
+
| min(column_name) |
+
| 12 |
+
string_agg
Concatenates the values of string expressions and places separator values between them. If ordering is required, strings are concatenated in the specified order. This aggregation function can only mix DISTINCT and ORDER BY if the ordering expression is exactly the same as the first argument expression.
string_agg([DISTINCT] expression, delimiter [ORDER BY expression])
Arguments
> SELECT string_agg(name, ', ') AS names_list
FROM employee;
+
| names_list |
+
| Alice, Bob, Bob, Charlie |
+
> SELECT string_agg(name, ', ' ORDER BY name DESC) AS names_list
FROM employee;
+
| names_list |
+
| Charlie, Bob, Bob, Alice |
+
> SELECT string_agg(DISTINCT name, ', ' ORDER BY name DESC) AS names_list
FROM employee;
+
| names_list |
+
| Charlie, Bob, Alice |
+
sum
Returns the sum of all values in the specified column.
Arguments> SELECT sum(column_name) FROM table_name;
+
| sum(column_name) |
+
| 12345 |
+
var
Returns the statistical sample variance of a set of numbers.
Argumentsvar_pop
Returns the statistical population variance of a set of numbers.
Argumentsvar_population
Alias of var_pop.
var_samp
Alias of var.
var_sample
Alias of var.
Statistical Functionscorr
Returns the coefficient of correlation between two numeric values.
corr(expression1, expression2)
Arguments
> SELECT corr(column1, column2) FROM table_name;
+
| corr(column1, column2) |
+
| 0.85 |
+
covar
Alias of covar_samp.
covar_pop
Returns the sample covariance of a set of number pairs.
covar_samp(expression1, expression2)
Arguments
> SELECT covar_samp(column1, column2) FROM table_name;
+
| covar_samp(column1, column2) |
+
| 8.25 |
+
covar_samp
Returns the sample covariance of a set of number pairs.
covar_samp(expression1, expression2)
Arguments
> SELECT covar_samp(column1, column2) FROM table_name;
+
| covar_samp(column1, column2) |
+
| 8.25 |
+
Aliases
nth_value
Returns the nth value in a group of values.
nth_value(expression, n ORDER BY expression)
Arguments
> SELECT dept_id, salary, NTH_VALUE(salary, 2) OVER (PARTITION BY dept_id ORDER BY salary ASC) AS second_salary_by_dept
FROM employee;
+
| dept_id | salary | second_salary_by_dept |
+
| 1 | 30000 | NULL |
| 1 | 40000 | 40000 |
| 1 | 50000 | 40000 |
| 2 | 35000 | NULL |
| 2 | 45000 | 45000 |
+
regr_avgx
Computes the average of the independent variable (input) expression_x for the non-null paired data points.
regr_avgx(expression_y, expression_x)
Arguments
regr_avgy
Computes the average of the dependent variable (output) expression_y for the non-null paired data points.
regr_avgy(expression_y, expression_x)
Arguments
regr_count
Counts the number of non-null paired data points.
regr_count(expression_y, expression_x)
Arguments
regr_intercept
Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.
regr_intercept(expression_y, expression_x)
Arguments
regr_r2
Computes the square of the correlation coefficient between the independent and dependent variables.
regr_r2(expression_y, expression_x)
Arguments
regr_slope
Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k*X + b) using minimal RSS fitting.
regr_slope(expression_y, expression_x)
Arguments
regr_sxx
Computes the sum of squares of the independent variable.
regr_sxx(expression_y, expression_x)
Arguments
regr_sxy
Computes the sum of products of paired data points.
regr_sxy(expression_y, expression_x)
Arguments
regr_syy
Computes the sum of squares of the dependent variable.
regr_syy(expression_y, expression_x)
Arguments
stddev
Returns the standard deviation of a set of numbers.
Arguments> SELECT stddev(column_name) FROM table_name;
+
| stddev(column_name) |
+
| 12.34 |
+
Aliases
stddev_pop
Returns the population standard deviation of a set of numbers.
Arguments> SELECT stddev_pop(column_name) FROM table_name;
+
| stddev_pop(column_name) |
+
| 10.56 |
+
stddev_samp
Alias of stddev.
Approximate Functionsapprox_distinct
Returns the approximate number of distinct input values calculated using the HyperLogLog algorithm.
approx_distinct(expression)
Arguments
> SELECT approx_distinct(column_name) FROM table_name;
+
| approx_distinct(column_name) |
+
| 42 |
+
approx_median
Returns the approximate median (50th percentile) of input values. It is an alias of approx_percentile_cont(0.5) WITHIN GROUP (ORDER BY x)
.
approx_median(expression)
Arguments
> SELECT approx_median(column_name) FROM table_name;
+
| approx_median(column_name) |
+
| 23.5 |
+
approx_percentile_cont
Returns the approximate percentile of input values using the t-digest algorithm.
approx_percentile_cont(percentile, centroids) WITHIN GROUP (ORDER BY expression)
Arguments
> SELECT approx_percentile_cont(0.75, 100) WITHIN GROUP (ORDER BY column_name) FROM table_name;
+
| approx_percentile_cont(0.75, 100) WITHIN GROUP (ORDER BY column_name) |
+
| 65.0 |
+
approx_percentile_cont_with_weight
Returns the weighted approximate percentile of input values using the t-digest algorithm.
approx_percentile_cont_with_weight(weight, percentile) WITHIN GROUP (ORDER BY expression)
Arguments
> SELECT approx_percentile_cont_with_weight(weight_column, 0.90) WITHIN GROUP (ORDER BY column_name) FROM table_name;
+
| approx_percentile_cont_with_weight(weight_column, 0.90) WITHIN GROUP (ORDER BY column_name) |
+
| 78.5 |
+
Window Functions
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. However, window functions do not cause rows to become grouped into a single output row like non-window aggregate calls would. Instead, the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result
Here is an example that shows how to compare each employee's salary with the average salary in his or her department:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
+
| depname | empno | salary | avg |
+
| personnel | 2 | 3900 | 3700.0 |
| personnel | 5 | 3500 | 3700.0 |
| develop | 8 | 6000 | 5020.0 |
| develop | 10 | 5200 | 5020.0 |
| develop | 11 | 5200 | 5020.0 |
| develop | 9 | 4500 | 5020.0 |
| develop | 7 | 4200 | 5020.0 |
| sales | 1 | 5000 | 4866.666666666667 |
| sales | 4 | 4800 | 4866.666666666667 |
| sales | 3 | 4800 | 4866.666666666667 |
+
A window function call always contains an OVER clause directly following the window function's name and argument(s). This is what syntactically distinguishes it from a normal function or non-window aggregate. The OVER clause determines exactly how the rows of the query are split up for processing by the window function. The PARTITION BY clause within OVER divides the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row. The previous example showed how to count the average of a column per partition.
You can also control the order in which rows are processed by window functions using ORDER BY within OVER. (The window ORDER BY does not even have to match the order in which the rows are output.) Here is an example:
SELECT depname, empno, salary,
rank() OVER (PARTITION BY depname ORDER BY salary DESC)
FROM empsalary;
+
| depname | empno | salary | rank |
+
| personnel | 2 | 3900 | 1 |
| develop | 8 | 6000 | 1 |
| develop | 10 | 5200 | 2 |
| develop | 11 | 5200 | 2 |
| develop | 9 | 4500 | 4 |
| develop | 7 | 4200 | 5 |
| sales | 1 | 5000 | 1 |
| sales | 4 | 4800 | 2 |
| personnel | 5 | 3500 | 2 |
| sales | 3 | 4800 | 2 |
+
There is another important concept associated with window functions: for each row, there is a set of rows within its partition called its window frame. Some window functions act only on the rows of the window frame, rather than of the whole partition. Here is an example of using window frames in queries:
SELECT depname, empno, salary,
avg(salary) OVER(ORDER BY salary ASC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS avg,
min(salary) OVER(ORDER BY empno ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cum_min
FROM empsalary
ORDER BY empno ASC;
+
| depname | empno | salary | avg | cum_min |
+
| sales | 1 | 5000 | 5000.0 | 5000 |
| personnel | 2 | 3900 | 3866.6666666666665 | 3900 |
| sales | 3 | 4800 | 4700.0 | 3900 |
| sales | 4 | 4800 | 4866.666666666667 | 3900 |
| personnel | 5 | 3500 | 3700.0 | 3500 |
| develop | 7 | 4200 | 4200.0 | 3500 |
| develop | 8 | 6000 | 5600.0 | 3500 |
| develop | 9 | 4500 | 4500.0 | 3500 |
| develop | 10 | 5200 | 5133.333333333333 | 3500 |
| develop | 11 | 5200 | 5466.666666666667 | 3500 |
+
When a query involves multiple window functions, it is possible to write out each one with a separate OVER clause, but this is duplicative and error-prone if the same windowing behavior is wanted for several functions. Instead, each windowing behavior can be named in a WINDOW clause and then referenced in OVER. For example:
SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC);
Syntax
The syntax for the OVER-clause is
function([expr])
OVER(
[PARTITION BY expr[, …]]
[ORDER BY expr [ ASC | DESC ][, …]]
[ frame_clause ]
)
where frame_clause is one of:
{ RANGE | ROWS | GROUPS } frame_start
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end
and frame_start and frame_end can be one of
UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING
where offset is an non-negative integer.
RANGE and GROUPS modes require an ORDER BY clause (with RANGE the ORDER BY must specify exactly one column).
Aggregate functionsAll aggregate functions can be used as window functions.
Ranking Functionscume_dist
Relative rank of the current row: (number of rows preceding or peer with the current row) / (total rows).
Example
SELECT salary,
cume_dist() OVER (ORDER BY salary) AS cume_dist
FROM employees;
+
| salary | cume_dist |
+
| 30000 | 0.33 |
| 50000 | 0.67 |
| 70000 | 1.00 |
+
dense_rank
Returns the rank of the current row without gaps. This function ranks rows in a dense manner, meaning consecutive ranks are assigned even for identical values.
ntile
Integer ranging from 1 to the argument value, dividing the partition as equally as possible
Argumentspercent_rank
Returns the percentage rank of the current row within its partition. The value ranges from 0 to 1 and is computed as (rank - 1) / (total_rows - 1)
.
rank
Returns the rank of the current row within its partition, allowing gaps between ranks. This function provides a ranking similar to row_number
, but skips ranks for identical values.
row_number
Number of the current row within its partition, counting from 1.
Analytical Functionsfirst_value
Returns value evaluated at the row that is the first row of the window frame.
Argumentslag
Returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value).
lag(expression, offset, default)
Arguments
last_value
Returns value evaluated at the row that is the last row of the window frame.
Argumentslead
Returns value evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead return default (which must be of the same type as value).
lead(expression, offset, default)
Arguments
nth_value
Returns the value evaluated at the nth row of the window frame (counting from 1). Returns NULL if no such row exists.
Arguments
CREATE TABLE employees (id INT, salary INT);
INSERT INTO employees (id, salary) VALUES
(1, 30000),
(2, 40000),
(3, 50000),
(4, 60000),
(5, 70000);
SELECT nth_value(salary, 2) OVER (
ORDER BY salary
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS nth_value
FROM employees;
+-----------+
| nth_value |
+-----------+
| 40000 |
| 40000 |
| 40000 |
| 40000 |
| 40000 |
+-----------+
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4