Spark SQL and DataFrames support the following data types:
ByteType
: Represents 1-byte signed integer numbers. The range of numbers is from -128
to 127
.ShortType
: Represents 2-byte signed integer numbers. The range of numbers is from -32768
to 32767
.IntegerType
: Represents 4-byte signed integer numbers. The range of numbers is from -2147483648
to 2147483647
.LongType
: Represents 8-byte signed integer numbers. The range of numbers is from -9223372036854775808
to 9223372036854775807
.FloatType
: Represents 4-byte single-precision floating point numbers.DoubleType
: Represents 8-byte double-precision floating point numbers.DecimalType
: Represents arbitrary-precision signed decimal numbers. Backed internally by java.math.BigDecimal
. A BigDecimal
consists of an arbitrary precision integer unscaled value and a 32-bit integer scale.StringType
: Represents character string values.VarcharType(length)
: A variant of StringType
which has a length limitation. Data writing will fail if the input string exceeds the length limitation. Note: this type can only be used in table schema, not functions/operators.CharType(length)
: A variant of VarcharType(length)
which is fixed length. Reading column of type CharType(n)
always returns string values of length n
. Char type column comparison will pad the short one to the longer length.BinaryType
: Represents byte sequence values.BooleanType
: Represents boolean values.DateType
: Represents values comprising values of fields year, month and day, without a time-zone.TimestampType
: Timestamp with local time zone(TIMESTAMP_LTZ). It represents values comprising values of fields year, month, day, hour, minute, and second, with the session local time-zone. The timestamp value represents an absolute point in time.TimestampNTZType
: Timestamp without time zone(TIMESTAMP_NTZ). It represents values comprising values of fields year, month, day, hour, minute, and second. All operations are performed without taking any time zone into account.
TIMESTAMP_LTZ
(default value) or TIMESTAMP_NTZ
via the configuration spark.sql.timestampType
.YearMonthIntervalType(startField, endField)
: Represents a year-month interval which is made up of a contiguous subset of the following fields:
[0..11]
,[0..178956970]
.Individual interval fields are non-negative, but an interval itself can have a sign, and be negative.
startField
is the leftmost field, and endField
is the rightmost field of the type. Valid values of startField
and endField
are 0(MONTH) and 1(YEAR). Supported year-month interval types are:
YearMonthIntervalType(YEAR, YEAR)
or YearMonthIntervalType(YEAR)
INTERVAL YEAR INTERVAL '2021' YEAR
YearMonthIntervalType(YEAR, MONTH)
INTERVAL YEAR TO MONTH INTERVAL '2021-07' YEAR TO MONTH
YearMonthIntervalType(MONTH, MONTH)
or YearMonthIntervalType(MONTH)
INTERVAL MONTH INTERVAL '10' MONTH
DayTimeIntervalType(startField, endField)
: Represents a day-time interval which is made up of a contiguous subset of the following fields:
[0..59.999999]
,[0..59]
,[0..23]
,[0..106751991]
.Individual interval fields are non-negative, but an interval itself can have a sign, and be negative.
startField
is the leftmost field, and endField
is the rightmost field of the type. Valid values of startField
and endField
are 0 (DAY), 1 (HOUR), 2 (MINUTE), 3 (SECOND). Supported day-time interval types are:
DayTimeIntervalType(DAY, DAY)
or DayTimeIntervalType(DAY)
INTERVAL DAY INTERVAL '100' DAY
DayTimeIntervalType(DAY, HOUR)
INTERVAL DAY TO HOUR INTERVAL '100 10' DAY TO HOUR
DayTimeIntervalType(DAY, MINUTE)
INTERVAL DAY TO MINUTE INTERVAL '100 10:30' DAY TO MINUTE
DayTimeIntervalType(DAY, SECOND)
INTERVAL DAY TO SECOND INTERVAL '100 10:30:40.999999' DAY TO SECOND
DayTimeIntervalType(HOUR, HOUR)
or DayTimeIntervalType(HOUR)
INTERVAL HOUR INTERVAL '123' HOUR
DayTimeIntervalType(HOUR, MINUTE)
INTERVAL HOUR TO MINUTE INTERVAL '123:10' HOUR TO MINUTE
DayTimeIntervalType(HOUR, SECOND)
INTERVAL HOUR TO SECOND INTERVAL '123:10:59' HOUR TO SECOND
DayTimeIntervalType(MINUTE, MINUTE)
or DayTimeIntervalType(MINUTE)
INTERVAL MINUTE INTERVAL '1000' MINUTE
DayTimeIntervalType(MINUTE, SECOND)
INTERVAL MINUTE TO SECOND INTERVAL '1000:01.001' MINUTE TO SECOND
DayTimeIntervalType(SECOND, SECOND)
or DayTimeIntervalType(SECOND)
INTERVAL SECOND INTERVAL '1000.000001' SECOND
ArrayType(elementType, containsNull)
: Represents values comprising a sequence of elements with the type of elementType
. containsNull
is used to indicate if elements in a ArrayType
value can have null
values.MapType(keyType, valueType, valueContainsNull)
: Represents values comprising a set of key-value pairs. The data type of keys is described by keyType
and the data type of values is described by valueType
. For a MapType
value, keys are not allowed to have null
values. valueContainsNull
is used to indicate if values of a MapType
value can have null
values.StructType(fields)
: Represents values with the structure described by a sequence of StructField
s (fields
).
StructField(name, dataType, nullable)
: Represents a field in a StructType
. The name of a field is indicated by name
. The data type of a field is indicated by dataType
. nullable
is used to indicate if values of these fields can have null
values.All data types of Spark SQL are located in the package of pyspark.sql.types
. You can access them by doing
from pyspark.sql.types import *
Data type Value type in Python API to access or create a data type ByteType int
All data types of Spark SQL are located in the package org.apache.spark.sql.types
. You can access them by doing
import org.apache.spark.sql.types._
Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala" in the Spark repo.
Data type Value type in Scala API to access or create a data type ByteType Byte ByteType ShortType Short ShortType IntegerType Int IntegerType LongType Long LongType FloatType Float FloatType DoubleType Double DoubleType DecimalType java.math.BigDecimal DecimalType StringType String StringType CharType(length) String CharType(length) VarcharType(length) String VarcharType(length) BinaryType Array[Byte] BinaryType BooleanType Boolean BooleanType TimestampType java.time.Instant or java.sql.Timestamp TimestampType TimestampNTZType java.time.LocalDateTime TimestampNTZType DateType java.time.LocalDate or java.sql.Date DateType YearMonthIntervalType java.time.Period YearMonthIntervalType DayTimeIntervalType java.time.Duration DayTimeIntervalType ArrayType scala.collection.Seq ArrayType(elementType, [containsNull])All data types of Spark SQL are located in the package of org.apache.spark.sql.types
. To access or create a data type, please use factory methods provided in org.apache.spark.sql.types.DataTypes
.
The following table shows the type names as well as aliases used in Spark SQL parser for each data type.
Data type SQL name BooleanType BOOLEAN ByteType BYTE, TINYINT ShortType SHORT, SMALLINT IntegerType INT, INTEGER LongType LONG, BIGINT FloatType FLOAT, REAL DoubleType DOUBLE DateType DATE TimestampType TIMESTAMP, TIMESTAMP_LTZ TimestampNTZType TIMESTAMP_NTZ StringType STRING CharType(length) CHAR(length) VarcharType(length) VARCHAR(length) BinaryType BINARY DecimalType DECIMAL, DEC, NUMERIC YearMonthIntervalType INTERVAL YEAR, INTERVAL YEAR TO MONTH, INTERVAL MONTH DayTimeIntervalType INTERVAL DAY, INTERVAL DAY TO HOUR, INTERVAL DAY TO MINUTE, INTERVAL DAY TO SECOND, INTERVAL HOUR, INTERVAL HOUR TO MINUTE, INTERVAL HOUR TO SECOND, INTERVAL MINUTE, INTERVAL MINUTE TO SECOND, INTERVAL SECOND ArrayType ARRAY<element_type> StructType STRUCT<field1_name: field1_type, field2_name: field2_type, …>Spark SQL supports several special floating point values in a case-insensitive manner:
FloatType
: equivalent to Scala Float.PositiveInfinity
.DoubleType
: equivalent to Scala Double.PositiveInfinity
.FloatType
: equivalent to Scala Float.NegativeInfinity
.DoubleType
: equivalent to Scala Double.NegativeInfinity
.FloatType
: equivalent to Scala Float.NaN
.DoubleType
: equivalent to Scala Double.NaN
.There is special handling for positive and negative infinity. They have the following semantics:
There is special handling for not-a-number (NaN) when dealing with float
or double
types that do not exactly match standard floating point semantics. Specifically:
SELECT double('infinity') AS col;
+--------+
| col|
+--------+
|Infinity|
+--------+
SELECT float('-inf') AS col;
+---------+
| col|
+---------+
|-Infinity|
+---------+
SELECT float('NaN') AS col;
+---+
|col|
+---+
|NaN|
+---+
SELECT double('infinity') * 0 AS col;
+---+
|col|
+---+
|NaN|
+---+
SELECT double('-infinity') * (-1234567) AS col;
+--------+
| col|
+--------+
|Infinity|
+--------+
SELECT double('infinity') < double('NaN') AS col;
+----+
| col|
+----+
|true|
+----+
SELECT double('NaN') = double('NaN') AS col;
+----+
| col|
+----+
|true|
+----+
SELECT double('inf') = double('infinity') AS col;
+----+
| col|
+----+
|true|
+----+
CREATE TABLE test (c1 int, c2 double);
INSERT INTO test VALUES
(1, double('infinity')),
(2, double('infinity')),
(3, double('inf')),
(4, double('-inf')),
(5, double('NaN')),
(6, double('NaN')),
(7, double('-infinity'))
;
SELECT COUNT(*), c2
FROM test
GROUP BY c2
ORDER BY c2;
+---------+---------+
| count(1)| c2|
+---------+---------+
| 2|-Infinity|
| 3| Infinity|
| 2| NaN|
+---------+---------+
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4