Introduced or updated: v1.2.713
Databend accepts a variety of file formats both as a source and as a target for data loading or unloading. This page explains the supported file formats and their available options.
SyntaxTo specify a file format in a statement, use the following syntax:
... FILE_FORMAT = ( TYPE = { CSV | TSV | NDJSON | PARQUET | ORC | AVRO } [ formatTypeOptions ] )
... FILE_FORMAT = ( FORMAT_NAME = '<your-custom-format>' )
formatTypeOptions
includes one or more options to describe other format details about the file. The options vary depending on the file format. See the sections below to find out the available options for each supported file format.
formatTypeOptions ::=
RECORD_DELIMITER = '<character>'
FIELD_DELIMITER = '<character>'
SKIP_HEADER = <integer>
QUOTE = '<character>'
ESCAPE = '<character>'
NAN_DISPLAY = '<string>'
ROW_TAG = '<string>'
COMPRESSION = AUTO | GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE | XZ | NONE
CSV Options
Databend accepts CVS files that are compliant with RFC 4180 and is subject to the following conditions:
Separates records in an input file.
Available Values:
\r\n
#
and |
.\b
, \f
, \r
, \n
, \t
, \0
, \xHH
Default: \n
Separates fields in a record.
Available Values:
#
and |
.\b
, \f
, \r
, \n
, \t
, \0
, \xHH
Default: ,
(comma)
Quotes strings in a CSV file. For data loading, the quote is not necessary unless a string contains the character of a QUOTE, ESCAPE, RECORD_DELIMITER, or FIELD_DELIMITER.
Available Values: '
, "
, or `(backtick)
Default: "
Escapes a quote in a quoted string.
Available Values: '\\'
or ''
Default: ''
Specifies how many lines to be skipped from the beginning of the file.
Default: 0
Specifies how "NaN" (Not-a-Number) values are displayed in query results.
Available Values: Must be literal 'nan'
or 'null'
(case-insensitive)
Default: 'NaN'
Specifies how NULL values are displayed in query results.
Default: '\N'
ERROR_ON_COLUMN_COUNT_MISMATCH is a boolean option that, when set to true, specifies that an error should be raised if the number of columns in the data file doesn't match the number of columns in the destination table. Setting it to true helps ensure data integrity and consistency during the loading process.
Default: true
Specifies the value that should be used when encountering empty fields, including both ,,
and ,"",
, in the CSV data being loaded into the table.
null
(Default) Interprets empty fields as NULL values. Applicable to nullable columns only. string
Interprets empty fields as empty strings (''). Applicable to String columns only. field_default
Uses the column's default value for empty fields.
Specifies whether to include a header row in the CSV file when exporting data with the COPY INTO <location>
command. Defaults to false
.
Controls the binary encoding format during both data export and import operations, with available values HEX
(default) and BASE64
.
Specifies the compression algorithm.
Available Values DescriptionNONE
(Default) Indicates that the files are not compressed. AUTO
Auto detect compression via file extensions GZIP
BZ2
BROTLI
Must be specified if loading/unloading Brotli-compressed files. ZSTD
Zstandard v0.8 (and higher) is supported. DEFLATE
Deflate-compressed files (with zlib header, RFC1950). RAW_DEFLATE
Deflate-compressed files (without any header, RFC1951). XZ
TSV Options
Databend is subject to the following conditions when dealing with a TSV file:
\b
, \f
, \r
, \n
, \t
, \0
, \\
, \'
, RECORD_DELIMITER, FIELD_DELIMITER.\N
.note
mysqldump
command with option --tab
without --fields-enclosed-by
or --fields-optinally-enclosed-by
, if the later two is specified, use CSV instead.ESCAPE_UNENCLOSED_FIELD
. if ESCAPE_UNENCLOSED_FIELD
is specified, use CSV instead.Separates records in an input file.
Available Values:
\r\n
#
and |
.\b
, \f
, \r
, \n
, \t
, \0
, \xHH
Default: \n
Separates fields in a record.
Available Values:
#
and |
.\b
, \f
, \r
, \n
, \t
, \0
, \xHH
Default: \t
(TAB)
Same as the COMPRESSION option for CSV.
NDJSON Options NULL_FIELD_AS (Load Only)Specifies how to handle null values during data loading. Refer to the options in the table below for possible configurations.
Available Values DescriptionNULL
(Default) Interprets null values as NULL for nullable fields. An error will be generated for non-nullable fields. FIELD_DEFAULT
Uses the default value of the field for null values. MISSING_FIELD_AS (Load Only)
Determines the behavior when encountering missing fields during data loading. Refer to the options in the table below for possible configurations.
Available Values DescriptionERROR
(Default) Generates an error if a missing field is encountered. NULL
Interprets missing fields as NULL values. An error will be generated for non-nullable fields. FIELD_DEFAULT
Uses the default value of the field for missing fields. COMPRESSION
Same as the COMPRESSION option for CSV.
PARQUET Options MISSING_FIELD_AS (Load Only)Determines the behavior when encountering missing fields during data loading. Refer to the options in the table below for possible configurations.
Available Values DescriptionERROR
(Default) Generates an error if a missing field is encountered. FIELD_DEFAULT
Uses the default value of the field for missing fields. COMPRESSION (Unload Only)
Specifies the compression algorithm, which is used for compressing internal blocks of the file rather than the entire file, so the output remains in Parquet format.
Available Values DescriptionZSTD
(default) Zstandard v0.8 (and higher) is supported. SNAPPY
Snappy is a popular and fast compression algorithm often used with Parquet. ORC Options MISSING_FIELD_AS (Load Only)
Determines the behavior when encountering missing fields during data loading. Refer to the options in the table below for possible configurations.
Available Values DescriptionERROR
(Default) Generates an error if a missing field is encountered. FIELD_DEFAULT
Uses the default value of the field for missing fields. AVRO Options MISSING_FIELD_AS (Load Only)
Determines the behavior when encountering missing fields during data loading. Refer to the options in the table below for possible configurations.
Available Values DescriptionERROR
(Default) Generates an error if a missing field is encountered. FIELD_DEFAULT
Uses the default value of the field for missing fields.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4