A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://arrow.apache.org/docs/python/generated/pyarrow.dataset.partitioning.html below:

pyarrow.dataset.partitioning — Apache Arrow v21.0.0

pyarrow.dataset.partitioning#
pyarrow.dataset.partitioning(schema=None, field_names=None, flavor=None, dictionaries=None)[source]#

Specify a partitioning scheme.

The supported schemes include:

Parameters:
schemapyarrow.Schema, default None

The schema that describes the partitions present in the file path. If not specified, and field_names and/or flavor are specified, the schema will be inferred from the file path (and a PartitioningFactory is returned).

field_nameslist of str, default None

A list of strings (field names). If specified, the schema’s types are inferred from the file paths (only valid for DirectoryPartitioning).

flavorstr, default None

The default is DirectoryPartitioning. Specify flavor="hive" for a HivePartitioning, and flavor="filename" for a FilenamePartitioning.

dictionariesdict[str, Array]

If the type of any field of schema is a dictionary type, the corresponding entry of dictionaries must be an array containing every value which may be taken by the corresponding column or an error will be raised in parsing. Alternatively, pass infer to have Arrow discover the dictionary values, in which case a PartitioningFactory is returned.

Returns:
Partitioning or PartitioningFactory

The partitioning scheme

Examples

Specify the Schema for paths like “/2009/June”:

>>> import pyarrow as pa
>>> import pyarrow.dataset as ds
>>> part = ds.partitioning(pa.schema([("year", pa.int16()),
...                                   ("month", pa.string())]))

or let the types be inferred by only specifying the field names:

>>> part =  ds.partitioning(field_names=["year", "month"])

For paths like “/2009/June”, the year will be inferred as int32 while month will be inferred as string.

Specify a Schema with dictionary encoding, providing dictionary values:

>>> part = ds.partitioning(
...     pa.schema([
...         ("year", pa.int16()),
...         ("month", pa.dictionary(pa.int8(), pa.string()))
...     ]),
...     dictionaries={
...         "month": pa.array(["January", "February", "March"]),
...     })

Alternatively, specify a Schema with dictionary encoding, but have Arrow infer the dictionary values:

>>> part = ds.partitioning(
...     pa.schema([
...         ("year", pa.int16()),
...         ("month", pa.dictionary(pa.int8(), pa.string()))
...     ]),
...     dictionaries="infer")

Create a Hive scheme for a path like “/year=2009/month=11”:

>>> part = ds.partitioning(
...     pa.schema([("year", pa.int16()), ("month", pa.int8())]),
...     flavor="hive")

A Hive scheme can also be discovered from the directory structure (and types will be inferred):

>>> part = ds.partitioning(flavor="hive")

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4