Bases: object
Sorting specification for a single column.
Returned by RowGroupMetaData.sorting_columns()
and used in ParquetWriter
to specify the sort order of the data.
int
Index of column that data is sorted by.
False
Whether column is sorted in descending order.
False
Whether null values appear before valid values.
Notes
Column indices are zero-based, refer only to leaf fields, and are in depth-first order. This may make the column indices for nested schemas different from what you expect. In most cases, it will be easier to specify the sort order using column names instead of column indices and converting using the from_ordering
method.
Examples
In other APIs, sort order is specified by names, such as:
>>> sort_order = [('id', 'ascending'), ('timestamp', 'descending')]
For Parquet, the column index must be used instead:
>>> import pyarrow.parquet as pq >>> [pq.SortingColumn(0), pq.SortingColumn(1, descending=True)] [SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False)]
Convert the sort_order into the list of sorting columns with from_ordering
(note that the schema must be provided as well):
>>> import pyarrow as pa >>> schema = pa.schema([('id', pa.int64()), ('timestamp', pa.timestamp('ms'))]) >>> sorting_columns = pq.SortingColumn.from_ordering(schema, sort_order) >>> sorting_columns (SortingColumn(column_index=0, descending=False, nulls_first=False), SortingColumn(column_index=1, descending=True, nulls_first=False))
Convert back to the sort order with to_ordering
:
>>> pq.SortingColumn.to_ordering(schema, sorting_columns) ((('id', 'ascending'), ('timestamp', 'descending')), 'at_end')
Methods
Attributes
âIndex of column data is sorted by (int).
Whether column is sorted in descending order (bool).
Create a tuple of SortingColumn objects from the same arguments as pyarrow.compute.SortOptions
.
Schema
Schema of the input data.
Sequence
of (name
, order
) tuples
Names of field/column keys (str) to sort the input on, along with the order each field/column is sorted in. Accepted values for order are âascendingâ, âdescendingâ.
Where null values should appear in the sort order.
tuple
of SortingColumn
Whether null values appear before valid values (bool).
Get dictionary representation of the SortingColumn.
dict
Dictionary with a key for each attribute of this class.
Convert a tuple of SortingColumn objects to the same format as pyarrow.compute.SortOptions
.
Schema
Schema of the input data.
tuple
of SortingColumn
Columns to sort the input on.
tuple
of (name
, order
) tuples
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4