Bases: _Weakrefable
A named collection of types a.k.a schema. A schema defines the column names and types in a record batch or table data structure. They also contain metadata about the columns. For example, schemas converted from Pandas contain metadata about their original Pandas types so they can be converted back to the same types.
Warning
Do not call this classâs constructor directly. Instead use pyarrow.schema()
factory function which makes a new Arrow Schema object.
Examples
Create a new Arrow Schema object:
>>> import pyarrow as pa >>> pa.schema([ ... ('some_int', pa.int32()), ... ('some_string', pa.string()) ... ]) some_int: int32 some_string: string
Create Arrow Schema with metadata:
>>> pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal'
Methods
Attributes
DEPRECATED
dict
Keys and values must be string-like / coercible to bytes
Append a field at the end of the schema.
In contrast to Pythonâs list.append()
it does return a new object, leaving the original Schema unmodified.
Field
Schema
New object with appended field.
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Append a field âextraâ at the end of the schema:
>>> schema_new = schema.append(pa.field('extra', pa.bool_())) >>> schema_new n_legs: int64 animals: string extra: bool
Original schema is unmodified:
>>> schema n_legs: int64 animals: string
Provide an empty table according to the schema.
pyarrow.Table
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Create an empty table with schemaâs fields:
>>> schema.empty_table() pyarrow.Table n_legs: int64 animals: string ---- n_legs: [[]] animals: [[]]
Test if this schema is equal to the other
pyarrow.Schema
False
Key/value metadata must be equal too
Examples
>>> import pyarrow as pa >>> schema1 = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) >>> schema2 = pa.schema([ ... ('some_int', pa.int32()), ... ('some_string', pa.string()) ... ])
Test two equal schemas:
>>> schema1.equals(schema1) True
Test two unequal schemas:
>>> schema1.equals(schema2) False
Select a field by its column name or numeric index.
int
or str
pyarrow.Field
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Select the second field:
>>> schema.field(1) pyarrow.Field<animals: string>
Select the field of the column named ân_legsâ:
>>> schema.field('n_legs') pyarrow.Field<n_legs: int64>
DEPRECATED
str
pyarrow.Field
Returns implied schema from dataframe
pandas.DataFrame
True
Whether to store the index as an additional column (or columns, for MultiIndex) in the resulting Table. The default of None will store the index as a column, except for RangeIndex which is stored as metadata only. Use preserve_index=True
to force it to be stored as a column.
pyarrow.Schema
Examples
>>> import pandas as pd >>> import pyarrow as pa >>> df = pd.DataFrame({ ... 'int': [1, 2], ... 'str': ['a', 'b'] ... })
Create an Arrow Schema from the schema of a pandas dataframe:
>>> pa.Schema.from_pandas(df) int: int64 str: string -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, ...
Return sorted list of indices for the fields with the given name.
str
The name of the field to look up.
List
[int
]
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string()), ... pa.field('animals', pa.bool_())])
Get the indexes of the fields named âanimalsâ:
>>> schema.get_all_field_indices("animals") [1, 2]
Return index of the unique field with the given name.
str
The name of the field to look up.
int
The index of the field with the given name; -1 if the name isnât found or there are several fields with the given name.
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Get the index of the field named âanimalsâ:
>>> schema.get_field_index("animals") 1
Index in case of several fields with the given name:
>>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string()), ... pa.field('animals', pa.bool_())], ... metadata={"n_legs": "Number of legs per animal"}) >>> schema.get_field_index("animals") -1
Add a field at position i to the schema.
int
Field
Schema
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Insert a new field on the second position:
>>> schema.insert(1, pa.field('extra', pa.bool_())) n_legs: int64 extra: bool animals: string
The schemaâs metadata (if any is set).
dict
or None
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"})
Get the metadata of the schemaâs fields:
>>> schema.metadata {b'n_legs': b'Number of legs per animal'}
The schemaâs field names.
list
of str
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Get the names of the schemaâs fields:
>>> schema.names ['n_legs', 'animals']
Return deserialized-from-JSON pandas metadata field (if it exists)
Examples
>>> import pyarrow as pa >>> import pandas as pd >>> df = pd.DataFrame({'n_legs': [2, 4, 5, 100], ... 'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]}) >>> schema = pa.Table.from_pandas(df).schema
Select pandas metadata field from Arrow Schema:
>>> schema.pandas_metadata {'index_columns': [{'kind': 'range', 'name': None, 'start': 0, 'stop': 4, 'step': 1}], ...
Remove the field at index i from the schema.
int
Schema
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Remove the second field of the schema:
>>> schema.remove(1) n_legs: int64
Create new schema without metadata, if any
pyarrow.Schema
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) >>> schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal'
Create a new schema with removing the metadata from the original:
>>> schema.remove_metadata() n_legs: int64 animals: string
Write Schema to Buffer as encapsulated IPC message
MemoryPool
, default None
Uses default memory pool if not specified
Buffer
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Write schema to Buffer:
>>> schema.serialize() <pyarrow.Buffer address=0x... size=... is_cpu=True is_mutable=True>
Replace a field at position i in the schema.
int
Field
Schema
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Replace the second field of the schema with a new field âextraâ:
>>> schema.set(1, pa.field('replaced', pa.bool_())) n_legs: int64 replaced: bool
Return human-readable representation of Schema
True
Limit metadata key/value display to a single line of ~80 characters or less
True
Display Field-level KeyValueMetadata
True
Display Schema-level KeyValueMetadata
the
formatted
output
The schemaâs field types.
list
of DataType
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Get the types of the schemaâs fields:
>>> schema.types [DataType(int64), DataType(string)]
Add metadata as dict of string keys and values to Schema
dict
Keys and values must be string-like / coercible to bytes
pyarrow.Schema
Examples
>>> import pyarrow as pa >>> schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())])
Add metadata to existing schema field:
>>> schema.with_metadata({"n_legs": "Number of legs per animal"}) n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal'
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4