Create a pyarrow.RecordBatch from another Python data structure or sequence of arrays.
dict
, list
, pandas.DataFrame
, Arrow-compatible table
A mapping of strings to Arrays or Python lists, a list of Arrays, a pandas DataFame, or any tabular object implementing the Arrow PyCapsule Protocol (has an __arrow_c_array__
or __arrow_c_device_array__
method).
list
, default None
Column names if list of arrays passed as data. Mutually exclusive with âschemaâ argument.
Schema
, default None
The expected schema of the RecordBatch. If not passed, will be inferred from the data. Mutually exclusive with ânamesâ argument.
dict
or Mapping, default None
Optional metadata for the schema (if schema not passed).
RecordBatch
Examples
>>> import pyarrow as pa >>> n_legs = pa.array([2, 2, 4, 4, 5, 100]) >>> animals = pa.array(["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"]) >>> names = ["n_legs", "animals"]
Construct a RecordBatch from a python dictionary:
>>> pa.record_batch({"n_legs": n_legs, "animals": animals}) pyarrow.RecordBatch n_legs: int64 animals: string ---- n_legs: [2,2,4,4,5,100] animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"] >>> pa.record_batch({"n_legs": n_legs, "animals": animals}).to_pandas() n_legs animals 0 2 Flamingo 1 2 Parrot 2 4 Dog 3 4 Horse 4 5 Brittle stars 5 100 Centipede
Creating a RecordBatch from a list of arrays with names:
>>> pa.record_batch([n_legs, animals], names=names) pyarrow.RecordBatch n_legs: int64 animals: string ---- n_legs: [2,2,4,4,5,100] animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"]
Creating a RecordBatch from a list of arrays with names and metadata:
>>> my_metadata={"n_legs": "How many legs does an animal have?"} >>> pa.record_batch([n_legs, animals], ... names=names, ... metadata = my_metadata) pyarrow.RecordBatch n_legs: int64 animals: string ---- n_legs: [2,2,4,4,5,100] animals: ["Flamingo","Parrot","Dog","Horse","Brittle stars","Centipede"] >>> pa.record_batch([n_legs, animals], ... names=names, ... metadata = my_metadata).schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'How many legs does an animal have?'
Creating a RecordBatch from a pandas DataFrame:
>>> import pandas as pd >>> df = pd.DataFrame({'year': [2020, 2022, 2021, 2022], ... 'month': [3, 5, 7, 9], ... 'day': [1, 5, 9, 13], ... 'n_legs': [2, 4, 5, 100], ... 'animals': ["Flamingo", "Horse", "Brittle stars", "Centipede"]}) >>> pa.record_batch(df) pyarrow.RecordBatch year: int64 month: int64 day: int64 n_legs: int64 animals: string ---- year: [2020,2022,2021,2022] month: [3,5,7,9] day: [1,5,9,13] n_legs: [2,4,5,100] animals: ["Flamingo","Horse","Brittle stars","Centipede"]
>>> pa.record_batch(df).to_pandas() year month day n_legs animals 0 2020 3 1 2 Flamingo 1 2022 5 5 4 Horse 2 2021 7 9 5 Brittle stars 3 2022 9 13 100 Centipede
Creating a RecordBatch from a pandas DataFrame with schema:
>>> my_schema = pa.schema([ ... pa.field('n_legs', pa.int64()), ... pa.field('animals', pa.string())], ... metadata={"n_legs": "Number of legs per animal"}) >>> pa.record_batch(df, my_schema).schema n_legs: int64 animals: string -- schema metadata -- n_legs: 'Number of legs per animal' pandas: ... >>> pa.record_batch(df, my_schema).to_pandas() n_legs animals 0 2 Flamingo 1 4 Horse 2 5 Brittle stars 3 100 Centipede
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4