Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
When I have a Series
of type ArrowDtype(struct(...))
, I'd like to be able to extract sub-fields from them.
For example, I have a pandas Series
with the ArrowDtype(pyarrow.struct([("int_col", pyarrow.int64()), ("string_col", pyarrow.string())]))
. I'd like to extract just the int_col
field from this Series
as another Series
.
Add a struct
accessor which is accessible from Series
with ArrowDtype(struct(...))
. This struct
accessor provides a field()
method which returns a Series
containing only the specified sub-field.
series = pandas.Series(struct_array, dtype=pandas.ArrowDtype(struct_type)) int_series = series.struct.field("int_col")Alternative Solutions
I can currently do this via pyarrow.compute.struct_field on the underlying pyarrow array:
import pyarrow struct_type = pyarrow.struct([ ("int_col", pyarrow.int64()), ("string_col", pyarrow.string()), ]) struct_array = pyarrow.array([ {"int_col": 1, "string_col": "a"}, {"int_col": 2, "string_col": "b"}, {"int_col": 3, "string_col": "c"}, ], type=struct_type) import pandas series = pandas.Series(struct_array, dtype=pandas.ArrowDtype(struct_type)) int_col_index = struct_array.type.get_field_index("int_col") int_col_series = pandas.Series( pyarrow.compute.struct_field(struct_array, [int_col_index]), dtype=pandas.ArrowDtype(struct_array.type[int_col_index].type))Additional Context
This issue is particularly relevant when working with data sources that support struct fields, such as BigQuery or Parquet.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4