Bases: object
A grouping of columns in a table on which to perform aggregations.
pyarrow.Table
Input table to execute the aggregation on.
str
or list
[str
]
Name of the grouped columns.
True
Whether to use multithreading or not. When set to True (the default), no stable ordering of the output is guaranteed.
Examples
>>> import pyarrow as pa >>> t = pa.table([ ... pa.array(["a", "a", "b", "b", "c"]), ... pa.array([1, 2, 3, 4, 5]), ... ], names=["keys", "values"])
Grouping of columns:
>>> pa.TableGroupBy(t,"keys") <pyarrow.lib.TableGroupBy object at ...>
Perform aggregations:
>>> pa.TableGroupBy(t,"keys").aggregate([("values", "sum")]) pyarrow.Table keys: string values_sum: int64 ---- keys: [["a","b","c"]] values_sum: [[3,7,5]]
Methods
Perform an aggregation over the grouped columns of the table.
list
[tuple
(str
, str
)] or list
[tuple
(str
, str
, FunctionOptions
)]
List of tuples, where each tuple is one aggregation specification and consists of: aggregation column name followed by function name and optionally aggregation function option. Pass empty list to get a single row for each group. The column name can be a string, an empty list or a list of column names, for unary, nullary and n-ary aggregation functions respectively.
For the list of function names and respective aggregation function options see Grouped Aggregations.
Table
Results of the aggregation functions.
Examples
>>> import pyarrow as pa >>> t = pa.table([ ... pa.array(["a", "a", "b", "b", "c"]), ... pa.array([1, 2, 3, 4, 5]), ... ], names=["keys", "values"])
Sum the column âvaluesâ over the grouped column âkeysâ:
>>> t.group_by("keys").aggregate([("values", "sum")]) pyarrow.Table keys: string values_sum: int64 ---- keys: [["a","b","c"]] values_sum: [[3,7,5]]
Count the rows over the grouped column âkeysâ:
>>> t.group_by("keys").aggregate([([], "count_all")]) pyarrow.Table keys: string count_all: int64 ---- keys: [["a","b","c"]] count_all: [[2,2,1]]
Do multiple aggregations:
>>> t.group_by("keys").aggregate([ ... ("values", "sum"), ... ("keys", "count") ... ]) pyarrow.Table keys: string values_sum: int64 keys_count: int64 ---- keys: [["a","b","c"]] values_sum: [[3,7,5]] keys_count: [[2,2,1]]
Count the number of non-null values for column âvaluesâ over the grouped column âkeysâ:
>>> import pyarrow.compute as pc >>> t.group_by(["keys"]).aggregate([ ... ("values", "count", pc.CountOptions(mode="only_valid")) ... ]) pyarrow.Table keys: string values_count: int64 ---- keys: [["a","b","c"]] values_count: [[2,2,1]]
Get a single row for each group in column âkeysâ:
>>> t.group_by("keys").aggregate([]) pyarrow.Table keys: string ---- keys: [["a","b","c"]]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4