RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://arrow.apache.org/docs/python/generated/pyarrow.TableGroupBy.html below:

pyarrow.TableGroupBy — Apache Arrow v21.0.0

pyarrow.TableGroupBy#

class pyarrow.TableGroupBy(table, keys, use_threads=True)#

Bases: object

A grouping of columns in a table on which to perform aggregations.

Parameters:

tablepyarrow.Table: Input table to execute the aggregation on.
keysstr or list[str]: Name of the grouped columns.
use_threadsbool, default True: Whether to use multithreading or not. When set to True (the default), no stable ordering of the output is guaranteed.

Examples

>>> import pyarrow as pa
>>> t = pa.table([
...       pa.array(["a", "a", "b", "b", "c"]),
...       pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])

Grouping of columns:

>>> pa.TableGroupBy(t,"keys")
<pyarrow.lib.TableGroupBy object at ...>

Perform aggregations:

>>> pa.TableGroupBy(t,"keys").aggregate([("values", "sum")])
pyarrow.Table
keys: string
values_sum: int64
----
keys: [["a","b","c"]]
values_sum: [[3,7,5]]

__init__(self, table, keys, use_threads=True)#

Methods

aggregate(self, aggregations)#

Perform an aggregation over the grouped columns of the table.

Parameters:

aggregationslist[tuple(str, str)] or list[tuple(str, str, FunctionOptions)]

List of tuples, where each tuple is one aggregation specification and consists of: aggregation column name followed by function name and optionally aggregation function option. Pass empty list to get a single row for each group. The column name can be a string, an empty list or a list of column names, for unary, nullary and n-ary aggregation functions respectively.

For the list of function names and respective aggregation function options see Grouped Aggregations.

Returns:

Table: Results of the aggregation functions.

Examples

>>> import pyarrow as pa
>>> t = pa.table([
...       pa.array(["a", "a", "b", "b", "c"]),
...       pa.array([1, 2, 3, 4, 5]),
... ], names=["keys", "values"])

Sum the column âvaluesâ over the grouped column âkeysâ:

>>> t.group_by("keys").aggregate([("values", "sum")])
pyarrow.Table
keys: string
values_sum: int64
----
keys: [["a","b","c"]]
values_sum: [[3,7,5]]

Count the rows over the grouped column âkeysâ:

>>> t.group_by("keys").aggregate([([], "count_all")])
pyarrow.Table
keys: string
count_all: int64
----
keys: [["a","b","c"]]
count_all: [[2,2,1]]

Do multiple aggregations:

>>> t.group_by("keys").aggregate([
...    ("values", "sum"),
...    ("keys", "count")
... ])
pyarrow.Table
keys: string
values_sum: int64
keys_count: int64
----
keys: [["a","b","c"]]
values_sum: [[3,7,5]]
keys_count: [[2,2,1]]

Count the number of non-null values for column âvaluesâ over the grouped column âkeysâ:

>>> import pyarrow.compute as pc
>>> t.group_by(["keys"]).aggregate([
...    ("values", "count", pc.CountOptions(mode="only_valid"))
... ])
pyarrow.Table
keys: string
values_count: int64
----
keys: [["a","b","c"]]
values_count: [[2,2,1]]

Get a single row for each group in column âkeysâ:

>>> t.group_by("keys").aggregate([])
pyarrow.Table
keys: string
----
keys: [["a","b","c"]]

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4