A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pandas-dev/pandas/issues/6124 below:

GitHub ยท Where software is built

See #6068

Use case

Facilitate DataFrame group/apply transformations when using a function that returns a Series. Right now, if we perform the following:

import pandas
df = pandas.DataFrame(
        {'a':  [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
         'b':  [0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1],
         'c':  [1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0],
         'd':  [0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1],
         })

def count_values(df):
    return pandas.Series({'count': df['b'].sum(), 'mean': df['c'].mean()}, name='metrics')

result = df.groupby('a').apply(count_values)
print result.stack().reset_index()

We get the following output:

   a level_1    0
0  0   count  2.0
1  0    mean  0.5
2  1   count  2.0
3  1    mean  0.5
4  2   count  2.0
5  2    mean  0.5

[6 rows x 3 columns]

Ideally, the series name should be preserved and propagated through these operations such that we get the following output:

   a metrics    0
0  0   count  2.0
1  0    mean  0.5
2  1   count  2.0
3  1    mean  0.5
4  2   count  2.0
5  2    mean  0.5

[6 rows x 3 columns]

The only way to achieve this (currently) is:

result = df.groupby('a').apply(count_values)
result.columns.name = 'metrics'
print result.stack().reset_index()

However, the key issue here is 1) this adds an extra line of code and 2) the name of the series created in the applied function may not be known in the outside block (so we can't properly fix the result.columns.name attribute).

The other work-around is to name the index of the series:

def count_values(df):
    series = pandas.Series({'count': df['b'].sum(), 'mean': df['c'].mean()})
    series.index.name = 'metrics'
    return series

During the group/apply operation, one approach is to check to see whether series.index has the name attribute set. If the name attribute is not set, it will set the index.name attribute to the name of the series (thus ensuring the name propagates).


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4