I am using groupby().apply() to compute new columns in a dataframe, for example like this:
df1 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19}, {"val1":2, "val2": 27}, {"val1":2, "val2": 12}]) def func(dataf): return dataf["val2"] - dataf["val2"].mean() print type(df1.groupby("val1").apply(func)) # this is a Series df1["centered"] = df1.groupby("val1").apply(func) print df1
However, if the set of values of the grouped by column ("val1") is unique, the groupby above returns a dataframe as opposed to a Serie, in which case the assignment of the result to a column fails:
df2 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19}, {"val1":1, "val2": 27}, {"val1":1, "val2": 12}]) def func(dataf): return dataf["val2"] - dataf["val2"].mean() print type(df2.groupby("val1").apply(func)) # this is a DataFrame df2["centered"] = df2.groupby("val1").apply(func) # this fails: cannot assign a DataFrame to a column print df2
As a result, my code is littered by check of uniqueness of grouped by parameter:
if len(dataframe["val1"].unique()) == 1: df2["centered"] = func(df2) else: df2["centered"] = df2.groupby("val1").apply(func)
Am I taking the wrong approach there? If not, would it be possible to have a consistent returned type?
Thanks!
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4