Given the following dataframe
In [31]: rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2,
'B': rand.randn(6),
'C': rand.rand(6) > .5})
In [32]: df
Out[32]: A B C
0 foo 1.624345 False
1 bar -0.611756 True
2 baz -0.528172 False
3 foo -1.072969 True
4 bar 0.865408 False
5 baz -2.301539 True
I would like to sort it in groups (A
) by the aggregated sum of B
, and then by the value in C
(not aggregated). So basically get the order of the A
groups with
In [28]: df.groupby('A').sum().sort('B')
Out[28]: B C
A
baz -2.829710 1
bar 0.253651 1
foo 0.551377 1
And then by True/False, so that it ultimately looks like this:
In [30]: df.ix[[5, 2, 1, 4, 3, 0]]
Out[30]: A B C
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False
How can this be done?
asked Feb 18, 2013 at 16:55
beardcbeardc21.2k1919 gold badges8080 silver badges9797 bronze badges
Groupby A:
In [0]: grp = df.groupby('A')
Within each group, sum over B and broadcast the values using transform. Then sort by B:
In [1]: grp[['B']].transform(sum).sort('B')
Out[1]:
B
2 -2.829710
5 -2.829710
1 0.253651
4 0.253651
0 0.551377
3 0.551377
Index the original df by passing the index from above. This will re-order the A values by the aggregate sum of the B values:
In [2]: sort1 = df.ix[grp[['B']].transform(sum).sort('B').index]
In [3]: sort1
Out[3]:
A B C
2 baz -0.528172 False
5 baz -2.301539 True
1 bar -0.611756 True
4 bar 0.865408 False
0 foo 1.624345 False
3 foo -1.072969 True
Finally, sort the 'C' values within groups of 'A' using the sort=False
option to preserve the A sort order from step 1:
In [4]: f = lambda x: x.sort('C', ascending=False)
In [5]: sort2 = sort1.groupby('A', sort=False).apply(f)
In [6]: sort2
Out[6]:
A B C
A
baz 5 baz -2.301539 True
2 baz -0.528172 False
bar 1 bar -0.611756 True
4 bar 0.865408 False
foo 3 foo -1.072969 True
0 foo 1.624345 False
Clean up the df index by using reset_index
with drop=True
:
In [7]: sort2.reset_index(0, drop=True)
Out[7]:
A B C
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False
answered Feb 18, 2013 at 22:11
Zelazny7Zelazny740.7k1818 gold badges7171 silver badges8585 bronze badges
4Here's a more concise approach...
df['a_bsum'] = df.groupby('A')['B'].transform(sum)
df.sort(['a_bsum','C'], ascending=[True, False]).drop('a_bsum', axis=1)
The first line adds a column to the data frame with the groupwise sum. The second line performs the sort and then removes the extra column.
Result:
A B C
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False
NOTE: sort
is deprecated, use sort_values
instead
15.6k55 gold badges4242 silver badges5050 bronze badges
answered May 14, 2013 at 14:03
Mark ByersMark Byers842k202202 gold badges1.6k1.6k silver badges1.5k1.5k bronze badges
2One way to do this is to insert a dummy column with the sums in order to sort:
In [10]: sum_B_over_A = df.groupby('A').sum().B
In [11]: sum_B_over_A
Out[11]:
A
bar 0.253652
baz -2.829711
foo 0.551376
Name: B
in [12]: df['sum_B_over_A'] = df.A.apply(sum_B_over_A.get_value)
In [13]: df
Out[13]:
A B C sum_B_over_A
0 foo 1.624345 False 0.551376
1 bar -0.611756 True 0.253652
2 baz -0.528172 False -2.829711
3 foo -1.072969 True 0.551376
4 bar 0.865408 False 0.253652
5 baz -2.301539 True -2.829711
In [14]: df.sort(['sum_B_over_A', 'A', 'B'])
Out[14]:
A B C sum_B_over_A
5 baz -2.301539 True -2.829711
2 baz -0.528172 False -2.829711
1 bar -0.611756 True 0.253652
4 bar 0.865408 False 0.253652
3 foo -1.072969 True 0.551376
0 foo 1.624345 False 0.551376
and maybe you would drop the dummy row:
In [15]: df.sort(['sum_B_over_A', 'A', 'B']).drop('sum_B_over_A', axis=1)
Out[15]:
A B C
5 baz -2.301539 True
2 baz -0.528172 False
1 bar -0.611756 True
4 bar 0.865408 False
3 foo -1.072969 True
0 foo 1.624345 False
answered Feb 18, 2013 at 18:06
Andy HaydenAndy Hayden377k110110 gold badges639639 silver badges543543 bronze badges
5The question is difficult to understand. However, group by A and sum by B then sort values descending. The column A sort order depends on B. You can then use filtering to create a new dataframe filter by A values order the dataframe.
rand = np.random.RandomState(1)
df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2,
'B': rand.randn(6),
'C': rand.rand(6) > .5})
grouped=df.groupby('A')['B'].sum().sort_values(ascending=False)
print(grouped)
print(grouped.index.get_level_values(0))
Output:
A
foo 0.551377
bar 0.253651
baz -2.829710
answered Jul 12, 2021 at 14:58
Start asking to get answers
Find the answer to your question by asking.
Ask questionExplore related questions
See similar questions with these tags.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4