The Pandas groupby() method in Python is a powerful tool for data aggregation and analysis. It splits the data into groups, applies a function to each group, and combines the results. This method is essential for data analysis tasks like aggregation, transformations and filtration.
The Pandas groupby() method can be used on both Pandas Series and DataFrame objects, including those with hierarchical indexes. This method is designed to −
Split data into groups based on specified criteria.
Apply a function to each group independently.
Combine the results into a structured format.
Following is the syntax of the Python Pandas groupby() method
Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=<no_default>, dropna=True)Parameters
The Python Pandas groupby() method accepts the below parameters −
by: Used to define how to group data. It can be a function, label, Series, or list of labels.
axis: Determines grouping by rows (0) or columns (1).
level: Groups by specific levels of a MultiIndex.
as_index: If True, group labels are used as the index in the result. If False, returns the result with the original index.
sort: Sort group keys (default is True).
group_keys: Adds group keys to the index when applying functions. OR Adds group keys to the result if True.
observed: If True, shows only observed categories when grouping the categorical data.
dropna: If True, excludes NA values from group keys.
The Pandas groupby() method returns a special object depending on the input type. This object is either pandas.api.typing.DataFrameGroupBy or pandas.api.typing.SeriesGroupBy, representing grouped data for further operations.
Example: Grouping a Series by Index LabelsThis example demonstrates the basic functionality of the Series.groupby() method by grouping a Pandas Series using index labels.
import pandas as pd s = pd.Series([1000, 1400, 1000, 900, 1700], index=['BMW', 'Audi', 'Mercedes', 'Audi', 'BMW'], name='Car') # Display the Input Series print("Original Series:") print(s) # Grouping the Series by Index Labels result = s.groupby(level=0).sum() print("\nSeries after Grouping:") print(result)
When we run above program, it produces following result −
Original Series: BMW 1000 Audi 1400 Mercedes 1000 Audi 900 BMW 1700 Name: Car, dtype: int64 Series after Grouping: Audi 2300 BMW 2700 Mercedes 1000 Name: Car, dtype: int64Example: Grouping a DataFrame Column
The following example demonstrates using the Pandas groupby() method for grouping the DataFrame column.
import pandas as pd # Create a DataFrame df = pd.DataFrame({'Car':['BMW', 'Audi', 'Mercedes', 'Audi', 'BMW'], 'Price':[1000, 1400, 1000, 900, 1700]}) # Display the Input DataFrame print("Input DataFrame:") print(df) # Grouping a DataFrame Column result = df.groupby("Car").mean() print("\nDataFrame after Grouping Based on a Column:") print(result)
While executing the above code we get the following output −
Input DataFrame:Car Price 0 BMW 1000 1 Audi 1400 2 Mercedes 1000 3 Audi 900 4 BMW 1700
DataFrame after Grouping Based on a Column:
Price Car Audi 1150.0 BMW 1350.0 Mercedes 1000.0 Example: Grouping while Handling Missing ValuesHandling missing values is a easy task while grouping the Pandas objects using the dropna parameter. The following example sets the dropna=False for including NA values as a separate group.
import pandas as pd import numpy as np # Create a DataFrame df = pd.DataFrame({'Car':['BMW', 'Audi', np.nan, 'Audi', 'BMW'], 'Price':[1000, 1400, 1000, 900, 1700]}) # Display the Input DataFrame print("Input DataFrame:") print(df) # Including NA as a separate group result = df.groupby("Car", dropna=False).sum() print("\nDataFrame after Grouping:") print(result)
Following is an output of the above code −
Input DataFrame:Car Price 0 BMW 1000 1 Audi 1400 2 NaN 1000 3 Audi 900 4 BMW 1700
DataFrame after Grouping:
Price Car Audi 2300 BMW 2700 Nan 1000 Example: Grouping by Multiple ColumnsThis example demonstrates grouping a Pandas DataFrame by multiple columns.
import pandas as pd import numpy as np # Create a DataFrame df = pd.DataFrame({'Car':['BMW', 'Audi', np.nan, 'Audi', 'BMW'], 'Price':[1000, 1400, 1000, 900, 1700], 'color': ['white', 'black', 'red', 'red', 'white']}) # Display the Input DataFrame print("Input DataFrame:") print(df) # Grouping a DataFrame by multiple columns result = df.groupby(["Car", "color"], dropna=False).sum() print("\nDataFrame after Grouping Based on Multiple Column:") print(result)
When we run above program, it produces following result −
Input DataFrame:Car Price color 0 BMW 1000 white 1 Audi 1400 black 2 NaN 1000 red 3 Audi 900 red 4 BMW 1700 white
DataFrame after Grouping Based on Multiple Column:
Price Car color Audi black 1400 red 900 BMW white 2700 NaN red 1000 Example: Grouping with Hierarchical IndexesGrouping of a hierarchical index can be done by using the level parameter of the groupby() method. following example demonstrates the same.
import pandas as pd import numpy as np # Create a DataFrame data = [['BMW', 'BMW', 'Audi', 'Audi'], ['white', 'black', 'black', 'white']] # Create a MultiIndex object index = pd.MultiIndex.from_arrays(data, names=("car", "color")) # Creating a MultiIndexed Series df = pd.DataFrame({'Price': [1000, 1400, 1000, 900]}, index=index) # Display the input MultiIndexed DataFrame print("Input MultiIndexed DataFrame:\n") print(df) # Grouping MultiIndexed by level name result = df.groupby("car").sum() print("\nMultiIndexed DataFrame after Grouping:") print(result)
Following is an output of the above code −
Input MultiIndexed DataFrame:Price car color BMW white 1000 black 1400 Audi black 1000 white 900
MultiIndexed DataFrame after Grouping:
Price car Audi 1900 BMW 2400python_pandas_groupby.htm
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4