Last Updated : 11 Jul, 2025
Pandas set_index() method is used to set one or more columns of a DataFrame as the index. This is useful when we need to modify or add new indices to our data as it enhances data retrieval, indexing and merging tasks. Setting the index is helpful for organizing the data more efficiently, especially when we have meaningful column values that can act as identifiers such as employee names, IDs or dates.
Lets see a basic example:
Here we are using a Employee Dataset which you can download it from here. Let’s first load the Employee Dataset to see how to use set_index().
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
print("Employee Dataset:")
display(data.head(5))
Output:
Employee DatasetNow we are using Pandas DataFrame.set_index() to set a Single Column as Index.
Python
data.set_index("First Name", inplace=True)
print("\nEmployee Dataset with 'First Name' as Index:")
display(data.head(5))
Output:
Index is replaced with the "First Name" columnWe set the "First Name" column as the index which makes it easier to access data by the employee's first name.
Syntax:DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
Parameters:
Return: Return type is a new DataFrame with the specified index, unless inplace=True which modifies the original DataFrame directly.
Now let see some practical examples better understand how to use the Pandas set_index() function.
1. Setting Multiple Columns as Index (MultiIndex)In this example, we set both First Name and Gender as the index columns using the set_index() method with the append and drop parameters. This is useful when we want to organize data by multiple columns.
Python
import pandas as pd
data = pd.read_csv("employees.csv")
data.set_index(["First Name", "Gender"], inplace=True, append=True, drop=False)
data.head()
Output:
Set Multiple Columns as MultiIndex 2. Setting a Float Column as IndexIn some cases, we may want to use numeric or float columns as the index which is useful for datasets with scores or other numeric data that should act as unique identifiers. Here, we set the Agg_Marks (a float column) as the index for a DataFrame containing student data.
Python
import pandas as pd
students = [['jack', 34, 'Sydeny', 'Australia', 85.96],
['Riti', 30, 'Delhi', 'India', 95.20],
['Vansh', 31, 'Delhi', 'India', 85.25],
['Nanyu', 32, 'Tokyo', 'Japan', 74.21],
['Maychan', 16, 'New York', 'US', 99.63],
['Mike', 17, 'Las Vegas', 'US', 47.28]]
df = pd.DataFrame(students, columns=['Name', 'Age', 'City', 'Country', 'Agg_Marks'])
df.set_index('Agg_Marks', inplace=True)
display(df)
Output:
Setting a Float Column as Index 3. Setting Index of Specific Column (with drop=False)By default, set_index() removes the column used as the index. However, if we want to keep the column after it’s set as the index, we can use the drop=False parameter.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.set_index("First Name", drop=False, inplace=True)
print(data.head())
Output:
Using drop=FalseUsing drop=False ensures that the "First Name" column is retained even after it is set as the index.
4. Setting Index Using inplace=TrueWhen we want to modify the original DataFrame directly rather than creating a new DataFrame, we can use inplace=True.
Python
import pandas as pd
data = {'Name': ['Geek1', 'Geek2', 'Geek3'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
df.set_index('Name', inplace=True)
display(df)
Output:
Setting Index Using inplace=TrueWith set_index(), we can easily organize our data, making it simpler to access and analyze, ultimately improving our workflow.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4