Last Updated : 11 Jul, 2025
Pandas is widely used library in Python used for tasks like cleaning, analyzing and transforming data. One important part of cleaning data is identifying and handling duplicate rows which can lead to incorrect results if left unchecked.
The duplicated() method in Pandas helps us to find these duplicates in our data quickly and returns True for duplicates and False for unique rows. It's a simple way to clean up our dataset before going into analysis. In this article, we'll see how the duplicated() method works with easy examples.
Lets see an example:
Python
import pandas as pd
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Alice', 'Charlie'],
'Age': [25, 32, 25, 37]
})
duplicates = df[df.duplicated()]
print(duplicates)
Output:
SyntaxName Age
2 Alice 25
DataFrame.duplicated(subset=None, keep='first')
Parameters:
1. subset: (Optional) Specifies which columns to check for duplicates. By default, it checks all columns.
2. keep: Finds which duplicates to mark as True:
Returns: A Boolean series where each value corresponds to whether the row is a duplicate (True) or unique (False).
Let's look at some examples of the duplicated method in Pandas library used to identify duplicated rows in a DataFrame. Here we will be using custom dataset you can download it by clicking Here.
Example 1: Returning a Boolean SeriesIn this example we will identify duplicate values in the First Name column using the default keep='first' parameter.. This keeps the first occurrence of each duplicate and marks the rest as duplicates.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace = True)
bool_series = data["First Name"].duplicated()
data.head()
data[bool_series]
Output:
Output Example 2: Removing duplicatesIn this example we'll remove all duplicates from the DataFrame. By setting keep=False we remove every instance of a duplicate.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace = True)
bool_series = data["First Name"].duplicated(keep = False)
bool_series
data = data[~bool_series]
data.info()
data
Output:
Output Example 3: Keeping the Last Occurrence of DuplicatesIn this example, we will keep the last occurrence of each duplicate and mark the rest as duplicates. This is done using the keep='last' arguments.
Python
import pandas as pd
data = pd.read_csv("/content/employees.csv")
data.sort_values("First Name", inplace=True)
bool_series_last = data["First Name"].duplicated(keep='last')
data_last = data[~bool_series_last]
data_last.info()
print(data_last)
Output:
OutputMastering the use of the duplicated() method in Pandas helps in effective data cleaning, helping us manage duplicate entries and retain only the unique, meaningful data for analysis.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4