Visualizing data has become a critical skill for many professionals, especially in the field of data analysis. In this article, we’ll explore how to visualize data using Python’s powerful pandas
library.
Start by obtaining the dataset we’ll be working with. This dataset, which can be fetched from depaul.edu, consists of information about US presidents, their associated political parties, professions, and other related data.
If you’re keen to dive deeper into data analysis with pandas, consider enrolling in this Data Analysis with Python Pandas course.
Plotting Data with PandasOne of the most exciting features of pandas is its plotting capabilities. With a few lines of code, you can visualize data from large excel files. For instance, if we focus on the “Occupation” column for this demonstration:
1
df['Occupation']
The entire code to visualize the occupation distribution among the presidents is as follows:
1
2
3
4
5
6
7
8
9
10
11
from pandas import DataFrame, read_csvData Cleaning and Plotting
import matplotlib.pyplot as plt
import pandas as pd
file = r'data/Presidents.xls'
df = pd.read_excel(file)
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral','red','green','blue','orange','white','brown']
df['Occupation'].value_counts().plot(kind='pie',title='Occupation by President',colors=colors)
plt.show()
Data visualization often requires clean data. So, before we visualize the popularity of each president, let’s first clean our dataset. Here’s an example of what the raw data looks like:
As observed, some cells do not have numerical values, and it’s best practice to either remove or replace them. For the sake of this tutorial, we’ll opt for removal:
1
df = df[df['% popular'] != 'NA()']
Now, let’s plot the popularity:
1
2
3
4
5
6
7
8
9
10
11
12
13
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
file = r'data/Presidents.xls'
df = pd.read_excel(file)
df = df[df['% popular'] != 'NA()']
df['% popular'].plot(kind='hist', bins=8, title='Popularity by President', facecolor='blue', alpha=0.5, normed=1)
plt.show()
With the ease of Python’s pandas library, visualizing data has never been more straightforward. Ensure to explore more functionalities and improve your data analysis skills.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4