Pandas is a powerful tool for data analysis in Python, with two primary structures: Series and DataFrames. By understanding these structures, you’ll be better equipped to manipulate, analyze, and visualize data efficiently.
Pandas Series
A Series is a one-dimensional labeled array capable of holding data of various types, such as strings, numbers, and Python objects.
Example of a Series holding characters:
1
2
import pandas as pd
s = pd.Series(['a', 'b', 'c'])
Storing integers in a Series:
1
s = pd.Series([1, 2, 3, 4, 5])
Series can also encompass dictionaries:
1
2
countries_population = {'Netherlands': 17, 'US': 318, 'Canada': 35, 'France': 66, 'UK': 64}
population = pd.Series(countries_population)
Accessing values in a Series:
1
population['US']
Retrieving a subset of the Series:
1
population[['US', 'Canada', 'UK']]
Applying operations to filter the Series:
1
populous_countries = population[population > 60]
Pandas DataFrames
A DataFrame is a two-dimensional labeled data structure. Think of it as a table or a spreadsheet. DataFrames are more versatile than Series and are extensively used in Pandas.
1
2
3
4
5
data = {
'name': ['Bob', 'Bart', 'Bobby'],
'occupation': ['Lawyer', 'Programmer', 'Teacher']
}
frame = pd.DataFrame(data, columns=['name', 'occupation'])
DataFrames support numerous operations:
DataFrame Indexing & Selection
DataFrames allow fine-grained access to rows and columns.
Selecting a column:
1
2
names = dff['name']
occupations = dff['occupation']
Accessing specific rows using index:
1
2
first_row = dff.iloc[0]
second_row = dff.iloc[1]
Slicing rows for a subset of the DataFrame:
1
subset = dff[0:2]
Arithmetic Operations on Data
Both Series and DataFrames support arithmetic operations. Scalars can be applied to modify the data.
Applying scalars to a Series:
1
2
3
numbers = pd.Series([1, 2, 3, 4, 5])
doubled_numbers = numbers * 2
squared_numbers = doubled_numbers * doubled_numbers
Applying scalars to a DataFrame:
1
2
3
import numpy as np
random_data = pd.DataFrame(np.random.randint(0, 5, size=(5, 4)), columns=list('ABCD'))
doubled_data = random_data * 2
By mastering the concepts of Series and DataFrames, you’ll be well on your way to becoming proficient in data analysis using Pandas. Remember, practice is key, so keep experimenting and exploring these data structures!
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4