Last Updated : 28 Jul, 2025
Time series data consists of sequential data points recorded over time which is used in industries like finance, pharmaceuticals, social media and research. Analyzing and visualizing this data helps us to find trends and seasonal patterns for forecasting and decision-making. In this article, we will see more about Time Series Analysis and Visualization in depth.
What is Time Series Data Analysis?Time series data analysis involves studying data points collected in chronological time order to identify current trends, patterns and other behaviors. This helps extract actionable insights and supports accurate forecasting and decision-making.
Key Concepts in Time Series AnalysisTime series data can be classified into two sections:
Lets implement this step by step:
Step 1: Installing and Importing LibrariesWe will be using the stock dataset which you can download from here.
We will be using Numpy, Pandas, seaborn and Matplotlib libraries.
Python
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.stattools import adfuller
Step 2: Loading the Dataset
Here we will load the dataset and use the parse_dates parameter to convert the Date column to the DatetimeIndex format.
Python
df = pd.read_csv("/content/stock_data.csv",
parse_dates=True,
index_col="Date")
df.head()
Output:
Dataset Step 3: Cleaning of DataWe will drop columns from the dataset that are not important for our visualization.
Python
df.drop(columns='Unnamed: 0', inplace =True)
df.head()
Output:
Drop columns Step 4: Plotting High Stock PricesSince the volume column is of continuous data type we will use line graph to visualize it.
sns.set(style="whitegrid")
plt.figure(figsize=(12, 6))
sns.lineplot(data=df, x='Date', y='High', label='High Price', color='blue')
plt.xlabel('Date')
plt.ylabel('High')
plt.title('Share Highest Price Over Time')
plt.show()
Output:
Line plot for Time Series data Step 5: Resampling DataTo better understand the trend of the data we will use the resampling method which provide a clearer view of trends and patterns when we are dealing with daily data.
df_resampled = df.resample('ME').mean(numeric_only=True)
sns.set(style="whitegrid")
plt.figure(figsize=(12, 6))
sns.lineplot(data=df_resampled, x=df_resampled.index, y='High', label='Month Wise Average High Price', color='blue')
plt.xlabel('Date (Monthly)')
plt.ylabel('High')
plt.title('Monthly Resampling Highest Price Over Time')
plt.show()
Output:
Monthly resampling Step 6: Detecting Seasonality with AutocorrelationWe will detect Seasonality using the autocorrelation function (ACF) plot. Peaks at regular intervals in the ACF plot suggest the presence of seasonality.
Python
if 'Date' not in df.columns:
print("'Date' is already the index or not present in the DataFrame.")
else:
df.set_index('Date', inplace=True)
plt.figure(figsize=(12, 6))
plot_acf(df['High'], lags=40)
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')
plt.title('Autocorrelation Function (ACF) Plot')
plt.show()
Output:
Seasonality with Autocorrelation Step 7: Testing Stationarity with ADF testWe will perform the ADF test to formally test for stationarity.
Python
from statsmodels.tsa.stattools import adfuller
result = adfuller(df['High'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:', result[4])
Output:
Detecting StationarityDifferencing involves subtracting the previous observation from the current observation to remove trends or seasonality.
Python
df['high_diff'] = df['High'].diff()
plt.figure(figsize=(12, 6))
plt.plot(df['High'], label='Original High', color='blue')
plt.plot(df['high_diff'], label='Differenced High', linestyle='--', color='green')
plt.legend()
plt.title('Original vs Differenced High')
plt.show()
Output:
Smoothening the data Step 9: Smoothing Data with Moving Averagedf['High'].diff(): helps in calculating the difference between consecutive values in the High column. This differencing operation is used to transform a time series into a new series that represents the changes between consecutive observations.
Python
window_size = 120
df['high_smoothed'] = df['High'].rolling(window=window_size).mean()
plt.figure(figsize=(12, 6))
plt.plot(df['High'], label='Original High', color='blue')
plt.plot(df['high_smoothed'], label=f'Moving Average (Window={window_size})', linestyle='--', color='orange')
plt.xlabel('Date')
plt.ylabel('High')
plt.title('Original vs Moving Average')
plt.legend()
plt.show()
Output:
Original vs Moving AverageThis calculates the moving average of the High column with a window size of 120(A quarter), creating a smoother curve in the high_smoothed series. The plot compares the original High values with the smoothed version.
Step 10: Original Data Vs Differenced DataPrinting the original and differenced data side by side we get:
Python
df_combined = pd.concat([df['High'], df['high_diff']], axis=1)
print(df_combined.head())
Output:
Original Data Vs Differenced DataHence the high_diff column represents the differences between consecutive high values. The first value of high_diff is NaN because there is no previous value to calculate the difference.
As there is a NaN value we will drop that proceed with our test:
Python
df.dropna(subset=['high_diff'], inplace=True)
df['high_diff'].head()
Output:
Differences between consecutive high valuesAfter that if we conduct the ADF test:
Python
from statsmodels.tsa.stattools import adfuller
result = adfuller(df['high_diff'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:', result[4])
Output:
ADF testBased on the ADF Statistic we reject the null hypothesis and conclude that we have enough evidence to reject the null hypothesis.
You can download the source code from here.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4