A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.geeksforgeeks.org/machine-learning/ml-mini-batch-gradient-descent-with-python/ below:

ML | Mini-Batch Gradient Descent with Python

ML | Mini-Batch Gradient Descent with Python

Last Updated : 05 Jul, 2025

Gradient Descent is an optimization algorithm in machine learning used to determine the optimal parameters such as weights and bias for models. The idea is to minimize the model's error by iteratively updating the parameters in the direction of the steepest descent as determined by the gradient of the loss function.

Depending on how much data is used to compute the gradient during each update, gradient descent comes in three main variants:

Each variant has its own strengths and trade-offs in terms of speed, stability and convergence behavior.

Convergence in BGD, SGD & MBGD Working of Mini-Batch Gradient Descent

Mini-batch gradient descent is a optimization method that updates model parameters using small subsets of the training data called mini-batches. This technique offers a middle path between the high variance of stochastic gradient descent and the high computational cost of batch gradient descent. They are used to perform each update, making training faster and more memory-efficient. It also helps stabilize convergence and introduces beneficial randomness during learning.

It is often preferred in modern machine learning applications because it combines the benefits of both batch and stochastic approaches.

Key advantages of mini-batch gradient descent:

Algorithm:

Let:

For itr=1,2,3,…,max_iters:

For each mini-batch ( X_{mini} y_{mini} ):

1. Forward Pass on the batch X_mini:

Make predictions on the mini-batch

\hat{y} = f(X_{\text{mini}},\ \theta)

Compute error in predictions J(θ) with the current values of the parameters

J(θ)=L(\hat{y},y_{mini})

2. Backward Pass:

Compute gradient:

\nabla_{\theta} J(\theta) = \frac{\partial J(\theta)}{\partial \theta}

3. Update parameters:

Gradient descent rule: 

\theta = \theta - \eta \nabla_{\theta} J(\theta)

Python Implementation

Here we will use Mini-Batch Gradient Descent for Linear Regression.

1. Importing Libraries

We begin by importing libraries like Numpy and Matplotlib.pyplot

Python
import numpy as np
import matplotlib.pyplot as plt
2. Generating Synthetic 2D Data

Here, we generate 8000 two-dimensional data points sampled from a multivariate normal distribution:

Python
mean = np.array([5.0, 6.0])
cov = np.array([[1.0, 0.95], [0.95, 1.2]])
data = np.random.multivariate_normal(mean, cov, 8000)
3. Visualizing Generated Data Python
plt.scatter(data[:500, 0], data[:500, 1], marker='.')
plt.title("Scatter Plot of First 500 Samples")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()

Output:

4. Splitting Data

We split the data into training and testing sets:

Python
data = np.hstack((np.ones((data.shape[0], 1)), data))  # shape: (8000, 3)

split_factor = 0.90
split = int(split_factor * data.shape[0])

X_train = data[:split, :-1]
y_train = data[:split, -1].reshape((-1, 1))
X_test = data[split:, :-1]
y_test = data[split:, -1].reshape((-1, 1))
5. Displaying Datasets Python
print("Number of examples in training set = %d" % X_train.shape[0])
print("Number of examples in testing set = %d" % X_test.shape[0])

Output:

results 6. Defining Core Functions of Linear Regression Python
# Hypothesis function
def hypothesis(X, theta):
    return np.dot(X, theta)

# Gradient of the cost function
def gradient(X, y, theta):
    h = hypothesis(X, theta)
    grad = np.dot(X.T, (h - y))
    return grad

# Mean squared error cost
def cost(X, y, theta):
    h = hypothesis(X, theta)
    J = np.dot((h - y).T, (h - y)) / 2
    return J[0]
7. Creating Mini-Batches for Training

This function divides the dataset into random mini-batches used during training:

Python
# Create mini-batches from the dataset
def create_mini_batches(X, y, batch_size):
    mini_batches = []
    data = np.hstack((X, y))
    np.random.shuffle(data)
    n_minibatches = data.shape[0] // batch_size
    for i in range(n_minibatches + 1):
        mini_batch = data[i * batch_size:(i + 1) * batch_size, :]
        X_mini = mini_batch[:, :-1]
        Y_mini = mini_batch[:, -1].reshape((-1, 1))
        mini_batches.append((X_mini, Y_mini))
    if data.shape[0] % batch_size != 0:
        mini_batch = data[i * batch_size:]
        X_mini = mini_batch[:, :-1]
        Y_mini = mini_batch[:, -1].reshape((-1, 1))
        mini_batches.append((X_mini, Y_mini))
    return mini_batches
8. Mini-Batch Gradient Descent Function

This function performs mini-batch gradient descent to train the linear regression model:

Python
# Mini-batch gradient descent
def gradientDescent(X, y, learning_rate=0.001, batch_size=32):
    theta = np.zeros((X.shape[1], 1))
    error_list = []
    max_iters = 3

    for itr in range(max_iters):
        mini_batches = create_mini_batches(X, y, batch_size)
        for X_mini, y_mini in mini_batches:
            theta = theta - learning_rate * gradient(X_mini, y_mini, theta)
            error_list.append(cost(X_mini, y_mini, theta))

    return theta, error_list
9. Training and Visualization

The model is trained using gradientDescent() on the training data. After training:

This provides a visual and quantitative insight into how well the mini-batch gradient descent is optimizing the regression model.

Python
theta, error_list = gradientDescent(X_train, y_train)
print("Bias = ", theta[0])
print("Coefficients = ", theta[1:])

# visualising gradient descent
plt.plot(error_list)
plt.xlabel("Number of iterations")
plt.ylabel("Cost")
plt.show()

Output:

Mini-Batch over Regression model 10. Final Prediction and Evaluation

Prediction: The hypothesis() function is used to compute predicted values for the test set.

Visualization:

Evaluation:

Python
# Predicting output for X_test
y_pred = hypothesis(X_test, theta)

# Visualizing predictions vs actual values
plt.scatter(X_test[:, 1], y_test, marker='.', label='Actual')
plt.plot(X_test[:, 1], y_pred, color='orange', label='Predicted')
plt.xlabel("Feature 1")
plt.ylabel("Target")
plt.title("Model Predictions vs Actual Values")
plt.legend()
plt.grid(True)
plt.show()

# Calculating mean absolute error
error = np.sum(np.abs(y_test - y_pred)) / y_test.shape[0]
print("Mean Absolute Error =", error)

Output:

Model prediction and Actual values

The orange line represents the final hypothesis function i.e θ[0] + θ[1] * X_test[:, 1] + θ[2] * X_test[:, 2] = 0

This is the linear equation learned by the model where:

Comparison Between Gradient Descent Variants

Lets see a quick difference between Batch Gradient Descent, Stochastic Gradient Descent (SGD) and Mini-Batch Gradient Descent.

Type Update Strategy Speed & Efficiency Noise in Updates Batch Gradient Descent Updates parameters after computing gradient using the entire training dataset Slow, as it processes the full dataset before each update Smooth and stable Stochastic Gradient Descent (SGD) Updates parameters after computing gradient using one training example Faster updates, but cannot fully utilize vectorized computations Highly noisy updates Mini-Batch Gradient Descent Updates parameters using a small batch (subset) of training examples Efficient; leverages vectorization for faster computation Moderate noise—dependent on batch size

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4