Last Updated : 23 Jul, 2025
AdaBoost means Adaptive Boosting which is a ensemble learning technique that combines multiple weak classifiers to create a strong classifier. It works by sequentially adding classifiers to correct the errors made by previous models giving more weight to the misclassified data points. In this article we will learn to implement AdaBoost algorithm from scratch.
Boosting AlgorithmsThe above image shows how Boosting works. It starts with training on the original data. After each round more weight is given to misclassified points so the next model focuses on them. This process repeats and in the end all models are combined to make a final, more accurate prediction.
Python provides special packages for applying AdaBoost we will see how we can use Python for applying AdaBoost on a machine learning problem. In this problem we are creating a synthetic dataset to check implement it.
1. Import LibrariesLet's begin with importing important libraries like numpy and scikit learn which will be required to do classification task.
Python
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, roc_auc_score
2. Defining the AdaBoost Class
In this step we define a custom class called AdaBoost
that will implement the AdaBoost algorithm from scratch. This class will handle the entire training process and predictions.
The AdaBoost class is where we define the entire AdaBoost algorithm which consists of:
class AdaBoost:
def __init__(self, n_estimators=50):
self.n_estimators = n_estimators
self.alphas = []
self.models = []
The constructor (__init__) initializes the number of weak models (n_estimators) to a list to store the alphas (self.alphas) and a list to store the weak classifiers (self.models)
3. Training the AdaBoost ModelIn the fit() method we:
def fit(self, X, y):
n_samples, n_features = X.shape
w = np.ones(n_samples) / n_samples
for _ in range(self.n_estimators):
model = DecisionTreeClassifier(max_depth=1)
model.fit(X, y, sample_weight=w)
predictions = model.predict(X)
err = np.sum(w * (predictions != y)) / np.sum(w)
alpha = 0.5 * np.log((1 - err) / (err + 1e-10))
self.models.append(model)
self.alphas.append(alpha)
w *= np.exp(-alpha * y * predictions)
w /= np.sum(w)
4. Defining Predict Method
In the predict() method we combine the predictions of all weak classifiers using their respective alpha values to make the final prediction.
1
(positive class) or -1
(negative class).
def predict(self, X):
strong_preds = np.zeros(X.shape[0])
for model, alpha in zip(self.models, self.alphas):
predictions = model.predict(X)
strong_preds += alpha * predictions
return np.sign(strong_preds).astype(int)
5. Example Usage
We are generating a synthetic dataset with 1000 samples and 20 features. Then, we split the data into training and testing sets. We initialize and train an AdaBoost classifier with 50 estimators. After training, we predict on the test set and evaluate the model using accuracy, precision, recall, F1 score, and ROC-AUC. The ROC-AUC is calculated with a fallback in case probability scores are not provided. Finally, we print all evaluation metrics.
Python
if __name__ == "__main__":
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
adaboost = AdaBoost(n_estimators=50)
adaboost.fit(X_train, y_train)
predictions = adaboost.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y_test, predictions)
recall = recall_score(y_test, predictions)
f1 = f1_score(y_test, predictions)
try:
roc_auc = roc_auc_score(y_test, predictions)
except ValueError:
roc_auc = 'Undefined (requires probability scores)'
print(f"Accuracy: {accuracy * 100}%")
print(f"Precision: {precision}")
print(f"Recall: {recall}")
print(f"F1 Score: {f1}")
print(f"ROC-AUC: {roc_auc}")
Output:
Model performanceThe model performs well with:
Overall these metrics indicate good performance.
Implementing the AdaBoost Algorithm From Scratch
Visit CourseRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4