RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/koaning/scikit-partial below:

koaning/scikit-partial: Pipeline components that support partial_fit.

Pipeline components that support partial_fit.

The goal of scikit-partial is to offer a pipeline that can run partial_fit. This allows of online learning on an entire pipeline.

You can install everything with pip:

python -m pip install --upgrade pip
python -m pip install scikit-partial

Assuming that you use a stateless featurizer in your pipeline, such as HashingVectorizer or language models from whatlies, you choose to pre-train your scikit-learn model beforehand and fine-tune it later using models that offer the .partial_fit()-api. If you're unfamiliar with this api, you might appreciate this course on calmcode.

import pandas as pd
from sklearn.linear_model import SGDClassifier
from sklearn.feature_extraction.text import HashingVectorizer

from skpartial.pipeline import make_partial_pipeline

url = "https://raw.githubusercontent.com/koaning/icepickle/main/datasets/imdb_subset.csv"
df = pd.read_csv(url)
X, y = list(df['text']), df['label']

# Construct a pipeline with components that are `.partial_fit()` compatible
pipe = make_partial_pipeline(HashingVectorizer(), SGDClassifier(loss="log"))

# Run the learning algorithm on batches of data
for i in range(10):
    # We could also do a whole bunch of data augmentation here!
    pipe.partial_fit(X, y, classes=[0, 1])

When is this pattern useful? Let's consider spelling errors. Suppose that we'd like our algorithm to be robust against typos. Then we can simulate typos on our X inside of our learning loop.

The following pipeline components are added.

from skpartial.pipeline import (
    PartialPipeline,
    PartialFeatureUnion,
    make_partial_pipeline,
    make_partial_union,
)

These tools allow you to declare pipelines that support .partial_fit(). Note that components used in these pipelines all need to have .partial_fit() implemented.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4