A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://scikit-adaptation.github.io/auto_examples/plot_how_to_use_skada.html below:

Website Navigation


How to use SKADA — SKADA : Scikit Adaptation

How to use SKADA

This is a short example to get started with SKADA and perform domain adaptation on a simple dataset. It illustrates the API choice specific to DA.

# Author: Remi Flamary
#
# License: BSD 3-Clause
# sphinx_gallery_thumbnail_number = 1
DA dataset

We generate a simple 2D DA dataset. Note that DA datasets provided by SKADA are organized as follows:

# Get DA dataset
X, y, sample_domain = make_shifted_datasets(
    20, 20, shift="concept_drift", random_state=42
)

# split source and target for visualization
Xs, Xt, ys, yt = source_target_split(X, y, sample_domain=sample_domain)
sample_domain_s = np.ones(Xs.shape[0])
sample_domain_t = -np.ones(Xt.shape[0]) * 2

# plot data
plt.figure(1, (10, 5))

plt.subplot(1, 2, 1)
plt.scatter(Xs[:, 0], Xs[:, 1], c=ys, cmap="tab10", vmax=9, label="Source")
plt.title("Source data")
ax = plt.axis()

plt.subplot(1, 2, 2)
plt.scatter(Xt[:, 0], Xt[:, 1], c=yt, cmap="tab10", vmax=9, label="Target")
plt.axis(ax)
plt.title("Target data")
Text(0.5, 1.0, 'Target data')
DA Classifier estimator

SKADA estimators are used like scikit-learn estimators. The only difference is that the sample_domain array must be passed by name when fitting the estimator.

Accuracy on source: 0.84375
Accuracy on target: 1.0
DA estimator in a pipeline DA Adapter pipeline

Several SKADA estimators include a data adapter that transforms the input data so that a scikit-learn estimator can be used. For those methods, SKADA provides a Adapter class that can be used in a DA pipeline from make_da_pipeline.

Here is an example with the CORAL and GaussianReweight adapters.

#   Note that as illustrated below for reweighting adapters, one needs a
#   subsequent estimator that takes :code:`sample_weight` as an input parameter.
#   This can be done using the :code:`set_fit_request` method of the estimator
#   by calling :code:`.set_fit_request(sample_weight=True)`.
#   If the estimator (for pipeline or DA estimator) does not
#   require sample weights, the DA pipeline will raise an error.


# create a DA pipeline with CORAL adapter
pipe = make_da_pipeline(StandardScaler(), CORALAdapter(), SVC())
pipe.fit(X, y, sample_domain=sample_domain)

print("Accuracy on target:", pipe.score(Xt, yt))

# create a DA pipeline with GaussianReweight adapter (does not work well on
# concept drift).
pipe = make_da_pipeline(
    StandardScaler(),
    GaussianReweightAdapter(),
    LogisticRegression().set_fit_request(sample_weight=True),
)
pipe.fit(X, y, sample_domain=sample_domain)

print("Accuracy on target:", pipe.score(Xt, yt))
Accuracy on target: 1.0
Accuracy on target: 0.5
DA estimators with score cross-validation

DA estimators are compatible with scikit-learn cross-validation functions. Note that the sample_domain array must be passed in the params dictionary of the cross_val_score function.

Entropy score: -0.02 (+-0.01)
DA estimator with grid search

DA estimators are also compatible with scikit-learn grid search functions. Note that the sample_domain array must be passed in the fit method of the grid search.

reg_coral = [0.1, 0.5, 1, "auto"]

clf = make_da_pipeline(StandardScaler(), CORALAdapter(), SVC(probability=True))

# grid search
grid_search = GridSearchCV(
    estimator=clf,
    param_grid={"coraladapter__reg": reg_coral},
    cv=SourceTargetShuffleSplit(random_state=0),
    scoring=PredictionEntropyScorer(),
)

grid_search.fit(X, y, sample_domain=sample_domain)

print("Best regularization parameter:", grid_search.best_params_["coraladapter__reg"])
print("Accuracy on target:", np.mean(grid_search.predict(Xt) == yt))
Best regularization parameter: 0.1
Accuracy on target: 1.0
Advanced DA pipeline

The DA pipeline can be used with any estimator and any adapter. But more importantly all estimators in the pipeline are automatically wrapped in what we call in skada a Selector. The selector is a wrapper that allows you to choose which data is passed during fit and predict/transform.

In the following example, one StandardScaler is trained per domain. Then a single SVC is trained on source data only. When predicting on target data the pipeline will automatically use the StandardScaler trained on target and the SVC trained on source.

Accuracy on source: 1.0
Accuracy on target: 1.0

Similarly one can use the PerDomain selector to train a different estimator per domain. This allows to handle multiple source and target domains. In this case sample_domain must be provided to fit and predict/transform.

Accuracy on all data: 1.0

One can use a default selector on the whole pipeline which allows for instance to train the whole pipeline only on the source data as follows:

Accuracy on source: 1.0
Accuracy on target: 0.5

One can also use a default selector on the whole pipeline but overwrite it for the last estimator. In the example below a StandardScaler and a PCA are estimated per domain but the final SVC is trained on source data only.

pipe_perdomain = make_da_pipeline(
    StandardScaler(),
    PCA(n_components=2),
    SelectSource(SVC()),
    default_selector=SelectSourceTarget,
)

pipe_perdomain.fit(X, y, sample_domain=sample_domain)
print(
    "Accuracy on source:", pipe_perdomain.score(Xs, ys, sample_domain=sample_domain_s)
)
print(
    "Accuracy on target:", pipe_perdomain.score(Xt, yt, sample_domain=sample_domain_t)
)
Accuracy on source: 1.0
Accuracy on target: 1.0

Total running time of the script: (0 minutes 1.697 seconds)

Gallery generated by Sphinx-Gallery


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4