This is a short example to get started with SKADA and perform domain adaptation on a simple dataset. It illustrates the API choice specific to DA.
# Author: Remi Flamary # # License: BSD 3-Clause # sphinx_gallery_thumbnail_number = 1DA dataset
We generate a simple 2D DA dataset. Note that DA datasets provided by SKADA are organized as follows:
X
is the input data, including the source and the target samples
y
is the output data to be predicted (labels on target samples are not used when fitting the DA estimator)
sample_domain
encodes the domain of each sample (integer >=0 for source and <0 for target)
# Get DA dataset X, y, sample_domain = make_shifted_datasets( 20, 20, shift="concept_drift", random_state=42 ) # split source and target for visualization Xs, Xt, ys, yt = source_target_split(X, y, sample_domain=sample_domain) sample_domain_s = np.ones(Xs.shape[0]) sample_domain_t = -np.ones(Xt.shape[0]) * 2 # plot data plt.figure(1, (10, 5)) plt.subplot(1, 2, 1) plt.scatter(Xs[:, 0], Xs[:, 1], c=ys, cmap="tab10", vmax=9, label="Source") plt.title("Source data") ax = plt.axis() plt.subplot(1, 2, 2) plt.scatter(Xt[:, 0], Xt[:, 1], c=yt, cmap="tab10", vmax=9, label="Target") plt.axis(ax) plt.title("Target data")
Text(0.5, 1.0, 'Target data')DA Classifier estimator
SKADA estimators are used like scikit-learn estimators. The only difference is that the sample_domain
array must be passed by name when fitting the estimator.
Accuracy on source: 0.84375 Accuracy on target: 1.0DA estimator in a pipeline DA Adapter pipeline
Several SKADA estimators include a data adapter that transforms the input data so that a scikit-learn estimator can be used. For those methods, SKADA provides a Adapter
class that can be used in a DA pipeline from make_da_pipeline
.
Here is an example with the CORAL and GaussianReweight adapters.
# Note that as illustrated below for reweighting adapters, one needs a # subsequent estimator that takes :code:`sample_weight` as an input parameter. # This can be done using the :code:`set_fit_request` method of the estimator # by calling :code:`.set_fit_request(sample_weight=True)`. # If the estimator (for pipeline or DA estimator) does not # require sample weights, the DA pipeline will raise an error. # create a DA pipeline with CORAL adapter pipe = make_da_pipeline(StandardScaler(), CORALAdapter(), SVC()) pipe.fit(X, y, sample_domain=sample_domain) print("Accuracy on target:", pipe.score(Xt, yt)) # create a DA pipeline with GaussianReweight adapter (does not work well on # concept drift). pipe = make_da_pipeline( StandardScaler(), GaussianReweightAdapter(), LogisticRegression().set_fit_request(sample_weight=True), ) pipe.fit(X, y, sample_domain=sample_domain) print("Accuracy on target:", pipe.score(Xt, yt))
Accuracy on target: 1.0 Accuracy on target: 0.5DA estimators with score cross-validation
DA estimators are compatible with scikit-learn cross-validation functions. Note that the sample_domain
array must be passed in the params
dictionary of the cross_val_score
function.
Entropy score: -0.02 (+-0.01)DA estimator with grid search
DA estimators are also compatible with scikit-learn grid search functions. Note that the sample_domain
array must be passed in the fit
method of the grid search.
reg_coral = [0.1, 0.5, 1, "auto"] clf = make_da_pipeline(StandardScaler(), CORALAdapter(), SVC(probability=True)) # grid search grid_search = GridSearchCV( estimator=clf, param_grid={"coraladapter__reg": reg_coral}, cv=SourceTargetShuffleSplit(random_state=0), scoring=PredictionEntropyScorer(), ) grid_search.fit(X, y, sample_domain=sample_domain) print("Best regularization parameter:", grid_search.best_params_["coraladapter__reg"]) print("Accuracy on target:", np.mean(grid_search.predict(Xt) == yt))
Best regularization parameter: 0.1 Accuracy on target: 1.0Advanced DA pipeline
The DA pipeline can be used with any estimator and any adapter. But more importantly all estimators in the pipeline are automatically wrapped in what we call in skada a Selector. The selector is a wrapper that allows you to choose which data is passed during fit and predict/transform.
In the following example, one StandardScaler is trained per domain. Then a single SVC is trained on source data only. When predicting on target data the pipeline will automatically use the StandardScaler trained on target and the SVC trained on source.
Accuracy on source: 1.0 Accuracy on target: 1.0
Similarly one can use the PerDomain selector to train a different estimator per domain. This allows to handle multiple source and target domains. In this case sample_domain
must be provided to fit and predict/transform.
Accuracy on all data: 1.0
One can use a default selector on the whole pipeline which allows for instance to train the whole pipeline only on the source data as follows:
Accuracy on source: 1.0 Accuracy on target: 0.5
One can also use a default selector on the whole pipeline but overwrite it for the last estimator. In the example below a StandardScaler
and a PCA
are estimated per domain but the final SVC is trained on source data only.
pipe_perdomain = make_da_pipeline( StandardScaler(), PCA(n_components=2), SelectSource(SVC()), default_selector=SelectSourceTarget, ) pipe_perdomain.fit(X, y, sample_domain=sample_domain) print( "Accuracy on source:", pipe_perdomain.score(Xs, ys, sample_domain=sample_domain_s) ) print( "Accuracy on target:", pipe_perdomain.score(Xt, yt, sample_domain=sample_domain_t) )
Accuracy on source: 1.0 Accuracy on target: 1.0
Total running time of the script: (0 minutes 1.697 seconds)
Gallery generated by Sphinx-Gallery
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4