Validation curve.
Determine training and test scores for varying parameter values.
Compute scores for an estimator with different values of a specified parameter. This is similar to grid search with one parameter. However, this will also compute training scores and is merely a utility for plotting the results.
Read more in the User Guide.
An object of that type which is cloned for each validation. It must also implement “predict” unless scoring
is a callable that doesn’t rely on “predict” to compute a score.
Training vector, where n_samples
is the number of samples and n_features
is the number of features.
Target relative to X for classification or regression; None for unsupervised learning.
Name of the parameter that will be varied.
The values of the parameter that will be evaluated.
Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold
).
Changed in version 1.6: groups
can only be passed if metadata routing is not enabled via sklearn.set_config(enable_metadata_routing=True)
. When routing is enabled, pass groups
alongside other metadata via the params
argument instead. E.g.: validation_curve(..., params={'groups': groups})
.
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross validation,
int, to specify the number of folds in a (Stratified)KFold
,
An iterable yielding (train, test) splits as arrays of indices.
For int/None inputs, if the estimator is a classifier and y
is either binary or multiclass, StratifiedKFold
is used. In all other cases, KFold
is used. These splitters are instantiated with shuffle=False
so the splits will be the same across calls.
Refer User Guide for the various cross-validation strategies that can be used here.
Changed in version 0.22: cv
default value if None changed from 3-fold to 5-fold.
Scoring method to use to evaluate the training and test sets.
str: see String name scorers for options.
callable: a scorer callable object (e.g., function) with signature scorer(estimator, X, y)
. See Callable scorers for details.
None
: the estimator
’s default evaluation criterion is used.
Number of jobs to run in parallel. Training the estimator and computing the score are parallelized over the combinations of each parameter value and each cross-validation split. None
means 1 unless in a joblib.parallel_backend
context. -1
means using all processors. See Glossary for more details.
Number of predispatched jobs for parallel execution (default is all). The option can reduce the allocated memory. The str can be an expression like ‘2*n_jobs’.
Controls the verbosity: the higher, the more messages.
Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised.
Added in version 0.20.
Parameters to pass to the fit method of the estimator.
Deprecated since version 1.6: This parameter is deprecated and will be removed in version 1.8. Use params
instead.
Parameters to pass to the estimator, scorer and cross-validation object.
If enable_metadata_routing=False
(default): Parameters directly passed to the fit
method of the estimator.
If enable_metadata_routing=True
: Parameters safely routed to the fit
method of the estimator, to the scorer and to the cross-validation object. See Metadata Routing User Guide for more details.
Added in version 1.6.
Scores on training sets.
Scores on test set.
Notes
See Effect of model regularization on training and test error
Examples
>>> import numpy as np >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import validation_curve >>> from sklearn.linear_model import LogisticRegression >>> X, y = make_classification(n_samples=1_000, random_state=0) >>> logistic_regression = LogisticRegression() >>> param_name, param_range = "C", np.logspace(-8, 3, 10) >>> train_scores, test_scores = validation_curve( ... logistic_regression, X, y, param_name=param_name, param_range=param_range ... ) >>> print(f"The average train accuracy is {train_scores.mean():.2f}") The average train accuracy is 0.81 >>> print(f"The average test accuracy is {test_scores.mean():.2f}") The average test accuracy is 0.81
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4