This is the class and function reference of scikit-learn. Please refer to the full user guide for further details, as the class and function raw specifications may not be enough to give full guidelines on their uses.
sklearn.base
: Base classes and utility functions¶
Base classes for all estimators.
Functions¶base.clone
(estimator[, safe]) Constructs a new estimator with the same parameters. sklearn.cluster
: Clustering¶
The sklearn.cluster
module gathers popular unsupervised clustering algorithms.
User guide: See the Clustering section for further details.
Classes¶cluster.AffinityPropagation
([damping, ...]) Perform Affinity Propagation Clustering of data. cluster.AgglomerativeClustering
([...]) Agglomerative Clustering cluster.Birch
([threshold, branching_factor, ...]) Implements the Birch clustering algorithm. cluster.DBSCAN
([eps, min_samples, metric, ...]) Perform DBSCAN clustering from vector array or distance matrix. cluster.FeatureAgglomeration
([n_clusters, ...]) Agglomerate features. cluster.KMeans
([n_clusters, init, n_init, ...]) K-Means clustering cluster.MiniBatchKMeans
([n_clusters, init, ...]) Mini-Batch K-Means clustering cluster.MeanShift
([bandwidth, seeds, ...]) Mean shift clustering using a flat kernel. cluster.SpectralClustering
([n_clusters, ...]) Apply clustering to a projection to the normalized laplacian. Functions¶ cluster.estimate_bandwidth
(X[, quantile, ...]) Estimate the bandwidth to use with the mean-shift algorithm. cluster.k_means
(X, n_clusters[, init, ...]) K-means clustering algorithm. cluster.ward_tree
(X[, connectivity, ...]) Ward clustering based on a Feature matrix. cluster.affinity_propagation
(S[, ...]) Perform Affinity Propagation Clustering of data cluster.dbscan
(X[, eps, min_samples, ...]) Perform DBSCAN clustering from vector array or distance matrix. cluster.mean_shift
(X[, bandwidth, seeds, ...]) Perform mean shift clustering of data using a flat kernel. cluster.spectral_clustering
(affinity[, ...]) Apply clustering to a projection to the normalized laplacian. sklearn.covariance
: Covariance Estimators¶
The sklearn.covariance
module includes methods and algorithms to robustly estimate the covariance of features given a set of points. The precision matrix defined as the inverse of the covariance is also estimated. Covariance estimation is closely related to the theory of Gaussian Graphical Models.
User guide: See the Covariance estimation section for further details.
sklearn.datasets
: Datasets¶
The sklearn.datasets
module includes utilities to load datasets, including methods to load and fetch popular reference datasets. It also features some artificial data generators.
User guide: See the Dataset loading utilities section for further details.
Loaders¶datasets.clear_data_home
([data_home]) Delete all the content of the data home cache. datasets.get_data_home
([data_home]) Return the path of the scikit-learn data dir. datasets.fetch_20newsgroups
([data_home, ...]) Load the filenames and data from the 20 newsgroups dataset. datasets.fetch_20newsgroups_vectorized
([...]) Load the 20 newsgroups dataset and transform it into tf-idf vectors. datasets.load_boston
() Load and return the boston house-prices dataset (regression). datasets.load_diabetes
() Load and return the diabetes dataset (regression). datasets.load_digits
([n_class]) Load and return the digits dataset (classification). datasets.load_files
(container_path[, ...]) Load text files with categories as subfolder names. datasets.load_iris
() Load and return the iris dataset (classification). datasets.fetch_lfw_pairs
([subset, ...]) Loader for the Labeled Faces in the Wild (LFW) pairs dataset datasets.fetch_lfw_people
([data_home, ...]) Loader for the Labeled Faces in the Wild (LFW) people dataset datasets.load_linnerud
() Load and return the linnerud dataset (multivariate regression). datasets.mldata_filename
(dataname) Convert a raw name for a data set in a mldata.org filename. datasets.fetch_mldata
(dataname[, ...]) Fetch an mldata.org data set datasets.fetch_olivetti_faces
([data_home, ...]) Loader for the Olivetti faces data-set from AT&T. datasets.fetch_california_housing
([...]) Loader for the California housing dataset from StatLib. datasets.fetch_covtype
([data_home, ...]) Load the covertype dataset, downloading it if necessary. datasets.load_mlcomp
(name_or_id[, set_, ...]) Load a datasets as downloaded from http://mlcomp.org datasets.load_sample_image
(image_name) Load the numpy array of a single sample image datasets.load_sample_images
() Load sample images for image manipulation. datasets.load_svmlight_file
(f[, n_features, ...]) Load datasets in the svmlight / libsvm format into sparse CSR matrix datasets.load_svmlight_files
(files[, ...]) Load dataset from multiple files in SVMlight format datasets.dump_svmlight_file
(X, y, f[, ...]) Dump the dataset in svmlight / libsvm file format. Samples generator¶ datasets.make_blobs
([n_samples, n_features, ...]) Generate isotropic Gaussian blobs for clustering. datasets.make_classification
([n_samples, ...]) Generate a random n-class classification problem. datasets.make_circles
([n_samples, shuffle, ...]) Make a large circle containing a smaller circle in 2d. datasets.make_friedman1
([n_samples, ...]) Generate the “Friedman #1” regression problem datasets.make_friedman2
([n_samples, noise, ...]) Generate the “Friedman #2” regression problem datasets.make_friedman3
([n_samples, noise, ...]) Generate the “Friedman #3” regression problem datasets.make_gaussian_quantiles
([mean, ...]) Generate isotropic Gaussian and label samples by quantile datasets.make_hastie_10_2
([n_samples, ...]) Generates data for binary classification used in Hastie et al. datasets.make_low_rank_matrix
([n_samples, ...]) Generate a mostly low rank matrix with bell-shaped singular values datasets.make_moons
([n_samples, shuffle, ...]) Make two interleaving half circles datasets.make_multilabel_classification
([...]) Generate a random multilabel classification problem. datasets.make_regression
([n_samples, ...]) Generate a random regression problem. datasets.make_s_curve
([n_samples, noise, ...]) Generate an S curve dataset. datasets.make_sparse_coded_signal
(n_samples, ...) Generate a signal as a sparse combination of dictionary elements. datasets.make_sparse_spd_matrix
([dim, ...]) Generate a sparse symmetric definite positive matrix. datasets.make_sparse_uncorrelated
([...]) Generate a random regression problem with sparse uncorrelated design datasets.make_spd_matrix
(n_dim[, random_state]) Generate a random symmetric, positive-definite matrix. datasets.make_swiss_roll
([n_samples, noise, ...]) Generate a swiss roll dataset. datasets.make_biclusters
(shape, n_clusters) Generate an array with constant block diagonal structure for biclustering. datasets.make_checkerboard
(shape, n_clusters) Generate an array with block checkerboard structure for biclustering. sklearn.decomposition
: Matrix Decomposition¶
The sklearn.decomposition
module includes matrix decomposition algorithms, including among others PCA, NMF or ICA. Most of the algorithms of this module can be regarded as dimensionality reduction techniques.
User guide: See the Decomposing signals in components (matrix factorization problems) section for further details.
decomposition.PCA
([n_components, copy, whiten]) Principal component analysis (PCA) decomposition.IncrementalPCA
([n_components, ...]) Incremental principal components analysis (IPCA). decomposition.ProjectedGradientNMF
([...]) Non-Negative matrix factorization by Projected Gradient (NMF) decomposition.RandomizedPCA
([n_components, ...]) Principal component analysis (PCA) using randomized SVD decomposition.KernelPCA
([n_components, ...]) Kernel Principal component analysis (KPCA) decomposition.FactorAnalysis
([n_components, ...]) Factor Analysis (FA) decomposition.FastICA
([n_components, ...]) FastICA: a fast algorithm for Independent Component Analysis. decomposition.TruncatedSVD
([n_components, ...]) Dimensionality reduction using truncated SVD (aka LSA). decomposition.NMF
([n_components, init, ...]) Non-Negative matrix factorization by Projected Gradient (NMF) decomposition.SparsePCA
([n_components, ...]) Sparse Principal Components Analysis (SparsePCA) decomposition.MiniBatchSparsePCA
([...]) Mini-batch Sparse Principal Components Analysis decomposition.SparseCoder
(dictionary[, ...]) Sparse coding decomposition.DictionaryLearning
([...]) Dictionary learning decomposition.MiniBatchDictionaryLearning
([...]) Mini-batch dictionary learning sklearn.ensemble
: Ensemble Methods¶
The sklearn.ensemble
module includes ensemble-based methods for classification and regression.
User guide: See the Ensemble methods section for further details.
partial dependence¶Partial dependence plots for tree ensembles.
sklearn.linear_model
: Generalized Linear Models¶
The sklearn.linear_model
module implements generalized linear models. It includes Ridge regression, Bayesian Regression, Lasso and Elastic Net estimators computed with Least Angle Regression and coordinate descent. It also implements Stochastic Gradient Descent related algorithms.
User guide: See the Generalized Linear Models section for further details.
linear_model.ARDRegression
([n_iter, tol, ...]) Bayesian ARD regression. linear_model.BayesianRidge
([n_iter, tol, ...]) Bayesian ridge regression linear_model.ElasticNet
([alpha, l1_ratio, ...]) Linear regression with combined L1 and L2 priors as regularizer. linear_model.ElasticNetCV
([l1_ratio, eps, ...]) Elastic Net model with iterative fitting along a regularization path linear_model.Lars
([fit_intercept, verbose, ...]) Least Angle Regression model a.k.a. linear_model.LarsCV
([fit_intercept, ...]) Cross-validated Least Angle Regression model linear_model.Lasso
([alpha, fit_intercept, ...]) Linear Model trained with L1 prior as regularizer (aka the Lasso) linear_model.LassoCV
([eps, n_alphas, ...]) Lasso linear model with iterative fitting along a regularization path linear_model.LassoLars
([alpha, ...]) Lasso model fit with Least Angle Regression a.k.a. linear_model.LassoLarsCV
([fit_intercept, ...]) Cross-validated Lasso, using the LARS algorithm linear_model.LassoLarsIC
([criterion, ...]) Lasso model fit with Lars using BIC or AIC for model selection linear_model.LinearRegression
([...]) Ordinary least squares Linear Regression. linear_model.LogisticRegression
([penalty, ...]) Logistic Regression (aka logit, MaxEnt) classifier. linear_model.LogisticRegressionCV
([Cs, ...]) Logistic Regression CV (aka logit, MaxEnt) classifier. linear_model.MultiTaskLasso
([alpha, ...]) Multi-task Lasso model trained with L1/L2 mixed-norm as regularizer linear_model.MultiTaskElasticNet
([alpha, ...]) Multi-task ElasticNet model trained with L1/L2 mixed-norm as regularizer linear_model.MultiTaskLassoCV
([eps, ...]) Multi-task L1/L2 Lasso with built-in cross-validation. linear_model.MultiTaskElasticNetCV
([...]) Multi-task L1/L2 ElasticNet with built-in cross-validation. linear_model.OrthogonalMatchingPursuit
([...]) Orthogonal Matching Pursuit model (OMP) linear_model.OrthogonalMatchingPursuitCV
([...]) Cross-validated Orthogonal Matching Pursuit model (OMP) linear_model.PassiveAggressiveClassifier
([...]) Passive Aggressive Classifier linear_model.PassiveAggressiveRegressor
([C, ...]) Passive Aggressive Regressor linear_model.Perceptron
([penalty, alpha, ...]) Perceptron linear_model.RandomizedLasso
([alpha, ...]) Randomized Lasso. linear_model.RandomizedLogisticRegression
([...]) Randomized Logistic Regression linear_model.RANSACRegressor
([...]) RANSAC (RANdom SAmple Consensus) algorithm. linear_model.Ridge
([alpha, fit_intercept, ...]) Linear least squares with l2 regularization. linear_model.RidgeClassifier
([alpha, ...]) Classifier using Ridge regression. linear_model.RidgeClassifierCV
([alphas, ...]) Ridge classifier with built-in cross-validation. linear_model.RidgeCV
([alphas, ...]) Ridge regression with built-in cross-validation. linear_model.SGDClassifier
([loss, penalty, ...]) Linear classifiers (SVM, logistic regression, a.o.) with SGD training. linear_model.SGDRegressor
([loss, penalty, ...]) Linear model fitted by minimizing a regularized empirical loss with SGD linear_model.TheilSenRegressor
([...]) Theil-Sen Estimator: robust multivariate regression model. linear_model.lars_path
(X, y[, Xy, Gram, ...]) Compute Least Angle Regression or Lasso path using LARS algorithm [1] linear_model.lasso_path
(X, y[, eps, ...]) Compute Lasso path with coordinate descent linear_model.lasso_stability_path
(X, y[, ...]) Stabiliy path based on randomized Lasso estimates linear_model.orthogonal_mp
(X, y[, ...]) Orthogonal Matching Pursuit (OMP) linear_model.orthogonal_mp_gram
(Gram, Xy[, ...]) Gram Orthogonal Matching Pursuit (OMP) sklearn.metrics
: Metrics¶
See the Model evaluation: quantifying the quality of predictions section and the Pairwise metrics, Affinities and Kernels section of the user guide for further details.
The sklearn.metrics
module includes score functions, performance metrics and pairwise metrics and distance computations.
See the Classification metrics section of the user guide for further details.
metrics.accuracy_score
(y_true, y_pred[, ...]) Accuracy classification score. metrics.auc
(x, y[, reorder]) Compute Area Under the Curve (AUC) using the trapezoidal rule metrics.average_precision_score
(y_true, y_score) Compute average precision (AP) from prediction scores metrics.brier_score_loss
(y_true, y_prob[, ...]) Compute the Brier score. metrics.classification_report
(y_true, y_pred) Build a text report showing the main classification metrics metrics.confusion_matrix
(y_true, y_pred[, ...]) Compute confusion matrix to evaluate the accuracy of a classification metrics.f1_score
(y_true, y_pred[, labels, ...]) Compute the F1 score, also known as balanced F-score or F-measure metrics.fbeta_score
(y_true, y_pred, beta[, ...]) Compute the F-beta score metrics.hamming_loss
(y_true, y_pred[, classes]) Compute the average Hamming loss. metrics.hinge_loss
(y_true, pred_decision[, ...]) Average hinge loss (non-regularized) metrics.jaccard_similarity_score
(y_true, y_pred) Jaccard similarity coefficient score metrics.log_loss
(y_true, y_pred[, eps, ...]) Log loss, aka logistic loss or cross-entropy loss. metrics.matthews_corrcoef
(y_true, y_pred) Compute the Matthews correlation coefficient (MCC) for binary classes metrics.precision_recall_curve
(y_true, ...) Compute precision-recall pairs for different probability thresholds metrics.precision_recall_fscore_support
(...) Compute precision, recall, F-measure and support for each class metrics.precision_score
(y_true, y_pred[, ...]) Compute the precision metrics.recall_score
(y_true, y_pred[, ...]) Compute the recall metrics.roc_auc_score
(y_true, y_score[, ...]) Compute Area Under the Curve (AUC) from prediction scores metrics.roc_curve
(y_true, y_score[, ...]) Compute Receiver operating characteristic (ROC) metrics.zero_one_loss
(y_true, y_pred[, ...]) Zero-one classification loss. metrics.brier_score_loss
(y_true, y_prob[, ...]) Compute the Brier score. Clustering metrics¶
See the Clustering performance evaluation section of the user guide for further details.
The sklearn.metrics.cluster
submodule contains evaluation metrics for cluster analysis results. There are two forms of evaluation:
See the Pairwise metrics, Affinities and Kernels section of the user guide for further details.
metrics.pairwise.additive_chi2_kernel
(X[, Y]) Computes the additive chi-squared kernel between observations in X and Y metrics.pairwise.chi2_kernel
(X[, Y, gamma]) Computes the exponential chi-squared kernel X and Y. metrics.pairwise.distance_metrics
() Valid metrics for pairwise_distances. metrics.pairwise.euclidean_distances
(X[, Y, ...]) Considering the rows of X (and Y=X) as vectors, compute the distance matrix between each pair of vectors. metrics.pairwise.kernel_metrics
() Valid metrics for pairwise_kernels metrics.pairwise.linear_kernel
(X[, Y]) Compute the linear kernel between X and Y. metrics.pairwise.manhattan_distances
(X[, Y, ...]) Compute the L1 distances between the vectors in X and Y. metrics.pairwise.pairwise_distances
(X[, Y, ...]) Compute the distance matrix from a vector array X and optional Y. metrics.pairwise.pairwise_kernels
(X[, Y, ...]) Compute the kernel between arrays X and optional array Y. metrics.pairwise.polynomial_kernel
(X[, Y, ...]) Compute the polynomial kernel between X and Y: metrics.pairwise.rbf_kernel
(X[, Y, gamma]) Compute the rbf (gaussian) kernel between X and Y: metrics.pairwise_distances
(X[, Y, metric, ...]) Compute the distance matrix from a vector array X and optional Y. metrics.pairwise_distances_argmin
(X, Y[, ...]) Compute minimum distances between one point and a set of points. metrics.pairwise_distances_argmin_min
(X, Y) Compute minimum distances between one point and a set of points. sklearn.multiclass
: Multiclass and multilabel classification¶ Multiclass and multilabel classification strategies¶
The estimators provided in this module are meta-estimators: they require a base estimator to be provided in their constructor. For example, it is possible to use these estimators to turn a binary classifier or a regressor into a multiclass classifier. It is also possible to use these estimators with multiclass estimators in the hope that their accuracy or runtime performance improves.
All classifiers in scikit-learn implement multiclass classification; you only need to use this module if you want to experiment with custom multiclass strategies.
The one-vs-the-rest meta-classifier also implements a predict_proba method, so long as such a method is implemented by the base classifier. This method returns probabilities of class membership in both the single label and multilabel case. Note that in the multilabel case, probabilities are the marginal probability that a given sample falls in the given class. As such, in the multilabel case the sum of these probabilities over all possible labels for a given sample will not sum to unity, as they do in the single label case.
User guide: See the Multiclass and multilabel algorithms section for further details.
sklearn.naive_bayes
: Naive Bayes¶
The sklearn.naive_bayes
module implements Naive Bayes algorithms. These are supervised learning methods based on applying Bayes’ theorem with strong (naive) feature independence assumptions.
User guide: See the Naive Bayes section for further details.
sklearn.preprocessing
: Preprocessing and Normalization¶
The sklearn.preprocessing
module includes scaling, centering, normalization, binarization and imputation methods.
User guide: See the Preprocessing data section for further details.
preprocessing.Binarizer
([threshold, copy]) Binarize data (set feature values to 0 or 1) according to a threshold preprocessing.Imputer
([missing_values, ...]) Imputation transformer for completing missing values. preprocessing.KernelCenterer
Center a kernel matrix preprocessing.LabelBinarizer
([neg_label, ...]) Binarize labels in a one-vs-all fashion preprocessing.LabelEncoder
Encode labels with value between 0 and n_classes-1. preprocessing.MultiLabelBinarizer
([classes, ...]) Transform between iterable of iterables and a multilabel format preprocessing.MinMaxScaler
([feature_range, copy]) Standardizes features by scaling each feature to a given range. preprocessing.Normalizer
([norm, copy]) Normalize samples individually to unit norm. preprocessing.OneHotEncoder
([n_values, ...]) Encode categorical integer features using a one-hot aka one-of-K scheme. preprocessing.PolynomialFeatures
([degree, ...]) Generate polynomial and interaction features. preprocessing.RobustScaler
([with_centering, ...]) Scale features using statistics that are robust to outliers. preprocessing.StandardScaler
([copy, ...]) Standardize features by removing the mean and scaling to unit variance preprocessing.add_dummy_feature
(X[, value]) Augment dataset with an additional dummy feature. preprocessing.binarize
(X[, threshold, copy]) Boolean thresholding of array-like or scipy.sparse matrix preprocessing.label_binarize
(y, classes[, ...]) Binarize labels in a one-vs-all fashion preprocessing.normalize
(X[, norm, axis, copy]) Scale input vectors individually to unit norm (vector length). preprocessing.scale
(X[, axis, with_mean, ...]) Standardize a dataset along any axis sklearn.random_projection
: Random projection¶
Random Projection transformers
Random Projections are a simple and computationally efficient way to reduce the dimensionality of the data by trading a controlled amount of accuracy (as additional variance) for faster processing times and smaller model sizes.
The dimensions and distribution of Random Projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset.
The main theoretical result behind the efficiency of random projection is the Johnson-Lindenstrauss lemma (quoting Wikipedia):
In mathematics, the Johnson-Lindenstrauss lemma is a result concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. The map used for the embedding is at least Lipschitz, and can even be taken to be an orthogonal projection.
User guide: See the Random Projection section for further details.
sklearn.semi_supervised
Semi-Supervised Learning¶
The sklearn.semi_supervised
module implements semi-supervised learning algorithms. These algorithms utilized small amounts of labeled data and large amounts of unlabeled data for classification tasks. This module includes Label Propagation.
User guide: See the Semi-Supervised section for further details.
sklearn.svm
: Support Vector Machines¶
The sklearn.svm
module includes Support Vector Machine algorithms.
User guide: See the Support Vector Machines section for further details.
Estimators¶svm.SVC
([C, kernel, degree, gamma, coef0, ...]) C-Support Vector Classification. svm.LinearSVC
([penalty, loss, dual, tol, C, ...]) Linear Support Vector Classification. svm.NuSVC
([nu, kernel, degree, gamma, ...]) Nu-Support Vector Classification. svm.SVR
([kernel, degree, gamma, coef0, tol, ...]) Epsilon-Support Vector Regression. svm.LinearSVR
([epsilon, tol, C, loss, ...]) Linear Support Vector Regression. svm.NuSVR
([nu, C, kernel, degree, gamma, ...]) Nu Support Vector Regression. svm.OneClassSVM
([kernel, degree, gamma, ...]) Unsupervised Outlier Detection. svm.l1_min_c
(X, y[, loss, fit_intercept, ...]) Return the lowest bound for C such that for C in (l1_min_C, infinity) the model is guaranteed not to be empty.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4