RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://scikit-learn.org/dev/developers/../api/../modules/generated/sklearn.cluster.DBSCAN.html below:

DBSCAN — scikit-learn 1.8.dev0 documentation

DBSCAN#

class sklearn.cluster.DBSCAN(eps=0.5, *, min_samples=5, metric='euclidean', metric_params=None, algorithm='auto', leaf_size=30, p=None, n_jobs=None)[source]#

Perform DBSCAN clustering from vector array or distance matrix.

DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density.

This implementation has a worst case memory complexity of $O({n}^2)$, which can occur when the eps param is large and min_samples is low, while the original DBSCAN only uses linear memory. For further details, see the Notes below.

See also

OPTICS: A similar clustering at multiple values of eps. Our implementation is optimized for memory usage.

Notes

This implementation bulk-computes all neighborhood queries, which increases the memory complexity to O(n.d) where d is the average number of neighbors, while original DBSCAN had memory complexity O(n). It may attract a higher memory complexity when querying these nearest neighborhoods, depending on the algorithm.

One way to avoid the query complexity is to pre-compute sparse neighborhoods in chunks using NearestNeighbors.radius_neighbors_graph with mode='distance', then using metric='precomputed' here.

Another way to reduce memory and computation time is to remove (near-)duplicate points and use sample_weight instead.

OPTICS provides a similar clustering with lower memory usage.

References

Ester, M., H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231. 1996

Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017). “DBSCAN revisited, revisited: why and how you should (still) use DBSCAN.” ACM Transactions on Database Systems (TODS), 42(3), 19.

Examples

>>> from sklearn.cluster import DBSCAN
>>> import numpy as np
>>> X = np.array([[1, 2], [2, 2], [2, 3],
...               [8, 7], [8, 8], [25, 80]])
>>> clustering = DBSCAN(eps=3, min_samples=2).fit(X)
>>> clustering.labels_
array([ 0,  0,  0,  1,  1, -1])
>>> clustering
DBSCAN(eps=3, min_samples=2)

For an example, see Demo of DBSCAN clustering algorithm.

For a comparison of DBSCAN with other clustering algorithms, see Comparing different clustering algorithms on toy datasets

fit(X, y=None, sample_weight=None)[source]#

Perform DBSCAN clustering from features, or distance matrix.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features), or (n_samples, n_samples): Training instances to cluster, or distances between instances if metric='precomputed'. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.
yIgnored: Not used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=None: Weight of each sample, such that a sample with a weight of at least min_samples is by itself a core sample; a sample with a negative weight may inhibit its eps-neighbor from being core. Note that weights are absolute, and default to 1.

Returns:

selfobject: Returns a fitted instance of self.

fit_predict(X, y=None, sample_weight=None)[source]#

Compute clusters from a data or distance matrix and predict labels.

Parameters:

X{array-like, sparse matrix} of shape (n_samples, n_features), or (n_samples, n_samples): Training instances to cluster, or distances between instances if metric='precomputed'. If a sparse matrix is provided, it will be converted into a sparse csr_matrix.
yIgnored: Not used, present here for API consistency by convention.
sample_weightarray-like of shape (n_samples,), default=None: Weight of each sample, such that a sample with a weight of at least min_samples is by itself a core sample; a sample with a negative weight may inhibit its eps-neighbor from being core. Note that weights are absolute, and default to 1.

Returns:

labelsndarray of shape (n_samples,): Cluster labels. Noisy samples are given the label -1.

get_metadata_routing()[source]#

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routingMetadataRequest: A MetadataRequest encapsulating routing information.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

paramsdict: Parameter names mapped to their values.

set_fit_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → DBSCAN[source]#

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

False: metadata is not requested and the meta-estimator will not pass it to fit.

None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for sample_weight parameter in fit.

Returns:

selfobject: The updated object.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**paramsdict: Estimator parameters.

Returns:

selfestimator instance: Estimator instance.

Gallery examples#

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4