RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from http://cran.rstudio.com/web/packages/Rcpp/../RcppDynProg/../Rcpp/../dbscan/readme/README.html below:

R package dbscan - Density-Based Spatial Clustering of Applications with Noise (DBSCAN) and Related Algorithms

Introduction

This R package (Hahsler, Piekenbrock, and Doran 2019) provides a fast C++ (re)implementation of several density-based algorithms with a focus on the DBSCAN family for clustering spatial data. The package includes:

Clustering

DBSCAN: Density-based spatial clustering of applications with noise (Ester et al.Â 1996).
Jarvis-Patrick Clustering: Clustering using a similarity measure based on shared near neighbors (Jarvis and Patrick 1973).
SNN Clustering: Shared nearest neighbor clustering (ErtÃ¶z, Steinbach, and Kumar 2003).
HDBSCAN: Hierarchical DBSCAN with simplified hierarchy extraction (Campello et al.Â 2015).
FOSC: Framework for optimal selection of clusters for unsupervised and semisupervised clustering of hierarchical cluster tree (Campello, Moulavi, and Sander 2013).
OPTICS/OPTICSXi: Ordering points to identify the clustering structure and cluster extraction methods (Ankerst et al. 1999).

Outlier Detection

LOF: Local outlier factor algorithm (Breunig et al. 2000).
GLOSH: Global-Local Outlier Score from Hierarchies algorithm (Campello et al.Â 2015).

Cluster Evaluation

DBCV: Density-based clustering validation (Moulavi et al. 2014).

Fast Nearest-Neighbor Search (using kd-trees)

kNN search
Fixed-radius NN search

The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search, and are for Euclidean distance typically faster than the native R implementations (e.g., dbscan in package fpc), or the implementations in WEKA, ELKI and Pythonâs scikit-learn.

The following R packages use dbscan: AFM, AnimalSequences, bioregion, CLONETv2, ClustAssess, cordillera, CPC, crosshap, daltoolbox, DDoutlier, diceR, dobin, doc2vec, dPCP, EHRtemporalVariability, eventstream, evprof, FCPS, fdacluster, FORTLS, funtimes, FuzzyDBScan, immunaut, karyotapR, ksharp, LOMAR, maotai, MapperAlgo, metaCluster, metasnf, mlr3cluster, oclust, openSkies, opticskxi, OTclust, pagoda2, parameters, ParBayesianOptimization, performance, rMultiNet, seriation, sfdep, sfnetworks, sharp, shipunov, smotefamily, snap, spdep, squat, ssMRCD, stream, SuperCell, synr, tidySEM, weird

To cite package âdbscanâ in publications use:

Hahsler M, Piekenbrock M, Doran D (2019). âdbscan: Fast Density-Based Clustering with R.â Journal of Statistical Software, 91(1), 1-30. doi:10.18637/jss.v091.i01 https://doi.org/10.18637/jss.v091.i01.

@Article{,
  title = {{dbscan}: Fast Density-Based Clustering with {R}},
  author = {Michael Hahsler and Matthew Piekenbrock and Derek Doran},
  journal = {Journal of Statistical Software},
  year = {2019},
  volume = {91},
  number = {1},
  pages = {1--30},
  doi = {10.18637/jss.v091.i01},
}

Installation

Stable CRAN version: Install from within R with

install.packages("dbscan")

Current development version: Install from r-universe.

install.packages("dbscan",
    repos = c("https://mhahsler.r-universe.dev",
              "https://cloud.r-project.org/"))

Usage

Load the package and use the numeric variables in the iris dataset

library("dbscan")

data("iris")
x <- as.matrix(iris[, 1:4])

DBSCAN

db <- dbscan(x, eps = 0.42, minPts = 5)
db

## DBSCAN clustering for 150 objects.
## Parameters: eps = 0.42, minPts = 5
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 3 cluster(s) and 29 noise points.
## 
##  0  1  2  3 
## 29 48 37 36 
## 
## Available fields: cluster, eps, minPts, metric, borderPoints

Visualize the resulting clustering (noise points are shown in black).

pairs(x, col = db$cluster + 1L)

OPTICS

opt <- optics(x, eps = 1, minPts = 4)
opt

## OPTICS ordering/clustering for 150 objects.
## Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
## Available fields: order, reachdist, coredist, predecessor, minPts, eps,
##                   eps_cl, xi

Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)

opt <- extractDBSCAN(opt, eps_cl = 0.4)
plot(opt)

HDBSCAN

hdb <- hdbscan(x, minPts = 4)
hdb

## HDBSCAN clustering for 150 objects.
## Parameters: minPts = 4
## The clustering contains 2 cluster(s) and 0 noise points.
## 
##   1   2 
## 100  50 
## 
## Available fields: cluster, minPts, coredist, cluster_scores,
##                   membership_prob, outlier_scores, hc

Visualize the hierarchical clustering as a simplified tree. HDBSCAN finds 2 stable clusters.

plot(hdb, show_flat = TRUE)

Using dbscan with tidyverse

dbscan provides for all clustering algorithms tidy(), augment(), and glance() so they can be easily used with tidyverse, ggplot2 and tidymodels.

library(tidyverse)
db <- x %>%
    dbscan(eps = 0.42, minPts = 5)

Get cluster statistics as a tibble

## # A tibble: 4 Ã 3
##   cluster  size noise
##   <fct>   <int> <lgl>
## 1 0          29 TRUE 
## 2 1          48 FALSE
## 3 2          37 FALSE
## 4 3          36 FALSE

Visualize the clustering with ggplot2 (use an x for noise points)

augment(db, x) %>%
    ggplot(aes(x = Petal.Length, y = Petal.Width)) + geom_point(aes(color = .cluster,
    shape = noise)) + scale_shape_manual(values = c(19, 4))

Using dbscan from Python

R, the R package dbscan, and the Python package rpy2 need to be installed.

import pandas as pd
import numpy as np

### prepare data
iris = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', 
                   header = None, 
                   names = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species'])
iris_numeric = iris[['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']]

# get R dbscan package
from rpy2.robjects import packages
dbscan = packages.importr('dbscan')

# enable automatic conversion of pandas dataframes to R dataframes
from rpy2.robjects import pandas2ri
pandas2ri.activate()

db = dbscan.dbscan(iris_numeric, eps = 0.5, MinPts = 5)
print(db)

## DBSCAN clustering for 150 objects.
## Parameters: eps = 0.5, minPts = 5
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 2 cluster(s) and 17 noise points.
## 
##  0  1  2 
## 17 49 84 
## 
## Available fields: cluster, eps, minPts, dist, borderPoints

# get the cluster assignment vector
labels = np.array(db.rx('cluster'))
labels

## array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
##         1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
##         1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2,
##         2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0,
##         2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 0, 0, 2, 0, 0,
##         2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0,
##         2, 2, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]],
##       dtype=int32)

License

The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The OPTICSXi R implementation was directly ported from the ELKI frameworkâs Java implementation (GNU AGPLv3), with permission by the original author, Erich Schubert.

Changes

List of changes from NEWS.md

References

Ankerst, Mihael, Markus M Breunig, Hans-Peter Kriegel, and JÃ¶rg Sander. 1999. âOPTICS: Ordering Points to Identify the Clustering Structure.â In

ACM Sigmod Record

, 28:49â60. 2. ACM.

https://doi.org/10.1145/304181.304187

Breunig, Markus M, Hans-Peter Kriegel, Raymond T Ng, and JÃ¶rg Sander. 2000. âLOF: Identifying Density-Based Local Outliers.â In

ACM Int. Conf. On Management of Data

, 29:93â104. 2. ACM.

https://doi.org/10.1145/335191.335388

Campello, Ricardo JGB, Davoud Moulavi, and JÃ¶rg Sander. 2013. âDensity-Based Clustering Based on Hierarchical Density Estimates.â In

Pacific-Asia Conference on Knowledge Discovery and Data Mining

, 160â72. Springer.

https://doi.org/10.1007/978-3-642-37456-2_14

Campello, Ricardo JGB, Davoud Moulavi, Arthur Zimek, and Joerg Sander. 2015. âHierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection.â

ACM Transactions on Knowledge Discovery from Data (TKDD)

10 (1): 5.

https://doi.org/10.1145/2733381

ErtÃ¶z, Levent, Michael Steinbach, and Vipin Kumar. 2003. âFinding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data.â In

Proceedings of the 2003 SIAM International Conference on Data Mining (SDM)

, 47â58.

https://doi.org/10.1137/1.9781611972733.5

Ester, Martin, Hans-Peter Kriegel, JÃ¶rg Sander, Xiaowei Xu, et al.Â 1996. âA Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise.â In

Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96)

, 226â31.

https://dl.acm.org/doi/10.5555/3001460.3001507

Hahsler, Michael, Matthew Piekenbrock, and Derek Doran. 2019. â

dbscan

: Fast Density-Based Clustering with R.â

Journal of Statistical Software

91 (1): 1â30.

https://doi.org/10.18637/jss.v091.i01

Jarvis, R. A., and E. A. Patrick. 1973. âClustering Using a Similarity Measure Based on Shared Near Neighbors.â

IEEE Transactions on Computers

C-22 (11): 1025â34.

https://doi.org/10.1109/T-C.1973.223640

Moulavi, Davoud, Pablo A. Jaskowiak, Ricardo J. G. B. Campello, Arthur Zimek, and JÃ¶rg Sander. 2014. âDensity-Based Clustering Validation.â In

Proceedings of the 2014 SIAM International Conference on Data Mining (SDM)

, 839â47.

https://doi.org/10.1137/1.9781611973440.96

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4