A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/serkor1/SLmetrics below:

serkor1/SLmetrics: A high-performance R for supervised and unsupervised machine learning evaluation metrics witten in 'C++'.

{SLmetrics}: Machine learning performance evaluation on steroids

{SLmetrics} is a lightweight R package written in C++ and {Rcpp} for memory-efficient and lightning-fast machine learning performance evaluation; it’s like using a supercharged {yardstick} but without the risk of soft to super-hard deprecations. {SLmetrics} covers both regression and classification metrics and provides (almost) the same array of metrics as {scikit-learn} and {PyTorch} all without {reticulate} and the Python compile-run-(crash)-debug cycle.

Depending on the mood and alignment of planets {SLmetrics} stands for Supervised Learning metrics, or Statistical Learning metrics. If {SLmetrics} catches on, the latter will be the core philosophy and include unsupervised learning metrics. If not, then it will remain a {pkg} for Supervised Learning metrics, and a sandbox for me to develop my C++ skills.

Below you’ll find instructions to install {SLmetrics} and get started with your first metric, the Root Mean Squared Error (RMSE).

## install latest CRAN build
install.packages("SLmetrics")

Below is a minimal example demonstrating how to compute both unweighted and weighted RMSE.

library(SLmetrics)

actual    <- c(10.2, 12.5, 14.1)
predicted <- c(9.8, 11.5, 14.2)
weights   <- c(0.2, 0.5, 0.3)

cat(
  "Root Mean Squared Error", rmse(
    actual    = actual,
    predicted = predicted,
  ),
  "Root Mean Squared Error (weighted)", weighted.rmse(
    actual    = actual,
    predicted = predicted,
    w         = weights
  ),
  sep = "\n"
)
#> Root Mean Squared Error
#> 0.6244998
#> Root Mean Squared Error (weighted)
#> 0.7314369

That’s all! Now you can explore the rest of this README for in-depth usage, performance comparisons, and more details about {SLmetrics}.

Machine learning can be a complicated task; the steps from feature engineering to model deployment require carefully measured actions and decisions. One low-hanging fruit to simplify this process is performance evaluation.

At its core, performance evaluation is essentially just comparing two vectors - a programmatically and, at times, mathematically trivial step in the machine learning pipeline, but one that can become complicated due to:

  1. Dependencies and potential deprecations
  2. Needlessly complex or repetitive arguments
  3. Performance and memory bottlenecks at scale

{SLmetrics} solves these issues by being:

  1. Fast: Powered by C++ and {Rcpp}
  2. Memory-efficient: Everything is structured around pointers and references
  3. Lightweight: Only depends on {Rcpp} and {lattice}
  4. Simple: S3-based, minimal overhead, and flexible inputs

Performance evaluation should be plug-and-play and “just work” out of the box - there’s no need to worry about quasiquations, dependencies, deprecations, or variations of the same functions relative to their arguments when using {SLmetrics}.

One, obviously, can’t build an R-package on C++ and {Rcpp} without a proper pissing contest at the urinals - below is a comparison in execution time and memory efficiency of two simple cases that any {pkg} should be able to handle gracefully; computing a 2 x 2 confusion matrix and computing the RMSE1.

As shown in the chart, {SLmetrics} maintains consistently low(er) execution times across different sample sizes.

Below are the results for garbage collections and total memory allocations when computing a 2×2 confusion matrix (N = 1e7) and RMSE (N = 1e7) 2. Notice that {SLmetrics} requires no GC calls for these operations.

Iterations Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) {SLmetrics} 100 0 0.00 0 {yardstick} 100 190 4.44 381 {MLmetrics} 100 186 4.50 381 {mlr3measures} 100 371 3.93 916

2 x 2 Confusion Matrix (N = 1e7)

Iterations Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) {SLmetrics} 100 0 0.00 0 {yardstick} 100 149 4.30 420 {MLmetrics} 100 15 2.00 76 {mlr3measures} 100 12 1.29 76

RMSE (N = 1e7)

In both tasks, {SLmetrics} remains extremely memory-efficient, even at large sample sizes.

Important

From {bench} documentation: Total amount of memory allocated by R while running the expression. Memory allocated outside the R heap, e.g. by malloc() or new directly is not tracked, take care to avoid misinterpreting the results if running code that may do this.

In its simplest form, {SLmetrics}-functions work directly with pairs of <numeric> vectors (for regression) or <factor> vectors (for classification). Below we demonstrate this on two well-known datasets, mtcars (regression) and iris (classification).

We first fit a linear model to predict mpg in the mtcars dataset, then compute the in-sample RMSE:

## Evaluate a linear model on mpg (mtcars)
model <- lm(mpg ~ ., data = mtcars)
rmse(mtcars$mpg, fitted(model))
#> [1] 2.146905

Now we recode the iris dataset into a binary problem (“virginica” vs. “others”) and fit a logistic regression. Then we generate predicted classes, compute the confusion matrix and summarize it.

## 1) recode iris
## to binary problem
iris$species_num <- as.numeric(
  iris$Species == "virginica"
)

## 2) fit the logistic
## regression
model <- glm(
  formula = species_num ~ Sepal.Length + Sepal.Width,
  data    = iris,
  family  = binomial(
    link = "logit"
  )
)

## 3) generate predicted
## classes
predicted <- factor(
  as.numeric(
    predict(model, type = "response") > 0.5
  ),
  levels = c(1,0),
  labels = c("Virginica", "Others")
)

## 4) generate actual
## values as factor
actual <- factor(
  x = iris$species_num,
  levels = c(1,0),
  labels = c("Virginica", "Others")
)
## 4) generate
## confusion matrix
summary(
  confusion_matrix <- cmatrix(
    actual    = actual,
    predicted = predicted
  )
)
#> Confusion Matrix (2 x 2) 
#> ================================================================================
#>           Virginica Others
#> Virginica        35     15
#> Others           14     86
#> ================================================================================
#> Overall Statistics (micro average)
#>  - Accuracy:          0.81
#>  - Balanced Accuracy: 0.78
#>  - Sensitivity:       0.81
#>  - Specificity:       0.81
#>  - Precision:         0.81

Important

OpenMP support in {SLmetrics} is experimental. Use it with caution, as performance gains and stability may vary based on your system configuration and workload.

You can control OpenMP usage within {SLmetrics} using openmp.on() and openmp.off() . Below are examples demonstrating how to enable and disable OpenMP:

## enable OpenMP
SLmetrics::openmp.on()
#> OpenMP enabled!

## disable OpenMP
SLmetrics::openmp.off()
#> OpenMP disabled!

To illustrate the impact of OpenMP on performance, consider the following benchmarks for calculating entropy on a 1,000,000 x 200 matrix over 100 iterations3.

📚 Entropy without OpenMP Iterations Runtime (sec) Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) 100 0.86 0 0 0

1e6 x 200 matrix without OpenMP

Iterations Runtime (sec) Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) 100 0.15 0 0 0

1e6 x 200 matrix with OpenMP

## install github release
pak::pak(
    pkg = "serkor1/SLmetrics@*release",
    ask = FALSE
)
Clone repository with submodules
git clone --recurse-submodules https://github.com/serkor1/SLmetrics.git
Installing with build tools
## install nightly build
pak::pak(
    pkg = ".",
    ask = FALSE
)

Please note that the {SLmetrics} project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

  1. The source code is available here and here.

  2. The source code is available here.

  3. The source code is available here.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4