{SLmetrics} is a lightweight R
package written in C++
and {Rcpp} for memory-efficient and lightning-fast machine learning performance evaluation; it’s like using a supercharged {yardstick} but without the risk of soft to super-hard deprecations. {SLmetrics} covers both regression and classification metrics and provides (almost) the same array of metrics as {scikit-learn} and {PyTorch} all without {reticulate} and the Python compile-run-(crash)-debug cycle.
Depending on the mood and alignment of planets {SLmetrics} stands for Supervised Learning metrics, or Statistical Learning metrics. If {SLmetrics} catches on, the latter will be the core philosophy and include unsupervised learning metrics. If not, then it will remain a {pkg} for Supervised Learning metrics, and a sandbox for me to develop my C++
skills.
Below you’ll find instructions to install {SLmetrics} and get started with your first metric, the Root Mean Squared Error (RMSE).
## install latest CRAN build install.packages("SLmetrics")
Below is a minimal example demonstrating how to compute both unweighted and weighted RMSE.
library(SLmetrics) actual <- c(10.2, 12.5, 14.1) predicted <- c(9.8, 11.5, 14.2) weights <- c(0.2, 0.5, 0.3) cat( "Root Mean Squared Error", rmse( actual = actual, predicted = predicted, ), "Root Mean Squared Error (weighted)", weighted.rmse( actual = actual, predicted = predicted, w = weights ), sep = "\n" ) #> Root Mean Squared Error #> 0.6244998 #> Root Mean Squared Error (weighted) #> 0.7314369
That’s all! Now you can explore the rest of this README for in-depth usage, performance comparisons, and more details about {SLmetrics}.
Machine learning can be a complicated task; the steps from feature engineering to model deployment require carefully measured actions and decisions. One low-hanging fruit to simplify this process is performance evaluation.
At its core, performance evaluation is essentially just comparing two vectors - a programmatically and, at times, mathematically trivial step in the machine learning pipeline, but one that can become complicated due to:
{SLmetrics} solves these issues by being:
C++
and {Rcpp}Performance evaluation should be plug-and-play and “just work” out of the box - there’s no need to worry about quasiquations, dependencies, deprecations, or variations of the same functions relative to their arguments when using {SLmetrics}.
One, obviously, can’t build an R
-package on C++
and {Rcpp} without a proper pissing contest at the urinals - below is a comparison in execution time and memory efficiency of two simple cases that any {pkg} should be able to handle gracefully; computing a 2 x 2 confusion matrix and computing the RMSE1.
As shown in the chart, {SLmetrics} maintains consistently low(er) execution times across different sample sizes.
Below are the results for garbage collections and total memory allocations when computing a 2×2 confusion matrix (N = 1e7) and RMSE (N = 1e7) 2. Notice that {SLmetrics} requires no GC calls for these operations.
Iterations Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) {SLmetrics} 100 0 0.00 0 {yardstick} 100 190 4.44 381 {MLmetrics} 100 186 4.50 381 {mlr3measures} 100 371 3.93 9162 x 2 Confusion Matrix (N = 1e7)
Iterations Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) {SLmetrics} 100 0 0.00 0 {yardstick} 100 149 4.30 420 {MLmetrics} 100 15 2.00 76 {mlr3measures} 100 12 1.29 76RMSE (N = 1e7)
In both tasks, {SLmetrics} remains extremely memory-efficient, even at large sample sizes.
Important
From {bench} documentation: Total amount of memory allocated by R while running the expression. Memory allocated outside the R heap, e.g. by malloc()
or new directly is not tracked, take care to avoid misinterpreting the results if running code that may do this.
In its simplest form, {SLmetrics}-functions work directly with pairs of <numeric>
vectors (for regression) or <factor>
vectors (for classification). Below we demonstrate this on two well-known datasets, mtcars
(regression) and iris
(classification).
We first fit a linear model to predict mpg
in the mtcars
dataset, then compute the in-sample RMSE:
## Evaluate a linear model on mpg (mtcars) model <- lm(mpg ~ ., data = mtcars) rmse(mtcars$mpg, fitted(model)) #> [1] 2.146905
Now we recode the iris
dataset into a binary problem (“virginica” vs. “others”) and fit a logistic regression. Then we generate predicted classes, compute the confusion matrix and summarize it.
## 1) recode iris ## to binary problem iris$species_num <- as.numeric( iris$Species == "virginica" ) ## 2) fit the logistic ## regression model <- glm( formula = species_num ~ Sepal.Length + Sepal.Width, data = iris, family = binomial( link = "logit" ) ) ## 3) generate predicted ## classes predicted <- factor( as.numeric( predict(model, type = "response") > 0.5 ), levels = c(1,0), labels = c("Virginica", "Others") ) ## 4) generate actual ## values as factor actual <- factor( x = iris$species_num, levels = c(1,0), labels = c("Virginica", "Others") )
## 4) generate ## confusion matrix summary( confusion_matrix <- cmatrix( actual = actual, predicted = predicted ) ) #> Confusion Matrix (2 x 2) #> ================================================================================ #> Virginica Others #> Virginica 35 15 #> Others 14 86 #> ================================================================================ #> Overall Statistics (micro average) #> - Accuracy: 0.81 #> - Balanced Accuracy: 0.78 #> - Sensitivity: 0.81 #> - Specificity: 0.81 #> - Precision: 0.81
Important
OpenMP support in {SLmetrics} is experimental. Use it with caution, as performance gains and stability may vary based on your system configuration and workload.
You can control OpenMP usage within {SLmetrics} using openmp.on()
and openmp.off()
. Below are examples demonstrating how to enable and disable OpenMP:
## enable OpenMP SLmetrics::openmp.on() #> OpenMP enabled! ## disable OpenMP SLmetrics::openmp.off() #> OpenMP disabled!
To illustrate the impact of OpenMP on performance, consider the following benchmarks for calculating entropy on a 1,000,000 x 200 matrix over 100 iterations3.
📚 Entropy without OpenMP Iterations Runtime (sec) Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) 100 0.86 0 0 01e6 x 200 matrix without OpenMP
Iterations Runtime (sec) Garbage Collections [gc()] gc() pr. second Memory Allocation (MB) 100 0.15 0 0 01e6 x 200 matrix with OpenMP
## install github release pak::pak( pkg = "serkor1/SLmetrics@*release", ask = FALSE )Clone repository with submodules
git clone --recurse-submodules https://github.com/serkor1/SLmetrics.gitInstalling with build tools
## install nightly build pak::pak( pkg = ".", ask = FALSE )
Please note that the {SLmetrics} project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4