santoku is a versatile cutting tool for R. It provides chop()
, a replacement for base::cut()
.
Install from r-universe:
install.packages("santoku", repos = c("https://hughjonesd.r-universe.dev",
"https://cloud.r-project.org"))
Or from CRAN:
install.packages("santoku")
Or get the development version from github:
# install.packages("remotes")
remotes::install_github("hughjonesd/santoku")
Advantages
Here are some advantages of santoku:
By default, chop()
always covers the whole range of the data, so you wonât get unexpected NA
values.
chop()
can handle single values as well as intervals. For example, chop(x, breaks = c(1, 2, 2, 3))
will create a separate factor level for values exactly equal to 2.
chop()
can handle many kinds of data, including numbers, dates and times, and units.
chop_*
functions create intervals in many ways, using quantiles of the data, standard deviations, fixed-width intervals, equal-sized groups, or pretty intervals for use in graphs.
Itâs easy to label intervals: use names for your breaks vector, or use a lbl_*
function to create interval notation like [1, 2)
, dash notation like 1-2
, or arbitrary styles using glue::glue()
.
tab_*
functions quickly chop data, then tabulate it.
These advantages make santoku especially useful for exploratory analysis, where you may not know the range of your data in advance.
Exampleschop
returns a factor:
chop(1:5, c(2, 4))
#> [1] [1, 2) [2, 4) [2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) [2, 4) [4, 5]
Include a number twice to match it exactly:
chop(1:5, c(2, 2, 4))
#> [1] [1, 2) {2} (2, 4) [4, 5] [4, 5]
#> Levels: [1, 2) {2} (2, 4) [4, 5]
Use names in breaks for labels:
chop(1:5, c(Low = 1, Mid = 2, High = 4))
#> [1] Low Mid Mid High High
#> Levels: Low Mid High
Or use lbl_*
functions:
chop(1:5, c(2, 4), labels = lbl_dash())
#> [1] 1â2 2â4 2â4 4â5 4â5
#> Levels: 1â2 2â4 4â5
Chop into fixed-width intervals:
chop_width(runif(10), 0.1)
#> [1] [0.1068, 0.2068) [0.6068, 0.7068) [0.9068, 1.007] [0.006763, 0.1068)
#> [5] [0.9068, 1.007] [0.3068, 0.4068) [0.6068, 0.7068) [0.1068, 0.2068)
#> [9] [0.4068, 0.5068) [0.5068, 0.6068)
#> 7 Levels: [0.006763, 0.1068) [0.1068, 0.2068) ... [0.9068, 1.007]
Or into fixed-size groups:
chop_n(1:10, 5)
#> [1] [1, 6) [1, 6) [1, 6) [1, 6) [1, 6) [6, 10] [6, 10] [6, 10] [6, 10]
#> [10] [6, 10]
#> Levels: [1, 6) [6, 10]
Chop dates by calendar month, then tabulate:
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
dates <- as.Date("2021-12-31") + 1:90
tab_width(dates, months(1), labels = lbl_discrete(fmt = "%d %b"))
#> 01 Janâ31 Jan 01 Febâ28 Feb 01 Marâ31 Mar
#> 31 28 31
For more information, see the vignette.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4