A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/tdhock/nc below:

tdhock/nc: Named capture regular expressions for text parsing and data reshaping

User-friendly functions for extracting a data table (row for each match, column for each group) from non-tabular text data using regular expressions, and for melting/reshaping columns that match a regular expression. Please read and cite my related R Journal papers, if you use this code!

Quick demo of matching functions
fruit.vec <- c("granny smith apple", "blood orange and yellow banana")
fruit.pattern <- list(type=".*?", " ", fruit="orange|apple|banana")
nc::capture_first_vec(fruit.vec, fruit.pattern)
#>            type  fruit
#> 1: granny smith  apple
#> 2:        blood orange
nc::capture_all_str(fruit.vec, fruit.pattern)
#>            type  fruit
#> 1: granny smith  apple
#> 2:        blood orange
#> 3:   and yellow banana
Quick demo of reshaping functions
(one.iris <- iris[1,])
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
nc::capture_melt_single(one.iris, part=".*", "[.]", dim=".*")
#>    Species   part    dim value
#> 1:  setosa  Sepal Length   5.1
#> 2:  setosa  Sepal  Width   3.5
#> 3:  setosa  Petal Length   1.4
#> 4:  setosa  Petal  Width   0.2
nc::capture_melt_multiple(one.iris, part=".*", "[.]", column=".*")
#>    Species   part Length Width
#> 1:  setosa  Petal    1.4   0.2
#> 2:  setosa  Sepal    5.1   3.5
nc::capture_melt_multiple(one.iris, column=".*", "[.]", dim=".*")
#>    Species    dim Petal Sepal
#> 1:  setosa Length   1.4   5.1
#> 2:  setosa  Width   0.2   3.5
install.packages("nc")
## or:
if(!require(devtools))install.packages("devtools")
devtools::install_github("tdhock/nc")

Watch the screencast tutorial videos!

The main functions provided in nc are:

Subject nc function Similar to And Single string capture_all_str stringr::str_match_all rex::re_matches Character vector capture_first_vec stringr::str_match rex::re_matches Data frame chr cols capture_first_df tidyr::extract/separate_wider_regex data.table::tstrsplit Data frame col names capture_melt_single tidyr::pivot_longer data.table::melt Data frame col names capture_melt_multiple tidyr::pivot_longer data.table::melt File paths capture_first_glob arrow::open_dataset

By default, nc uses PCRE. Other options include ICU and RE2.

To tell nc that you would like to use a certain engine,

Every function also has an engine argument, e.g.

nc::capture_first_vec(
  "foo a\U0001F60E# bar",
  before=".*?",
  emoji="\\p{EMOJI_Presentation}",
  after=".*",
  engine="ICU")
#>   before emoji after
#> 1  foo a     😎 # bar

For an detailed comparison of regex C libraries in R (ICU, PCRE, TRE, RE2), see my R journal (2019) paper about namedCapture.

The nc reshaping functions provide functionality similar to packages tidyr, stats, data.table, reshape, reshape2, cdata, utils, etc. The main difference is that nc::capture_melt_* support named capture regular expressions with type conversion, which (1) makes it easier to create/maintain a complex regex, and (2) results in less repetition in user code. For a detailed comparison, see my R Journal (2021) paper about nc.

Below I list the main differences between the functions in nc and other analogous R functions:


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4