A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ropensci-review-tools/pkgmatch below:

ropensci-review-tools/pkgmatch: Find R packages matching either descriptions or other R packages

A tool that uses language models to help find R packages, by matching packages either to a text description, or to entire packages. Can find matching packages either from rOpenSci’s suite of packages, or from all packages currently on CRAN.

This package relies on a locally-running instance of ollama. Procedures for setting that up are described in a separate vignette (vignette("ollama", package = "pkgmatch")). Although some functionality of this package may be used without ollama, the main functions require ollama to be installed.

Once ollama is running, the easiest way to install this package is via the associated r-universe. As shown there, simply enable the universe with

options (repos = c (
    ropenscireviewtools = "https://ropensci-review-tools.r-universe.dev",
    CRAN = "https://cloud.r-project.org"
))

And then install the usual way with,

install.packages ("pkgmatch")

Alternatively, the package can be installed by first installing either the remotes or pak packages and running one of the following lines:

remotes::install_github ("ropensci-review-tools/pkgmatch")
pak::pkg_install ("ropensci-review-tools/pkgmatch")

The package can then loaded for use with

The ollama_check() function can then be used to confirm that ollama is up and running as expected.

Using the pkgmatch package

The ‘pkgmatch’ package takes input either from a text description or local path to an R package, and finds matching packages based on both Language Model (LM) embeddings, and more traditional text and code matching algorithms.

The package has two main functions:

The following code demonstrates how these functions work, first matching general text strings packages from rOpenSci:

input <- "
Packages for analysing evolutionary trees, with a particular focus
on visualising inter-relationships among distinct trees.
"
pkgmatch_similar_pkgs (input, corpus = "ropensci")
## [1] "phylogram"    "phruta"       "rotl"         "taxa"         "lingtypology"

The corpus parameter must be specified as one of “ropensci” or “cran” (case-insensitive). The CRAN corpus is much larger than the rOpenSci corpus, and matching for corpus = "cran" will generally take notably longer.

Websites of packages returned by the pkgmatch_similar_pkgs() function can be automatically opened, either by calling the function with browse = TRUE, or by storing the return value of the pkgmatch_similar_pkgs() function as an object and passing that to the pkgmatch_browse() function.

The input parameter can also specify an entire package, either as a local path to a package directory, or the name of an installed package. To demonstrate that, the following code downloads a .tar.gz file of the httr2 package from CRAN:

pkg <- "httr2"
p <- available.packages () |>
    data.frame () |>
    dplyr::filter (Package == pkg)
url_base <- "https://cran.r-project.org/src/contrib/"
url <- paste0 (url_base, p$Package, "_", p$Version, ".tar.gz")
path <- fs::path (fs::path_temp (), basename (url))
download.file (url, destfile = path, quiet = TRUE)

The path to that package (in this case as a compressed tarball) can then be passed to the pkgmatch_similar_pkgs() function:

pkgmatch_similar_pkgs (path, corpus = "cran")
## $text
## [1] "luca"          "httr"          "tapLock"       "scatterplot3d"
## [5] "AzureAuth"    
## 
## $code
## [1] "paperplanes" "httr"        "prenoms"     "tapLock"     "AzureAuth"

The result includes the top five matches based from both text and code of the input package. The input package itself is the second-placed match in both cases, and not the top match. This happens because embeddings are “chunked” or randomly permuted, and because matches are statistical and not deterministic. Nevertheless, the only two packages which appear in the top five matches on both lists are the package itself, httr2, and the very closely related, httptest2 package for testing output of httr2. See the vignette on Why are the results not what I expect? for more detail on how matches are generated.

There is an additional function to find functions within packages which best match a text description.

input <- "A function to label a set of geographic coordinates"
pkgmatch_similar_fns (input)
## [1] "GSODR::nearest_stations"          "refsplitr::plot_addresses_points"
## [3] "slopes::elevation_extract"        "rnoaa::meteo_nearby_stations"    
## [5] "charlatan::CoordinateProvider"
input <- "Identify genetic sequences matching a given input fragment"
pkgmatch_similar_fns (input)
## [1] "charlatan::SequenceProvider" "beastier::is_alignment"     
## [3] "charlatan::ch_gene_sequence" "beautier::is_phylo"         
## [5] "textreuse::align_local"

Setting browse = TRUE will then open the documentation pages corresponding to those best-matching functions.

The pkgmatch package includes the following vignettes:

All contributions to this project are gratefully acknowledged using the allcontributors package following the allcontributors specification. Contributions of any kind are welcome!


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4