The goal of unjoin is to provide unjoin
for data frames. This is exactly part of what tidyr::nest
does, but with two differences:
main
and data
main
with the rows in data
.Install unjoin from CRAN:
install.packages("unjoin")
You can install the development unjoin from github with:
# install.packages("devtools") devtools::install_github("hypertidy/unjoin")
This is a basic example which shows you how to unjoin a data frame.
library(unjoin) unjoin(iris) #> $.idx0 #> # A tibble: 1 x 1 #> .idx0 #> <int> #> 1 1 #> #> $data #> # A tibble: 150 x 6 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species .idx0 #> <dbl> <dbl> <dbl> <dbl> <fct> <int> #> 1 5.1 3.5 1.4 0.2 setosa 1 #> 2 4.9 3 1.4 0.2 setosa 1 #> 3 4.7 3.2 1.3 0.2 setosa 1 #> 4 4.6 3.1 1.5 0.2 setosa 1 #> 5 5 3.6 1.4 0.2 setosa 1 #> 6 5.4 3.9 1.7 0.4 setosa 1 #> 7 4.6 3.4 1.4 0.3 setosa 1 #> 8 5 3.4 1.5 0.2 setosa 1 #> 9 4.4 2.9 1.4 0.2 setosa 1 #> 10 4.9 3.1 1.5 0.1 setosa 1 #> # … with 140 more rows #> #> attr(,"class") #> [1] "unjoin" library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union iris %>% unjoin(Species) #> $.idx0 #> # A tibble: 3 x 2 #> Species .idx0 #> <fct> <int> #> 1 setosa 1 #> 2 versicolor 2 #> 3 virginica 3 #> #> $data #> # A tibble: 150 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width .idx0 #> <dbl> <dbl> <dbl> <dbl> <int> #> 1 5.1 3.5 1.4 0.2 1 #> 2 4.9 3 1.4 0.2 1 #> 3 4.7 3.2 1.3 0.2 1 #> 4 4.6 3.1 1.5 0.2 1 #> 5 5 3.6 1.4 0.2 1 #> 6 5.4 3.9 1.7 0.4 1 #> 7 4.6 3.4 1.4 0.3 1 #> 8 5 3.4 1.5 0.2 1 #> 9 4.4 2.9 1.4 0.2 1 #> 10 4.9 3.1 1.5 0.1 1 #> # … with 140 more rows #> #> attr(,"class") #> [1] "unjoin" iris %>% unjoin(Species, Petal.Width) #> $.idx0 #> # A tibble: 27 x 3 #> Species Petal.Width .idx0 #> <fct> <dbl> <int> #> 1 setosa 0.2 2 #> 2 setosa 0.4 4 #> 3 setosa 0.3 3 #> 4 setosa 0.1 1 #> 5 setosa 0.5 5 #> 6 setosa 0.6 6 #> 7 versicolor 1.4 11 #> 8 versicolor 1.5 12 #> 9 versicolor 1.3 10 #> 10 versicolor 1.6 13 #> # … with 17 more rows #> #> $data #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length .idx0 #> <dbl> <dbl> <dbl> <int> #> 1 5.1 3.5 1.4 2 #> 2 4.9 3 1.4 2 #> 3 4.7 3.2 1.3 2 #> 4 4.6 3.1 1.5 2 #> 5 5 3.6 1.4 2 #> 6 5.4 3.9 1.7 4 #> 7 4.6 3.4 1.4 3 #> 8 5 3.4 1.5 2 #> 9 4.4 2.9 1.4 2 #> 10 4.9 3.1 1.5 1 #> # … with 140 more rows #> #> attr(,"class") #> [1] "unjoin"
This is used to build topological data structures, with a kind of inside-out version of a nested data frame. Whether it’s of broader use is unclear.
There is a record here of some of the thinking that led to unjoin: https://github.com/r-gris/babelfish
The function unjoin
replaces the method here: http://rpubs.com/cyclemumner/iout_nest
(d2 <- iris %>% unjoin(Species, Petal.Width)) #> $.idx0 #> # A tibble: 27 x 3 #> Species Petal.Width .idx0 #> <fct> <dbl> <int> #> 1 setosa 0.2 2 #> 2 setosa 0.4 4 #> 3 setosa 0.3 3 #> 4 setosa 0.1 1 #> 5 setosa 0.5 5 #> 6 setosa 0.6 6 #> 7 versicolor 1.4 11 #> 8 versicolor 1.5 12 #> 9 versicolor 1.3 10 #> 10 versicolor 1.6 13 #> # … with 17 more rows #> #> $data #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length .idx0 #> <dbl> <dbl> <dbl> <int> #> 1 5.1 3.5 1.4 2 #> 2 4.9 3 1.4 2 #> 3 4.7 3.2 1.3 2 #> 4 4.6 3.1 1.5 2 #> 5 5 3.6 1.4 2 #> 6 5.4 3.9 1.7 4 #> 7 4.6 3.4 1.4 3 #> 8 5 3.4 1.5 2 #> 9 4.4 2.9 1.4 2 #> 10 4.9 3.1 1.5 1 #> # … with 140 more rows #> #> attr(,"class") #> [1] "unjoin"
We can chain unjoins together, but make sure not to repeat a key_col
in one of these.
unjoin(iris, Species, key_col = "vertex") %>% unjoin(Petal.Width, vertex, key_col = "branch") #> $vertex #> # A tibble: 3 x 2 #> Species vertex #> <fct> <int> #> 1 setosa 1 #> 2 versicolor 2 #> 3 virginica 3 #> #> $branch #> # A tibble: 27 x 3 #> Petal.Width vertex branch #> <dbl> <int> <int> #> 1 0.2 1 2 #> 2 0.4 1 4 #> 3 0.3 1 3 #> 4 0.1 1 1 #> 5 0.5 1 5 #> 6 0.6 1 6 #> 7 1.4 2 11 #> 8 1.5 2 13 #> 9 1.3 2 10 #> 10 1.6 2 15 #> # … with 17 more rows #> #> $data #> # A tibble: 150 x 4 #> Sepal.Length Sepal.Width Petal.Length branch #> <dbl> <dbl> <dbl> <int> #> 1 5.1 3.5 1.4 2 #> 2 4.9 3 1.4 2 #> 3 4.7 3.2 1.3 2 #> 4 4.6 3.1 1.5 2 #> 5 5 3.6 1.4 2 #> 6 5.4 3.9 1.7 4 #> 7 4.6 3.4 1.4 3 #> 8 5 3.4 1.5 2 #> 9 4.4 2.9 1.4 2 #> 10 4.9 3.1 1.5 1 #> # … with 140 more rows #> #> attr(,"class") #> [1] "unjoin"
Also, there’s no escape hatch here, you can’t “unjoin” your way to normal nirvana, each unjoin needs to carry the last unjoin-key with it, and you just end up with the big link table with no attributes. It needs some kind of group-semantic to cut the chain.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4