Tidy Verbs for Dealing with Genomic Data Frames
DescriptionHandle genomic data within data frames just as you would with GRanges
. This packages provides method to deal with genomics intervals the âtidy-wayâ which makes it simpler to integrate in the the general data munging process. The API is inspired by the popular bedtools and the genome_join() method from the fuzzyjoin package.
install.packages("tidygenomics")
# Or to get the latest development version
devtools::install_github("const-ae/tidygenomics")
Documentation genome_intersect
Joins 2 data frames based on their genomic overlap. Unlike the genome_join
function it updates the boundaries to reflect the overlap of the regions.
x1 <- data.frame(id = 1:4,
chromosome = c("chr1", "chr1", "chr2", "chr2"),
start = c(100, 200, 300, 400),
end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4,
chromosome = c("chr1", "chr2", "chr2", "chr1"),
start = c(140, 210, 400, 300),
end = c(160, 240, 415, 320))
genome_intersect(x1, x2, by=c("chromosome", "start", "end"), mode="both")
1 chr1 1 140 150 4 chr2 3 400 415 genome_subtract
Subtracts one data frame from the other. This can be used to split the x data frame into smaller areas.
x1 <- data.frame(id = 1:4,
chromosome = c("chr1", "chr1", "chr2", "chr1"),
start = c(100, 200, 300, 400),
end = c(150, 250, 350, 450))
x2 <- data.frame(id = 1:4,
chromosome = c("chr1", "chr2", "chr1", "chr1"),
start = c(120, 210, 300, 400),
end = c(125, 240, 320, 415))
genome_subtract(x1, x2, by=c("chromosome", "start", "end"))
1 chr1 100 119 1 chr1 126 150 2 chr1 200 250 3 chr2 300 350 4 chr1 416 450 genome_join_closest
Joins 2 data frames based on their genomic location. If no exact overlap is found the next closest interval is used.
x1 <- data_frame(id = 1:4,
chr = c("chr1", "chr1", "chr2", "chr3"),
start = c(100, 200, 300, 400),
end = c(150, 250, 350, 450))
x2 <- data_frame(id = 1:4,
chr = c("chr1", "chr1", "chr1", "chr2"),
start = c(220, 210, 300, 400),
end = c(225, 240, 320, 415))
genome_join_closest(x1, x2, by=c("chr", "start", "end"), distance_column_name="distance", mode="left")
1 chr1 100 150 2 chr1 210 240 59 2 chr1 200 250 1 chr1 220 225 0 2 chr1 200 250 2 chr1 210 240 0 3 chr2 300 350 4 chr2 400 415 49 4 chr3 400 450 NA NA NA NA NA genome_cluster
Add a new column with the cluster if 2 intervals are overlapping or are within the max_distance
.
x1 <- data.frame(id = 1:4, bla=letters[1:4],
chromosome = c("chr1", "chr1", "chr2", "chr1"),
start = c(100, 120, 300, 260),
end = c(150, 250, 350, 450))
genome_cluster(x1, by=c("chromosome", "start", "end"))
1 a chr1 100 150 0 2 b chr1 120 250 0 3 c chr2 300 350 2 4 d chr1 260 450 1
genome_cluster(x1, by=c("chromosome", "start", "end"), max_distance=10)
1 a chr1 100 150 0 2 b chr1 120 250 0 3 c chr2 300 350 1 4 d chr1 260 450 0 genome_complement
Calculates the complement of a genomic region.
x1 <- data.frame(id = 1:4,
chromosome = c("chr1", "chr1", "chr2", "chr1"),
start = c(100, 200, 300, 400),
end = c(150, 250, 350, 450))
genome_complement(x1, by=c("chromosome", "start", "end"))
chr1 1 99 chr1 151 199 chr1 251 399 chr2 1 299 genome_join
Classical join function based on the overlap of the interval. Implemented and maintained in the fuzzyjoin package and documented here only for completeness.
x1 <- data_frame(id = 1:4,
chr = c("chr1", "chr1", "chr2", "chr3"),
start = c(100, 200, 300, 400),
end = c(150, 250, 350, 450))
x2 <- data_frame(id = 1:4,
chr = c("chr1", "chr1", "chr1", "chr2"),
start = c(220, 210, 300, 400),
end = c(225, 240, 320, 415))
fuzzyjoin::genome_join(x1, x2, by=c("chr", "start", "end"), mode="inner")
2 chr1 200 250 1 chr1 220 225 2 chr1 200 250 2 chr1 210 240
fuzzyjoin::genome_join(x1, x2, by=c("chr", "start", "end"), mode="left")
1 chr1 100 150 NA NA NA NA 2 chr1 200 250 1 chr1 220 225 2 chr1 200 250 2 chr1 210 240 3 chr2 300 350 NA NA NA NA 4 chr3 400 450 NA NA NA NA
fuzzyjoin::genome_join(x1, x2, by=c("chr", "start", "end"), mode="anti")
1 chr1 100 150 3 chr2 300 350 4 chr3 400 450 Inspiration
If you have any additional questions or encounter issues please raise them on the github page.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4