The goal of srt is to read SubRip text files as tabular data for easy analysis and manipulation.
You can install the development version of srt from GitHub with:
# install.packages("remotes") remotes::install_github("k5cents/srt")
The .srt
standard is used to identify the subtitle components for the columns of a data frame:
-->
and the time it should disappearlibrary(srt) library(tidyverse) library(tidytext) srt <- srt_example()
#> 1
#> 00:01:25,210 --> 00:01:28,004
#> I owe everything to George Bailey.
#>
#> 2
#> 00:01:28,422 --> 00:01:30,298
#> Help him, dear Father.
#>
#> 3
#> 00:01:30,674 --> 00:01:33,718
#> Joseph, Jesus and Mary,
These subtitle files are parsed as data frames with separate columns.
(wonderful_life <- read_srt(path = srt, collapse = " ")) #> # A tibble: 2,268 × 4 #> n start end subtitle #> <int> <dbl> <dbl> <chr> #> 1 1 85.2 88.0 I owe everything to George Bailey. #> 2 2 88.4 90.3 Help him, dear Father. #> 3 3 90.7 93.7 Joseph, Jesus and Mary, #> 4 4 93.8 96.4 help my friend Mr. Bailey. #> 5 5 96.9 99.5 Help my son George tonight. #> 6 6 100. 102. He never thinks about himself, God. #> 7 7 102. 104. That's why he's in trouble. #> 8 8 104. 105. George is a good guy. #> 9 9 106. 108. Give him a break, God. #> 10 10 108. 110. I love him, dear Lord. #> # ℹ 2,258 more rows
This makes it easy to perform various text analysis on the subtitles.
wonderful_life %>% unnest_tokens(word, subtitle) %>% count(word, sort = TRUE) %>% anti_join(stop_words) #> # A tibble: 1,651 × 2 #> word n #> <chr> <int> #> 1 george 216 #> 2 mary 85 #> 3 bailey 74 #> 4 hey 56 #> 5 harry 53 #> 6 yeah 50 #> 7 gonna 45 #> 8 potter 45 #> 9 home 34 #> 10 money 34 #> # ℹ 1,641 more rows
Or uniformly manipulate the numeric time stamps:
wonderful_life <- srt_shift(wonderful_life, seconds = 9.99)
The subtitle data frames can be easily re-written as valid SubRip files.
tmp <- tempfile(fileext = ".srt") write_srt(wonderful_life, tmp, wrap = FALSE)
#> 1
#> 00:01:35,200 --> 00:01:37,994
#> I owe everything to George Bailey.
#>
#> 2
#> 00:01:38,412 --> 00:01:40,288
#> Help him, dear Father.
#>
#> 3
#> 00:01:40,664 --> 00:01:43,708
#> Joseph, Jesus and Mary,
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4