The SubRip file format is described on the
Matroska
multimedia container format website as "perhaps the most basic of all subtitle formats."SubRip (SubRip Text)
files are named with the extension.srt
, and contain formatted lines of plain text in groups separated by a blank line. Subtitles are numbered sequentially, starting at 1. The timecode format used is hours:minutes:seconds,milliseconds with time units fixed to two zero-padded digits and fractions fixed to three zero-padded digits (00:00:00,000). The fractional separator used is the comma, since the program was written in France.
How to load data from subtitle (.srt
) files
Please, download the example .srt file from here.
%pip install --upgrade --quiet pysrt
from langchain_community.document_loaders import SRTLoader
loader = SRTLoader(
"example_data/Star_Wars_The_Clone_Wars_S06E07_Crisis_at_the_Heart.srt"
)
docs[0].page_content[:100]
'<i>Corruption discovered\nat the core of the Banking Clan!</i> <i>Reunited, Rush Clovis\nand Senator A'
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4