A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/joshy/striprtf below:

joshy/striprtf: Stripping rtf to plain old text

This is a library to convert Rich Text Format (RTF) files to plain text files. A lot of medical documents are written in RTF format which is not ideal for parsing and further processing. This library converts it to plain old text.

from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string"
text = rtf_to_text(rtf)
print(text)

If you want to use a different encoding than cp1252 you can pass it via the encoding parameter. This is only taken into account if no explicit codepage has been set.

from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string in latin1"
text = rtf_to_text(rtf, encoding="latin-1")
print(text)

Sometimes UnicodeDecodingErrors can happen because of various reasons. In this case you can try to relax the encoding process like this:

from striprtf.striprtf import rtf_to_text
rtf = "some rtf encoded string"
text = rtf_to_text(rtf, errors="ignore")
print(text)

If you don't want to install or just try it out there is an online version available.

There is also a PostgreSQL version available from Raffael Mancini.

Pyth was not working for the rtf files I had. The next best thing was this gist: https://gist.github.com/gilsondev/7c1d2d753ddb522e7bc22511cfb08676

~~Very few additions where made, e.g. better formatting of tables. ~~

In the meantime some encodings bugs have been fixed. :-)


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4