html2text is a Python script that converts a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).
Usage: html2text [filename [encoding]]
--version
Show program's version number and exit -h
, --help
Show this help message and exit --ignore-links
Don't include any formatting for links --escape-all
Escape all special characters. Output is less readable, but avoids corner case formatting issues. --reference-links
Use reference links instead of links to create markdown --mark-code
Mark preformatted and code blocks with [code]...[/code]
For a complete list of options see the docs
Or you can use it from within Python
:
>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.
Or with some configuration options:
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!
Originally written by Aaron Swartz. This code is distributed under the GPLv3.
How to installhtml2text
is available on pypi https://pypi.org/project/html2text/
$ pip install html2textDevelopment How to run unit tests
$ tox
To see the coverage results:
$ coverage html
then open the ./htmlcov/index.html
file in your browser.
The CI runs several linting steps, including:
To make sure the code passes the CI linting steps, run:
$ tox -e pre-commitDocumentation
Documentation lives here
Download filesDownload the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution Built Distribution File detailsDetails for the file html2text-2025.4.15.tar.gz
.
948a645f8f0bc3abe7fd587019a2197a12436cd73d0d4908af95bfc8da337588
MD5 d4fb2b23350c6fff96dd1f47e35acb56
BLAKE2b-256 f827e158d86ba1e82967cc2f790b0cb02030d4a8bef58e0c79a8590e9678107f
See more details on using hashes here.
ProvenanceThe following attestation bundles were made for html2text-2025.4.15.tar.gz
:
Details for the file html2text-2025.4.15-py3-none-any.whl
.
00569167ffdab3d7767a4cdf589b7f57e777a5ed28d12907d8c58769ec734acc
MD5 04b36d8960a922593d1f49565abb6073
BLAKE2b-256 1d841a0f9555fd5f2b1c924ff932d99b40a0f8a6b12f6dd625e2a47f415b00ea
See more details on using hashes here.
ProvenanceThe following attestation bundles were made for html2text-2025.4.15-py3-none-any.whl
:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4