Given an HTML document, extract and clean up the main body text and title.
This is a Python port of a Ruby port of arc90's Readability project.
InstallationIt's easy using pip
, just run:
$ pip install readability-lxml
As an alternative, you may also use conda to install, just run:
$ conda install -c conda-forge readability-lxmlUsage
>>> import requests >>> from readability import Document >>> response = requests.get('http://example.com') >>> doc = Document(response.content) >>> doc.title() 'Example Domain' >>> doc.summary() """<html><body><div><body id="readabilityBody">\n<div>\n <h1>Example Domain</h1>\n <p>This domain is established to be used for illustrative examples in documents. You may use this\n domain in examples without prior coordination or asking for permission.</p> \n <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div> \n</body>\n</div></body></html>"""Change Log
This code is under the Apache License 2.0 license.
Thanks toDownload the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution Built Distribution File detailsDetails for the file readability_lxml-0.8.4.1.tar.gz
.
9d2924f5942dd7f37fb4da353263b22a3e877ccf922d0e45e348e4177b035a53
MD5 14af137865e8220ac2af2fcabf5ea931
BLAKE2b-256 553edc87d97532ddad58af786ec89c7036182e352574c1cba37bf2bf783d2b15
See more details on using hashes here.
File detailsDetails for the file readability_lxml-0.8.4.1-py3-none-any.whl
.
874c0cea22c3bf2b78c7f8df831bfaad3c0a89b7301d45a188db581652b4b465
MD5 993c47451250d45104f41a4886e1ed77
BLAKE2b-256 c7752cc58965097e351415af420be81c4665cf80da52a17ef43c01ffbe2caf91
See more details on using hashes here.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4