RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://pypi.python.org/pypi/readability-lxml below:

readability-lxml · PyPI

Project description

python-readability

Given an HTML document, extract and clean up the main body text and title.

This is a Python port of a Ruby port of arc90's Readability project.

Installation

It's easy using pip, just run:

$ pip install readability-lxml

As an alternative, you may also use conda to install, just run:

$ conda install -c conda-forge readability-lxml

Usage

>>> import requests
>>> from readability import Document

>>> response = requests.get('http://example.com')
>>> doc = Document(response.content)
>>> doc.title()
'Example Domain'

>>> doc.summary()
"""<html><body><div><body id="readabilityBody">\n<div>\n    <h1>Example Domain</h1>\n
<p>This domain is established to be used for illustrative examples in documents. You may
use this\n    domain in examples without prior coordination or asking for permission.</p>
\n    <p><a href="http://www.iana.org/domains/example">More information...</a></p>\n</div>
\n</body>\n</div></body></html>"""

Change Log

0.8.4 Better CJK support, thanks @cdhigh
0.8.3.1 Support for python 3.8 - 3.13
0.8.3 We can now save all images via keep_all_images=True (default is to save 1 main image), thanks @botlabsDev
0.8.2 Added article author(s) (thanks @mattblaha)
0.8.1 Fixed processing of non-ascii HTMLs via regexps.
0.8 Replaced XHTML output with HTML5 output in summary() call.
0.7.1 Support for Python 3.7 . Fixed a slowdown when processing documents with lots of spaces.
0.7 Improved HTML5 tags handling. Fixed stripping unwanted HTML nodes (only first matching node was removed before).
0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3 - 3.6
0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and 3.4
0.4 Added Videos loading and allowed more images per paragraph
0.3 Added Document.encoding, positive_keywords and negative_keywords

Licensing

This code is under the Apache License 2.0 license.

Thanks to

Latest readability.js
Ruby port by starrhorne and iterationlabs
Python port by gfxmonk
Decruft effort to move to lxml
"BR to P" fix from readability.js which improves quality for smaller texts
Github users contributions.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution Built Distribution File details

Details for the file readability_lxml-0.8.4.1.tar.gz.

File metadata

Download URL: readability_lxml-0.8.4.1.tar.gz
Upload date: May 3, 2025
Size: 22.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes Hashes for readability_lxml-0.8.4.1.tar.gz Algorithm Hash digest SHA256 9d2924f5942dd7f37fb4da353263b22a3e877ccf922d0e45e348e4177b035a53 MD5 14af137865e8220ac2af2fcabf5ea931 BLAKE2b-256 553edc87d97532ddad58af786ec89c7036182e352574c1cba37bf2bf783d2b15

See more details on using hashes here.

File details

Details for the file readability_lxml-0.8.4.1-py3-none-any.whl.

File metadata File hashes Hashes for readability_lxml-0.8.4.1-py3-none-any.whl Algorithm Hash digest SHA256 874c0cea22c3bf2b78c7f8df831bfaad3c0a89b7301d45a188db581652b4b465 MD5 993c47451250d45104f41a4886e1ed77 BLAKE2b-256 c7752cc58965097e351415af420be81c4665cf80da52a17ef43c01ffbe2caf91

See more details on using hashes here.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4