This is ada_url
, a Python library for parsing and joining URLs.
Unlike the standard library's urllib.parse
module, this library is compliant with the WHATWG URL spec.
urlstring = "https://www.GOoglé.com/./path/../path2/" import ada_url # prints www.xn--googl-fsa.com, # the correctly parsed domain name according to WHATWG print(ada_url.URL(urlstring).hostname) # prints /path2/ # the correctly parsed pathname according to WHATWG print(ada_url.URL(urlstring).pathname) import urllib #prints www.googlé.com print(urllib.parse.urlparse(urlstring).hostname) #prints /./path/../path2/ print(urllib.parse.urlparse(urlstring).path)
This package exposes a URL
class that is intended to match the one described in the WHATWG URL spec.
>>> import ada_url >>> ada_url.URL('https://example.org/path/../file.txt') as urlobj: >>> urlobj.host = 'example.com' >>> new_url = urlobj.href >>> new_url 'https://example.com/file.txt'
It also provides some higher level functions for parsing and manipulating URLs.
>>> import ada_url >>> ada_url.check_url('https://example.org') True >>> ada_url.join_url( 'https://example.org/dir/child.txt', '../parent.txt' ) 'https://example.org/parent.txt' >>> ada_url.normalize_url('https://example.org/dir/../parent.txt') 'https://example.org/parent.txt' >>> ada_url.parse_url('https://user:pass@example.org:80/api?q=1#2') { 'href': 'https://user:pass@example.org:80/api?q=1#2', 'username': 'user', 'password': 'pass', 'protocol': 'https:', 'host': 'example.org:80', 'port': '80', 'hostname': 'example.org', 'pathname': '/api', 'search': '?q=1', 'hash': '#2' } >>> ada_url.replace_url('http://example.org:80', protocol='https:') 'https://example.org/'
You can find more documentation at Read the Docs.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4