[Fast] Python bindings for Ada, a fast and WHATWG spec-compliant URL parser. This is the URL parser used in projects like Node.js.
Binary wheels are available for most platforms. If not available, a C++17-or-greater compiler will be required to build the underlying Ada library.
Unlike the standard library's urllib.parse
module, this library is compliant with the WHATWG URL specification.
import can_ada urlstring = "https://www.GOoglé.com/./path/../path2/" url = can_ada.parse(urlstring) # prints www.xn--googl-fsa.com, the correctly parsed domain name according # to WHATWG print(url.hostname) # prints /path2/, which is the correctly parsed pathname according to WHATWG print(url.pathname) import urllib.parse urlstring = "https://www.GOoglé.com/./path/../path2/" url = urllib.parse.urlparse(urlstring) # prints www.googlé.com print(url.hostname) # prints /./path/../path2/ print(url.path)
Parsing is simple:
from can_ada import parse url = parse("https://tkte.ch/search?q=canada") print(url.protocol) # https: print(url.host) # tkte.ch print(url.pathname) # /search print(url.search) # ?q=canada
You can also modify URLs:
from can_ada import parse url = parse("https://tkte.ch/search?q=canada") url.host = "google.com" url.search = "?q=canada&safe=off" print(url) # https://google.com/search?q=canada&safe=off
can_ada
also supports the URLSearchParams
API:
from can_ada import URLSearchParams params = URLSearchParams("q=canada&safe=off") params.append("page", "2") params.append("page", "3") params["q"] = "usa" print(params) # q=usa&safe=off&page=2&page=3 print(params.has("q")) # True print(params.get("page")) # 2 print(params.get_all("page")) # [2, 3] print(params.keys()) # ["q", "safe", "page"] print(params.values()) # ["usa", "off", "2", "3"]
We find that can_ada
is typically ~4x faster than urllib:
---------------------------------------------------------------------------------
Name (time in ms) Min Max Mean
---------------------------------------------------------------------------------
test_can_ada_parse 54.1304 (1.0) 54.6734 (1.0) 54.3699 (1.0)
test_ada_python_parse 107.5653 (1.99) 108.1666 (1.98) 107.7817 (1.98)
test_urllib_parse 251.5167 (4.65) 255.1327 (4.67) 253.2407 (4.66)
---------------------------------------------------------------------------------
To run the benchmarks locally, use:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4