Python's standard urllib and urlparse modules provide a number of URL related functions, but using these functions to perform common URL operations proves tedious. Furl makes parsing and manipulating URLs easy.
Furl is well tested, Unlicensed in the public domain, and supports Python 3 and PyPy3.
π₯ Furl is looking for a lead contributor and maintainer. Would you love to lead furl, and making working with URLs a joy for everyone in Python? Please reach out and let me know! π
Code time: Paths and query arguments are easy. Really easy.
>>> from furl import furl >>> f = furl('http://www.google.com/?one=1&two=2') >>> f /= 'path' >>> del f.args['one'] >>> f.args['three'] = '3' >>> f.url 'http://www.google.com/path?two=2&three=3'
Or use furl's inline modification methods.
>>> furl('http://www.google.com/?one=1').add({'two':'2'}).url 'http://www.google.com/?one=1&two=2' >>> furl('http://www.google.com/?one=1&two=2').set({'three':'3'}).url 'http://www.google.com/?three=3' >>> furl('http://www.google.com/?one=1&two=2').remove(['one']).url 'http://www.google.com/?two=2'
Encoding is handled for you. Unicode, too.
>>> f = furl('http://www.google.com/') >>> f.path = 'some encoding here' >>> f.args['and some encoding'] = 'here, too' >>> f.url 'http://www.google.com/some%20encoding%20here?and+some+encoding=here,+too' >>> f.set(host=u'γγ‘γ€γ³.γγΉγ', path=u'Π΄ΠΆΠΊ', query=u'β=βΊ') >>> f.url 'http://xn--eckwd4c7c.xn--zckzah/%D0%B4%D0%B6%D0%BA?%E2%98%83=%E2%98%BA'
Fragments also have a path and a query.
>>> f = furl('http://www.google.com/') >>> f.fragment.path.segments = ['two', 'directories'] >>> f.fragment.args = {'one': 'argument'} >>> f.url 'http://www.google.com/#two/directories?one=argument'Installation
Installing furl with pip is easy.
$ pip install furl
API
furl objects let you access and modify the various components of a URL.
scheme://username:password@host:port/path?query#fragment
//www.google.com
.http
).?
separator.scheme, username, password, and host are strings or None. port is an integer or None.
>>> f = furl('http://user:pass@www.google.com:99/') >>> f.scheme, f.username, f.password, f.host, f.port ('http', 'user', 'pass', 'www.google.com', 99)
furl infers the default port for common schemes.
>>> f = furl('https://secure.google.com/') >>> f.port 443 >>> f = furl('unknown://www.google.com/') >>> print(f.port) None
netloc is the string combination of username, password, host, and port, not including port if it's None or the default port for the provided scheme.
>>> furl('http://www.google.com/').netloc 'www.google.com' >>> furl('http://www.google.com:99/').netloc 'www.google.com:99' >>> furl('http://user:pass@www.google.com:99/').netloc 'user:pass@www.google.com:99'
origin is the string combination of scheme, host, and port, not including port if it's None or the default port for the provided scheme.
>>> furl('http://www.google.com/').origin 'http://www.google.com' >>> furl('http://www.google.com:99/').origin 'http://www.google.com:99'Path
URL paths in furl are Path objects that have segments, a list of zero or more path segments that can be manipulated directly. Path segments in segments are percent-decoded and all interaction with segments should take place with percent-decoded strings.
>>> f = furl('http://www.google.com/a/large%20ish/path') >>> f.path Path('/a/large ish/path') >>> f.path.segments ['a', 'large ish', 'path'] >>> str(f.path) '/a/large%20ish/path'Manipulation
>>> f.path.segments = ['a', 'new', 'path', ''] >>> str(f.path) '/a/new/path/' >>> f.path = 'o/hi/there/with%20some%20encoding/' >>> f.path.segments ['o', 'hi', 'there', 'with some encoding', ''] >>> str(f.path) '/o/hi/there/with%20some%20encoding/' >>> f.url 'http://www.google.com/o/hi/there/with%20some%20encoding/' >>> f.path.segments = ['segments', 'are', 'maintained', 'decoded', '^`<>[]"#/?'] >>> str(f.path) '/segments/are/maintained/decoded/%5E%60%3C%3E%5B%5D%22%23%2F%3F'
A path that starts with /
is considered absolute, and a Path can be absolute or not as specified (or set) by the boolean attribute isabsolute. URL Paths have a special restriction: they must be absolute if a netloc (username, password, host, and/or port) is present. This restriction exists because a URL path must start with /
to separate itself from the netloc, if present. Fragment Paths have no such limitation and isabsolute and can be True or False without restriction.
Here's a URL Path example that illustrates how isabsolute becomes True and read-only in the presence of a netloc.
>>> f = furl('/url/path') >>> f.path.isabsolute True >>> f.path.isabsolute = False >>> f.url 'url/path' >>> f.host = 'blaps.ru' >>> f.url 'blaps.ru/url/path' >>> f.path.isabsolute True >>> f.path.isabsolute = False Traceback (most recent call last): ... AttributeError: Path.isabsolute is True and read-only for URLs with a netloc (a username, password, host, and/or port). URL paths must be absolute if a netloc exists. >>> f.url 'blaps.ru/url/path'
Conversely, the isabsolute attribute of Fragment Paths isn't bound by the same read-only restriction. URL fragments are always prefixed by a #
character and don't need to be separated from the netloc.
>>> f = furl('http://www.google.com/#/absolute/fragment/path/') >>> f.fragment.path.isabsolute True >>> f.fragment.path.isabsolute = False >>> f.url 'http://www.google.com/#absolute/fragment/path/' >>> f.fragment.path.isabsolute = True >>> f.url 'http://www.google.com/#/absolute/fragment/path/'
A path that ends with /
is considered a directory, and otherwise considered a file. The Path attribute isdir returns True if the path is a directory, False otherwise. Conversely, the attribute isfile returns True if the path is a file, False otherwise.
>>> f = furl('http://www.google.com/a/directory/') >>> f.path.isdir True >>> f.path.isfile False >>> f = furl('http://www.google.com/a/file') >>> f.path.isdir False >>> f.path.isfile True
A path can be normalized with normalize(), and normalize() returns the Path object for method chaining.
>>> f = furl('http://www.google.com////a/./b/lolsup/../c/') >>> f.path.normalize() >>> f.url 'http://www.google.com/a/b/c/'
Path segments can also be appended with the slash operator, like with pathlib.Path.
>>> from __future__ import division # For Python 2.x. >>> >>> f = furl('path') >>> f.path /= 'with' >>> f.path = f.path / 'more' / 'path segments/' >>> f.url '/path/with/more/path%20segments/'
For a dictionary representation of a path, use asdict().
>>> f = furl('http://www.google.com/some/enc%20oding') >>> f.path.asdict() { 'encoded': '/some/enc%20oding', 'isabsolute': True, 'isdir': False, 'isfile': True, 'segments': ['some', 'enc oding'] }Query
URL queries in furl are Query objects that have params, a one dimensional ordered multivalue dictionary of query keys and values. Query keys and values in params are percent-decoded and all interaction with params should take place with percent-decoded strings.
>>> f = furl('http://www.google.com/?one=1&two=2') >>> f.query Query('one=1&two=2') >>> f.query.params omdict1D([('one', '1'), ('two', '2')]) >>> str(f.query) 'one=1&two=2'
furl objects and Fragment objects (covered below) contain a Query object, and args is provided as a shortcut on these objects to access query.params.
>>> f = furl('http://www.google.com/?one=1&two=2') >>> f.query.params omdict1D([('one', '1'), ('two', '2')]) >>> f.args omdict1D([('one', '1'), ('two', '2')]) >>> f.args is f.query.params TrueManipulation
params is a one dimensional ordered multivalue dictionary that maintains method parity with Python's standard dictionary.
>>> f.query = 'silicon=14&iron=26&inexorable%20progress=vae%20victus' >>> f.query.params omdict1D([('silicon', '14'), ('iron', '26'), ('inexorable progress', 'vae victus')]) >>> del f.args['inexorable progress'] >>> f.args['magnesium'] = '12' >>> f.args omdict1D([('silicon', '14'), ('iron', '26'), ('magnesium', '12')])
params can also store multiple values for the same key because it's a multivalue dictionary.
>>> f = furl('http://www.google.com/?space=jams&space=slams') >>> f.args['space'] 'jams' >>> f.args.getlist('space') ['jams', 'slams'] >>> f.args.addlist('repeated', ['1', '2', '3']) >>> str(f.query) 'space=jams&space=slams&repeated=1&repeated=2&repeated=3' >>> f.args.popvalue('space') 'slams' >>> f.args.popvalue('repeated', '2') '2' >>> str(f.query) 'space=jams&repeated=1&repeated=3'
params is one dimensional. If a list of values is provided as a query value, that list is interpreted as multiple values.
>>> f = furl() >>> f.args['repeated'] = ['1', '2', '3'] >>> f.add(args={'space':['jams', 'slams']}) >>> str(f.query) 'repeated=1&repeated=2&repeated=3&space=jams&space=slams'
This makes sense: URL queries are inherently one dimensional -- query values can't have native subvalues.
See the orderedmultimdict documentation for more information on interacting with the ordered multivalue dictionary params.
ParametersTo produce an empty query argument, like http://sprop.su/?param=
, set the argument's value to the empty string.
>>> f = furl('http://sprop.su') >>> f.args['param'] = '' >>> f.url 'http://sprop.su/?param='
To produce an empty query argument without a trailing =
, use None
as the parameter value.
>>> f = furl('http://sprop.su') >>> f.args['param'] = None >>> f.url 'http://sprop.su/?param'
encode(delimiter='&', quote_plus=True, dont_quote='') can be used to encode query strings with delimiters like ;
, encode spaces as +
instead of %20
(i.e. application/x-www-form-urlencoded encoded), or avoid percent-encoding valid query characters entirely (valid query characters are /?:@-._~!$&'()*+,;=
).
>>> f.query = 'space=jams&woofs=squeeze+dog' >>> f.query.encode() 'space=jams&woofs=squeeze+dog' >>> f.query.encode(';') 'space=jams;woofs=squeeze+dog' >>> f.query.encode(quote_plus=False) 'space=jams&woofs=squeeze%20dog'
dont_quote
accepts True
, False
, or a string of valid query characters to not percent-enode. If True
, all valid query characters /?:@-._~!$&'()*+,;=
aren't percent-encoded.
>>> f.query = 'one,two/three' >>> f.query.encode() 'one%2Ctwo%2Fthree' >>> f.query.encode(dont_quote=True) 'one,two/three' >>> f.query.encode(dont_quote=',') 'one,two%2Fthree'
For a dictionary representation of a query, use asdict().
>>> f = furl('http://www.google.com/?space=ja+ms&space=slams') >>> f.query.asdict() { 'encoded': 'space=ja+ms&space=slams', 'params': [('space', 'ja ms'), ('space', 'slams')] }Fragment
URL fragments in furl are Fragment objects that have a Path path and Query query separated by an optional ?
separator.
>>> f = furl('http://www.google.com/#/fragment/path?with=params') >>> f.fragment Fragment('/fragment/path?with=params') >>> f.fragment.path Path('/fragment/path') >>> f.fragment.query Query('with=params') >>> f.fragment.separator True
Manipulation of Fragments is done via the Fragment's Path and Query instances, path and query.
>>> f = furl('http://www.google.com/#/fragment/path?with=params') >>> str(f.fragment) '/fragment/path?with=params' >>> f.fragment.path.segments.append('file.ext') >>> str(f.fragment) '/fragment/path/file.ext?with=params' >>> f = furl('http://www.google.com/#/fragment/path?with=params') >>> str(f.fragment) '/fragment/path?with=params' >>> f.fragment.args['new'] = 'yep' >>> str(f.fragment) '/fragment/path?new=yep&with=params'
Creating hash-bang fragments with furl illustrates the use of Fragment's boolean attribute separator. When separator is False, the ?
that separates path and query isn't included.
>>> f = furl('http://www.google.com/') >>> f.fragment.path = '!' >>> f.fragment.args = {'a':'dict', 'of':'args'} >>> f.fragment.separator True >>> str(f.fragment) '!?a=dict&of=args' >>> f.fragment.separator = False >>> str(f.fragment) '!a=dict&of=args' >>> f.url 'http://www.google.com/#!a=dict&of=args'
For a dictionary representation of a fragment, use asdict().
>>> f = furl('http://www.google.com/#path?args=args') >>> f.fragment.asdict() { 'encoded': 'path?args=args', 'separator': True, 'path': { 'encoded': 'path', 'isabsolute': False, 'isdir': False, 'isfile': True, 'segments': ['path']}, 'query': { 'encoded': 'args=args', 'params': [('args', 'args')]} }Encoding
Furl handles encoding for you, and furl's philosophy on encoding is simple: raw URL strings should always be percent-encoded.
>>> f = furl() >>> f.netloc = '%40user:%3Apass@google.com' >>> f.username, f.password '@user', ':pass' >>> f = furl() >>> f.path = 'supply%20percent%20encoded/path%20strings' >>> f.path.segments ['supply percent encoded', 'path strings'] >>> f.set(query='supply+percent+encoded=query+strings,+too') >>> f.query.params omdict1D([('supply percent encoded', 'query strings, too')]) >>> f.set(fragment='percent%20encoded%20path?and+percent+encoded=query+too') >>> f.fragment.path.segments ['percent encoded path'] >>> f.fragment.args omdict1D([('and percent encoded', 'query too')])
Raw, non-URL strings should never be percent-encoded.
>>> f = furl('http://google.com') >>> f.set(username='@prap', password=':porps') >>> f.url 'http://%40prap:%3Aporps@google.com' >>> f = furl() >>> f.set(path=['path segments are', 'decoded', '<>[]"#']) >>> str(f.path) '/path%20segments%20are/decoded/%3C%3E%5B%5D%22%23' >>> f.set(args={'query parameters':'and values', 'are':'decoded, too'}) >>> str(f.query) 'query+parameters=and+values&are=decoded,+too' >>> f.fragment.path.segments = ['decoded', 'path segments'] >>> f.fragment.args = {'and decoded':'query parameters and values'} >>> str(f.fragment) 'decoded/path%20segments?and+decoded=query+parameters+and+values'
Python's urllib.quote() and urllib.unquote() can be used to percent-encode and percent-decode path strings. Similarly, urllib.quote_plus() and urllib.unquote_plus() can be used to percent-encode and percent-decode query strings.
Inline manipulationFor quick, single-line URL manipulation, the add(), set(), and remove() methods of furl objects manipulate various URL components and return the furl object for method chaining.
>>> url = 'http://www.google.com/#fragment' >>> furl(url).add(args={'example':'arg'}).set(port=99).remove(fragment=True).url 'http://www.google.com:99/?example=arg'
add() adds items to a furl object with the optional arguments
>>> f = furl('http://www.google.com/').add( ... path='/search', fragment_path='frag/path', fragment_args={'frag':'arg'}) >>> f.url 'http://www.google.com/search#frag/path?frag=args'
set() sets items of a furl object with the optional arguments
?
separator between the fragment path and the fragment query.>>> f = furl().set( ... scheme='https', host='secure.google.com', port=99, path='index.html', ... args={'some':'args'}, fragment='great job') >>> f.url 'https://secure.google.com:99/index.html?some=args#great%20job'
remove() removes items from a furl object with the optional arguments
>>> url = 'https://secure.google.com:99/a/path/?some=args#great job' >>> furl(url).remove(args=['some'], path='path/', fragment=True, port=True).url 'https://secure.google.com/a/'Miscellaneous
Like pathlib.Path, path segments can be appended to a furl object's Path with the slash operator.
>>> from __future__ import division # For Python 2.x. >>> f = furl('http://www.google.com/path?example=arg#frag') >>> f /= 'add' >>> f = f / 'seg ments/' >>> f.url 'http://www.google.com/path/add/seg%20ments/?example=arg#frag'
tostr(query_delimiter='&', query_quote_plus=True, query_dont_quote='') creates and returns a URL string. query_delimiter
, query_quote_plus
, and query_dont_quote
are passed unmodified to Query.encode()
as delimiter
, quote_plus
, and dont_quote
respectively.
>>> f = furl('http://spep.ru/?a+b=c+d&two%20tap=cat%20nap%24') >>> f.tostr() 'http://spep.ru/?a+b=c+d&two+tap=cat+nap$' >>> f.tostr(query_delimiter=';', query_quote_plus=False) 'http://spep.ru/?a%20b=c%20d;two%20tap=cat%20nap$' >>> f.tostr(query_dont_quote='$') 'http://spep.ru/?a+b=c+d&two+tap=cat+nap$'
furl.url
is a shortcut for furl.tostr()
.
>>> f.url 'http://spep.ru/?a+b=c+d&two+tap=cat+nap$' >>> f.url == f.tostr() == str(f) True
copy() creates and returns a new furl object with an identical URL.
>>> f = furl('http://www.google.com') >>> f.copy().set(path='/new/path').url 'http://www.google.com/new/path' >>> f.url 'http://www.google.com'
join() joins the furl object's URL with the provided relative or absolute URL and returns the furl object for method chaining. join()'s action is the same as navigating to the provided URL from the current URL in a web browser.
>>> f = furl('http://www.google.com') >>> f.join('new/path').url 'http://www.google.com/new/path' >>> f.join('replaced').url 'http://www.google.com/new/replaced' >>> f.join('../parent').url 'http://www.google.com/parent' >>> f.join('path?query=yes#fragment').url 'http://www.google.com/path?query=yes#fragment' >>> f.join('unknown://www.yahoo.com/new/url/').url 'unknown://www.yahoo.com/new/url/'
For a dictionary representation of a URL, use asdict().
>>> f = furl('https://xn--eckwd4c7c.xn--zckzah/path?args=args#frag') >>> f.asdict() { 'url': 'https://xn--eckwd4c7c.xn--zckzah/path?args=args#frag', 'scheme': 'https', 'username': None 'password': None, 'host': 'γγ‘γ€γ³.γγΉγ', 'host_encoded': 'xn--eckwd4c7c.xn--zckzah', 'port': 443, 'netloc': 'xn--eckwd4c7c.xn--zckzah', 'origin': 'https://xn--eckwd4c7c.xn--zckzah', 'path': { 'encoded': '/path', 'isabsolute': True, 'isdir': False, 'isfile': True, 'segments': ['path']}, 'query': { 'encoded': 'args=args', 'params': [('args', 'args')]}, 'fragment': { 'encoded': 'frag', 'path': { 'encoded': 'frag', 'isabsolute': False, 'isdir': False, 'isfile': True, 'segments': ['frag']}, 'query': { 'encoded': '', 'params': []}, 'separator': True} }
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4