To simplify and speed up handling URLs the package provides a new type to work with them in an object oriented way.
URL objects can be added to each other as well as right added to strings giving a joined URL object in both cases. The join semantics depend on the URL schemes and their attributes.
Predfined URL schemesThese schemes are predefined by the module. The function register_scheme()
(see above) allows adding new ones or changing the behaviour for predefined ones.
The uses_* fields are integers 0 or 1 representing the schemes possibilities. When a feature is set to 0 the corresponding field is left out while parsing the URL. Characters which would normally be seen as separators are ignored then.
uses_relative is important when joining URLs. Only URLs with uses_relative will have their paths joined according to the common rules.
Note that the URL object constructors will raise a ValueError exception for unknown schemes they find in the construction string.
These constructors are available in the package:
URL(url)
RawURL(url)
BuildURL(scheme='', netloc='', path='', params='', query='', fragment='')
Normalizing means that unnecessary relative components and slashes are removed from the URL prior to storing it. The stored URL will always be equivalent to the one given to the constructor.
Note: The URL type uses a scheme feature dictionary to figure out how to parse different schemes. Use the add_scheme()
to access this dictionary.
A URL instance url
defines these methods:
depth()
normalized()
parsed()
basic()
In case the url already forfills this requirement, a new reference to it is returned.
relative(baseURL)
URL and baseURL must both be absolute URLs for this to work. An exception is raised otherwise. The base URL should provide also scheme and netloc, because otherwise joining might result in lossage of scheme information. If only the URL provides a scheme, then the returned relative URL will also include that scheme.
Parameters, fragment and query of the URL object are preserved; only the path is made relative and the netloc removed (relative paths and netlocs don't go together).
In case both URLs provide schemes and/or netlocs that point to different resources, the method simply returns a new reference to the object.
rebuild(scheme='', netloc='', path='', params='', query='', fragment='')
Arguments not given are taken unchanged from the URL object.
pathentry(index)
index may be negative to indicate an entry counted from the right (with -1 being the rightmost entry). An IndexError is raised in case the index lies out of range. Leading and trailing slashes are not counted.
pathlen()
.pathentry()
method.
Leading and trailing slashes are not counted.
pathtuple()
Leading and trailing slashes are ignored and the slashes are not included.
A URL instance url
provides access to these (read-only) variables:
absolute
base
ext
file
fragment
netloc
host
user
passwd
port
params
path
scheme
string, url
absolute
normal
mimetype
types_map
dictionary of the Python standard lib's mimetype module as basis for finding out the MIME type. You can add entries to that dictionary at runtime to adapt the mechanism to your needs.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4