This PEP describes a mirroring infrastructure for PyPI.
PEP WithdrawalThe main PyPI web service was moved behind the Fastly caching CDN in May 2013: https://mail.python.org/pipermail/distutils-sig/2013-May/020848.html
Subsequently, this arrangement was formalised as an in-kind sponsorship with the PSF, and the PSF has also taken on the task of risk management in the event that that sponsorship arrangement were to ever cease.
The download statistics that were previously provided directly on PyPI, are now published indirectly via Google Big Query: https://packaging.python.org/guides/analyzing-pypi-package-downloads/
Accordingly, the mirroring proposal described in this PEP is no longer required, and has been marked as Withdrawn.
RationalePyPI is hosting over 6000 projects and is used on a daily basis by people to build applications. Especially systems like easy_install
and zc.buildout
make intensive usage of PyPI.
For people making intensive use of PyPI, it can act as a single point of failure. People have started to set up some mirrors, both private and public. Those mirrors are active mirrors, which means that they are browsing PyPI to get synced.
In order to make the system more reliable, this PEP describes:
People that wants to mirror PyPI make a proposal on catalog-SIG. When a mirror is proposed on the mailing list, it is manually added in a mirror list in the PyPI application after it has been checked to be compliant with the mirroring rules.
The mirror list is provided as a list of host names of the form
X.pypi.python.org
The values of X are the sequence a,b,c,…,aa,ab,… a.pypi.python.org is the master server; the mirrors start with b. A CNAME record last.pypi.python.org points to the last host name. Mirror operators should use a static address, and report planned changes to that address in advance to distutils-sig.
The new mirror also appears at http://pypi.python.org/mirrors
which is a human-readable page that gives the list of mirrors. This page also explains how to register a new mirror.
PyPI provides statistics on downloads at /stats
. This page is calculated daily by PyPI, by reading all mirrors’ local stats and summing them.
The stats are presented in daily or monthly files, under /stats/days
and /stats/months
. Each file is a bzip2
file with these formats:
Examples:
With a distributed mirroring system, clients may want to verify that the mirrored copies are authentic. There are multiple threats to consider:
This specification only deals with the second threat. Some provisions are made to detect man-in-the-middle attacks. To detect the first attack, package authors need to sign their packages using PGP keys, so that users verify that the package comes from the author they trust.
The central index provides a DSA key at the URL /serverkey, in the PEM format as generated by “openssl dsa -pubout” (i.e. RFC 3280 SubjectPublicKeyInfo, with the algorithm 1.3.14.3.2.12). This URL must not be mirrored, and clients must fetch the official serverkey from PyPI directly, or use the copy that came with the PyPI client software. Mirrors should still download the key, to detect a key rollover.
For each package, a mirrored signature is provided at /serversig/<package>. This is the DSA signature of the parallel URL /simple/<package>, in DER form, using SHA-1 with DSA (i.e. as a RFC 3279 Dsa-Sig-Value, created by algorithm 1.2.840.10040.4.3)
Clients using a mirror need to perform the following steps to verify a package:
An implementation of the verification algorithm is available from https://svn.python.org/packages/trunk/pypi/tools/verify.py
Verification is not needed when downloading from central index, and should be avoided to reduce the computation overhead.
About once a year, the key will be replaced with a new one. Mirrors will have to re-fetch all /serversig pages. Clients using mirrors need to find a trusted copy of the new server key. One way to obtain one is to download it from https://pypi.python.org/serverkey. To detect man-in-the-middle attacks, clients need to verify the SSL server certificate, which will be signed by the CACert authority.
Special pages a mirror needs to provideA mirror is a subset copy of PyPI, so it provides the same structure by copying it.
It also needs to provide two specific elements:
CPAN uses a freshness date system where the mirror’s last synchronisation date is made available.
For PyPI, each mirror needs to maintain a URL with simple text content that represents the last synchronisation date the mirror maintains.
The date is provided in GMT time, using the ISO 8601 format [2]. Each mirror will be responsible to maintain its last modified date.
This page must be located at : /last-modified
and must be a text/plain page.
Each mirror is responsible to count all the downloads that where done via it. This is used by PyPI to sum up all downloads, to be able to display the grand total.
These statistics are in CSV-like form, with a header in the first line. It needs to obey PEP 305. Basically, it should be readable by Python’s csv
module.
The fields in this file are:
The content will look like this:
# package,filename,useragent,count zc.buildout,zc.buildout-1.6.0.tgz,MyAgent,142 ...
The counting starts the day the mirror is launched, and there is one file per day, compressed using the bzip2
format. Each file is named like the day. For example, 2008-11-06.bz2
is the file for the 6th of November 2008.
They are then provided in a folder called days
. For example:
This page must be located at /local-stats
.
A mirroring protocol called Simple Index
was described and implemented by Martin v. Loewis and Jim Fulton, based on how easy_install
works. This section synthesizes it and gives a few relevant links, plus a small part about User-Agent
.
Mirrors must reduce the amount of data transferred between the central server and the mirror. To achieve that, they MUST use the changelog() PyPI XML-RPC call, and only refetch the packages that have been changed since the last time. For each package P, they MUST copy documents /simple/P/ and /serversig/P. If a package is deleted on the central server, they MUST delete the package and all associated files. To detect modification of package files, they MAY cache the file’s ETag, and MAY request skipping it using the If-none-match header.
Each mirroring tool MUST identify itself using a descripte User-agent header.
The pep381client package [1] provides an application that respects this protocol to browse PyPI.
How a client can use PyPI and its mirrorsClients that are browsing PyPI should be able to use alternative mirrors, by getting the list of the mirrors using last.pypi.python.org
.
Code example:
>>> import socket >>> socket.gethostbyname_ex('last.pypi.python.org')[0] 'h.pypi.python.org'
The clients so far that could use this mechanism:
Clients that are browsing PyPI should be able to use a fail-over mechanism when PyPI or the used mirror is not responding.
It is up to the client to decide which mirror should be used, maybe by looking at its geographical location and its responsiveness.
This PEP does not describe how this fail-over mechanism should work, but it is strongly encouraged that the clients try to use the nearest mirror.
The clients so far that could use this mechanism:
When a client needs to get some packages from several distinct indexes, it should be able to use each one of them as a potential source of packages. Different indexes should be defined as a sorted list for the client to look for a package.
Each independent index can of course provide a list of its mirrors.
XXX define how to get the hostname for the mirrors of an arbitrary index.
That permits all combinations at client level, for a reliable packaging system with all levels of privacy.
It is up the client to deal with the merging.
References AcknowledgmentsGeorg Brandl.
CopyrightThis document has been placed in the public domain.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4