A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://python.github.io/peps/pep-0385/ below:

PEP 385 – Migrating from Subversion to Mercurial

PEP 385 – Migrating from Subversion to Mercurial
Author:
Dirkjan Ochtman <dirkjan at ochtman.nl>, Antoine Pitrou <solipsis at pitrou.net>, Georg Brandl <georg at python.org>
Status:
Final
Type:
Process
Created:
25-May-2009
Table of Contents Motivation

After having decided to switch to the Mercurial DVCS, the actual migration still has to be performed. In the case of an important piece of infrastructure like the version control system for a large, distributed project like Python, this is a significant effort. This PEP is an attempt to describe the steps that must be taken for further discussion. It’s somewhat similar to PEP 347, which discussed the migration to SVN.

To make the most of hg, we would like to make a high-fidelity conversion, such that (a) as much of the svn metadata as possible is retained, and (b) all metadata is converted to formats that are common in Mercurial. This way, tools written for Mercurial can be optimally used. In order to do this, we want to use the hgsubversion software to do an initial conversion. This hg extension is focused on providing high-quality conversion from Subversion to Mercurial for use in two-way correspondence, meaning it doesn’t throw away as much available metadata as other solutions.

Such a conversion also seems like a good time to reconsider the contents of the repository and determine if some things are still valuable. In this spirit, the following sections also propose discarding some of the older metadata.

Timeline

The current schedule for conversion milestones:

Transition plan Branch strategy

Mercurial has two basic ways of using branches: cloned branches, where each branch is kept in a separate repository, and named branches, where each revision keeps metadata to note on which branch it belongs. The former makes it easier to distinguish branches, at the expense of requiring more disk space on the client. The latter makes it a little easier to switch between branches, but all branch names are a persistent part of history. [1]

Differences between named branches and cloned branches:

We propose to use named branches for release branches and adopt cloned branches for feature branches.

History management

In order to minimize the loss of information due to the conversion, we propose to provide several repositories as a conversion result:

Since all branches are present in the historic repo, they can later be extracted as separate repositories at any time should it prove to be necessary.

The final revision map between SVN revision numbers, Mercurial changesets and SVN branch names will be made available in a file stored in the Misc directory. Its format is as following:

[...]
88483 e65daae6cf4499a0863cb7645109a4798c28d83e issue10276-snowleopard
88484 835cb57abffeceaff0d85c2a3aa0625458dd3e31 py3k
88485 d880f9d8492f597a030772c7485a34aadb6c4ece release32-maint
88486 0c431b8c22f5dbeb591414c154acb7890c1809df py3k
88487 82cda1f21396bbd10db8083ea20146d296cb630b release32-maint
88488 8174d00d07972d6f109ed57efca8273a4d59302c release27-maint
[...]
Converting tags

The SVN tags directory contains a lot of old stuff. Some of these are not, in fact, full tags, but contain only a smaller subset of the repository. All release tags will be kept; other tags will be included based on requests from the developer community. We propose to make the tag naming scheme consistent, in this style: v3.2.1a2.

Author map

In order to provide user names the way they are common in hg (in the ‘First Last <user@example.org>’ format), we need an author map to map cvs and svn user names to real names and their email addresses. We have a complete version of such a map in the migration tools repository (not publicly accessible to avoid leaking addresses to harvesters). The email addresses in it might be out of date; that’s bound to happen, although it would be nice to try and have as many people as possible review it for addresses that are out of date. The current version also still seems to contain some encoding problems.

Generating .hgignore

The .hgignore file can be used in Mercurial repositories to help ignore files that are not eligible for version control. It does this by employing several possible forms of pattern matching. The current Python repository already includes a rudimentary .hgignore file to help with using the hg mirrors.

Since the current Python repository already includes a .hgignore file (for use with hg mirrors), we’ll just use that. Generating full history of the file was debated but deemed impractical (because it’s relatively hard with fairly little gain, since ignoring is less important for older revisions).

Repository size

A bare conversion result of the current Python repository weighs 1.9 GB; although this is smaller than the Subversion repository (2.7 GB) it is not feasible.

The size becomes more manageable by the trimming applied to the working repository, and by a process called “revlog reordering” that optimizes the layout of internal Mercurial storage very efficiently.

After all optimizations done, the size of the working repository is around 180 MB on disk. The amount of data transferred over the network when cloning is estimated to be around 80 MB.

Other repositories

There are a number of other projects hosted in svn.python.org’s “projects” repository. The “peps” directory will be converted along with the main Python one. Richard Tew has indicated that he’d like the Stackless repository to also be converted. What other projects in the svn.python.org repository should be converted?

There’s now an initial stab at converting the Jython repository. The current tip of hgsubversion unfortunately fails at some point. Pending investigation.

Other repositories that would like to converted to Mercurial can announce themselves to me after the main Python migration is done, and I’ll take care of their needs.

Infrastructure hg-ssh

Developers should access the repositories through ssh, similar to the current setup. Public keys can be used to grant people access to a shared hg@ account. A hgwebdir instance also has been set up at hg.python.org for easy browsing and read-only access. It is configured so that developers can trivially start new clones (for longer-term features that profit from development in a separate repository).

Also, direct creation of public repositories is allowed for core developers, although it is not yet decided which naming scheme will be enforced:

$ hg init ssh://hg@hg.python.org/sandbox/mywork
repo created, public URL is http://hg.python.org/sandbox/mywork
Hooks

A number of hooks is currently in use. The hg equivalents for these should be developed and deployed. The following hooks are being used:

The hooks repository contains ports of these server-side hooks to Mercurial, as well as a couple additional ones:

One additional hook could be beneficial:

End-of-line conversions

Discussion about the lack of end-of-line conversion support in Mercurial, which was provided initially by the win32text extension, led to the development of the new eol extension that supports a versioned management of line-ending conventions on a file-by-file basis, akin to Subversion’s svn:eol-style properties. This information is kept in a versioned file called .hgeol, and such a file has already been checked into the Subversion repository.

A hook also exists on the server side to reject any changeset introducing inconsistent newline data (see above).

hgwebdir

A more or less stock hgwebdir installation should be set up. We might want to come up with a style to match the Python website.

A small WSGI application has been written that can look up Subversion revisions and redirect to the appropriate hgweb page for the given changeset, regardless in which repository the converted revision ended up (since one big Subversion repository is converted into several Mercurial repositories). It can also look up Mercurial changesets by their hexadecimal ID.

roundup

By pointing Roundup to the URL of the lookup script mentioned above, links to SVN revisions will continue to work, and links to Mercurial changesets can be created as well, without having to give repository and changeset ID.

After migration Where to get code

After migration, the hgwebdir will live at hg.python.org. This is an accepted standard for many organizations, and an easy parallel to svn.python.org. The working repo might live at http://hg.python.org/cpython/, for example, with the archive repo at http://hg.python.org/cpython-archive/. For write access, developers will have to use ssh, which could be ssh://hg@hg.python.org/cpython/.

code.python.org was also proposed as the hostname. We think that using the VCS name in the hostname is good because it prevents confusion: it should be clear that you can’t use svn or bzr for hg.python.org.

hgwebdir can already provide tarballs for every changeset. This obviates the need for daily snapshots; we can just point users to tip.tar.gz instead, meaning they will get the latest. If desired, we could even use buildbot results to point to the last good changeset.

Python-specific documentation

hg comes with good built-in documentation (available through hg help) and a wiki that’s full of useful information and recipes, not to mention a popular book (readable online).

In addition to that, the recently overhauled Python Developer’s Guide already has a branch with instructions for Mercurial instead of Subversion; an online build of this branch is also available.

Proposed workflow

We propose two workflows for the migration of patches between several branches.

For migration within 2.x or 3.x branches, we propose a patch always gets committed to the oldest branch where it applies first. Then, the resulting changeset can be merged using hg merge to all newer branches within that series (2.x or 3.x). If it does not apply as-is to the newer branch, hg revert can be used to easily revert to the new-branch-native head, patch in some alternative version of the patch (or none, if it’s not applicable), then commit the merge. The premise here is that all changesets from an older branch within the series are eventually merged to all newer branches within the series.

The upshot is that this provides for the most painless merging procedure. This means that in the general case, people have to think about the oldest branch to which the patch should be applied before actually applying it. Usually, that is one of only two branches: the latest maintenance branch and the trunk, except for security fixes applicable to older branches in security-fix-only mode.

For merging bug fixes from the 3.x to the 2.7 maintenance branch (2.6 and 2.5 are in security-fix-only mode and their maintenance will continue in the Subversion repository), changesets should be transplanted (not merged) in some other way. The transplant extension, import/export and bundle/unbundle work equally well here.

Choosing this approach allows 3.x not to carry all of the 2.x history-since-it-was-branched, meaning the clone is not as big and the merges not as complicated.

The future of Subversion

What happens to the Subversion repositories after the migration? Since the svn server contains a bunch of repositories, not just the CPython one, it will probably live on for a bit as not every project may want to migrate or it takes longer for other projects to migrate. To prevent people from staying behind, we may want to move migrated projects from the repository to a new, read-only repository with a new name.

Build identification

Python currently provides the sys.subversion tuple to allow Python code to find out exactly what version of Python it’s running against. The current version looks something like this:

Another value is returned from Py_GetBuildInfo() in the C API, and available to Python code as part of sys.version:

I propose that the revision identifier will be the short version of hg’s revision hash, for example ‘dd3ebf81af43’, augmented with ‘+’ (instead of ‘M’) if the working directory from which it was built was modified. This mirrors the output of the hg id command, which is intended for this kind of usage. The sys.subversion value will also be renamed to sys.mercurial to reflect the change in VCS.

For the tag/branch identifier, I propose that hg will check for tags on the currently checked out revision, use the tag if there is one (‘tip’ doesn’t count), and uses the branch name otherwise. sys.subversion becomes

and the build info string becomes

This reflects that the default branch in hg is called ‘default’ instead of Subversion’s ‘trunk’, and reflects the proposed new tag format.

Mercurial also allows to find out the latest tag and the number of changesets separating the current changeset from that tag, allowing for a descriptive version string:

$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
v3.2+37-4b5d0d260e72
$ hg up 2.7
3316 files updated, 0 files merged, 379 files removed, 0 files unresolved
$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
v2.7.1+216-9619d21d8198
Copyright

This document has been placed in the public domain.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4