The history and evolution of the Unix operating system is made available as a revision management repository, covering the period from its inception in 1972 as a five thousand line kernel, to 2016 as a widely-used 27 million line system. The 1.1GB repository contains 496 thousand commits and 2,523 branch merges. The repository employs the commonly used Git version control system for its storage, and is hosted on the popular GitHub archive. It has been created by synthesizing with custom software 24 snapshots of systems developed at Bell Labs, the University of California at Berkeley, and the 386BSD team, two legacy repositories, and the modern repository of the open source FreeBSD system. In total, 973 individual contributors are identified, the early ones through primary research. The data set can be used for empirical research in software engineering, information systems, and software archaeology.
1 IntroductionThe Unix operating system stands out as a major engineering breakthrough due to its exemplary design, its numerous technical contributions, its impact, its development model, and its widespread use [
Gehani, 2003,pp. 27–29]. The design of the Unix programming environment has been characterized as one offering unusual simplicity, power, and elegance [
McIlroy et al, 1978,
Pike and Kernighan, 1984]. On the technical side, features that can be directly attributed to Unix or were popularized by it include [
Ritchie and Thompson, 1978,
Ritchie, 1978,
Johnson and Ritchie, 1978]:
A large community contributed software to Unix from its early days (; ). This community grew immensely over time and worked using what are now termed open source software development methods (; ). Unix and its intellectual descendants have also helped the spread of:
Unix systems also form a large part of the modern internet infrastructure and the web.
The importance of Unix as an engineering artefact motivates the preservation of its development history, which can then be used for empirical research in software engineering, information systems, and software archeology.
The availability of Unix source code has changed over the years. In the 1970s, when Unix came of out Bell Labs and became widely known in the scientific community [
Ritchie and Thompson, 1974], AT&T was still operating under a 1956 "consent decree" entered by Judge Thomas F. Meaney [
Lewis, 1956]. This was the result of a complaint filed by the US Department of Justice Antitrust Division in 1949 against the Western Electric Company and AT&T, claiming that the companies were unlawfully restraining and monopolizing trade and commerce in violation of the Sherman Antitrust Act. Under the terms of the consent decree, Western Electric (a fully owned subsidiary of AT&T and 50% owner of Bell Labs) was prohibited from manufacturing non-telecommunications equipment, and AT&T (owner of the other 50% of Bell Labs) was forbidden to engage in business other than communication services [
Lewis, 1956]. Consequently, AT&T could not market or license Unix for profit, and, therefore Unix was initially licensed royalty-free through simple letter agreements [
Salus, 1994,p. 60]. Later however licenses became more intricate and restrictive, limiting the availability of its source code [
Takahashi and Takamatsu, 2013], which was carefully guarded as a trade secret [
Libes and Ressler, 1989,p. 20].
Luckily, important Unix material of historical importance has survived until today, often through magnetic tapes preserved in the hands of people realizing their significance. Also, key parts of the early Unix development, namely the systems running on the 16-bit PDP-11 and early versions of the 32-bit Unix (excluding System III, System V, and their successors), were released in 2002 by one of its right-holders (Caldera International) under a liberal license. The license, which covers the 16-bit Unix Editions 1–7 and 32-bit Unix 32V, allows the redistribution and use of the material in source and binary forms, with or without modification, subject to conditions similar to these of the original BSD license.
Combining these parts with software that was developed or released as open source software by the University of California at Berkeley and the FreeBSD Project provides coverage of the system's development over a period ranging from June 20th 1972 until today.
Curating and processing available source code snapshots as well as old and modern configuration management repositories allows the reconstruction of a new synthetic Git repository that combines under a single roof most of the available data. This repository documents in a digital form the detailed history and evolution of an important digital artefact over a period of 44 years. The contributions of the work presented here are:
This work expands on a presentation [
Gall et al, 2014] and a four-page conference paper [
Spinellis, 2015] by including considerably more detailed information on the data and their generation process. The added material includes an overview of the code's licensing, the data set's key metrics (Table
1), a detailed description of the available software releases (Section
2.1), an expanded overview of data sources (Table
2) and available metadata (Section
2.2), GitHub integration (Section
2.3), known limitations (Section
2.4), the documentation of derived authorship data (Tables
3and
4), details on the data import process and tools (Section
3.3), instructions for contributing to the project (Section
5), and a second example on using the data set (Figure
13).
The following sections describe the Unix history repository's structure and contents (Section
2), the way it was created (Section
3), how it can be used (Section
4), and how it can be extended (Section
5). The paper concludes with ideas for further work (Section
6).
2 Data OverviewTable 1: Key Repository Metrics of the Unix and Linux History Repositories
Metric Unix history Linux history Start date 1972-06-20 1991-09-17 Start files 13 92 Start lines 4768 917,812 End files 63,049 51,396 End lines 27,388,943 21,525,436 Data size (.git) 1.1GB 1.0GB Number of commits 495,622 611,735 Number of merges 2,523 48,821 Number of authors 973 18,465 Days with activity 13,004 5,126The 1GB Unix history Git repository is made available for cloning on GitHub.
2Currently
3the repository contains 496 thousand commits and 2,523 merges from about 973 contributors (measured by counting unique email addresses). The contributors include 29 from the Bell Labs staff, 158 from Berkeley's Computer Systems Research Group (CSRG), 79 contributors of the 386BSD patch kit, and 691 from the FreeBSD Project. More metrics regarding the Unix history repository are listed in Table
1. For comparison purposes the table also includes details regarding the Linux kernel history repository.
4The Unix history repository reported here differs from the Linux one in three ways: first it covers a significantly longer timespan, second after 1974 it contains code of a complete system (kernel and tools) rather than only a kernel, and third it represents the work of four diverse communities.
2.1 Available Operating System ReleasesFigure 1: Timeline of releases in the repository
The repository starts its life at a tag identified as
Epoch, which contains only licensing information and its modern README file. Various tag and branch names identify points of significance. A timeline of these releases based on their repository timestamps is depicted in Figure
1.
The
Research-VXtags correspond to six so-called
researcheditions of Unix that came out of Bell Labs. These start with
Research-V1(4768 lines of PDP-11 assembly) and end with
Research-V7(1820 mostly C files, 324kLOC
mdash; lines of code). Following tradition, the numbers of these releases correspond to the edition of the manual [
Libes and Ressler, 1989,p. 5]. For example,
Research-V7is variously called
7th Editionor
Version 7Unix.
Figure 2: Representative scanned pages from the 1st Edition Unix
The 1st Edition (November 3, 1971
5—
Research-V1) contains only the kernel; the 60 user commands that came with it [
Salus, 1994,p. 41] are no longer available. Even the kernel, written in PDP-11 assembly language, has not survived in electronic form. It was derived from a group effort that took a scanned June 1972 280-page printout of 1st Edition UNIX source code and documentation [
Bashkow, 1972], and restored it to an incomplete but running system [
Toomey, 2010]. Two representative pages of the printout are shown in Figure
2.
The next four editions are also only partially available.
The 6th Edition (May 1975 —
Research-V6), is the first that appears in the repository in complete form, and the first that became widely available outside Bell Labs through licenses to commercial and government users. It was also the last bearing the names of Thompson and Ritchie on the manuals' title page. The 6th Edition is the one John Lions used for teaching two operating systems courses at the University of New South Wales in Australia. In 1977 Lions produced a booklet with an indexed 9073-line listing of the entire Unix kernel with an equal amount of commentary explaining its structure [
Lions, 1996]. Although this was initially sold by mail order, a year afterwards it was no longer available [
Salus, 1994,p. 130]. Nevertheless, for the next two decades it circulated as multiple-generation
samizdatphotocopies [
Lions, 1996,p. ix], until in late 1995 the lawyers of Santa Cruz Operation, Inc. gave permission for its official publication.
The 7th Edition (January 1979 —
Research-V7), includes many new influential commands, such as
awk[
Aho et al, 1979],
expr,
find,
lex[
Lesk, 1975],
lint[
Johnson, 1977],
m4[
Kernighan and Ritchie, 1979],
make[
Feldman, 1979],
refer[
Lesk, 1979a],
sed[
McMahon, 1979],
tar,
uucp[
Nowitz and Lesk, 1979], and the Bourne shell [
Bourne, 1979,
Bourne, 1978]. It also supports larger file systems and more user accounts. It is the version that was widely ported to other architectures.
Unix 32V(or
32/V— tagged
Bell-32V) is the port of the 7th Edition Unix to the DEC/VAX architecture. It was created by John Raiser and Tom London, managed by Charlier Roberts, at Bell Labs in Holmdel in 1978. There seem to be two reasons why the port was not implemented by the original team. First, DEC's refusal to support Unix, favouring VMS instead, and, second, the complexity of the VAX instruction set, which apparently went against the values of the Unix patriarchs [
Salus, 1994,p. 154]. The port took about three months to complete by treating the VAX as a large PDP-11 — keeping the existing swapping mechanism and ignoring the VAX's hardware paging capability [
Libes and Ressler, 1989,p. 12]. In the fall of 1978
Bell-32Vwas sent to the University of California at Berkeley under a "special research agreement" [
Salus, 1994,p. 154].
BSD-Xtags correspond to 15 snapshots released from Berkeley. Their contents are summarized in the following paragraphs, based on published descriptions (; ; ) and the manual examination of their contents. The first Berkeley Software Distribution (BSD) (tagged
BSD-1), released in early 1978, contained the Unix Pascal System the
exline editor, and a number of tools. The Second Berkeley Software Distribution (2BSD, tagged
BSD-2), included the full-screen editor
vi, the associated terminal capability database and management library
termcap, and many more tools, such as the
cshshell. The 3BSD release (tagged
BSD-3), released in late 1979, extended
Unix 32Vwith support for virtual memory [
Babaõglu and Joy, 1981] and the 2BSD additions. Subsequent releases [
Salus, 1994,pp. 164–167] included in the repository are marked with the following tags.
tags correspond to 386/BSD: version 0.0 (March 1992 — tagged
386BSD-0.0) and version 0.1 (July 1992 — tagged
386BSD-0.1). This was a derivative of the BSD Networking 2 Release targeting the Intel 386 architecture, developed by Lynne and William Jolitz, who wrote the six missing kernel files. A description of this system was published as a series of 18 articles in the
Dr. Dobb's Journal[
Jolitz and Jolitz, 1991].
The
386BSD-0.1-patchkitbranch contains 171 commits associated with patches made to 386BSD 0.1 by a group of volunteers from mid-1992 to mid-1993. Patches contain their changes in Unix "context diff" format, and can therefore be applied automatically to the 386BSD distribution. Each patch is accompanied by a metadata file listing its title, author, description, and prerequisites.
FreeBSD-release/Xtags and branches mark 69 releases derived from the FreeBSD Project. The names of tags and branches to be imported are obtained by excluding from the corresponding FreeBSD set, names matching one of the following patterns:
projects/,
user/,
master, or
svn_head. The FreeBSD Project started in early 1993 to address difficulties in maintaining 386/BSD through patches and working with its author to secure the future of 386/BSD [
FreeBSD, 2015]. The focus of the project was to support the PC architecture appealing to a large, not necessarily highly technically sophisticated audience [
McKusick and Neville-Neil, 2004,p. 11]. For legal reasons associated with the settlement of the USL case, while versions up to 1.1.5.1 were derived from the BSD Networking 2 Release, later ones were derived from the 4.4BSD-Lite Release 2 with 386/BSD additions. Two other BSD Unix descendants that could have been imported in the place of FreeBSD or in parallel with it are NetBSD and OpenBSD. FreeBSD was chosen, because it appears to be more popular that the other two as measured by the results obtained by Google search (17 million results for FreeBSD, 366 thousand results for OpenBSD, and 350 thousand results for NetBSD).
All branches with a
-Snapshot-Developmentsuffix denote commits that have been synthesized from a time-ordered sequence of a snapshot's files, while tags with a
-VCS-Developmentsuffix mark the point along an imported version control history branch where a particular release occurred.
2.2 Available MetadataThe repository's history includes commits from the earliest days of the system's development, such as the ones listed in Figure
3. Commits that have been synthesized from snapshots and author-to-file maps, rather than imported from other revision control systems, can be recognized by the "
Synthetic commit" phrase that appears in the commit's comment. Such commit comments follow exactly the preceding format, identifying the snapshot from which the commit was synthesized (four Research Editions in this case) and the file corresponding to the commit's time stamp.
commit c4b1db0397c78e91b554e3edff3350a8c80781b1 Author: Ken Thompson <ken@research.uucp> Date: Mon May 7 01:23:11 1979 -0500 Research V7 development Work on file usr/sys/sys/nami.c Synthesized-from: v7 commit 08d62191ab22882194e5f7004b3c00fb39d99193 Author: Ken Thompson <ken@research.uucp> Date: Fri Jul 18 04:09:14 1975 -0500 Research V6 development Work on file usr/sys/ken/nami.c Synthesized-from: v6 commit 90798d6e3caec237bab95d22f0650047c3e9d431 Author: Ken Thompson <ken@research.uucp> Date: Thu Jan 2 19:25:11 1975 -0500 Research V5 development Work on file usr/sys/ken/nami.c Synthesized-from: v5 commit a8c0fddc39968d4669a1f75a5121b4acd8f9c699 Author: Ken Thompson <ken@research.uucp> Date: Thu Aug 30 19:30:51 1973 -0500 Research V3 development Work on file sys/ken/nami.c Synthesized-from: v3
Figure 3: A log of file changes across Research Unix releases
Note that the commits derived from snapshot data are timestamped with the modification time of each file in the snapshot (see Figure
3). This means that they represent only the file's final change and state in the development of the given release. Furthermore, the timestamp may be incorrect in cases where the file's modification time was changed after it was last written by its author. This is almost certainly the case in the very early Unix Research Editions.
Merges between releases that happened along the system's evolution, such as the development of 3BSD from 2BSD and Unix 32/V, are also correctly represented in the Git repository as graph nodes with two parents (see Figure
8).
78a8403693 usr/sys/ken/pipe.c (Ken Thompson 1975-07-17 10:33:37 -0500 48) iput(ip); 78a8403693 usr/sys/ken/pipe.c (Ken Thompson 1975-07-17 10:33:37 -0500 49) return; 78a8403693 usr/sys/ken/pipe.c (Ken Thompson 1975-07-17 10:33:37 -0500 50) } 9dd2619e6d usr/sys/sys/pipe.c (Ken Thompson 1979-01-10 15:19:35 -0500 51) u.u_r.r_val2 = u.u_r.r_val1; 9dd2619e6d usr/sys/sys/pipe.c (Ken Thompson 1979-01-10 15:19:35 -0500 52) u.u_r.r_val1 = r; 2c5a749b29 usr/sys/ken/pipe.c (Ken Thompson 1974-11-26 18:13:21 -0500 53) wf->f_flag = FWRITE|FPIPE; 2c5a749b29 usr/sys/ken/pipe.c (Ken Thompson 1974-11-26 18:13:21 -0500 54) wf->f_inode = ip; 2c5a749b29 usr/sys/ken/pipe.c (Ken Thompson 1974-11-26 18:13:21 -0500 55) rf->f_flag = FREAD|FPIPE; 2c5a749b29 usr/sys/ken/pipe.c (Ken Thompson 1974-11-26 18:13:21 -0500 56) rf->f_inode = ip; 2c5a749b29 usr/sys/ken/pipe.c (Ken Thompson 1974-11-26 18:13:21 -0500 57) ip->i_count = 2; 9dd2619e6d usr/sys/sys/pipe.c (Ken Thompson 1979-01-10 15:19:35 -0500 58) ip->i_mode = IFREG; 7fc472a9e2 usr/src/sys/sys/pipe.c (Bill Joy 1980-11-09 08:01:07 -0800 59) ip->i_flag = IACC|IUPD|ICHG|IPIPE;
Figure 4: Identification in a single file of commits spanning multiple snapshots
More importantly, the repository is constructed in a way that allows
git blame, which annotates source code lines with the version, date, and author associated with their first appearance, to produce the expected code provenance results. For example, checking out the
BSD-4tag, and running
git blame -M -M -C -Con the kernel's
pipe.cfile will show lines spanning the 5th, 6th, and the 7th Research Edition developed at Bell Labs, as well as 4BSD developed at Berkeley (see Figure
4). These lines are derived from snapshot files (probably) written by Ken Thompson in 1974, 1975, and 1979, and by Bill Joy in 1980. This feature allows the automatic (though computationally expensive) detection of the code's provenance at any point of time. Similarly, the
git logcommand can also trace file changes across successive Unix releases. An example can be seen in Figure
3, which was obtained by running
git log -follow -M20 -C20 ./usr/sys/sys/nami.con the checked out version of
Research-V7.
Figure 5: Code growth and provenance across representative Unix releases.
lib/libc/gen/timezone.c (Ed Schouten 2009-12-05 19:31:38 +0000 107) _tztab(int zone, int dst) lib/libc/gen/timezone.c (Rodney Grimes 1994-05-27 05:00:24 +0000 108) { lib/libc/gen/timezone.c (David E. O'Brien 2002-02-01 01:08:48 +0000 109) struct zone *zp; lib/libc/gen/timezone.c (David E. O'Brien 2002-02-01 01:08:48 +0000 110) char sign; usr/src/lib/libc/gen/timezone.c (Bill Joy 1980-12-22 00:40:25 -0800 111) usr/src/lib/libc/gen/timezone.c (Keith Bostic 1987-03-28 19:27:07 -0800 112) for (zp = zonetab; zp->offset != -1;++zp) /* static tables */ usr/src/lib/libc/gen/timezone.c (Keith Bostic 1987-03-28 19:27:07 -0800 113) if (zp->offset == zone) { usr/src/libc/gen/timezone.c (Dennis Ritchie 1979-01-10 14:58:45 -0500 114) if (dst && zp->dlzone) usr/src/libc/gen/timezone.c (Dennis Ritchie 1979-01-10 14:58:45 -0500 115) return(zp->dlzone); usr/src/libc/gen/timezone.c (Dennis Ritchie 1979-01-10 14:58:45 -0500 116) if (!dst && zp->stdzone) usr/src/libc/gen/timezone.c (Dennis Ritchie 1979-01-10 14:58:45 -0500 117) return(zp->stdzone); usr/src/libc/gen/timezone.c (Dennis Ritchie 1979-01-10 14:58:45 -0500 118) }
Figure 6: The oldest surviving code in a 2016 version of FreeBSD Unix (lines 114–118).
As can be seen in Figure
5, a modern version of Unix (FreeBSD 10.2) still contains visible chunks of code from 4.3BSD, 4.3BSD Net/2, and all releases starting from FreeBSD 2.0. Interestingly, the Figure also shows that code developed during the 18-month dash to create an open source operating system out of the code released by Berkeley — 386BSD and FreeBSD 1.0 — does not seem to have survived.
The oldest significant code in the 2016 version of FreeBSD (10.2.0) appears to be an 18-line sequence in the C library file
timezone.c. This was found by running the
git blamecommand on it, which takes a bit more than two minutes to complete on a modern PC. The output (see Figure
6) includes code with changes spanning three decades. The oldest part can also be found in the 7th Edition Unix file with the same name and a time stamp of January 10th, 1979 — 36 years ago.
2.3 GitHub IntegrationFigure 7: Integration of the repository with current GitHub accounts.
All commits included in the repository are associated with a single internet-standard [
Resnick, 2008] email address that can be linked to GitHub accounts. Old-style UCCP addresses (e.g. ) are expressed in domain-name format (). Where more contributors are associated with a commit these are identified through
Co-Authored-By:header-like lines added to the commit's comment. For example, most unaccounted early commits are attributed as instructed in the following quote [
Ritchie, 1984].
The reader will not, on the average, go far wrong if he reads each occurrence of `we' with unclear antecedent as `Thompson, with some assistance from me.'
A simple web-based search engine and a process outlined in the project's README file, allow current GitHub users to associate their past commits with their current GitHub account through the email address listed in the commit. This can be seen in Figure
7: S. R. Bourne's commit (top) is not associated with a GitHub account, Ken Thompson's commit (second from the bottom) is associated with his current GitHub account, while the commits by Dennis Ritchie and J. F. Ossanna are associated with posthumously-created
in memoriamaccounts. Through direct emails and a message posted on
The Unix Heritage Societymailing list past authors were encouraged to link their current GitHub accounts to their past commits. Although some have responded enthusiastically, the response was not overwhelming.
2.4 Known LimitationsResearchers using the provided data set should note some limitations regarding its coverage and fidelity. Where applicable these are discussed in detail in other parts of this work.
Many of the data set's limitations are associated with releases that are imported through snapshots (depicted by square boxes in Figure
8). These are the following.
Other limitations apply to the data set as a whole.
The goal of the work reported here is to consolidate data concerning the history of Unix in a form that helps the study of the system's evolution, by entering them into a modern revision repository. This involves collecting the data, curating them, and synthesizing them into a single Git repository.
The software and data files that were developed as part of this project, are available online,
7and, with appropriate network, CPU, and disk resources, they can be used to recreate the repository from scratch.
Figure 8: Imported Unix snapshots, repositories, and their mergers. (On the right: a model of the synthetic commits between any two snapshots.)
3.1 Primary DataTable 2: Data Sources
The project is based on three types of data (see Figure
8and the corresponding data sources listed in Table
2). First,
snapshots of early released versions, which were obtained from the Unix Heritage Society archive [
Toomey, 2009], the CD-ROM images containing the full source archives of CSRG,
8the OldLinux site, and the FreeBSD archive. These data are represented in the Unix history repository as synthetic commits, based on manually-added and extracted metadata. Second,
past and current repositories, namely the CSRG SCCS repository, the FreeBSD 1 CVS repository, and the Git mirror of modern FreeBSD development. These data were imported into the repository as commits matching the original ones. The last, and most labour intensive, source of data was
primary research, which is discussed in the next section. Information regarding merges between source code bases was obtained from a BSD family tree maintained by the NetBSD project.
9 3.2 AuthorshipTable 3:
Manually-Allocated Contributions in Research Unix Editions
Identifier Name Contributions aho Alfred V. Aho awk, dbm, egrep, fgrep, libdbm ark Andrew Koenig varargs bsb Brenda S. Baker struct bwk Brian W. Kernighan adv, awk, beg, beginners, ctut, ed, edtut, eqn, eqnchar, learn, m4, neqn, rat, ratfor, trofftut, uprog cbh Charles B. Haley regen, setup, tar csr C. S. Roberts tss dan D. A. Nowitz uucp dmr Dennis Ritchie a.out, ar, as, assembler, atan, bcd, c, cacm, cat, cc, cdb, check, chmod, chown, cmp, core, cp, ctime, ctour, date, db, dev, df, dir, dmr, dp, dsw, du, ed, exit, exp, f77, fc, fort, fptrap, getc, getty, glob, goto, hypot, if, init, iolib, iosys, istat, ld, libc, ln, login, ls, m4, man2, man3, man4, mesg, mkdir, mount, mv, nm, od, pr, ptx, putc, regen, rew, rf, rk, rm, rmdir, rp, secur, security, setup, sh, sin, sort, sqrt, strip, stty, su, switch, tp, tty, type, umount, unix, uprog, utmp, who, write, wtmp doug Doug McIlroy diff, echo, graph, join, look, m6, sort, spell, spline, tmg haight Dick Haight expr, find jfm J. F. Maranzano adb jfo Joe Ossanna azel, ed, getty, nroff, ov, roff, s7, stty, troff, wc ken Ken Thompson ar, atan, atof, bas, bj, bproc, cacm, cal, cat, check, chess, chmod, chown, core, cp, dc, dd, df, dir, dli, dp, dsw, dtf, ed, exp, f77, fc, fed, form, fort, fptrap, getty, grep, hypot, implement, init, itoa, ken, libplot, ln, log, login, ls, mail, man, man2, man4, mesg, mkdir, moo, mount, mv, nlist, nm, od, password, plot, pr, qsort, rew, rf, rk, rm, rmdir, roff, rp, sa, sh, sin, sort, sqrt, stty, su, sum, switch, sync, sys, tabs, tp, ttt, tty, umount, uniq, unix, utmp, who, write, wtmp lem Lee E. McMahon comm, cu, grep, qsort, sed llc Lorinda Cherry bc, dc, deroff, eqn, eqnchar, fed, form, neqn mel Michael E. Lesk iolib, learn, lex, ms, msmacros, refer, tbl, tmac, uucp pjw Peter J. Weinberger awk, f77, libI77, libmp, mp rhm Robert Morris atan, bc, crypt, dc, exp, factor, fed, form, libm, m6, man3, password, primes, sky, sqrt schmidt Eric Schmidt lex scj Stephen C. Johnson cc, lint, mip, pcc, porttour, yacc sif S. I. Feldman f77, make srb S. R. Bourne adb, sh, shell xtp Greg Chesson mpx, mpxcall, mpxio, pk[01]Table 4:
Manually-Allocated Contributions in BSD Unix Releases
Identifier Name Contributions arn Rich Newton spice arnold Ken Arnold curses, fortune, fortunes, libcurses cbh Charles B. Haley ex, eyacc, mkstr, pascal, pi, public, px cohen Ellis Cohen where cvw Chris Van Wyk ideal dlw David Wasley libI77uc dop Don O. Pederson spice eric Eric Allman me, memacros, portlib, sendmail, trek, tset erics Eric Shienbrood more frodo T. J. Kowalski fsck honey Peter Honeyman pathalias hpk Howard Katseff box, crazy, froc, last, sdb, sess, syswatch, toc, watch jeff Jeff Schriebman biorhythm, colrm, flt40, linerm, procp, repeat, strip jfr John Reiser as jkf John Foderaro lisp ken Ken Thompson apl, pi, px kurt Kurt A. Shoens fix, fixit, fleece, fmt, funny, lock, mail, Mail, pq, reset, rmtree, ucbmail, vpac lem Lee E. McMahon gres llc Lorinda Cherry diction mark Mark Horton banner, chfn, curses, leave, libcurses, rewind, script, ul, w mckusick Kirk McKusick gprof, num mike Mike Tilson tmac, vcat mja Mike Accetta enet, pty, tty_pty ozalp Ozalp Babaoglu analyze, locore, vm, vmstat, vmunix peter Peter B. Kessler gprof presott David Presotto vgrind rrh Robert R. Henry as schmidt Eric Schmidt berknet, net, netcp, netlpr, netmail, netq, netrm sif S. I. Feldman efl sklower Keith Sklower arff, flcopy, libNS tbl Tom London liszt td Tom Duff tmac, vcat toy Michael Toy 33, libretro, num, rogue, shutdown, termcap, termlib tuck Richard Tuck arff, flcopy wnj Bill Joy analyze, apropos, ashell, cat3a, chessclock, chownall, colcrt, collpr, cptree, cr3, csh, cshms, cxref, dates, diffdir, double, dribble, edit, ex, ex-1, expand, exrecover, exrefm, eyacc, fold, from, glob2, head, htmp, htmpg, htmps, iul, list, lntree, locore, ls, makeTtyn, man, manwhere, mkstr, msgs, nm, num, number, osethome, pascal, pascals, pcc, pi, pi0, pi1, pix, print, Print, puman, px, pxp, pxref, rout, see, sethome, sh, sidebyside, size, soelim, squash, ssp, strings, strip, termcap, termlib, tests, tra, transcribe, ttycap, ttycap2, Ttyn, ttytype, typeof, ulpr, vgrind, vi, vm, vmstat, vmunix, wc, whatis, whereis, whoami, whoison, xstr x-br Bill Reeves tmac, vcat x-clm Colin L. Mc Master ccat, compact, uncompact x-dl Douglas Lanam apl x-dw David Willcox indent x-etc Earl T. Cohen finger x-im Ivan Maltz ticktock x-jp Juan Porcar locore, vm, vmunix x-le Len Edmondson lastcomm x-or Olivier Roubine dribble x-rd R. Dowell spice x-rh Ross Harvey apl x-rt Robert Toxen todThe release snapshots do not provide information regarding their ancestors and the contributors of each file. Therefore, these pieces of information had to be determined through primary research. The authorship information was mainly obtained:
Precise details regarding the source of the authorship information are documented in the project's files that are used for mapping Unix source code files to their authors and the corresponding commit messages.
# 2. http://www.cs.bell-labs.com/who/doug/index.html # "Text- and data-processing utilities: # spell, diff, sort, join, graph, speak, etc." usr/src/cmd/diff.* doug usr/src/cmd/graph\.c doug usr/src/cmd/join\.c doug usr/src/cmd/spell/.* doug bin/spell doug # 3. [Morris] was also the author of the series of crypt programs # that came with early Unix, including the final one distributed with the # Seventh Edition # http://cm.bell-labs.com/cm/cs/who/dmr/crypt.html usr/man/man1/crypt\.1 rhm usr/man/man3/crypt\.3 rhm usr/src/cmd/crypt\.c rhm usr/src/libc/gen/crypt\.c rhm # 5. Volume 2 of the manual (supplementary documents) # Based on the authors listed in each document usr/doc/adb/.* jfm,srb usr/doc/adv.ed/.* bwk usr/doc/assembler dmr usr/doc/awk aho,pjw,bwk
Figure 9: Example specifications of file authorship
The authorship information for major releases is stored in files under the project's
author-pathdirectory. These contain lines with a regular expressions for a file path followed by the identifier of the corresponding author (Figure
9). Multiple authors can also be specified. The regular expressions are processed sequentially, so that a catch-all expression at the end of the file can specify a release's default authors.
Listing 1:
Retrieving authorship information from documentation files
# Location of the Volume 2 documentation cd archive/v7/usr/doc # Find all files find . -type f - # List those containing the .AU macro xargs fgrep .AU - # Create path regular expressions sed -n 's/^\.\/\([^-\/:]*\)\([:/]\).*/\/usr\/doc\/\1\2\.*/pp' - # Eliminate wildcard for single files sed 's/:\.\*//;s/ //' - # Remove duplicates sort -u # Find all files find . -type f - # List two lines of context around the .AU macro xargs fgrep -A 2 .AU
./adb/tut:.AU "MH2F-207" "3816" ./adb/tut-J. F. Maranzano ./adb/tut:.AU "MH2C-512" 7419 ./adb/tut-S. R. Bourne ./adb/tut-.AI -- ./adv.ed/ae0:.AU "MH 2C518" 6021 ./adv.ed/ae0-Brian W. Kernighan ./adv.ed/ae0-.AI -- ./assembler:.AU ./assembler-Dennis M. Ritchie ./assembler-.AI -- ./awk:.AU "MH 2C-522" 4862 ./awk-Alfred V. Aho ./awk:.AU "MH 2C-518" 6021 ./awk-Brian W. Kernighan ./awk:.AU "MH 2C-514" 7214 ./awk-Peter J. Weinberger ./awk-.AI
Figure 10: Author names as listed in Unix documentation files
As an example on how file authorship was collected and processed, consider the authors of the documentation files comprising Volume 2 of the
Unix Programmer's Manualin the 7th Research Edition. These files contain the names of their authors using the
troffmarkup macro
.AU. The path regular expressions for the corresponding files were obtained through the shell commands shown in Listing
1lines 4–13. The output were lines similar to what appears on the left column of Figure
9. Then, the author names were listed with the commands shown in lines 15–18 of Listing
1. The generated output, such as the one appearing in Figure
10, was then used to fill-in by hand the author identifiers appearing on the right column of Figure
9. The authorship could then be propagated to the corresponding source code and Volume 1 manual pages.
# Email address template %A $@research.uucp # Id (used in path maps):Full name:email aho:Alfred V. Aho bsb:Brenda S. Baker bwk:Brian W. Kernighan csr:C. S. Roberts dan:D. A. Nowitz dmr:Dennis Ritchie doug:Doug McIlroy jfm:J. F. Maranzano jfo:Joe Ossanna [...] schmidt:Eric Schmidt:schmidt@ucbvax.Berkeley.EDU
Figure 11: Specifications of author details
To avoid repetition, a separate file with a
.ausuffix is used to map author identifiers into their names and emails (Figure
11). One such file has been created for every community associated with the system's evolution: Bell Labs (
bell.au), Berkeley (
berkeley.au), 386BSD(
386bsd.au), and FreeBSD (
freebsd.au). For the sake of authenticity, emails for the early Bell Labs releases are listed using the UUCP [
Quarterman and Hoskins, 1986] top-level pseudo-domain, e.g.
ken@research.uucp.
The FreeBSD author identifier map, required for importing the early CVS repository, was constructed by extracting the corresponding data from the project's modern Git repository, which includes the full names of modern committers. In addition, the Unix
fingercommand was used on a computer hosting FreeBSD Project developers, to obtain the full names of another 60 contributors. In total the commented authorship files (897 rules) comprise 1215 lines, and there are another 988 lines mapping author identifiers to names.
3.3 ProcessingThe processing of the project's data sources has been codified into a 190-line
Makefile. The processing involves five steps: data fetching, tool construction, data unpacking, data cleaning, and repository creation. The following paragraphs summarize how each step is performed.
Data fetchinginvolves copying and cloning about 11GB of images, archives, and repositories from remote sites. Some of the snapshots used are available as compressed
taror
cpioarchives (sometimes split into multiple files), while others are available as (or can be converted into) CD-ROM images.
Under
tool constructionan archiver required for processing old PDP-11 archives on modern platforms is compiled from source. The archiver's code stems from 2.9BSD. It was subsequently modified to work on non-PDP-11 architectures.
11Further modifications introduced as part of the work reported here include changes to make it preserve the modification time of the extracted files and adjustments to allow its warning-free compilation under Linux.
The
data unpackingof the archives is mainly performed using
tarand
cpio. In addition, three 6th Research Edition directories are combined into one and all 1BSD archives are unpacked using the old PDP-11 archiver. Furthermore, the 8 and 62 386BSD floppy disk images are combined into two separate files. Finally, all CD-ROM images are made accessible so that they can be processed as file systems. This is done by mounting them via a loop-back device, which makes their contents (read-only) accessible as regular files.
The
data cleaninginvolves tasks required to bring the data into a state suitable for the repository import tools. These are:
Finally, the synthesis of the various data sources into the single
Unix history Git repositoryis performed by two scripts: A Perl script to feed Git with data and a shell script to invoke it for each data set.
The 780-line Perl script (
import-dir.pl) can export the (real or synthesized) commit history from a single data source (snapshot directory, SCCS repository, or Git repository) in the
Git fast exportformat.
The script takes as input a number of obligatory and optional arguments. These are used to specify:
Listing 2:
Example of generated Git fast import data
1 # 315830189 ../archive/3bsd/usr/src/cmd/ex/ex_addr.c 2 blob 3 mark :3 4 data 5190 5 /* Copyright (c) 1979 Regents of the University of California */ 6 #include ëx.h" 7 #include ëx_re.h" 8 [...] 9 10 # Start development commits from a clean slate 11 commit refs/heads/BSD-3-Snapshot-Development 12 mark :10 13 author Bill Joy <wnj@ucbvax.Berkeley.EDU> 287674317 -0800 14 committer Bill Joy <wnj@ucbvax.Berkeley.EDU> 287674317 -0800 15 data 99 16 Start development on BSD 3 17 Create reference copy of all prior development files 18 (Synthetic commit) 19 merge Bell-32V 20 merge BSD-2 21 M 100644 1468bde18e292c07e5d30ecbd7fd2b91a60e4626 .ref-Bell-32V/usr/include/stat.h 22 M 100644 1468bde18e292c07e5d30ecbd7fd2b91a60e4626 .ref-Bell-32V/usr/include/sys/stat.h 23 M 100644 816685f1f60f44dfaed7e673294b9d07a12114e5 .ref-Bell-32V/usr/man/man2/open.2 24 [...] 25 26 # 315830189 ../archive/3bsd/usr/src/cmd/ex/ex_addr.c 27 commit refs/heads/BSD-3-Snapshot-Development 28 mark :13 29 author Bill Joy <wnj@ucbvax.Berkeley.EDU> 315830189 -0800 30 committer Bill Joy <wnj@ucbvax.Berkeley.EDU> 315830189 -0800 31 data 75 32 BSD 3 development 33 Work on file usr/src/cmd/ex/ex_addr.c 34 (Synthetic commit) 35 M 100644 :3 usr/src/cmd/ex/ex_addr.c 36 [...] 37 38 # Release 39 commit refs/heads/BSD-Release 40 mark :3700 41 author Bill Joy <wnj@ucbvax.Berkeley.EDU> 315928541 -0800 42 committer Bill Joy <wnj@ucbvax.Berkeley.EDU> 315928541 -0800 43 data 78 44 BSD 3 release 45 Snapshot of the completed development branch 46 (Synthetic commit) 47 from :3699 48 merge Bell-32V 49 merge BSD-2 50 D .ref-Bell-32V 51 D .ref-BSD-2 52 53 tag BSD-3 54 from :3700 55 tagger Bill Joy <wnj@ucbvax.Berkeley.EDU> 315928541 -0800 56 data 91 57 Tagged 3 release snapshot of BSD with 3 58 Source directory: ../archive/3bsd 59 (Synthetic tag) 60 61 done
The command produces output in the so-called
Git fast importformat; a simple text-based stream format that many Git tools use to import and export data. An excerpt of this format can be seen in Listing
2, though its contents will be explained later.
An interesting part of the repository representation is how snapshots are imported and linked together in a way that allows
git blameto identify old code in newer file versions. Snapshots are imported into the repository as sequential commits based on the time stamp of each file. When all files have been imported, the repository is tagged with the name of the corresponding release. At that point these files could be deleted, and the import of the next snapshot could begin. Note that the
git blamecommand works by traversing backwards a repository's history, and using heuristics to detect code moves and copies within or across files. Consequently, deleted snapshot files would create a discontinuity between snapshots, and prevent the tracing of code between them.
Instead, before the next snapshot is imported, all the files of the preceding snapshot are moved into a hidden reference look-aside directory named
.ref. (See the expanded synthetic commit series appearing on the right of Figure
8.) They remain there, until all files of the next snapshot have been imported, at which point they are deleted. Because every file in the
.refdirectory matches exactly an original file,
git blamecan determine how source code moves from one version to the next via the
.reffile, without ever displaying the
.reffile. To further help the detection of code provenance, and to increase the representation's realism, each release is represented as a merge between the branch with the incremental file additions (
-Development) and the preceding release.
The small example of the
Git fast importdata seen in Listing
2demonstrates the concepts described in the preceding paragraphs. The data stream begins with the contents of files that will be stored in the repository. These are specified using the
blobcommand (lines 1–8). For debugging purposes the name and timestamp of the file from which the data were taken are first listed as a
#line comment (line 1). The embedded
markcommand (line 3) associates the number 3 with the contents. These are specified through the
datacommand (line 4). Its argument specifies the number of bytes supplied, which follow after a newline (e.g. lines 5–8).
Following the definitions of data elements come the commits associated with the import in the
BSD-3-Snapshot-Developmentbranch. The first commit (lines 10–24) creates a reference copy of the previous snapshot's files by moving them to a hidden directory (
.ref-Bell-32V) by means of the
M — filemodifycommand (lines 21–24). This changes the path of the old blob object identified through its SHA-1 hash to the one specified. The number
100644(an octal representation similar to the Unix file mode) specifies that this is a normal (non-executable) file. The associated branch is given as an argument to the
commitcommand (line 11). The snapshot being imported (in this case
BSD-3) is identified as a merge of two preceding snapshots (
Bell-32Vand
BSD-2) using the
mergecommand (lines 19–20). The name and email associated with the commit's author and committer and the corresponding timestamps (in seconds since 1970 and UTC offset) are given in lines 13–14, while the commit's message is specified with a
datacommand in lines 15–18.
Then comes a series of commits that add files to the repository. The commits are ordered according to the timestamps of the corresponding files. The example listed in lines 26–36 creates the file
ex_addr.c. This is again specified with an
M — filemodifycommand, which now refers to the blob (3) with the file's contents. The branch, author, committer, file mode, and commit message are specified in the same way as in the previous commit.
The last commit in a snapshot import (lines 38–51) marks a logical point on the
BSD-Releasebranch. This is defined as a merge between the last commit in the
BSD-3-Snapshot-Developmentbranch (marked as 3699 and identified with the
fromcommand in line 47) and the two preceding snapshots (
Bell-32Vand
BSD-2— lines 48–49). At this point two
D — filedeletecommands remove the refence file copies that were created at the beginning (lines 50–51).
Finally, a
tagcommand (lines 53–59) associates a symbolic name with the release, and the
donecommand (line 61) signals the stream's end.
A 620-line shell script (
import.sh) creates the Git repository and calls the Perl script with appropriate arguments to import each one of the approximately 30 available historical data sources (see Table
2). As an example, consider the following (slightly simplified) invocation.
perl ../import-dir.pl $VERBOSE -m Bell-32V,BSD-2 \ -c ../author-path/BSD-3 -n ../berkeley.au -r Bell-32V,BSD-2 \ -i ../ignore/BSD-3 -u ../unmatched/BSD-3 $ARCHIVE/3bsd \ BSD 3 -0800 | git fast-import --stats --done --quiet
The preceding shell command runs the import script over the
3bsdsnapshot to create version 3 of the
BSDbranch. This will appear as a merge between the tags
Bell-32Vand
BSD-2, whose files will also be retained until all the snapshot's files have been imported. The file
author-path/BSD-3specifies the authorship of each file and
berkeley.authe details of the authors. Files listed in
ignore/BSD-3will not be imported and files matched with a wildcard (
.*) authorship pattern will be listed in
unmatched/BSD-3. The command's output is piped into the
git fast-importcommand to convert it into actual Git commits.
For a period in the 1980s, only a subset of the files developed at Berkeley were under SCCS version control. During that period the Unix history repository contains imports of both the SCCS commits, and the snapshots' incremental additions. At the point of each release, the SCCS commit with the nearest time stamp is found and is marked as a merge with the release's incremental import branch. These merges can be seen in the center of Figure
8.
The import shell script also inserts into all imported versions of Unix diverse licensing files and a file named
README.mdwhich, among other things, contains the Git SHA sum of the software that created the repository and a timestamp of the import process. Provided the data sources are not modified, this allows the Unix repository to be uniquely identified in a replicable fashion.
The shell script also runs 30 tests that compare the repository at specific tags against the corresponding data sources, verify the appearance and disappearance of look-aside directories, and look for regressions in the count of tree branches and merges and the output of
git blameand
git log.
Before pushing the created repository to GitHub,
gitis called to garbage-collect and compress the repository from its initial 6.1GB size down to the distributed 1.1GB.
4 Data UsesThe Unix history repository can be used for empirical research in software engineering, information systems, and software archeology. Through its unique uninterrupted coverage of a period of more than 40 years, it can inform work on software evolution and handovers across generations. With thousandfold increases in processing speed and million-fold increases in storage capacity during that time, the data set can also be used to study the co-evolution of software and hardware technology.
Figure 12: Code style evolution along Unix releases.
As one concrete example, Figure
12depicts trend lines of some interesting code metrics along 36 major releases of Unix. It demonstrates the evolution of code style and programming language use over very long timescales. This evolution can be driven by software and hardware technology affordances and requirements, software construction theory, and even social forces. The Figure was obtained with R's local polynomial regression fitting function. The dates in the Figure have been calculated as the average date of all files appearing in a given release. As can be seen in it, over the past 40 years the mean length of identifiers has steadily increased from 4 characters to 7 and mean length of file names has increased from 6 characters to 11. We can also see less steady increases in the number of comments and decreases in the use of the
gotostatement, as well as the virtual disappearance of the
registertype modifier. Based on these observations made in an exploratory study [
Spinellis et al, 2015] a follow-up work [
Spinellis et al, 2016] used the Unix history repository to examine seven concrete hypotheses. By extracting, aggregating, and synthesizing metrics from 66 snapshots in the period covered by the repository it was found that over the years developers of the Unix operating system appear to have evolved their coding style in tandem with advancements in hardware technology, promoted modularity to tame rising complexity, adopted valuable new language features, allowed compilers to allocate registers on their behalf, and reached broad agreement regarding code formatting. The reported work also showed that many trends point toward increasing code quality through adherence to numerous programming guidelines, that some other trends indicate adoption that has reached maturity, and that in the area of code commenting progress appears to have stalled.
Figure 13: Exponential decay of Unix source code.
As a second example, Figure
13shows the distribution of minimum and maximum lifespan estimates of a line of code. The estimates were obtained as follows. First
git blame(with
-w -C -C -Cparameters) was run on all (1.5 million) source code files of 71 Unix releases selected in the repository. This task used considerable computing resources: 9.9 core years CPU time, 3,815 cores, 7.6 TB RAM, and 588 GB of disk space. Its execution was made possible by running it, as a set of tasks scheduled through SLURM [
Yoo et al, 2003], on a supercomputer (IBM NeXtScale nx360M5, Intel Xeon E5-2680v2 10C 2.8GHz, Infiniband FDR14, 8,520 cores, 170 TFLOP/s). The run associated, with each line of code of each release, a timestamp indicating the time the line was last modified. By identifying the first release where a line of code stopped appearing, it was possible to estimate the minimum and maximum bounds of that line's lifespan in its initial form. The line was considered to "die" (it was removed or modified) sometime between the preceding release and the one where it stopped appearing. In total, minimum estimates were obtained for 117 million lines and maximum estimates for 89 million lines. (The minimum estimates also include lines that survived until the last available release.) Linear regression on the logarithm of the surviving lines of code
land their lifespan
tindicates (
R2= 0.73;
p= 2.2 ×10
−16) that the code's decay matches the following exponential model.
Based on the lifespan's median value, we can bound the half-life of a line of code somewhere between 2.4 days and 9 years.
Apart from the preceding two concrete examples, many more areas of research present themselves. The move of the software's development from research labs, to academia, and to the open source community can be used to study the effects of organizational culture on software development. In that area an additional branch from Unix 32/V with System III, System v, and
illumoscould trace the evolution of Unix in corporate hands and its transition to another open source community.
The repository can also be used to study how notable individuals, such as Turing Award winners (Dennis Ritchie and Ken Thompson) and captains of the IT industry (Bill Joy and Eric Schmidt), actually programmed. Another phenomenon worthy of study concerns the longevity of code, either at the level of individual lines, or as complete systems that were at times distributed with Unix (Ingres, Lisp, Pascal, Ratfor, Snobol, TMG), as well as the factors that lead to code's survival or demise.
Finally, because the data set stresses Git, the underlying software repository storage technology, to its limits, it can be used to drive engineering progress in the field of revision management systems.
5 Contributing ExtensionsThe Unix history repository is managed as an open source project. The project can benefit from the addition of authorship information and entirely new data sources. Both can be contributed as changes to the repository containing the creation code; ideally as GitHub pull requests.
Adding authorship data for code that is imported via snapshots involves adding the author's login identifier, full name, and email (if different from the default for the corresponding community) in the author file associated with the repository:
386bsd.au,
bell.au,
berkeley.au, or
freebsd.au. The fields are colon-separated; see Figure
11. Then, records must be added in the repository's authorship data file, which is located in the
author-pathdirectory. Each record consists of a regular expression that matches one or more files in the repository, followed by the login identifier of the files' author (see Figure
9). For example the following two lines identify Alfred Aho as the author of all files in the
libdbmdirectory and Doug McIlroy as the author of the file
join.c. (Note the escaped "
."). Records are matched from top to the bottom of the file, so more specific patterns should be listed before more general ones.
usr/src/libdbm/.* aho usr/src/cmd/join\.c doug
Both files allow comments starting with a "
#" character. The associated comments and commit messages should clearly indicate the attribution's justification, e.g. a pointer to a publication or a timestamped excerpt of a personal communication. Following the change, the consistency of the added data should be verified by running
import.sh -VI, the Unix history repository should be rebuilt, and the differences in the
unmatcheddirectory files should be closely examined to verify that they match the change made.
Adding a completely new release data source is more involved. First, note that the corresponding data should be legally available for further redistribution. For example, although various snapshots of System III and beyond seem to be floating around the internet, including them in the repository is not currently possible, because Caldera's license explicitly excludes them. In brief, the steps required are the following.
Many things can be done to increase the repository's faithfulness and usefulness. Given that the build process is shared as open source code, it is easy to contribute additions and fixes through GitHub pull requests. The most useful community contribution would be to increase the coverage of imported snapshot files that are attributed to a specific author. Currently, about 81 thousand snapshot commits (10% out of a total of 496 thousand commits) are getting assigned an author through a default rule. Similarly, there are about 40 authors (primarily early FreeBSD ones, responsible for 4,974 commits - 1.6% of the total) for which only the identifier is known. Both are listed in the build repository's
unmatcheddirectory, and contributions are welcomed. Furthermore, the BSD SCCS and the FreeBSD CVS commits that share the same author and time-stamp can be coalesced into a single Git commit. Support can be added for importing the SCCS file comment fields, in order to bring into the repository the corresponding metadata. Finally, and most importantly, more branches of open source systems can be added, such as
Plan 9 from Bell Labs, NetBSD OpenBSD, DragonFlyBSD, and
illumos. Ideally, current right holders of other important historical Unix releases, such as System III, System V, NeXTSTEP, and SunOS, will release their systems under a license that would allow their incorporation into this repository for study.
AcknowledgementsThe author thanks the many individuals who contributed, directly or indirectly, to the effort. John Cowan, Brian W. Kernighan, Larry McVoy, Doug McIlroy, Jeremy C. Reed, Aharon Robbins, and Marc Rochkind helped with Bell Labs login identifiers. Clem Cole, John Cowan, Era Eriksson, Mary Ann Horton, Warner Losh, Kirk McKusick, Jeremy C. Reed, Ingo Schwarze, Anatole Shaw, and Norman Wilson helped with BSD login identifiers and code authorship information. The historical and current material used in the repository was made available thanks to efforts by the FreeBSD Project, Lynne Greer Jolitz, William F. Jolitz, Kirk McKusick, and the Unix Heritage Society. The early Unix editions were released under an BSD-style license thanks to the efforts of Bill Broderick, Paul Hatch, Dion L. Johnson II, Ransom Love, and Warren Toomey. The BSD SCCS import code is based on work by H. Merijn Brand and Jonathan Gray. The
newoldarprogram is a result of work by Brandon Creighton and Dan Frasnelli. The First Research Edition Unix was restored by Johan Beiser, Tim Bradshaw, Brantley Coile, Christian David, Alex Garbutt, Hellwig Geisse, Cyrille Lefevre, Ralph Logan, James Markevitch, Doug Merritt, Tim Newsham, Brad Parker, and Warren Toomey.
ReferencesThe work has been partially funded by the Research Centre of the Athens University of Economics and Business, under the Original Scientific Publications framework (project code EP-2279-01) and supported by computational time granted from the Greek Research & Technology Network (GRNET) in the National HPC facility — ARIS — under project ID PA003005-CDOLPOT. .
2https://github.com/dspinellis/unix-history-repo 3Updates may add or modify material. To ensure replicability the repository's users are encouraged to fork it on GitHub or archive it.
4https://archive.org/details/git-history-of-linux 5The dates provided here are given by
Salus [1994],p. 43.
6http://www.tuhs.org/Archive/PDP-11/Distributions/research/1972_stuff/ 7https://github.com/dspinellis/unix-history-make 8https://www.mckusick.com/csrg/ 9http://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/share/misc/bsd-family-tree 10http://unix.stackexchange.com/questions/64025/who-are-these-bsd-unix-contributors 11ftp://ftp.tuhs.org.ua/PDP-11/Tools/Tapes/newoldar.c 12https://github.com/jonathangray/csrg-git-patches/RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3