A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://en.wikipedia.org/wiki/Canterbury_corpus below:

Canterbury corpus - Wikipedia

From Wikipedia, the free encyclopedia

The Canterbury corpus is a collection of files intended for use as a benchmark for testing lossless data compression algorithms. It was created in 1997 at the University of Canterbury, New Zealand and designed to replace the Calgary corpus. The files were selected based on their ability to provide representative performance results.[1]

In its most commonly used form, the corpus consists of 11 files, selected as "average" documents from 11 classes of documents,[2] totaling 2,810,784 bytes as follows.

Size (bytes) File name Description 152,089 alice29.txt English text 125,179 asyoulik.txt Shakespeare 24,603 cp.html HTML source 11,150 fields.c C source 3,721 grammar.lsp LISP source 1,029,744 kennedy.xls Excel spreadsheet 426,754 lcet10.txt Technical writing 481,861 plrabn12.txt Poetry (Paradise Lost) 513,216 ptt5 CCITT test set 38,240 sum SPARC executable 4,227 xargs.1 GNU manual page

The University of Canterbury also offers the following corpora. Additional files may be added, so results should be only reported for individual files.[3]

±

Standard test items

Artificial intelligence Television (test card) Computer languages Data compression 3D computer graphics Machine learning Typography (filler text) Other

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4