A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://en.wikipedia.org/wiki/Silesia_corpus below:

Silesia corpus - Wikipedia

From Wikipedia, the free encyclopedia

The Silesia corpus is a collection of files intended for use as a benchmark for testing lossless data compression algorithms. It was created in 2003 as an alternative for the Canterbury corpus and Calgary corpus, based on concerns about how well these represented modern files. It contains various data types, including large text documents, executable files, and databases. [1]

The corpus consists of 12 files, totaling 211MB. The files were chosen to represent what the author considered to be data types likely to grow rapidly in size over time, such as computer programs and databases, along with more traditional compression benchmarks, such as large text files. [1]

Overview of files, their sizes, descriptions, and data types File Size (B) Description Type of data dickens 10192446 The works of Charles Dickens English text mozilla 51220480 Executable files for Mozilla 1.0 Executable mr 9970564 MRI Images 3D image nci 33553445 A database of chemical structures Database office 6152192 A shared library from OpenOffice Executable osdb 10085684 A Sample MySQL database from the Open Source Database Benchmark Database reymont 6625583 The text of the book Chłopi by Władysław Reymont PDF in Polish samba 21606400 The source code of Samba 2‑2.3 Executable sao 7251944 The SAO star catalogue Binary database webster 41458703 The 1913 Webster Unabridged Dictionary HTML xml 5345280 Collected XML files XML x-ray 8474240 A medical X-Ray Image Total 211938580

Because it has a broader and more modern selection of datatypes, it is considered a better source of test data for compression algorithms when compared to the Calgary corpus.[2]

±

Standard test items

Artificial intelligence Television (test card) Computer languages Data compression 3D computer graphics Machine learning Typography (filler text) Other

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4