RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://doi.org/10.1007/978-3-642-24583-1_41 below:

Reference Sequence Construction for Relative Compression of Genomes

Abstract

Relative compression, where a set of similar strings are compressed with respect to a reference string, is an effective method of compressing DNA datasets containing multiple similar sequences. Moreover, it supports rapid random access to the underlying data. The main difficulty of relative compression is in selecting an appropriate reference sequence. In this paper, we explore using the dictionary of repeats generated by COMRAD, RE-PAIR and DNA-X algorithms as reference sequences for relative compression. We show that this technique allows for better compression, and allows more general repetitive datasets to be compressed using relative compression.

This work was supported by the Royal Society and the NICTA Victorian Research Laboratory. NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Center of Excellence program.

This is a preview of subscription content, log in via an institution to check access.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others References

Bentley, J., McIlroy, D.: Data compression using long common strings. In: Proc. Data Compression Conference (DCC 1999), pp. 287–295 (1999)

Google Scholar
Brandon, M., Wallace, D., Baldi, P.: Data structures and compression algorithms for genomic sequence data. Bioinformatics 25(14), 1731–1738 (2009)

Article Google Scholar
Cao, M.D., Dix, T., Allison, L., Mears, C.: A simple statistical algorithm for biological sequence compression. In: Proc. Data Compression Conference (DCC 2007), pp. 43–52 (2007)

Google Scholar
Chen, X., Li, M., Ma, B., Tromp, J.: DNACompress: fast and effective DNA sequence compression. Bioinformatics 18(12), 1696–1698 (2002)

Article Google Scholar
Grabowski, S., Deorowicz, S.: Engineering relative compression of genomes (2011), http://arxiv.org/abs/1103.2351v1
Grumbach, S., Tahi, F.: A new challenge for compression algorithms: Genetic sequences. Information Processing & Management 30(6), 875–886 (1994)

Article MATH Google Scholar
Kreft, S., Navarro, G.: Self-indexing based on LZ77. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (to apppear, 2011)

Chapter Google Scholar
Kuruppu, S., Beresford-Smith, B., Conway, T., Zobel, J.: Iterative dictionary construction for compression of large DNA datasets. IEEE/ACM Transactions on Computational Biology and Bioinformatics (to appear, 2011)

Google Scholar
Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative lempel-ziv compression of genomes for large-scale storage and retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)

Chapter Google Scholar
Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative Lempel-Ziv compression of genomes. In: Proc. 34th Australasian Computer Science Conference (ACSC 2011), pp. 91–98 (2011)

Google Scholar
Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Proc. Data Compression Conference (DCC 1999), pp. 296–305 (1999)

Google Scholar
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Computational Biology 17(3), 281–308 (2010)

Article MathSciNet Google Scholar
Manzini, G., Rastero, M.: A simple and fast DNA compressor. Software: Practice and Experience 34(14), 1397–1411 (2004)

Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

Article MathSciNet MATH Google Scholar

Download references

Author information Authors and Affiliations

National ICT Australia Department of Computer Science & Software Engineering, University of Melbourne, Australia

Shanika Kuruppu & Justin Zobel
Department of Informatics, King’s College London, United Kingdom

Simon J. Puglisi

Authors

Shanika Kuruppu
Simon J. Puglisi
Justin Zobel

Editor information Editors and Affiliations

Università di Pisa, Italy

Roberto Grossi
Consiglio Nazionale delle Ricerche, Area della Ricerca di Pisa, Istituto di Scienza e Tecnologia dell’Informazione “Alessandro Faedo”, Via Giuseppe Moruzzi 1, 56124, Pisa, Italy

Fabrizio Sebastiani & Fabrizio Silvestri &

About this paper Cite this paper

Kuruppu, S., Puglisi, S.J., Zobel, J. (2011). Reference Sequence Construction for Relative Compression of Genomes. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds) String Processing and Information Retrieval. SPIRE 2011. Lecture Notes in Computer Science, vol 7024. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24583-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-642-24583-1_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24582-4
Online ISBN: 978-3-642-24583-1
eBook Packages: Computer ScienceComputer Science (R0)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4