A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://deeptools.readthedocs.io/en/develop/content/help_glossary.html below:

Website Navigation


Glossary of NGS terms — deepTools 3.5.6 documentation

Glossary of NGS terms

Like most specialized fields, next-generation sequencing has inspired many an acronyms. We are trying to keep track of those Abbreviations that we heavily use. Do make us aware if something is unclear by opening an issue on github

Abbreviations

Reference genomes are usually referred to by their abbreviations, such as:

For a more comprehensive list of available reference genomes and their abbreviations, see the UCSC data base.

Acronym

full phrase

Synonyms/Explanation

<ANYTHING>-seq

-sequencing

indicates that an experiment was completed by DNA sequencing using NGS

ChIP-seq

chromatin immunoprecipitation sequencing

NGS technique for detecting transcription factor binding sites and histone modifications (see entry Input for more information)

DNase

deoxyribonuclease I

DNase I digestion is used to determine active (“open”) chromatin regions

HTS

high-throughput sequencing

next-generation sequencing, massive parallel short read sequencing, deep sequencing

MNase

micrococcal nuclease

MNase digestion is used to determine sites with nucleosomes

NGS

next-generation sequencing

high-throughput (DNA) sequencing, massive parallel short read sequencing, deep sequencing

RPGC

reads per genomic content

normalize reads to 1x sequencing depth, sequencing depth is defined as: (mapped reads x fragment length) / effective genome size

RPKM

reads per kilobase per million reads

normalize read numbers: RPKM (per bin) = reads per bin / ( mapped reads (in millions) x bin length (kb))

For a review of popular *-seq applications, see Zentner and Henikoff.

NGS and generic terminology

The following are terms that may be new to some:

bin Input read File Formats

Data obtained from next-generation sequencing data must be processed several times. Most of the processing steps are aimed at extracting only that information needed for a specific down-stream analysis, with redundant entries often discarded. Therefore, specific data formats are often associated with different steps of a data processing pipeline.

Here, we just want to give very brief key descriptions of the file, for elaborate information we will link to external websites. Be aware, that the file name sorting here is alphabetical, not according to their usage within an analysis pipeline that is depicted here:

Follow the links for more information on the different tool collections mentioned in the figure:

samtools | UCSCtools | BEDtools |

2bit BAM BED bedGraph
chr1 10 20 1.5
chr1 20 30 1.7
chr1 30 40 2.0
chr1 40 50 1.8
bigWig FASTA
>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
 LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV
 EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG
 LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL
 GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX
 IENY
FASTQ SAM SAM alignment section

Warning

Although the SAM/BAM format is rather meticulously defined and documented, whether an alignment program will produce a SAM/BAM file that adheres to these principles is completely up to the programmer. The mapping score, CIGAR string, and particularly, all optional flags (fields >11) are often very differently defined depending on the program. If you plan on filtering your data based on any of these criteria, make sure you know exactly how these entries were calculated and set!


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4