Item in Clipboard
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2Michael I Love et al. Genome Biol. 2014.
doi: 10.1186/s13059-014-0550-8.Item in Clipboard
AbstractIn comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html webcite.
FiguresFigure 1
Shrinkage estimation of dispersion. Plot…
Figure 1
Shrinkage estimation of dispersion. Plot of dispersion estimates over the average expression strength…
Figure 1Shrinkage estimation of dispersion. Plot of dispersion estimates over the average expression strength (A) for the Bottomly et al. [16] dataset with six samples across two groups and (B) for five samples from the Pickrell et al. [17] dataset, fitting only an intercept term. First, gene-wise MLEs are obtained using only the respective gene’s data (black dots). Then, a curve (red) is fit to the MLEs to capture the overall trend of dispersion-mean dependence. This fit is used as a prior mean for a second estimation round, which results in the final MAP estimates of dispersion (arrow heads). This can be understood as a shrinkage (along the blue arrows) of the noisy gene-wise estimates toward the consensus represented by the red line. The black points circled in blue are detected as dispersion outliers and not shrunk toward the prior (shrinkage would follow the dotted line). For clarity, only a subset of genes is shown, which is enriched for dispersion outliers. Additional file 1: Figure S1 displays the same data but with dispersions of all genes shown. MAP, maximum a posteriori; MLE, maximum-likelihood estimate.
Figure 2
Effect of shrinkage on logarithmic…
Figure 2
Effect of shrinkage on logarithmic fold change estimates. Plots of the (A) MLE…
Figure 2Effect of shrinkage on logarithmic fold change estimates. Plots of the (A) MLE (i.e., no shrinkage) and (B) MAP estimate (i.e., with shrinkage) for the LFCs attributable to mouse strain, over the average expression strength for a ten vs eleven sample comparison of the Bottomly et al. [16] dataset. Small triangles at the top and bottom of the plots indicate points that would fall outside of the plotting window. Two genes with similar mean count and MLE logarithmic fold change are highlighted with green and purple circles. (C) The counts (normalized by size factors s j) for these genes reveal low dispersion for the gene in green and high dispersion for the gene in purple. (D) Density plots of the likelihoods (solid lines, scaled to integrate to 1) and the posteriors (dashed lines) for the green and purple genes and of the prior (solid black line): due to the higher dispersion of the purple gene, its likelihood is wider and less peaked (indicating less information), and the prior has more influence on its posterior than for the green gene. The stronger curvature of the green posterior at its maximum translates to a smaller reported standard error for the MAP LFC estimate (horizontal error bar). adj., adjusted; LFC, logarithmic fold change; MAP, maximum a posteriori; MLE, maximum-likelihood estimate.
Figure 3
Stability of logarithmic fold changes.…
Figure 3
Stability of logarithmic fold changes. DESeq2 is run on equally split halves of…
Figure 3Stability of logarithmic fold changes. DESeq2 is run on equally split halves of the data of Bottomly et al. [16], and the LFCs from the halves are plotted against each other. (A) MLEs, i.e., without LFC shrinkage. (B) MAP estimates, i.e., with shrinkage. Points in the top left and bottom right quadrants indicate genes with a change of sign of LFC. Red points indicate genes with adjusted P value <0.1. The legend displays the root-mean-square error of the estimates in group I compared to those in group II. LFC, logarithmic fold change; MAP, maximum a posteriori; MLE, maximum-likelihood estimate; RMSE, root-mean-square error.
Figure 4
Hypothesis testing involving non-zero thresholds.…
Figure 4
Hypothesis testing involving non-zero thresholds. Shown are plots of the estimated fold change…
Figure 4Hypothesis testing involving non-zero thresholds. Shown are plots of the estimated fold change over average expression strength (“minus over average”, or MA-plots) for a ten vs eleven comparison using the Bottomly et al. [16] dataset, with highlighted points indicating low adjusted P values. The alternate hypotheses are that logarithmic (base 2) fold changes are (A) greater than 1 in absolute value or (B) less than 1 in absolute value. adj., adjusted.
Figure 5
Variance stabilization and clustering after…
Figure 5
Variance stabilization and clustering after rlog transformation. Two transformations were applied to the…
Figure 5Variance stabilization and clustering after rlog transformation. Two transformations were applied to the counts of the Hammer et al. [26] dataset: the logarithm of normalized counts plus a pseudocount, i.e. f(K ij)= log2(K ij/s j+1), and the rlog. The gene-wise standard deviation of transformed values is variable across the range of the mean of counts using the logarithm (A), while relatively stable using the rlog (B). A hierarchical clustering on Euclidean distances and complete linkage using the rlog (D) transformed data clusters the samples into the groups defined by treatment and time, while using the logarithm-transformed counts (C) produces a more ambiguous result. sd, standard deviation.
Figure 6
Sensitivity and precision of algorithms…
Figure 6
Sensitivity and precision of algorithms across combinations of sample size and effect size.…
Figure 6Sensitivity and precision of algorithms across combinations of sample size and effect size. DESeq2 and edgeR often had the highest sensitivity of those algorithms that controlled the FDR, i.e., those algorithms which fall on or to the left of the vertical black line. For a plot of sensitivity against false positive rate, rather than FDR, see Additional file 1: Figure S8, and for the dependence of sensitivity on the mean of counts, see Additional file 1: Figure S9. Note that EBSeq filters low-count genes (see main text for details).
Figure 7
Benchmark of false positive calling.…
Figure 7
Benchmark of false positive calling. Shown are estimates of P ( P value<0.01)…
Figure 7Benchmark of false positive calling. Shown are estimates of P(P value<0.01) under the null hypothesis. The FPR is the number of P values less than 0.01 divided by the total number of tests, from randomly selected comparisons of five vs five samples from the Pickrell et al. [17] dataset, with no known condition dividing the samples. Type-I error control requires that the tool does not substantially exceed the nominal value of 0.01 (black line). EBSeq results were not included in this plot as it returns posterior probabilities, which unlike P values are not expected to be uniformly distributed under the null hypothesis. FPR, false positive rate.
Figure 8
Sensitivity estimated from experimental reproducibility.…
Figure 8
Sensitivity estimated from experimental reproducibility. Each algorithm’s sensitivity in the evaluation set (box…
Figure 8Sensitivity estimated from experimental reproducibility. Each algorithm’s sensitivity in the evaluation set (box plots) is evaluated using the calls of each other algorithm in the verification set (panels with grey label).
Figure 9
Precision estimated from experimental reproducibility.…
Figure 9
Precision estimated from experimental reproducibility. Each algorithm’s precision in the evaluation set (box…
Figure 9Precision estimated from experimental reproducibility. Each algorithm’s precision in the evaluation set (box plots) is evaluated using the calls of each other algorithm in the verification set (panels with grey label).
Similar articlesVaret H, Brillet-Guéguen L, Coppée JY, Dillies MA. Varet H, et al. PLoS One. 2016 Jun 9;11(6):e0157022. doi: 10.1371/journal.pone.0157022. eCollection 2016. PLoS One. 2016. PMID: 27280887 Free PMC article.
Delhomme N, Padioleau I, Furlong EE, Steinmetz LM. Delhomme N, et al. Bioinformatics. 2012 Oct 1;28(19):2532-3. doi: 10.1093/bioinformatics/bts477. Epub 2012 Jul 30. Bioinformatics. 2012. PMID: 22847932 Free PMC article.
Mayne BT, Leemaqz SY, Buckberry S, Rodriguez Lopez CM, Roberts CT, Bianco-Miotto T, Breen J. Mayne BT, et al. Sci Rep. 2018 Feb 1;8(1):2190. doi: 10.1038/s41598-018-19655-w. Sci Rep. 2018. PMID: 29391490 Free PMC article.
Gaidatzis D, Lerch A, Hahne F, Stadler MB. Gaidatzis D, et al. Bioinformatics. 2015 Apr 1;31(7):1130-2. doi: 10.1093/bioinformatics/btu781. Epub 2014 Nov 21. Bioinformatics. 2015. PMID: 25417205 Free PMC article.
McDermaid A, Monier B, Zhao J, Liu B, Ma Q. McDermaid A, et al. Brief Bioinform. 2019 Nov 27;20(6):2044-2054. doi: 10.1093/bib/bby067. Brief Bioinform. 2019. PMID: 30099484 Free PMC article. Review.
Jia S, Wen X, Zhu M, Fu X. Jia S, et al. Cell Mol Life Sci. 2024 Oct 26;81(1):440. doi: 10.1007/s00018-024-05465-z. Cell Mol Life Sci. 2024. PMID: 39460804 Free PMC article.
Millapán T, Gutiérrez Á, Rosas K, Buchegger K, Ili CG, Brebi P. Millapán T, et al. Int J Mol Sci. 2024 Oct 16;25(20):11113. doi: 10.3390/ijms252011113. Int J Mol Sci. 2024. PMID: 39456895 Free PMC article.
Morikka J, Federico A, Möbus L, Inkala S, Pavel A, Sani S, Vaani M, Peltola S, Serra A, Greco D. Morikka J, et al. Comput Struct Biotechnol J. 2024 Oct 8;25:194-204. doi: 10.1016/j.csbj.2024.10.010. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 39430886 Free PMC article.
Noun T, Kurdi A, Maatouk N, Talhouk R, Dohna HZ. Noun T, et al. Sci Rep. 2024 Oct 21;14(1):24711. doi: 10.1038/s41598-024-73632-0. Sci Rep. 2024. PMID: 39433788 Free PMC article.
Wang X, Yang C, Zhu W, Weng Z, Li F, Teng Y, Zhou K, Qian M, Deng Q. Wang X, et al. Plants (Basel). 2024 Oct 17;13(20):2903. doi: 10.3390/plants13202903. Plants (Basel). 2024. PMID: 39458853 Free PMC article.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4