A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://grunwaldlab.github.io/demulticoder/articles/Getting_started.html below:

Getting Started • demulticoder

Before You Start Components of the test dataset
  1. Paired-end short read amplicon data
  1. metadata.csv
  1. primerinfo_params.csv
  1. Taxonomy databases
Format of the paired-end read files Format of metadata file

Additional columns

Notes

Here is what the metadata.csv looks like for this test dataset:

sample_name primer_name organism S1 rps10 Cry S2 rps10 Cin S1 its Cry S2 its Cin Format of primer and parameters file

Primer sequence information and user-defined parameters are placed in primerinfo_params.csv

Example template for

primerinfo_params.csv
primer_name forward reverse already_trimmed minCutadaptlength multithread verbose maxN maxEE_forward maxEE_reverse truncLen_forward truncLen_reverse truncQ minLen maxLen minQ trimLeft trimRight rm.lowcomplex minOverlap maxMismatch min_asv_length rps10 GTTGGTTAGAGYARAAGACT ATRYYTAGAAAGAYTYGAACT FALSE 100 TRUE FALSE 1.00E+05 5 5 0 0 5 150 Inf 0 0 0 0 15 0 50 its CTTGGTCATTTAGAGGAAGTAA GCTGCGTTCTTCATCGATGC FALSE 50 TRUE FALSE 1.00E+05 5 5 0 0 5 50 Inf 0 0 0 0 15 0 50

For more info on parameter specifics, see Documentation

Reference Database Format

Databases are downloaded from these sources and then placed in the from the user-specified data folder where raw data files and csv files are located.

Additional Notes Installation

Dependencies:
First install Cutadapt program following the instructions here: https://cutadapt.readthedocs.io/en/stable/installation.html

Let’s locate where the Cutadapt executable is.

You must do this from a Terminal window:

#If you installed with pip or pipx, or homebrew, run this command from a Terminal window
which cutadapt
cutadapt --version

If you followed the cutadapt installation instructions to create a conda environment called Cutadapt (change to whatever you named your environment), to install it in.

Open up a Terminal window and type these commands:

#Run commands from a Terminal window
conda activate cutadapt
which cutadapt
cutadapt --version

Second, make sure the following R packages are installed:

Reorganize data tables and set-up data directory structure

The sample names, primer sequences, and other metadata are reorganized in preparation for running Cutadapt to remove primers.

output<-prepare_reads(
  data_directory = system.file("extdata", package = "demulticoder"),
  output_directory = tempdir(), # Change to your desired output directory path or leave as is", 
  overwrite_existing = TRUE)
#> Existing files found in the output directory. Overwriting existing files.
#> Rows: 2 Columns: 25
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (18): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (4): already_trimmed, count_all_samples, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 2 Columns: 25
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr  (3): primer_name, forward, reverse
#> dbl (18): minCutadaptlength, maxN, maxEE_forward, maxEE_reverse, truncLen_fo...
#> lgl  (4): already_trimmed, count_all_samples, multithread, verbose
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Rows: 4 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (3): sample_name, primer_name, organism
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> Creating output directory: /tmp/RtmpyNk8MG/demulticoder_run/prefiltered_sequences

Remove primers with Cutadapt

Before running Cutadapt, please ensure that you have installed it

cut_trim(
  output,
  cutadapt_path = "/usr/bin/cutadapt", #CHANGE LOCATION TO YOUR LOCAL INSTALLATION
  overwrite_existing = TRUE)
#> Running cutadapt 3.5 for its sequence data

#> Running cutadapt 3.5 for rps10 sequence data

ASV inference step

Raw reads will be merged and ASVs will be inferred

make_asv_abund_matrix(
  output,
  overwrite_existing = TRUE)
#> 80608 total bases in 307 reads from 2 samples will be used for learning the error rates.
#> Sample 1 - 163 reads in 84 unique sequences.
#> Sample 2 - 144 reads in 96 unique sequences.
#> 82114 total bases in 307 reads from 2 samples will be used for learning the error rates.
#> Sample 1 - 163 reads in 128 unique sequences.
#> Sample 2 - 144 reads in 119 unique sequences.

#> 91897 total bases in 327 reads from 2 samples will be used for learning the error rates.
#> Sample 1 - 145 reads in 107 unique sequences.
#> Sample 2 - 182 reads in 133 unique sequences.
#> 91567 total bases in 327 reads from 2 samples will be used for learning the error rates.
#> Sample 1 - 145 reads in 114 unique sequences.
#> Sample 2 - 182 reads in 170 unique sequences.

#> $its
#> [1] "/tmp/RtmpyNk8MG/demulticoder_run/asvabund_matrixDADA2_its.RData"
#> 
#> $rps10
#> [1] "/tmp/RtmpyNk8MG/demulticoder_run/asvabund_matrixDADA2_rps10.RData"
Taxonomic assignment step

Using the core assignTaxonomy function from DADA2, taxonomic assignments will be given to ASVs.

assign_tax(
  output,
  asv_abund_matrix,
  retrieve_files=TRUE,
  overwrite_existing = TRUE)
#>   samplename_barcode input filtered denoisedF denoisedR merged nonchim
#> 1             S1_its   299      163       146       141    132     132
#> 2             S2_its   235      144       113        99     99      99
#>   samplename_barcode input filtered denoisedF denoisedR merged nonchim
#> 1           S1_rps10   196      145       145       145    145     145
#> 2           S2_rps10   253      182       181       181    181     181
Format of output matrices

We can also load the matrices into our R environment and view then from R Studio.

We do that here with the rps10 matrix

rps10_matrix_file <- file.path(output$directory_paths$output_directory, "final_asv_abundance_matrix_rps10.csv")
rps10_matrix<-read.csv(rps10_matrix_file, header=TRUE)
head(rps10_matrix)
#>   asv_id
#> 1  asv_1
#> 2  asv_2
#> 3  asv_3
#> 4  asv_4
#> 5  asv_5
#>                                                                                                                                                                                                                                                                                                                                                                                                                                                    sequence
#> 1 GAAAATCTTTGTGTCGGTGGTTCAAATCCACCTCCAGACAAATTAATTTTTTAAAACTTATGTTTATATTAAGAATTACATTTAAATCTATTAAAAAAATAAATAACATTAAACCTATTTTATTAAATCAAAAAAATTTAAATAAATTTAATAATATTAAAATAAATGGAATATTTAAAACAAAAAATAAAAATAAAATTTTTACAGTATTAAAATCACCACATGTTAATAAAAAAGCACGTGAACATTTTATTTATAAAAATTTTACACAAAAAGTTGAAATTAAATGTTTAAATATTTTTCAATTATTAAATTTTTTAATTTTGATTAAAAAAATCTTAACAGAAAATTTTATTATTACTACAAAAATAATTAAACAATAATTAATATATGCTTATAGCTTAATGGATAAAGCATTAGATTGCGGATCTACAAAATGAA
#> 2      GAAAATCTTTGTGTCGGTGGTTCAAGTCCACCTCCAGACAAAAATAATAACTATGTATATTTTAAGAATAACTTTTAAATCAATACAAAAAATAAATAATATTAAAAAAAATTTATTAAAATTAAAAAAAATAAATAAATTTAAAAATATTCAAATAAAAGGAATATTTAAAATAAAAGATAAAAATAAAATTTTTACTTTATTAAAATCACCTCACGTAAATAAAAAATCTCGTGAACATTTTATTTATAAAAATTATACTCAAAAAATAGATGTAAAATTTTCAAATATTATTGAATTATTTAATTTTATAATACTTATTAAAAAAGTTTTAACTAAAAATTTTATAATAAATTTTAAAATTATAAAATATAATAAAAAAAAAATGCTTATAGCTTAATGGATAAAGCGTTAGATTGCGGATCTATAAAATGAA
#> 3  GAAAATCTTTGTGTCGGTGGTTCAAATCCGCCTCCAAACAAAAATATAAATTATTTTATATGTATATTTTAAGAATAAGTTTTAAATCTGTATCAAAAATAGATAAAATTAAAAAAGAATTATCAAGATTAAAAAAAATATATAATTTTAAAAATATAACAATTAATGGTATTTTTAAAACAAAAAATAAAGATAAAATTTTTACATTATTAAAATCACCTCATGTTAATAAAAAATCTCGCGAACATTTTATTTATAAAAATTATGTACAAAAAATAGATTTAAATTTTATTAATATTTTTCAATTAATAAATTTTTTAATAATTTTAAAAAAAAAATTATCAAAAAATGTATTAATAAATGTAAAAATAATTAAAAAATAAAAAAATATGCTTATAGCTTAATGGATAAAGCGTTAGATTGCGGATCTATAAAATGAA
#> 4    GAAAATCTTTGTGTCGGTGGTTCAAGTCCGCCTCCAAACATCTTATAAAAATGTATGTATATTTTACGAATTAGTTTTAAATCAGTAGAAAAAATAAATGAATTAAAAAAAAATATCTCAAAAATAAAAAAAATATATAATTTTAAAAATTTAAAAGTAAATGGTATTTTTAGAACAAAAAATAAAAATAAAATTTTTACATTATTAAAATCACCCCATGTTAATAAAAAATCACGTGAACATTTTATATATAAAAATTATATTCAAAAAATTGATTTAAATTTTTCAAATTTTTTTCAATTATTAAATTTTTTAGTAATTTTAAAAAAAATATTATCTAAAGATTTTTTAATCAATATCAAAATTATTAAAAAAAATAAAAAAAATATGCTTATAGCTTAATGGATAAAGCGTTAGATTGCGGATCTATAAAATGAA
#> 5     GAAAATCTTTGTGTCGGTGGTTCAAGTCCGCCTCCAAACATCTTATAAAAATGTATGTATATTTTACGAATTAGTTTTAAATCAGTAGAAAAAATAAATGAATTAAAAAAAAATATCTCAAAAATAAAAAAAATATATAATTTTAAAAATTTAAAAGTAAATGGTATTTTTAGAACAAAAAATAAAAATAAAATTTTTACATTATTAAAATCACCCCATGTTAATAAAAAATCACGTGAACATTTTATATATAAAAATTATATTCAAAAAATTGATTTAAATTTTTCAAATTTTTTTCAATTATTAAATTTTTTAGTAATTTTAAAAAAAACATTATCTAAAGATTTTTTAATCAATATTAAAATTATTAAAAAAAATAAAAAAATATGCTTATAGCTTAATGGATAAAGCGTTAGATTGCGGATCTATAAAATGAA
#>                                                                                                                                                                                                                   dada2_tax
#> 1 Stramenopila--100--Kingdom;Oomycota--100--Phylum;Oomycetes--100--Class;Peronosporales--26--Order;Peronosporaceae--26--Family;Phytophthora--23--Genus;Phytophthora_sp.kununurra--13--Species;oomycetedb_650--13--Reference
#> 2 Stramenopila--100--Kingdom;Oomycota--100--Phylum;Oomycetes--100--Class;Pythiales--100--Order;Pythiaceae--100--Family;Globisporangium--100--Genus;Globisporangium_cylindrosporum--28--Species;oomycetedb_41--28--Reference
#> 3                   Stramenopila--100--Kingdom;Oomycota--100--Phylum;Oomycetes--100--Class;Pythiales--93--Order;Pythiaceae--93--Family;Pythium--86--Genus;Pythium_aphanidermatum--22--Species;oomycetedb_727--22--Reference
#> 4                    Stramenopila--100--Kingdom;Oomycota--100--Phylum;Oomycetes--100--Class;Pythiales--100--Order;Pythiaceae--100--Family;Pythium--100--Genus;Pythium_dissotocum--58--Species;oomycetedb_750--28--Reference
#> 5                    Stramenopila--100--Kingdom;Oomycota--100--Phylum;Oomycetes--100--Class;Pythiales--100--Order;Pythiaceae--100--Family;Pythium--100--Genus;Pythium_oopapillum--64--Species;oomycetedb_771--64--Reference
#>   S1_rps10 S2_rps10
#> 1       87        0
#> 2        0       71
#> 3        0       63
#> 4       58        0
#> 5        0       47

We can also inspect the first rows of the ITS1 matrix

its_matrix_file <- file.path(output$directory_paths$output_directory, "final_asv_abundance_matrix_its.csv")
its_matrix<-read.csv(its_matrix_file, header=TRUE)
head(its_matrix)
#>   asv_id
#> 1  asv_1
#> 2  asv_2
#> 3  asv_3
#> 4  asv_4
#> 5  asv_5
#> 6  asv_6
#>                                                                                                                                                                                                                                                                                                                                 sequence
#> 1                                              AAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAGTGAATCTTCAAAGTCGGCTCGTCGGATTGTGCTGGTGGAGACACATGTGCACGTCTACGAGTCGCAAACCCACACACCTGTGCATCTATGACTCTGAGTGCCGCTTTGCATGGCCCCTTGATTTGGGCCTGGCGCTCGAGTACTTTCACACACTCTCGAATGTAATGGAATGTCTTGTTGTGCATAACGTACAAACAGAAACAACTTTCAACAACGGATCTCTTGGCTCTC
#> 2                                              AAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAGTGAATCTTCAAAGTCGGCTCGTCAGATTGTGCTGGTGGAGACACATGTGCACGTCTACGAGTCGCAAACCCACACACCTGTGCATCTATGACTCTGAGTGCCGCTTTGCATGGCCCCTTGATTTGGGCCTGGCGCTCGAGTACTTTCACACACTCTCGAATGTAATGGAATGTCTTCTTGTGCATAACGTACAAACAGAAACAACTTTCAACAACGGATCTCTTGGCTCTC
#> 3                                              AAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAGTGAATCTTCAAAGTCGGCTCGTCGGATTGTGCTGGTGGAGACACATGTGCACGTCTACGAGTCGCAAACCCACACACCTGTGCATCTATGACTCTGAGTGCCGCTTTGCATGGCCCCTTGATTTGGGCCTGGCGCTCGAGTACTTTCACACACTCTCGAATGTAATGGAATGACTTGTTGTGCATAACGTACAAACAGAAACAACTTTCAACAACGGATCTCTTGGCTCTC
#> 4 AAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAACGAGTTCAACTGGGGTCTGTGGGGATTCGATGCTGGCCTCCGGGCATGTGCTCGTCCCCCGGACCTCACATCCACTCACAACCCCTGTGCATCATGAGAGGTGTGGTCCTCCGTCATGGGAGCAGCTCCCCTCCGGGCGTGTTTGTACACCCCGGGCGGTCGAGCCGGGACTGCACTTCGACGCCTTTACACAAACCTTTGAATCAGTGATGTAGAATGTCATGGCCAGCGATGGTCGAACTTTAAATACAACTTTCAACAACGGATCTCTTGGCTCTC
#> 5                                              AAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAGTGAATCTTCAAAGTCGGCTCGTCGGATTGTGCTGGTGGAGACACATGTGCACATCTACGAGTCGCAAACCCACACACCTGTGCATCTATGACTCTGAGTGCCGCTTTGCATGGCCCCTTGATTTGGGCCTGGCGCTCGAGTACTTTCACACACTCTCGAATGTAATGGAATGTCTTGTTGTGCATAACGTACAAACAGAAACAACTTTCAACAACGGATCTCTTGGCTCTC
#> 6                                                                                                    AAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTAAAAATGAAGCCGGGAAACCGGTCCTTCTAACCCTTGATTATCTTAATTTGTTGCTTTGGTGGGCCGCGCAAGCACCGGCTTCGGCTGGATCGTGCCCGCCAGAGGACCACAACTCTGTATCAAATGTCGTCTGAGTACTATATAATAGTTAAAACTTTCAACAACGGATCTCTTGGTTCTG
#>                                                                                                                                                                                    dada2_tax
#> 1             Fungi--100--Kingdom;Basidiomycota--100--Phylum;Agaricomycetes--100--Class;Sebacinales--100--Order;unidentified--92--Family;unidentified--92--Genus;Sebacinales_sp--92--Species
#> 2             Fungi--100--Kingdom;Basidiomycota--100--Phylum;Agaricomycetes--100--Class;Sebacinales--100--Order;unidentified--95--Family;unidentified--95--Genus;Sebacinales_sp--95--Species
#> 3             Fungi--100--Kingdom;Basidiomycota--100--Phylum;Agaricomycetes--100--Class;Sebacinales--100--Order;unidentified--96--Family;unidentified--96--Genus;Sebacinales_sp--96--Species
#> 4          Fungi--100--Kingdom;Basidiomycota--69--Phylum;Agaricomycetes--66--Class;Hymenochaetales--26--Order;Repetobasidiaceae--11--Family;Rickenella--11--Genus;Rickenella_sp--11--Species
#> 5             Fungi--100--Kingdom;Basidiomycota--100--Phylum;Agaricomycetes--100--Class;Sebacinales--100--Order;unidentified--92--Family;unidentified--92--Genus;Sebacinales_sp--92--Species
#> 6 Fungi--100--Kingdom;Ascomycota--96--Phylum;Leotiomycetes--72--Class;Helotiales--72--Order;Helotiales_fam_Incertae_sedis--44--Family;Chlorencoelia--41--Genus;Chlorencoelia_sp--41--Species
#>   S1_its S2_its
#> 1     60      0
#> 2     46      0
#> 3      0     35
#> 4     21      0
#> 5      0     21
#> 6      0     13
Reformat ASV matrix as taxmap and phyloseq objects after optional filtering of low abundance ASVs Objects can now be used for downstream data analysis

Here we make heattrees using taxmap object.

First we make a heat tree to examine fungal community composition using our ITS1 data.

objs$taxmap_its %>%
  filter_taxa(! grepl(x = taxon_names, "NA", ignore.case = TRUE)) %>%
  metacoder::heat_tree(node_label = taxon_names,
                       node_size = n_obs,
                       node_color = n_obs,
                       node_color_axis_label = "ASV count",
                       node_size_axis_label = "Total Abundance of Taxa",
                       layout = "da", initial_layout = "re")

Now we make a heat tree to showcase oomycete community composition using our rps10 data

objs$taxmap_rps10 %>%
  filter_taxa(! grepl(x = taxon_names, "NA", ignore.case = TRUE)) %>%
  metacoder::heat_tree(node_label = taxon_names,
                       node_size = n_obs,
                       node_color = n_obs,
                       node_color_axis_label = "ASV count",
                       node_size_axis_label = "Total Abundance of Taxa",
                       layout = "da", initial_layout = "re")

We can also do a variety of analyses, if we convert to phyloseq object

Here we demonstrate how to make a stacked bar plot of the relative abundance of taxa by sample for the ITS1-barcoded samples

data <- objs$phyloseq_its %>%
  phyloseq::transform_sample_counts(function(x) {x/sum(x)} ) %>% 
  phyloseq::psmelt() %>%                                        
  dplyr::filter(Abundance > 0.02) %>%                      
  dplyr::arrange(Genus)                                      

abund_plot <- ggplot2::ggplot(data, ggplot2::aes(x = Sample, y = Abundance, fill = Genus)) + 
  ggplot2::geom_bar(stat = "identity", position = "stack", color = "black", size = 0.2) +
  ggplot2::scale_fill_viridis_d() +
  ggplot2::theme_minimal() +
  ggplot2::labs(
    y = "Relative Abundance",
    title = "Relative abundance of taxa by sample",
    fill = "Genus"
  ) +
  ggplot2::theme(
    axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 0.5, size = 14),
    panel.grid.major = ggplot2::element_blank(),
    panel.grid.minor = ggplot2::element_blank(),
    legend.position = "top",
    legend.text = ggplot2::element_text(size = 14),
    legend.title = ggplot2::element_text(size = 14),  
    strip.text = ggplot2::element_text(size = 14),
    strip.background = ggplot2::element_blank()
  ) +
  ggplot2::guides(
    fill = ggplot2::guide_legend(
      reverse = TRUE,
      keywidth = 1,
      keyheight = 1,
      title.position = "top",
      title.hjust = 0.5,  # Center the legend title
      label.theme = ggplot2::element_text(size = 10)  
    )
  )

print(abund_plot)

Finally, we demonstrate how to make a stacked bar plot of the relative abundance of taxa by sample for the rps10-barcoded samples

data <- objs$phyloseq_rps10 %>%
  phyloseq::transform_sample_counts(function(x) {x/sum(x)} ) %>% 
  phyloseq::psmelt() %>%                                        
  dplyr::filter(Abundance > 0.02) %>%                      
  dplyr::arrange(Genus)                                      

abund_plot <- ggplot2::ggplot(data, ggplot2::aes(x = Sample, y = Abundance, fill = Genus)) + 
  ggplot2::geom_bar(stat = "identity", position = "stack", color = "black", size = 0.2) +
  ggplot2::scale_fill_viridis_d() +
  ggplot2::theme_minimal() +
  ggplot2::labs(
    y = "Relative Abundance",
    title = "Relative abundance of taxa by sample",
    fill = "Genus"
  ) +
  ggplot2::theme(
    axis.text.x = ggplot2::element_text(angle = 90, hjust = 1, vjust = 0.5, size = 14),
    panel.grid.major = ggplot2::element_blank(),
    panel.grid.minor = ggplot2::element_blank(),
    legend.position = "top",
    legend.text = ggplot2::element_text(size = 14),
    legend.title = ggplot2::element_text(size = 14),  # Adjust legend title size
    strip.text = ggplot2::element_text(size = 14),
    strip.background = ggplot2::element_blank()
  ) +
  ggplot2::guides(
    fill = ggplot2::guide_legend(
      reverse = TRUE,
      keywidth = 1,
      keyheight = 1,
      title.position = "top",
      title.hjust = 0.5,  # Center the legend title
      label.theme = ggplot2::element_text(size = 10)  # Adjust the size of the legend labels
    )
  )

print(abund_plot)


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4