Place all your input files into a single directory. The directory should contain the following files:
Read Name FormatTo avoid errors, the only characters that are acceptable in sample names are letters and numbers. Characters can be separated by underscores, but no other symbols. The files must end with the suffix R1.fastq.gz
or R2.fastq.gz
Examples of permissible sample names are as follows:
Other permissible names are:
What is not permissible is:
The metadata.csv file contains information about the samples and primers (and associated metabarcodes) used in the experiment. It has the following two required columns:
rps10
, its
, r16S
, other1
, other2
)Please add your associated metadata to the file after these two required columns. This can then be used for your downstream exploratory or diversity analyses, as the sample data will be incorporated into the final phyloseq
and taxmap
objects.
Example file (with optional third column):
sample_name,primer_name,organism
S1,rps10,Cry
S2,rps10,Cin
S1,its,Cry
S2,its,Cin
Primer and parameter file components
The primerinfo_params.csv file contains information about the primer sequences used in the experiment, along with optional additional parameters that are part of the DADA2
pipeline. If anything is not specified, the default values will be used.
Required columns:
primer_name
: Name of the primer/barcode (e.g., its
, rps10
)forward
: Forward primer sequencereverse
: Reverse primer sequenceBelow are the parameters that can be input into the primerinfo_params.csv file along with the defaults. Refer to the DADA2
documentation and manual for additional information.
DADA2
filterAndTrim
function parameters:
already_trimmed
: Boolean indicating if primers are already trimmed (TRUE/FALSE
) (default: FALSE
)minCutadaptlength
: Cutadapt
parameter-Filter out processed reads that are shorter than specified length (default: 0
)multithread
: Boolean for multithreading (TRUE/FALSE
) (default: FALSE
)verbose
: Boolean for verbose output (TRUE/FALSE
) (default: FALSE
)maxN
: Maximum number of N bases allowed (default: 0
)maxEE_forward
: Maximum expected errors for forward reads (default: Inf
)maxEE_reverse
: Maximum expected errors for reverse reads (default: Inf
)truncLen_forward
: Truncation length for forward reads (default: 0
)truncLen_reverse
: Truncation length for reverse reads (default: 0
)truncQ
: Truncation quality threshold (default: 2
)minLen
: Minimum length of reads after processing (default: 20
)maxLen
: Maximum length of reads after processing (default: Inf
)minQ
: Minimum quality score (default: 0
)trimLeft
: Number of bases to trim from the start of reads (default: 0
)trimRight
: Number of bases to trim from the end of reads (default: 0
)rm.lowcomplex
: Boolean for removing low complexity sequences (default: TRUE
)DADA2
learnErrors
function parameters:
nbases
: Number of bases to use for error rate learning (default: 1e+08
)randomize
: Randomize reads for error rate learning (default: FALSE
)MAX_CONSIST
: Maximum number of self-consistency iterations (default: 10
)OMEGA_C
: Convergence threshold for the error rates (default: 0
)qualityType
: Quality score type ("Auto"
, "FastqQuality",
or "ShortRead"
) (default: "Auto"
)DADA2
plotErrors
parameters:
err_out
: Return the error rates used for inference (default: TRUE
)err_in
: Use input error rates instead of learning them (default: FALSE
)nominalQ
: Use nominal Q-scores (default: FALSE
)obs
: Return the observed error rates (default: TRUE
)DADA2
dada
function parameters:
OMP
: Use OpenMP multi-threading if available (default: TRUE
)n
: Number of reads to use for error rate estimation (default: 1e+05
)id.sep
: Character separating sample ID from sequence name (default: "\\s"
)orient.fwd
: NULL or TRUE/FALSE to orient sequences (default: NULL
)pool
: Pool samples for error rate estimation (default: FALSE
)selfConsist
: Perform self-consistency iterations (default: FALSE
)DADA2
mergePairs
function parameters:
minOverlap
: Minimum overlap for merging paired-end reads (default: 12
)maxMismatch
: Maximum mismatches allowed in the overlap region (default: 0
)DADA2
removeBimeraDenovo
function parameters:
method
: Method for sample inference ("consensus"
or "pooled"
) (default: "consensus"
)DADA2
assignTaxonomy
function parameters:
minBoot
: The minimum bootstrap confidence for assigning a taxonomic level (default: 0
)tryRC
: If TRUE, the reverse-complement of each sequences will be used for classification if it is a better match to the reference sequences than the forward sequence (default: FALSE
)Other parameters to include in CSV input file:
min_asv_length
: Minimum length of Amplicon Sequence Variants (ASVs) after core dada ASV inference steps (default = 0
)seed
: For greater reproducibility, user can specify an integer to set as a seed to use when the following DADA2
functions are run: plotQualityProfile
, learnErrors
, dada
, makeSequenceTable
, and assignTaxonomy
(default: NULL
)Example file (with select optional columns after forward and reverse primer sequence columns):
primer_name,forward,reverse,already_trimmed,minCutadaptlength,multithread,verbose,maxN,maxEE_forward,maxEE_reverse,truncLen_forward,truncLen_reverse,truncQ,minLen,maxLen,minQ,trimLeft,trimRight,rm.lowcomplex,minOverlap,maxMismatch,min_asv_length
rps10,GTTGGTTAGAGYARAAGACT,ATRYYTAGAAAGAYTYGAACT,FALSE,100,TRUE,FALSE,1.00E+05,5,5,0,0,5,150,Inf,0,0,0,0,15,0,50
its,CTTGGTCATTTAGAGGAAGTAA,GCTGCGTTCTTCATCGATGC,FALSE,50,TRUE,FALSE,1.00E+05,5,5,0,0,5,50,Inf,0,0,0,0,15,0,50
Reference Databases
Databases will be copied into the user-specified data folder where raw data files and csv files are located. The names will be parameters in the assignTax function.
For now, the package is compatible with the following databases:
oomyceteDB
from: https://grunwaldlab.github.io/OomyceteDB/
SILVA 16S database
with species assignments: https://www.arb-silva.de/
UNITE database
from https://unite.ut.ee/repository.php
Up to two other reference databases. The user will need to reformat headers exactly as outlined here. The user can then specify the path to the database in the input file. The database should be in FASTA format.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4