A Snakemake workflow for benchmarking callsets of small genomic variants, using popular benchmark datasets like Genome in a Bottle or CHM-eval. A detailed description of the workflow, also outlining all involved insights and design decisions can be found under https://doi.org/10.12688/f1000research.140344.1.
Germline:
Somatic:
vcf.gz
or .bcf
. You can use bgzip <your vcf file>.vcf
to compress the file.my-callset: # choose a descriptive name for your callset labels: site: # name of your institute, group, department etc. pipeline: # name of the pipeline trimming: # tool used to trim reads read-mapping: # used read mapper base-quality-recalibration: # base recalibration method (remove if unused) realignment: # realignment method (remove if unused) variant-detection: # variant callers (provide comma-separated list if multiple ones are used) genotyping: # genotyper/event-typer used url: # URL of used pipeline # add any additional relevant attributes (they will appear in the false positive and false negative tables of the online report) subcategory: # category of callsets to include this one (see other entries in the config file and align with them if possible) zenodo: deposition: # zenodo record id (e.g. 7734975) filename: # name of bcf/vcf.gz file in the zenodo record benchmark: # benchmark to use (one of giab-NA12878-agilent-200M, giab-NA12878-agilent-75M, giab-NA12878-twist, and more, see https://github.com/snakemake-workflows/dna-seq-benchmark/blob/main/workflow/resources/presets.yaml) rename-contigs: resources/rename-contigs/ucsc-to-ensembl.txt # rename contigs from UCSC (prefixed with chr) to Ensembl style (remove if your contigs are already in Ensembl style)
The latest results for all contributed callsets are shown at https://ncbench.github.io.
For running ncbench locally, the following steps are required:
variant-calls
section in the file config/config.yaml
of your local clone:
my-callset: # choose a descriptive name for your callset path: # path to vcf/bcf/vcf.gz file containing your variant calls (both SNVs and indels, sorted by coordinate) benchmark: # benchmark to use (one of giab-NA12878-agilent-200M, giab-NA12878-agilent-75M, giab-NA12878-twist, and more, see https://github.com/snakemake-workflows/dna-seq-benchmark/blob/main/workflow/resources/presets.yaml) rename-contigs: resources/rename-contigs/ucsc-to-ensembl.txt # rename contigs from UCSC (prefixed with chr) to Ensembl style (remove if your contigs are already in Ensembl style)
snakemake -n --sdm conda
and then in reality with snakemake --sdm conda --cores N
with N
being your desired number of cores. You can also run it on cluster or cloud middleware. The Snakemake documentation provides all the details.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4