A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ding-lab/BreakPointSurveyor below:

ding-lab/BreakPointSurveyor: A comprehensive pipeline to analyze and visualize structural variants

BreakPointSurveyor

A comprehensive pipeline to analyze and visualize structural variants

This is the master branch which implements the TCGA_Virus workflow.

BreakPointSurveyor (BPS) is a set of core libraries (BreakPointSurveyor-Core) and workflows (this project) which, with optional external tools, evaluate genomic sequence data to discover, analyze, and provide a visual summary of breakpoint events.

The BreakPointSurveyor project provides three reference workflows, each implemented as a separate git branch. These workflows (and the links to view them) are:

Citation

Matthew A. Wyczalkowski, Kristine M. Wylie, Song Cao, Michael D. McLellan, Jennifer Flynn, Mo Huang, Kai Ye, Xian Fan, Ken Chen, Michael C. Wendl, Li Ding; BreakPoint Surveyor: A Pipeline for Structural Variant Visualization. Bioinformatics 2017. doi: 10.1093/bioinformatics/btx362

Online preprint with supplemental information.

Download BreakPointSurveyor with three example workflows with,

git clone --recursive https://github.com/ding-lab/BreakPointSurveyor.git

See here for detailed installation instructions. The Getting started with the Synthetic branch section has instructions on working with a relatively small test dataset. See also the BPS developer guide for information about implementing your own workflow.

BPS generates two types of plots: structure plots and expression plots. Figures below are generated by the TCGA-Virus workflow.

Structure plots visualize breakpoints as points with X,Y coordinates given by the breakpoint position along each chromosome. Such figures also display read depth, gene and exon annotations, and a copy number histogram. In this workflow, read depth and discordant reads are obtained from aligned WGS data, and calls from various structural variant tools shown. Breakpoint predictions from other tools, whether from WGS or RNA-Seq data, can be readily integrated into the structure plot.

See T_PlotStructure for interpretation and details.

Expression plots illustrate relative gene expression near breakpoints, with gene position, size, orientation, and name shown. Expression is obtained for the sample and a population of controls from either processed expression data (e.g., TCGA RSEM) or RNA-Seq data directly.

See U_PlotExpression for interpretation and details.

There are three layers of BreakPoint Surveyor project:

For convenience, the workflows demonstrated here combine the Workflow and Data layers; also, the Core layer is implemented as a submodule and downloaded together with this project.

The BPS workflow is designed for scalability, and has been used to process batches of hundreds of whole genome and RNA-Seq datasets. It consists of a series of directories, each of which implements a stage in the BPS workflow. The order of processing indicated by the stage prefix. The figure below illustrates the stages and their relationship in the TCGA_Virus workflow.

Below is a list of the stages associated with the TCGA_Virus workflow (master branch) and their description:

The 1000SV and Synthetic workflows generally have a subset of these stages. See BPS Developer Guide for additional information about developing new workflow stages. The BreakPointSurveyor-Core project (distributed as a submodule of this project) has details about BPS utilities underlying these stages.

Genomic datasets tend to be very large and frequently have restrictions on access and distribution. Each of the three workflows operates on distinct datasets of various size, clinical relevance, and availability, to demonstrate different BreakPointSurveyor capabilities.

In general, the workflows include all intermediate data which is allowed to be distributed and which is not prohibitively large.

TCGA_Virus workflow (master branch)

The TCGA_Virus workflow provides an in-depth analysis of a virus integration event in the TCGA WGS sample (TCGA-BA-4077-01B-01D-2268-08), which is a head and neck cancer sample. Because of TCGA restictions we do not distribute any sequence data. After downloadeding, sequence data was aligned to a custom reference which includes human and virus sequences (details). We do not distribute the reference because of size constraints.

Relative expression calcuations require a case and a population of controls. We provide two examples of expression calculations:

The 1000SV workflow investigates interchromosomal human-human breakpoints in a publicly available human sample from the 1000 Genomes project, NA19240, which was sequenced at high (80X) coverage; this 65Gb file can be downloaded here.

The analysis focuses on two events with interchromosal discordant reads. Expression analalysis is not performed in the 1000SV workflow. We demonstrate using attributes to provide additional information about discordant reads.

The Synthetic workflow generates a simple breakpoints (inter- and intra-chromosomal) and corresponding synthetic read datasets of modest size which can be analyzed and visualized in BPS. We create a custom reference, consisting only of the chromosomes of interest, for improved performance (this reference is not distributed due to size).

We then generate a breakpoint sequence from sections of the human reference, and synthetic (simulated) reads are created. These are re-aligned to the custom reference. The resulting BAM file is then analyzed similarly to the 1000SV workflow. Expression analysis is not performed in the Synthetic workflow.

The Synthetic branch also illustrates more elaborate exon/gene annotations as well as an intrachromosomal inversion/duplication event.

Getting started with the Synthetic workflow

The Synthetic workflow utilizes a relatively small dataset which is created from scratch, and can be run relatively quickly on a laptop computer. It is a good place to start working with BPS.

There are a number of dependencies you'll need to install to get stated. You'll need the Core dependencies and as well as BWA, described here.

Get a fresh copy of BPS and switch to the Synthetic branch with,

git clone --recursive  https://github.com/ding-lab/BreakPointSurveyor.git
git checkout Synthetic

Next, edit bps.config to locate the installed software.

The idea is to run each stage in order according to its first letter. You can run an entire stage with,

./run_bps A_Reference

Each of these eleven stages consists of one or more steps. These steps are named starting with a number (e.g., 1_get_BAM_paths.sh), and consist of shell scripts which execute a specific task. See the documentation for each stage, as well as the contents of each step's script file, for details about implementation and debugging.

Performance per stage for TCGA_Virus branch, obtained with run_BPS <STAGE>.

Matthew A. Wyczalkowski, m.wyczalkowski@wustl.edu

This software is licensed under the GNU General Public License v3.0

This work was supported by the National Cancer Institute [R01CA178383 and R01CA180006 to Li Ding, R01CA172652 to Ken Chen]; and National Human Genome Research Institute [U01HG006517 to Li Ding].

This work was supported by the National Cancer Institute [R01CA178383, R01CA180006, 1U24CA211006-01, and 1U24CA210972-01 to Li Ding, R01CA172652 to Ken Chen]; and National Human Genome Research Institute 60 [U01HG006517 to Li Ding].


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4