A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/Ensembl/postgap below:

Ensembl/postgap: Linking GWAS studies to genes through cis-regulatory datasets

Post-GWAS Analysis Pipeline

Copyright holder: EMBL-European Bioinformatics Institute (Apache 2 License)

This script is designed to automatically finemap and highlight the causal variants behind GWAS results by cross-examining GWAS, population genetic, epigenetic and cis-regulatory datasets.

Its original design was based on STOPGAP. It takes as input a disease identifier, extracts associated SNPs via GWAS databases, expands them by LD, then searches an array of regulatory and cis-regulatory databases for gene associations.

If you wish to shortcut all of the instructions below, you can simply use our VirtualBox virtual machine.

Installing the Python library

Add the lib/ directory to your $PYTHONPATH environment variable.

The scripts/installation/ubuntu_environment.sh describes a recipe to install all basic C and Python dependencies on a fresh ubuntu server (requires root access).

To install all binformatic dependencies run sh scripts/installation/install_dependencies.sh.

Add the ./bin/ directory to your $PATH environment variable.

Via the FTP site (recommended)

The following script downloads a bunch of files into $PWD:

sh scripts/installation/download.sh

Ideally, save these files in a separate directory, which we will call databases_dir.

The following will create a databases_dir directory for you:

cd scripts/build_data_files
make download
make process

Warning this may take days as it needs to split the entire 1000 Genomes files by population.

Every time you run POSTGAP, add --database_dir /path/to/databases_dir to the command line, where the database directory path corresponds to the directory created above.

By default, run from the root directory the command:

python POSTGAP.py --disease autism --population EUR 

Multiple disease names can be provided.

You can also provide a list of EFOs:

python POSTGAP.py --efos EFO_0000196

Or an rsID:

python POSTGAP.py --rsID rs10009124

Or a manually defined variant:

python POSTGAP.py --coords my_variant 1 1234567 
Analysing your own summary statistics

To short cut the GWAS databases and enter you own data with a file:

python POSTGAP.py --summary_stats tests/sample_data/example.tsv

The summary statistics file should be tab delimited and follow the GWAS Catalog recommentations.

In particular, it must have the following columns:

Bayesian mode (EXPERIMENTAL)

For an EFO, you can trigger the Bayesian calculations with:

python POSTGAP.py --efos EFO_0000196 --bayesian --output2 output2.txt

In this case, POSTGAP produces a tab-delimited output file, 'output2.txt'. The columns represent:

  1. Gene ID
  2. Cluster description
  3. SNP ID
  4. Colocalisation posterior probability at that SNP
  5. Tissue
  6. Colocalisation posterior probability over the whole cluster

It can be displayed as:

python scripts/present_results/postgap_html_report.py --result_file output2.txt --template scripts/present_results/geneReport.html --output report.html

By default, the script writes out a tab delimited file to standard out.

If you wish, you can redirect this into a file:

python POSTGAP.py --disease autism --output results.txt

If you want a JSON dump of all the data retrieved by the pipeline:

python POSTGAP.py --disease autism --output results.json --json
python POSTGAP.py --disease autism --json

You can check the output with the following commands using the data tests.

Check out our Wiki


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4