A VEP Plugin to annotate high-impact five prime UTR variants either creating new upstream ORFs or disrupting existing upstream ORFs
Currently, it will annotate whether a small variation (1-5bp) including SNVs, indels and MNVs in 5'UTR would have any of the following molecular consequences:
Highlights:
The annotation output is transcript-specific not restricted to canonical transcript.
The plugin is applicable to annotate 5'UTR in eukaroyotes.
About the role of 5'UTR variants in human genetic disease:
Whiffin, N., Karczewski, K.J., Zhang, X. et al. Characterising the loss-of-function impact of 5’ untranslated region variants in 15,708 individuals. Nat Commun 11, 2523 (2020). https://doi.org/10.1038/s41467-019-10717-9
About UTRannotator:
Annotating high-impact 5'untranslated region variants with the UTRannotator Zhang, X., Wakeling, M.N., Ware, J.S, Whiffin, N. Bioinformatics; doi: https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btaa783/5905476
To use the plugin with VEP, you would need to add the plugin module in Perl's library path. To do this, you could either:
(1) download all the files of this repository to the VEP default path $HOME/.vep/Plugins
or
(2) download the repository and add its path to environment variable $PERL5LIB
.
e.g. Add this line export PERL5LIB=$PERL5LIB:/path/to/UTRannotator
to ~/.bash_profile
.
A written document can be found in this tutorial.
To run the plugin with VEP, you could the following command line:
vep -i test.vcf --tab -plugin UTRannotator -o test.output
If you are using offline version of VEP, it is essential to use reference genome.
vep -i test.vcf --cache --assembly GRCh38 --fasta /path/to/GRCh38.fa --offline --plugin UTRannotator -o test.output
Note, it's necessary to add option --minimal
to transform the alleles into minimal representations if it hasn't been transformed beforehand, especially for variants represented with rs IDs from dbSNP.
Currently, the output format supports default VEP output format, tab-delimited output and VCF output.
If a variant disrupts multiple uORFs, we will output the annotation for each uORF. The output for each uORF will be concatenated with a logical and symbol &
;
The plugin could also check whether an input variant disrupts a verified translated uORF.
To use this option, users would pass an evidence file of a list of verified translated uORFs as input.
Translated small ORF filesFor translated small ORFs in human, we have curated a list of uORFs previously identified with ribosome profiling from the online repository of small ORFs (www.sorfs.org)
This list is available in the repository:
Genome build GRCh37: uORF_starts_ends_GRCh37_PUBLIC.txt
Genome build GRCh38: uORF_starts_ends_GRCh38_PUBLIC.txt
The command to use the file is
vep -i test.vcf --tab -plugin UTRannotator,/path/to/uORF_starts_ends_GRCh37_PUBLIC.txt -o test.output
To use a customized list of translated uORF, users would curate a tab-delimited txt file with the following columns:
For example:
CHR START_POS GENE STRAND TYPE STOP_POS
19 45971469 FOSB forward five_prime_utr 45971714
START_POS
and STOP_POS
are the start genomic position and end genomics position of a small ORF respectively.
The following list is a collection of curated translated small ORF files for other species:
https://github.com/AhmedArslan/orf_mm10 curated by Ahmed Arslan from www.sorfs.org.
The output annotation from the plugin includes 5 fields:
For any 5'UTR variants, the plugin will first output the number of existing subtype uORFs in the 5'UTR:
Field 1 - existing_InFrame_oORFs : The number of existing inframe overlapping ORFs (inFrame_oORF) already within the 5 prime UTR
Field 2 - existing_OutOfFrame_oORFs : The number of existing out-of-frame overlapping ORFs (OutOfFrame_oORF) already within the 5 prime UTR
Field 3 - existing_uORFs : The number of existing uORFs with a stop codon within the 5 prime UTR
If this 5'UTR is uORF-perturbing, the plugin will output the consequence and detailed annotation of each consequence. Otherwise it will output -
:
Field 4 - five_prime_UTR_variant_annotation : Output the annotation of a given 5 prime UTR variant.
Field 5 - five_prime_UTR_variant_consequence : Output the variant consequences of a given 5 prime UTR variant: uAUG_gained, uAUG_lost, uSTOP_gained, uSTOP_lost, uFrameshift.
If a 5'UTR variant perturbs multiple uORFs, the annotation of each uORF will be concatenated with a logical and symbol &
for fields five_prime_UTR_variant_consequence and five_prime_UTR_variant_annotation.
#Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra
5_36877039_CC/A 5:36877039-36877040 A 25836 NM_015384.5 Transcript 5_prime_UTR_variant 169-170 - - - - - IMPACT=MODIFIER;STRAND=1;REFSEQ_MATCH=rseq_mrna_match;existing_InFrame_oORFs=0;existing_OutOfFrame_oORFs=0;existing_uORFs=5;five_prime_UTR_variant_annotation=uFrameShift_Evidence:False,uFrameShift_KozakContext:GCGATGC,uFrameShift_KozakStrength:Moderate,uFrameShift_alt_type:uORF,uFrameShift_alt_type_length:189,uFrameShift_ref_StartDistanceToCDS:324,uFrameShift_ref_type:uORF,uFrameShift_ref_type_length:15;five_prime_UTR_variant_consequence=uFrameShift
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4