A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ELTEbioinformatics/GMT_files_for_mulea below:

ELTEbioinformatics/GMT_files_for_mulea: GMT files for the mulea R package

This repository provides ready-to-use gene sets formatted in the standardized Gene Matrix Transposed (GMT) format, compatible with the mulea R package, a comprehensive tool for overrepresentation and functional enrichment analysis.

The GMT format is a tab-delimited text file used to represent collections of genes or proteins associated with specific ontology entries. Each row in a GMT file corresponds to a single ontology element and comprises three main columns:

  1. Ontology identifier: This column uniquely identifies the element within the referenced ontology.

  2. Ontology name or description: This column provides a user-friendly label or textual description for the ontology element.

  3. List of associated genes/proteins: This column lists the gene or protein identifiers belonging to the corresponding ontology element, separated by spaces.

Within the mulea package, these entities are referred to as ontology_id, ontology_name, and list_of_values, respectively. Additionally, rows starting with a “#” symbol in the GMT file are considered comment lines and may contain supplementary information about the referenced ontology, such as its type, source, species, version, and identifier.

This repository offers pre-processed gene sets for 27 model organisms (from Escherichia coli to human) with various identifiers including UniProt, Entrez, Gene Symbol, and Ensembl IDs.

The GMT files can be found in the GMT_files folder, and the scripts we applied to create them are available in the scripts_to_create_GMT_files folder. Also, there is a script for mapping between different ID types in the scripts_to_create_GMT_files/ID_mapping_scripts folder.

The GMT files can be downloaded and read with the mulea::read_gmt() function. i.e.

mulea::read_gmt(file = "Transcription_factor_TFLink_Drosophila_melanogaster_LS_GeneSymbol.gmt")

Or can be loaded directly from this github repository. i.e.

mulea::read_gmt(file = "https://raw.githubusercontent.com/ELTEbioinformatics/GMT_files_for_mulea/main/GMT_files/Drosophila_melanogaster_7227/Transcription_factor_TFLink_Drosophila_melanogaster_LS_GeneSymbol.gmt")

Besides, we also created the muleaData ExperimentHubData Bioconductor package to ease browsing and reading the ontologies.

List of species we cover:

Type, name, link and citation of the databases we cover:

Ontology category Ontology name Short description of content Reference Gene expression FlyAtlas Tissue-specific expression data for Drosophila melanogaster. Chintapalli,V.R. et al. (2007) Using FlyAtlas to identify better Drosophila melanogaster models of human disease. Nat Genet, 39, 715–720. ModEncode Functional characterization (cell line, temporal expression, tissue expression, treatment) of elements for Caenorhabditis elegans and Drosophila melanogaster. The Modencode Consortium et al. (2010) Identification of functional elements and regulatory circuits by Drosophila modENCODE. Science, 330, 1787–1797. Genomic location Chromosomal Bands Location of genes on the chromosome. Martin,F.J. et al. (2023) Ensembl 2023. Nucleic Acids Res, 51, D933–D941. Consecutive genes n consecutive genes on the chromosome. miRNA regulation miRTarBase Experimentally validated miRNA - target interactions. Huang,H.-Y. et al. (2022) miRTarBase update 2022: an informative resource for experimentally validated miRNA–target interactions. Nucleic Acids Res, 50, D222–D230. Gene Ontology GO Gene Ontology (GO) categorizes genes into unified categories and attributes. The Gene Ontology Consortium et al. (2023) The Gene Ontology knowledgebase in 2023. Genetics, 224, iyad031. Pathway Pathway Commons Collection of biological pathway and interaction data. Rodchenkov,I. et al. (2020) Pathway Commons 2019 Update: integration, analysis and exploration of pathway data. Nucleic Acids Res, 48, D489–D497. Reactome Collection of biological pathway and interaction data. Jassal,B. et al. (2020) The reactome pathway knowledgebase. Nucleic Acids Res, 48, D498–D503. Signalink Interaction database focussing on pathways and interactions of pathways. Csabai,L. et al. (2022) SignaLink3: a multi-layered resource to uncover tissue-specific signaling networks. Nucleic Acids Res, 50, D701–D709. Wikipathways Collection of biological pathway and interaction data. Martens,M. et al. (2021) WikiPathways: connecting communities. Nucleic Acids Res, 49, D613–D621. Protein domain PFAM Protein domain structure database. Mistry,J. et al. (2021) Pfam: The protein families database in 2021. Nucleic Acids Res, 49, D412–D419. Transcription factor regulation ATRM Transcription factor - target gene interactions for Arabidopsis thaliana. Jin,J. et al. (2015) An Arabidopsis transcriptional regulatory map reveals distinct functional and evolutionary features of novel transcription factors. Mol Biol Evol, 32, 1767–1773. dorothEA Transcription factor - target gene interactions for human and mouse. Garcia-Alonso,L. et al. (2019) Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res, 29, 1363–1375. RegulonDB Transcription factor - target gene interactions for Escherichia coli bacteria. Tierrafría,V.H. et al. (2022) RegulonDB 11.0: Comprehensive high-throughput datasets on transcriptional regulation in Escherichia coli K-12. Microb Genom, 8, 000833. TFLink Small- and large-scale transcription factor - target gene interactions for human and 6 model organisms. Liska,O. et al. (2022) TFLink: an integrated gateway to access transcription factor–target gene interactions for multiple species. Database, 2022, baac083. TRRUST Transcription factor - target gene interactions for human. Han,H. et al. (2018) TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res, 46, D380–D386. Yeastract Transcription factor - target gene interactions for Saccharomyces cerevisiae. Teixeira,M.C. et al. (2018) YEASTRACT: an upgraded database for the analysis of transcription regulatory networks in Saccharomyces cerevisiae. Nucleic Acids Res, 46, D348–D353. How to Cite the mulea Package?

To cite package mulea in publications use:

Turek, Cezary, Márton Ölbei, Tamás Stirling, Gergely Fekete, Ervin Tasnádi, Leila Gul, Balázs Bohár, Balázs Papp, Wiktor Jurkowski, and Eszter Ari. 2024. “mulea: An R Package for Enrichment Analysis Using Multiple Ontologies and Empirical False Discovery Rate.” BMC Bioinformatics 25 (1): 334. https://doi.org/10.1186/s12859-024-05948-7.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4