Affiliations
AffiliationsItem in Clipboard
A protocol for adding knowledge to Wikidata: aligning resources on human coronavirusesAndra Waagmeester et al. BMC Biol. 2021.
doi: 10.1186/s12915-020-00940-y. AffiliationsItem in Clipboard
Erratum inWaagmeester A, Willighagen EL, Su AI, Kutmon M, Gayo JEL, Fernández-Álvarez D, Groom Q, Schaap PJ, Verhagen LM, Koehorst JJ. Waagmeester A, et al. BMC Biol. 2023 Nov 16;21(1):261. doi: 10.1186/s12915-023-01764-2. BMC Biol. 2023. PMID: 37974169 Free PMC article. No abstract available.
Background: Pandemics, even more than other medical problems, require swift integration of knowledge. When caused by a new virus, understanding the underlying biology may help finding solutions. In a setting where there are a large number of loosely related projects and initiatives, we need common ground, also known as a "commons." Wikidata, a public knowledge graph aligned with Wikipedia, is such a commons and uses unique identifiers to link knowledge in other knowledge bases. However, Wikidata may not always have the right schema for the urgent questions. In this paper, we address this problem by showing how a data schema required for the integration can be modeled with entity schemas represented by Shape Expressions.
Results: As a telling example, we describe the process of aligning resources on the genomes and proteomes of the SARS-CoV-2 virus and related viruses as well as how Shape Expressions can be defined for Wikidata to model the knowledge, helping others studying the SARS-CoV-2 pandemic. How this model can be used to make data between various resources interoperable is demonstrated by integrating data from NCBI (National Center for Biotechnology Information) Taxonomy, NCBI Genes, UniProt, and WikiPathways. Based on that model, a set of automated applications or bots were written for regular updates of these sources in Wikidata and added to a platform for automatically running these updates.
Conclusions: Although this workflow is developed and applied in the context of the COVID-19 pandemic, to demonstrate its broader applicability it was also applied to other human coronaviruses (MERS, SARS, human coronavirus NL63, human coronavirus 229E, human coronavirus HKU1, human coronavirus OC4).
Keywords: COVID-19; Linked data; Open Science; ShEx; Wikidata.
Conflict of interest statementAll authors have declared to have no competing interests.
FiguresFig. 1
Structure of a Wikidata item,…
Fig. 1
Structure of a Wikidata item, containing a set of statements which are key-value…
Fig. 1Structure of a Wikidata item, containing a set of statements which are key-value pairs, with qualifiers and references. Here the item for the angiotensin-converting enzyme 2 (ACE2) protein is given containing a statement about its molecular function. This molecular function (peptidyl-dipeptidase activity) contains a reference stating when and where this information was obtained
Fig. 2
Example of an RDF data…
Fig. 2
Example of an RDF data model representing ACE2, created with RDFShape [32]
Fig. 2Example of an RDF data model representing ACE2, created with RDFShape [32]
Fig. 3
Overview of the ShEx schemas…
Fig. 3
Overview of the ShEx schemas and the relations between them. All shapes, properties,…
Fig. 3Overview of the ShEx schemas and the relations between them. All shapes, properties, and items are available from within Wikidata
Fig. 4
Application of the drafted ShEx…
Fig. 4
Application of the drafted ShEx schemas in the EntitySchema extension of Wikidata allows…
Fig. 4Application of the drafted ShEx schemas in the EntitySchema extension of Wikidata allows for confirmation if a set of on-topic items align with expressed expectations. In panel a, the application renders the Wikidata item invalid due to a missing reference which in turn does not conform to the expressed ShEx whereas in panel b, the item (Q88292589) conforms to the applied schema
Fig. 5
Screenshot of SARS-CoV-2 and COVID-19…
Fig. 5
Screenshot of SARS-CoV-2 and COVID-19 Pathway in WikiPathways ( wikipathways:WP4846 ) showing the…
Fig. 5Screenshot of SARS-CoV-2 and COVID-19 Pathway in WikiPathways ( wikipathways:WP4846 ) showing the BridgeDb popup box for the ORF3a protein, showing a link out to Scholia via the protein and gene’s Wikidata identifiers
Fig. 6
Screenshot of the Scholia page…
Fig. 6
Screenshot of the Scholia page for the SARS-CoV-2 spike glycoprotein, it shows four…
Fig. 6Screenshot of the Scholia page for the SARS-CoV-2 spike glycoprotein, it shows four articles that specifically discuss this protein
Fig. 7
Comparison of two Wikidata entries…
Fig. 7
Comparison of two Wikidata entries for the SARS-CoV-2 membrane protein. An overlap between…
Fig. 7Comparison of two Wikidata entries for the SARS-CoV-2 membrane protein. An overlap between a Wikidata item and a concept from a primary source needs to have some overlap to allow automatic reconciliation. If there is no overlap, duplicates will be created and left for human inspection. Since this screenshot was made, the entries have been merged in a manually curation process
Fig. 8
Flow diagram for entity schema…
Fig. 8
Flow diagram for entity schema development and the executable workflow for the virus…
Fig. 8Flow diagram for entity schema development and the executable workflow for the virus gene protein bot. a The workflow of creating shape expressions. b The computational workflow of how information was used from various public resources to populate Wikidata
Fig. 9
JavaScript Object notation output of…
Fig. 9
JavaScript Object notation output of the mygene.info output for gene with NCBI gene…
Fig. 9JavaScript Object notation output of the mygene.info output for gene with NCBI gene identifier 43740571
Fig. 10
The UniProt SPARQL query used…
Fig. 10
The UniProt SPARQL query used to obtain additional protein annotations, descriptions, and external…
Fig. 10The UniProt SPARQL query used to obtain additional protein annotations, descriptions, and external resources
Similar articlesMilewska A, Chi Y, Szczepanski A, Barreto-Duran E, Dabrowska A, Botwina P, Obloza M, Liu K, Liu D, Guo X, Ge Y, Li J, Cui L, Ochman M, Urlik M, Rodziewicz-Motowidlo S, Zhu F, Szczubialka K, Nowakowska M, Pyrc K. Milewska A, et al. J Virol. 2021 Jan 28;95(4):e01622-20. doi: 10.1128/JVI.01622-20. Print 2021 Jan 28. J Virol. 2021. PMID: 33219167 Free PMC article.
Obajuluwa AO, Okiki PA, Obajuluwa TM, Afolabi OB. Obajuluwa AO, et al. Pan Afr Med J. 2020 Nov 30;37:285. doi: 10.11604/pamj.2020.37.285.24663. eCollection 2020. Pan Afr Med J. 2020. PMID: 33654512 Free PMC article.
Malik YA. Malik YA. Malays J Pathol. 2020 Apr;42(1):3-11. Malays J Pathol. 2020. PMID: 32342926 Review.
Araf Y, Faruqui NA, Anwar S, Hosen MJ. Araf Y, et al. Int Microbiol. 2021 Jan;24(1):19-24. doi: 10.1007/s10123-020-00152-y. Epub 2020 Nov 24. Int Microbiol. 2021. PMID: 33231780 Free PMC article. Review.
Waagmeester A, Willighagen EL, Su AI, Kutmon M, Gayo JEL, Fernández-Álvarez D, Groom Q, Schaap PJ, Verhagen LM, Koehorst JJ. Waagmeester A, et al. BMC Biol. 2023 Nov 16;21(1):261. doi: 10.1186/s12915-023-01764-2. BMC Biol. 2023. PMID: 37974169 Free PMC article. No abstract available.
Miller RA, Kutmon M, Bohler A, Waagmeester A, Evelo CT, Willighagen EL. Miller RA, et al. PLoS One. 2022 Apr 18;17(4):e0263057. doi: 10.1371/journal.pone.0263057. eCollection 2022. PLoS One. 2022. PMID: 35436299 Free PMC article.
Turki H, Jemielniak D, Hadj Taieb MA, Labra Gayo JE, Ben Aouicha M, Banat M, Shafee T, Prud'hommeaux E, Lubiana T, Das D, Mietchen D. Turki H, et al. PeerJ Comput Sci. 2022 Sep 29;8:e1085. doi: 10.7717/peerj-cs.1085. eCollection 2022. PeerJ Comput Sci. 2022. PMID: 36262159 Free PMC article.
Zhan C, Zhang Y, Liu X, Wu R, Zhang K, Shi W, Shen L, Shen K, Fan X, Ye F, Shen B. Zhan C, et al. Comput Struct Biotechnol J. 2021 Nov 16;19:6098-6107. doi: 10.1016/j.csbj.2021.11.011. eCollection 2021. Comput Struct Biotechnol J. 2021. PMID: 34900127 Free PMC article.
Shafee T, Mietchen D, Lubiana T, Jemielniak D, Waagmeester A. Shafee T, et al. PLoS Comput Biol. 2023 Jul 20;19(7):e1011235. doi: 10.1371/journal.pcbi.1011235. eCollection 2023 Jul. PLoS Comput Biol. 2023. PMID: 37471307 Free PMC article. No abstract available.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4