Annotation

Map information
Entities
Protein
RNA
Gene
Metabolite
Drug
Complex
Compartment
Phenotype
Interactions

Annotation of a disease map includes 1) information about the map such as the title, authorship and licence, 2) identification of entities on the map such as proteins, RNAs, genes and metabolites, and 3) providing evidence for the interactions between map entities - references to publications.

In practice, because most entities in signalling pathways are proteins and in metabolic pathways – metabolites, the main attention should be directed to naming proteins according to HUGO Gene Nomenclature Committee (HGNC) names (it allows automatic annotation in MINERVA), and to manual annotation of metabolites, as well as to adding evidence for interactions.

Map information

Suggested fields for map annotation are listed in the table below. In CellDesigner this type of annotation can be added via Component > Model Information, Component > Model Description, Component > Model MIRIAM Info and Component > Model Notes. Some fields can be annotated on upload to MINERVA in the Add project window: map ID, map name, taxonomy ID, disease ID and map version.

Table 1. Suggested fields for map information, required* and optional.

Map annotation field	Comment
Map ID*	A short string of letters and numbers. It will be part of the map URL
Map name*	The title of the map. For example: Altzheimer’s Disease Map
Authors*	Authors’ names, affiliations, ORCIDs, contact email
Taxonomy (Organism)*	For human diseases: NCBI:txid9606
Disease name*	For example: Alzheimer’s disease
Disease ID*	For example, for Alzheimer’s disease: MESH:D000544 (DOID:10652, EFO:0000249, ICD10CM:G30, ICD9CM:331.0, KEGG:05010)
Licence*	Recommended licence for disease maps is Creative Commons Attribution 4.0 International License (CC BY 4.0)
Version	Specifies map version
Last updated	The date of the last update of the map
Derived from	If applicable - the source the map is derived from
Description	Map description: purpose, linked projects, objectives, content etc.
DOI	If available. Stable address for online browsing and exploration
Project homepage	If available. Map project homepage URL

Entities

Different types of entities in SBGN diagrams with the corresponding recommended annotation are shown in the table below.

Table 2. Recommended annotation of map entities. Resources: HGNC, UniProt, Complex Portal, ChEBI, PubChem, ChEMBL, DrugBank, Cell Ontology, BRENDA, Cellosaurus, Gene Ontology and MeSH.

Biochemical entity	Naming	Identifier
Protein	HGNC official symbol	UniProt / HGNC
RNA	HGNC official symbol	HGNC
Gene	HGNC official symbol	HGNC
Metabolite	ChEBI / PubChem recommended name	ChEBI / PubChem
Drug	ChEBI / PubChem recommended name	ChEBI / PubChem / ChEMBL / DrugBank
Complex	Specific name from literature or listing complex components: Element A:Element B	Not required. If available: Complex Portal
Compartment	Appropriate term from Cell Ontology, BRENDA, Cellosaurus, or a specific name from literature	Cell Ontology / BRENDA / Cellosaurus
Phenotype (biological processes)	Appropriate Gene Ontology (GO) Biological Process (BP) term if available	GO Biological Process
Phenotype (symptom, disease state)	Appropriate MeSH term if available	MeSH

To add MIRIAM annotation to entities in CellDesigner, please use the MIRIAM tab in the bottom panel of CellDesigner. For example, as instructed in CellDesigner Help, to add a UniProt ID to a protein in CellDesigner, click “Add relation” and then in the Relation field from the drop-down menu choose “bqbiol:isVersionOf”, then in the DataType field from the drop-down menu choose “UniProt”, and finally in the ID field add value, for example “P23219” for MAPK3 (ERK1).

Protein

Protein (“generic protein” in CellDesigner; “macromolecule” glyph in SBGN PD specification) should be annotated with UniProt ID and named according to HUGO Gene Nomenclature Committee (HGNC) names.

With the use of the MINERVA automatic annotation functionality, manually adding UniPort IDs in CellDesigner can be skipped as soon HGNC official names are used for naming proteins. The same rules can be applied for annotating genes and RNAs.

In some cases it is not possible or not convenient to provide entity ID, for example when a “generic entity” is used. ERK1/2 can be used instead of showing two specific proteins: ERK1 (MAPK3, UniProt:P27361) and ERK2 (MAPK1, UniProt:P28482). This may happen if information is incomplete or for creating a compact representation and avoiding combinatorial explosion in an attempt to show all possible specific entities and the corresponding processes.

RNA

RNA (“RNA” in CellDesigner; “nucleic acid feature” in SBGN PD specification with unit of information “ct:RNA”) should be annotated with HGNC ID. To skip manual annotation, please name RNAs using HGNC names, and the entities will be automatically annotated on upload to MINERVA.

Gene

Gene (“gene” in CellDesigner; “nucleic acid feature” in SBGN PD specification with unit of information “ct:gene”) should be annotated with HGNC ID. To skip manual annotation, please name genes using HGNC names, and the entities will be automatically annotated on upload to MINERVA.

Metabolite

Metabolite (“simple chemical” glyph in SBGN PD specification, or “simple molecule” in CellDesigner) should be annotated with ChEBI ID. To add a ChEBI ID to a metabolite in CellDesigner, select the MIRIAM tab in the bottom panel of CellDesigner > click “Add relation” > from the drop-down menu choose “bqmodel:isDescribedBy” in the Relation field > then from the drop-down menu choose “ChEBI” in the DataType field, and then add value in the ID field, for example “CHEBI:15843” for arachidonic acid.

Manual annotation is advised for metabolites. Automatic annotation functionality in MINERVA works for metabolites to some extent but normally there are many synonyms and the only proper way to identify a metabolite is by finding it in a metabolic database via synonyms or via its structure.

Drug

Drug (dedicated “drug” element or “generic protein”, “simple molecule” or “unknown” in CellDesigner; “macromolecule”, “simple chemical” or “unspecified entity” glyphs in SBGN PD specification) should be annotated with an ID from a drug databases such as DrugBank, or metabolic databases such as ChEBI, PubChem or ChEMBL.

Complex

Complex (“complex” in CellDesigner; “complex” in SBGN PD specification) should be named according to its content, for example “FCER1A:FCER1G:MS4A2”, unless there is a special name for a particular complex, for example “FcεR1”. Additional annotation is optional and for that GO Cellular Component (CC) term or Complex Portal identifier can be used.

Compartment

Compartment (“compartment” in CellDesigner; “compartment” in SBGN PD specification) refers to a subcellular location or cell type and should be annotated with Cell Ontology, for example, Cell Ontology GO:0005737 for “cytoplasm”.

Phenotype

Phenotype (“phenotype” in CellDesigner; “phenotype” in SBGN PD specification) is considered a type of process in SBGN PD (see the PD Reference Cards) but in CellDesigner can be used also as a trigger or as a reference to a submap (correspondingly “perturbing agent” glyph and “submap” glyph in SBGN PD). To annotate phenotype in CellDesigner as a biological process, please use appropriate GO Biological Process term and GO Biological Process identifier. To annotate phenotype as a symptom or a disease state, please use appropriate MeSH term and MeSH identifier.

Interactions

Interactions should be annotated with references in the form of PMIDs or DOIs in case PMIDs are not available. These references provide evidence that the interaction exists in the context of the diseases. Three references should be provided for each interaction (Kondratova et al, 2018 PMID:29688383).

If provided evidence are built on cell or animal models, it is important to annotate the corresponding interactions and annotate them with the NCBI taxon ID for the non-human organism. For example, if the phosphorylation of STAT3 by JAK2 was determined in mice, the NCBI Taxon NCBI:txid10090 should be added.

Recommended publications

Niarakis A, Kuiper M, Ostaszewski M, Malik Sheriff RS, Casals-Casas C, Thieffry D, Freeman TC, Thomas P, Touré V, Noël V, Stoll G, Saez-Rodriguez J, Naldi A, Oshurko E, Xenarios I, Soliman S, Chaouiya C, Helikar T, Calzone L. Setting the basis of best practices and standards for curation and annotation of logical models in biology-highlights of the [BC]2 2019 CoLoMoTo/SysMod Workshop. Brief Bioinform. 2021 Mar 22;22(2):1848-1859. doi: 10.1093/bib/bbaa046. PMID: 32313939.

Hanspers K, Kutmon M, Coort SL, Digles D, Dupuis LJ, Ehrhart F, Hu F, Lopes EN, Martens M, Pham N, Shin W, Slenter DN, Waagmeester A, Willighagen EL, Winckers LA, Evelo CT, Pico AR. Ten simple rules for creating reusable pathway models for computational analysis and visualization. PLoS Comput Biol. 2021 Aug 19;17(8):e1009226. doi: 10.1371/journal.pcbi.1009226. PMID: 34411100.

Touré V, Vercruysse S, Acencio ML, Lovering RC, Orchard S, Bradley G, Casals-Casas C, Chaouiya C, Del-Toro N, Flobak Å, Gaudet P, Hermjakob H, Hoyt CT, Licata L, Lægreid A, Mungall CJ, Niknejad A, Panni S, Perfetto L, Porras P, Pratt D, Saez-Rodriguez J, Thieffry D, Thomas PD, Türei D, Kuiper M. The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). Bioinformatics. 2021 Apr 5;36(24):5712-5718. doi: 10.1093/bioinformatics/btaa622. PMID: 32637990.

Bernal-Llinares M, Ferrer-Gómez J, Juty N, Goble C, Wimalaratne SM, Hermjakob H. Identifiers.org: Compact Identifier services in the cloud. Bioinformatics. 2021 Jul 19;37(12):1781-1782. doi: 10.1093/bioinformatics/btaa864. PMID: 33031499.

Juty N, Le Novère N, Laibe C. Identifiers.org and MIRIAM Registry: community resources to provide persistent identification. Nucleic Acids Res. 2012 Jan;40(Database issue):D580-6. doi: 10.1093/nar/gkr1097. Epub 2011 Dec 2. PMID: 22140103.

Thiele I, Palsson BØ. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010 Jan;5(1):93-121. doi: 10.1038/nprot.2009.203. Epub 2010 Jan 7. PMID: 20057383.