Knowledge formalisation

PubMed search
PMC full-text search
Reference manager
Figure search
Pathway databases

In the context of developing disease maps, “knowledge formalisation” refers to the practice of transforming textual information and figures in publications to the structured format of the Systems Biology Graphical Notation (SBGN) standard (PMID: 19668183). Scattered information about disease mechanisms from hundreds of scientific articles is integrated into a single conceptual disease model.

Text to SBGN

An example of how textual information from an article is being transformed into a diagram in the SBGN Process Description format in CellDesigner is shown on a dedicated “Text to SBGN” page.

Figure to SBGN

How to transform a static cartoon figure from a paper to a standard representation in SBGN is demonstrated in the “Figure to SBGN” project accessible on the Systems Biology Graphical Notation website. The “γCaMKII shuttles Ca²⁺/CaM to the nucleus to trigger CREB phosphorylation and gene expression” diagram was created based on the graphical abstract from the paper by Ma et al, Cell, 2014 (PMID: 25303525). Both Process Description and Activity Flow versions are available.

To efficiently search for articles of interest in PubMed, it is important to know basic search techniques for prioritising publications and focusing on a manageable number of highly relevant papers and then explore from there.

An example search for reviewing disease mechanisms

PubMed search for “asthma mechanisms” (accessed 2022-11-16) is shown in the table below. With every new change in the search parameters or the search query we have fewer search results and can choose a fairly small number of highly relevant publications to start from, so they can be selected and read.

Table 1. An example of optimising search query and search filters for finding and shortlisting relevant publications.

Search query Filters Search results
asthma mechanisms no filters 21,012 results
asthma mechanisms Article Type: Review 7,072 results
asthma mechanisms Article Type: Review, Publication Date: 5 years 1,900 results
asthma[title] mechanisms Article Type: Review, Publication Date: 5 years 623 results
asthma[title] mechanisms[title] Article Type: Review, Publication Date: 5 years 63 results

The search query itself can be made more sophisticated, especially with the use of the Advanced Search, though normally it is sufficient to use a combination of Boolean operators AND, OR and NOT (must be entered in UPPERCASE), where AND is to be used for combined search of terms and OR for any of the listed terms, for example synonyms. Example query:

asthma AND (mechanisms OR pathway OR pathways)

The use of quotation marks allows to search for exact phrases when needed, for example:

“asthma mechanisms” OR “mechanisms of asthma”

Search to confirm a connection or to find a specific detail

Often the objective is not to systematically review all relevant publications but to find out about a specific connection or a specific detail. For example, to find what site is phosphorylated during protein activation, it is enough to find a small number of papers with the required information.

After finding a good paper, another useful tactic is to explore related publications by finding at the bottom of the page the “Similar articles” section and then clicking “See all similar articles”.

In case of a very specific topic and limited information, there is an option to search in full-text papers in PubMed Central (PMC). Please note that PMC has approximately 40% of full-text articles in comparison to the number of articles in PubMed. The differences between PubMed and PMC are explained in the review by Williamson & Minter, 2019 (PMID: 30598645) and also discussed here. To compare the current content of PubMed and PMC, please type “1800:2100[dp]” or “all[sb]” in the search bar. Since we are interested mainly in recent publications after 2010 (accessed 2022-11-16): search “2010:2100[dp]” in PubMed returns 14,934,340 articles (100%), and search “2010:2100[dp]” in PMC returns 6,053,618 full-text articles (40.53%).

Europe PMC

Europe PMC offers other possibilities for exploring research papers and include text mining capabilities for better search of relevant papers (Ferguson et al, 2021 PMID: 33180112).

Reference manager

To organise publications found, a reference manager such as Zotero can be used. It is well-integrated in various browsers and papers can be easily saved individually or in bulk. Introductions to Zotero functionalities are available at many university websites and easy to find. One useful functionality is a shared library many users can access. We also suggest creating a system folders and subfolders for references. They work as tags and one paper can appear in several folders. This way, when revisiting, during map development and annotation, it is easier to find publications relevant to a certain topic or a subpathway.

Figures in published articles are a special and very useful resource for developing disease maps. While they are not in a standard or machine-readable format, the understanding of mechanisms is often very well conveyed and there is a lot of work and expertise behind these conceptual representations and graphical summaries. A good figure from a review paper can be the key starting point for creating a disease map or a top-level overview layer of a disease map.

PMC Image Search is a good way to find such figures in PubMed Central. The way it works is described in the NLM Technical Bulletin in the “Direct Access to Images” section and Figures 7-11.

Google Images is another efficient and convenient search. For example, search for “asthma mechanisms” will immediately offer relevant images from scientific publications. Adding “nature.com” to the search will narrow it down to the publications in Nature journals.

Cell SnapShot is one more interesting resource with many useful visualisations including ones dedicated to various diseases.

Pathway databases

Reusing information from well-curated pathway databases such as Reactome, PANTHER and KEGG is possible with additional contextualisation via confirmation with disease-related publications and modifying the original pathway if needed.