Ongoing Projects

Ongoing Projects

  • Automated evidence synthesis from published literature e.g. in human behaviour change and addiction. Automated evidence aggregation is essential to keep up with the ever-increasing body of published literature, and to structure and improve reporting of novel results to further enable cumulative domain science.

  • Integration of symbolic and sub-symbolic AI e.g. for the automated extension of ontologies of structured objects, applied to a case study automatically extending the ChEBI ontology.

  • Semantic representation of entities for mental health and the social determinants of health. I am particularly interested in the role that (neutral) information systems can fulfil as a mediator between vastly different scientific/clinical perspectives, since in mental health research there are wide gaps between the terminology and methodological approaches between, for example, clinical research and psychological, the social sciences and the biological. I currently develop the Mental Functioning suite of ontologies for mental health research.

  • Automated comparison and integration of theories in the behavioural sciences with a view to better supporting the use of theory in intervention development and the aggregation of evidence which has been gathered from different theoretical perspectives.

Historical Projects

For my PhD in computational biology, I investigated the metabolic influences on ageing in C. elegans using a variety of data science methods to study a time series dataset of interlinked gene expression and metabolite level measurements. In addition to implementing a standard workflow for each data layer of data cleaning, harmonisation, normalisation and interpretation using regression-based models, I was particularly interested in developing integrative approaches to co-analyse the two layers together in ‘multi-omics’ approaches. To this end I investigated methods for the data-driven inference of bipartite gene-metabolite networks. A particular area of my research involved the use of mathematical constraint-based modelling of metabolism harnessing an approach known as flux balance analysis. I contributed to the development of the WormJam whole-genome model of C. elegans metabolism and I developed a novel method for combining time-series metabolomics data with this type of metabolic model.

Historically, I was involved in the development of the ChEBI chemical ontology, the CHEMINF ontology of chemical information entities, the eNanoMapper ontology for nanomaterials, and the Basic Formal Ontology shared, standardised upper level ontology. During the time I was working on the ChEBI chemical ontology I became interested in the interrelationship between computational logic-based knowledge representations (i.e. ontologies) and the chemical classifications that are based on graph-based representations of chemical structural information to achieve the objective of structure-based chemical ontology classification. Structure-based chemical ontology classification refers, broadly, to two interconnected sets of capabilities. The first is for a hierarchy of chemical ontology classes to be dynamic and computable (i.e. self-rearranging when content is updated), including specification of ‘full’ (i.e., necessary and sufficient) class definitions. The second is for novel chemical structurally defined entities (e.g. molecules) to be able to be automatically and computationally placed appropriately within this ontology hierarchy. Ontology technologies such as the semantic web standard Web Ontology Language - OWL - are able to perform very efficient automated classification of large knowledge bases based on logically encoded full class definitions. On the other hand, within the cheminformatics domain there are technologies for chemical structure encoding, defining and matching at the class level, e.g. the SMARTS language for specifying classes of molecules based on structural patterns. Chemical ontology sits uneasily between these two traditions with their separate technological infrastructures, and it is an exciting research frontier to create bridges between them to enable structure-based chemical ontology classification. My early attempt to support chemical structures from within an extension of the OWL ontology language to support graph-like structures was reported in 2010, and this work was later separately developed into a separate logical formalism dedicated to representing and reasoning with graph-based structures and rules. However, as this formalism is not part of the core OWL ontology language it has not been widely adopted so far, and native cheminformatics technologies for performing operations with chemical structures are still more efficient. Within the cheminformatics community, the SMARTS language provides a natural formalism for encoding chemical class definitions. Both commercial and non-commercial methods have been developed to associate chemical ontology classes with SMARTS patterns for the purpose of linking cheminformatics to chemical ontology. At the time of writing, the ClassyFire algorithm is the state of the art for structure-based chemical ontology classification, with the largest knowledge base of rules and an associated ontology of 4,825 classes. However, ClassyFire does not harness OWL technology, and the SMARTS chemical class definitions are not integrated with the definitions of ontology classes, nor are the rules for selecting the most appropriate parent when several structural matches are obtained integrated. Thus, the associated chemical ontology still has to be maintained manually, and updating the integrated knowledge system can only be accomplished by updating the custom software suite.

Moreover, I have also worked on methods development to better harness the knowledge encoded in ontologies for research purposes, and on best practices in ontology development.