We carry out research on many topics in protein informatics, including the following areas. Please use the thumbnails to navigate to summaries of the research by group members in each area.
Proteins can fold during their synthesis by the ribosome through the process of co-translational folding. Such co-translational folding occurs out of equilibrium, meaning that the speed of translation can drastically influence the ability of proteins to fold and function. Changes to translation speed can prevent folding, increase misfolding, cause aggregate formation, and are implicated as causal factors in cystic fibrosis, hemophilia, and certain cancers. Most protein structure prediction methods do not consider the biological process of translation when making predictions and thus neglect the influence of translation speed. I am working to incorporate aspects of the non-equilibrium nature of protein synthesis into the protein sequence-to-structure prediction method SAINT2. The updated software will hopefully improve predictions and provide a tool capable of predicting the influence of changes to translation speed on proteins.
The pharmaceutical industry regularly uses Hydrogen Deuterium exchange mass spectrometry (HDX-MS) to inform key decisions in small molecule, antibody, and vaccine R&D. However, the statistical analysis of HDX-MS remains primitive, holding back important - potentially life-changing - discoveries. One key complication is that peptide spectra are manually assessed for quality, and peptide masses are frequently corrected by domain experts. Furthermore, excessive amounts of HDX-MS data are discarded, and inappropriate statistical methods are routinely applied. I develop scalable and extensible software methods to improve reproducibility and interpretation in structural mass spectrometry, along with statistical and machine learning tools for analyzing such data.
Protein engineering — the design of protein variants with desirable properties — is a central pursuit in biotechnology. In therapeutic discovery, after a promising antibody candidate has been found, it is often necessary to reduce immunogenicity, eliminate aggregation or increase plasma half-life while preserving binding affinity. In synthetic biology, engineered enzymes — for example, PETases that can rapidly degrade plastic, or designed enzymes that can catalyse new reactions — can be improved by increasing thermal stability and enhancing expressibility while conserving, or even boosting, catalytic efficiency. These pursuits have traditionally been carried out experimentally, either by rationally designing mutations, or with directed evolution, techniques which are limited to a small number of tested variants. In recent years, novel computational tools have arisen that can screen hundreds of thousands or millions of variants in short times. I am interested in progressing this field by developing multimodal deep learning methods, which incorporate diverse sources of biological information, to deliver the next generation of protein engineering algorithms.
My research applies immunoinformatics to improve therapeutic design and to better our understanding of the immune response. During my DPhil, I captured and compared structural representations of therapeutic and natural antibodies, leading to new structure-aware approaches for in silico developability assessment and screening library design. My research is now focused on incorporating structural awareness to improve our ability to identify broader sets antibodies with functional commonality and to define the functional boundaries of different classes of adaptive immune receptor.
Antibodies are proteins produced by the immune system to neutralise pathogens through interactions with their targets, called antigens, and present an attractive avenue for biotherapeutic development. The affinity of an antibody for its antigen is determined by the kinetic profile of the antibody-antigen interaction and two antibodies with the same affinity for their antigens may have vastly different binding and unbinding rates. Even though affinity maturation is often part of antibody lead optimisation, high affinity does not always correlate with neutralising activity. My DPhil project aims to investigate the determinants of binding rates with the aim of providing an in silico tool with which these can be modulated. To experimentally validate my results, I plan to use antibodies effective against Plasmodium falciparum malaria whose neutralising activity has been shown the be highly dependent on binding rates.
Antibodies are a class of protein produced by B-cells during an immune response. Antibody binding is controlled by six loops known as the complementarity determining regions, or CDRs. The large structural diversity of the CDR-H3 loop enables antibodies to bind with high affinity and specificity to almost any antigen. On the other hand, this diversity also makes CDR-H3 structure prediction one of the main challenges in antibody modelling. Recent advances in deep learning have been shown to greatly improve general protein structure prediction. Applying the deep learning methods developed for general protein structure prediction, I aim to improve structural modelling of the CDR-H3 loop.
Antibodies are important as proteins of the immune system and as therapeutics. However, as a consequence of their complex nature, laboratory based therapeutic antibody discovery is an expensive and time consuming process. During my DPhil, I will be developing deep learning tools, inspired by the current advances in other machine learning fields (e.g. natural language processing), for improving antibody design and enabling the exploration of a larger antibody space.
Antibodies and alternative antibody molecules (e.g. nanobodies) are increasingly important classes of therapeutics characterized by high binding specificity and affinity for targets. However, developing antibody therapeutics is time consuming and costly. My research aims to address this by developing machine learning methods to improve computational antibody design against a desired epitope. In particular, I will be focusing on in silico antibody affinity maturation and humanization.
Recently, the need for rapid vaccine and antibody-therapeutic development has become widely recognised. Thanks to the vast volume of antibody data now available, computational methods offer perhaps the greatest opportunity to both speed up and reduce the cost of this development process. My DPhil aims to advance this area further by using Machine Learning techniques to identify promising antibody-antigen leads with diverse sequence and structure profiles that can then be taken into the lab.
Antibodies are a highly successful class of biotherapeutic, however, their high molecular weight poses some challenges during production and manufacture. Nanobodies offer potential as an alternative, as they are much smaller and can show comparable specificity and affinity. However, developing biotherapeutics is non-trivial; issues can arise during manufacturing that may impede the success of the product. My research will concentrate on developing computational tools to highlight and predict developability issues in potential nanobody therapeutics.
Antibodies are an important component of the immune system and are increasingly used as therapeutics. Recent advances in protein structure modelling make it possible to accurately predict the structure of antibodies from their amino acid sequence. A limitation of current structure prediction tools is that they only predict the structure of a single conformation of an antibody. However, antibodies are flexible molecules that frequently transition between a set of distinct structural conformations and flexibility is key to many functional properties. During my DPhil, I aim to develop antibody structure prediction tools that capture the flexibility of antibodies and predict the structure of multiple conformations.
T cells are a key part of our immune system, responsible for fighting pathogens and regulating immune responses. To identify foreign invaders T cells, use their receptors (TCRs) to rapidly screen and identify antigens. Although, key to our health and survival, the map between TCR composition and antigens is still poorly understood. I aim to apply newly develop deep learning models in protein structure prediction to TCR data to better understand the rules that govern antigen-specific T cell response.
My primary focus is on methods development in computer-aided drug discovery, chiefly in high throughput docking, ligand-based virtual screening, network pharmacology, cheminfomatics, bioinformatics, machine learning and more recently protein engineering. Current research projects include: addressing the limitations of scoring functions in docking, in particular to improve our understanding of molecular recognition of small molecules; handling receptor flexibility in protein-ligand docking; and fragment-based drug discovery.
Molecular machine learning techniques have recently shown great promise for important computational drug discovery tasks such as molecular property prediction and activity cliff prediction. The success of such methods, however, crucially depends on the way in which molecular compounds are transformed into informative feature vectors that can be fed into a machine learning pipeline. This is referred to as the problem of molecular representation. In my DPhil project, I am investigating the potential of modern graph-based molecular representation techniques to outperform classical molecular representations such as structural fingerprints and physicochemical descriptor vectors. I am particularly interested in developing novel self-supervised learning strategies for graph neural networks operating on molecular graphs, identifying and removing hidden performance barriers of state-of-the-art molecular representation methods, and using the gained insights to design new tailored deep learning architectures for molecular property prediction and activity cliff prediction.
Accurately predicting the binding affinity of a small molecule to a protein target is a key problem in both molecular docking and virtual screening. My research involves investigating how machine learning techniques can be used to effectively leverage the increasing abundance of binding affinity and protein structure data to improve the scoring functions used to predict binding affinity. I am particularly interested in identifying which molecular features are most informative of binding activity and how this varies between families of proteins.
Fragment based-drug screening is a popular approach for drug discovery in which a set of small chemical compounds is assayed against biological targets in order to identify weekly binding hits that can be later exploited to produce lead compounds. My research focuses on the development of machine learning tools that exploit 3D information about hits and protein targets with the aim of facilitating drug discovery. I am particularly interested in automatic fragment merging and compound scoring using non-supervised machine learning techniques. In collaboration with the XChem team, we pursue to implement ready-to-use applications that can help experimentalist with no computational skills to perform better experiments.
Immunomodulatory imide drugs (IMIDs) are an important class of medicinal compounds used in the treatment of multiple myeloma. They include the drug thalidomide, as well as its analogues lenalidomide and pomalidomide. Only in recent years has the mechanism of action for IMIDs in human cells been discovered: the IMID acts as a ‘molecular glue’, bridging together the ubiquitin E3 ligase cereblon with a variety of other proteins, known as ‘neosubstrates’. After binding to cereblon, the neosubstrates become polyubiquinated and subsequently degraded - the cellular ‘kiss of death’. Many of cereblon’s neosubstrates share a common region based on a zinc finger motif. Structural studies have shown that this motif - known as a ‘degron’ - is required for the neosubstrate to bind to the cereblon/IMID binary complex. Furthermore, this degron can be added to genes for other, unrelated proteins, resulting in a fusion protein that contains an inducible off-switch. Addition of an IMID will cause the fusion protein to become polyubiquitinated and degraded. Thalidomide and its known analogues are not always suitable for this kind of system, as they have a variety of off-target effects. My research aims to explore how the structures of both the IMID small molecule inducer and the fusion protein’s degron can be modified so that the IMID/degron ‘off switch’ system is orthogonal: i.e. the small molecule will only interact with its intended target, and not any of the natural, degron-containing proteins present in human cells.
Fragment-based drug design campaigns rely on initial fragment screens thoroughly exploring the target binding site. However, this is not guaranteed with current fragment libraries, which are designed to be as diverse as possible. In collaboration with XChem at Diamond Light Source, I am using historic fragment screens to further understand protein-fragment interactions. This will provide a basis for the generation of target-specific fragment libraries that can more comprehensively explore the binding site and increase the potential for diverse lead compounds.
Ideally, in drug discovery, once a target has been identified and characterised, it should be possible to efficiently and exhaustively search chemical space for bioactive molecules with specific physio-chemical properties. Recent progress in computational capabilities and machine- learning methods mean that De Novo molecular generation tools are a step toward the above ideal. In collaboration with Exscientia, my research will focus on further developing machine learning tools for molecular design. Subsequent work will involve the elucidation and development of chemical synthesis pathways that will enable us more effectively make new compounds to tackle disease.
My overarching research interest is to develop machine learning and specifically deep learning algorithms that work reliably and robustly in the setting of early-stage drug discovery. Recent years have seen a resurgence of interest in using deep learning algorithms to predict pharmaceutically relevant properties of small molecules and biologics. However, the settings in which they would have the most practical impact typically violate the explicit and implicit assumptions that underpin many state-of-the-art approaches to training and inference. The central objective of my work is to modify and adapt modern machine learning algorithms to ensure their usefulness in practical settings.
Fragment-based drug discovery (FBDD) involves the screening of low-molecular-weight compounds against a target of interest that can be optimized to become larger, more potent lead-like compounds. In collaboration with XChem, I will be exploring how to exploit the rich structural data that result from crystallographic fragment screens to guide fragment-to-lead optimization, primarily using fragment merging approaches. Initial work will focus on improving the efficiency with which we can sample accessible chemical space by identifying fragment merges from commercially available compound libraries, thus overcoming issues with synthetic accessibility. Subsequent work will explore how to prioritize molecules for purchase and/or synthesis and the use of de novo design to generate novel compounds.
Fragment-based drug discovery consists of developing compounds for a target beginning from fragments that are known to weakly bind to it. When elaborating on a fragment in such campaigns, information about known ligands and the protein pocket can both be leveraged to maximise the binding ability of the end result. However, using information about known ligands has been demonstrated to bias the drug design process towards compounds similar to those already in use. In collaboration with IBM Research, I am investigating ways to perform fragment elaboration using exclusively information from the protein pocket.
I am interested in machine learning models that predict protein-ligand binding. Currently I am working with graph neural networks and generative models.
Computational tools in drug discovery are typically tested on clean and often flawed benchmarks, especially with machine-learning based tools, leading to optimistic characterisation of the tool's ability. My research interests are exploring how this difference in proposed performance and real-life performance can be accounted for and measured, specifically for predicting the binding affinity between a small molecule drug and a protein target. I will hopefully build upon this to improve the accuracy and generalisability of these models and then incorporate structural uncertainty into these models decision making.
My research aims to increase the efficiency of hit finding and development in small molecule drug discovery. By utilizing statistical and machine learning techniques, we can tackle the difficult task of deconvoluting structural and biophysical data of thousands of compounds, which can reduce the time and improve the accuracy of identifying hits. In my collaboration with XChem, I am working directly with crystallographers and chemists to develop these computational tools that will reduce their workload.
Antimicrobial resistance (AMR) is one of the leading public health concerns of the 21st century and is becoming an increasingly intractable problem as the continued overuse of antimicrobials in health and agriculture is exacerbating the rate at which resistance is developing and propagating. In collaboration with Oracle, my DPhil project therefore focuses on building generalisable machine learning and deep learning models featurised with structural and physiochemical information of the drug target to predict AMR against Mycobacterium tuberculosis within a diagnostic framework. I am also a member of the Modernising Medical Microbiology group at the NDM.