We carry out research on many topics in protein informatics, including the following areas. Please use the thumbnails to navigate to summaries of the research by group members in each area.
Proteins can fold during their synthesis by the ribosome through the process of co-translational folding. Such co-translational folding occurs out of equilibrium, meaning that the speed of translation can drastically influence the ability of proteins to fold and function. Changes to translation speed can prevent folding, increase misfolding, cause aggregate formation, and are implicated as causal factors in cystic fibrosis, hemophilia, and certain cancers. Most protein structure prediction methods do not consider the biological process of translation when making predictions and thus neglect the influence of translation speed. I am working to incorporate aspects of the non-equilibrium nature of protein synthesis into the protein sequence-to-structure prediction method SAINT2. The updated software will hopefully improve predictions and provide a tool capable of predicting the influence of changes to translation speed on proteins.
The pharmaceutical industry regularly uses Hydrogen Deuterium exchange mass spectrometry (HDX-MS) to inform key decisions in small molecule, antibody, and vaccine R&D. However, the statistical analysis of HDX-MS remains primitive, holding back important - potentially life-changing - discoveries. One key complication is that peptide spectra are manually assessed for quality, and peptide masses are frequently corrected by domain experts. Furthermore, excessive amounts of HDX-MS data are discarded, and inappropriate statistical methods are routinely applied. I develop scalable and extensible software methods to improve reproducibility and interpretation in structural mass spectrometry, along with statistical and machine learning tools for analyzing such data.
Allosteric regulation is a common mechanism in the protein world but little is known about the underlying mechanisms on a residue level. This project links the analysis of allostery with the analysis of co-evolution of protein residues. Co-evolution occurs when the interaction between two (or more) residues is crucial for a protein’s stability or functionality, e.g. allosteric signal transmission. My research aims to develop a pipeline for automated allostery analysis based solely on sequence information. Currently, there are not many tools available to analyse allostery, especially no high-throughput ones, so developing and improving such basic tools could be a first step of in the process of understanding allosteric signal transmission and potentially improving combination therapies where two drugs target a protein at different sites.
Rhizobium sp. are bacteria that establish a symbiotic relationship with legumes. Bacteria transform the atmospheric Nitrogen into ammonia, that is used by the plant. Knowing all the genes and proteins involved in this process is fundamental in order to improve the crops' growth. The objective of my project is to generate a gene coexpression network that helps us to get new knowledge about the Nitrogen fixation and the bacteria metabolism.
Synaptic Plasticity is the modulation of synapses to effect change in signal response strength of receiving neurons. It is a key mechanism in current models of learning and memory. Synapses are modulated in response to signals. These signals are mediated by small molecules such as calcium ions and integrated to produce a neuronal signalling response in the receiving neuron creating a cascade of information processing and dissemination within the brain. Dendrites form a geometrically complex branching tree of wires along which the signal receiving and modifiable synaptic spines are embedded. Calcium dynamics within dendrites and between spines shape signal integration and synaptic plasticity in ways that are not well understood. I am working with collaborators in the Emptage Lab at the Department of Pharmacology to develop a systematic multilayer networks approach to analysing 3D dynamic calcium images of dendrites. The data is generated using a novel light sheet microscopy protocol being developed by Nigel Emptage and Peter Haslehurst . Apart from providing new hypotheses to be tested by these collaborators, we hope that this research will generate exciting insights into the complex relationship between dendritic tree topology, calcium signal integration and the many modes of synaptic plasticity.
 Haslehurst, et al., 2018. Fast volume-scanning light sheet microscopy reveals transient neuronal events. Biomedical optics express, 9(5), pp.2154-2167.
My research applies immunoinformatics to improve therapeutic design and to better our understanding of the immune response. During my DPhil, I captured and compared structural representations of therapeutic antibody and natural antibody “space”, leading to new structure-aware approaches for in silico developability assessment and screening library design. My research is now focused on incorporating structural awareness to improve our ability to identify broader sets antibodies with functional commonality and to define the functional boundaries of different classes of adaptive immune receptor.
Monoclonal antibodies have taken a lead role in the drug landscape in recent years, in large part due to their potential in immuno-oncology, with global sales in monoclonal antibody therapeutics steadily increasing. Current early-stage antibody drug development relies heavily on time- and cost-intensive experimental screens.
In my research, I aim to develop machine learning methods, with a particular focus on deep learning approaches, for the in-silico predictions of antibody properties from sequence or structure, in order to enable rapid explorations of the antibody space for drug development purposes.
Due to advances in next-generation sequencing methodologies, the humoral immune response can be dissected with increasing precision. Structural annotation of this rapidly-expanding collection of sequence data can be used as a tool in the development of antibody-based therapeutics, connecting sequence to function. In my work, there will be particular emphasis on prediction of binding specificity, with a view for use in vaccine development.
Antibodies are the principal effector proteins of the immune system, which target and inactivate pathogens. Next generation sequencing of antibodies has led to an abundance of immunoglobulin sequencing (Ig-seq) data becoming available, and a challenging area of research is investigating how we can use this sequencing data to accurately predict the likelihood of an antibody binding to a target. My DPhil project is exploring how we can best use Ig-seq data to improve our ability to design antibody therapeutics, by combining sequence and structural information to more accurately understand and predict antibody binding.
Antibodies are proteins produced by the immune system to neutralise pathogens through interactions with their targets, called antigens, and present an attractive avenue for biotherapeutic development. The affinity of an antibody for its antigen is determined by the kinetic profile of the antibody-antigen interaction and two antibodies with the same affinity for their antigens may have vastly different binding and unbinding rates. Even though affinity maturation is often part of antibody lead optimisation, high affinity does not always correlate with neutralising activity. My DPhil project aims to investigate the determinants of binding rates with the aim of providing an in silico tool with which these can be modulated. To experimentally validate my results, I plan to use antibodies effective against Plasmodium falciparum malaria whose neutralising activity has been shown the be highly dependent on binding rates.
Antibodies are a class of protein produced by B-cells during an immune response. Antibody binding is controlled by six loops known as the complementarity determining regions, or CDRs. The large structural diversity of the CDR-H3 loop enables antibodies to bind with high affinity and specificity to almost any antigen. On the other hand, this diversity also makes CDR-H3 structure prediction one of the main challenges in antibody modelling. Recent advances in deep learning have been shown to greatly improve general protein structure prediction. Applying the deep learning methods developed for general protein structure prediction, I aim to improve structural modelling of the CDR-H3 loop.
Antibodies are important as proteins of the immune system and as therapeutics. However, as a consequence of their complex nature, laboratory based therapeutic antibody discovery is an expensive and time consuming process. During my DPhil, I will be developing deep learning tools, inspired by the current advances in other machine learning fields (e.g. natural language processing), for improving antibody design and enabling the exploration of a larger antibody space.
Antibodies and alternative antibody molecules (e.g. nanobodies) are increasingly important classes of therapeutics characterized by high binding specificity and affinity for targets. However, developing antibody therapeutics is time consuming and costly. My research aims to address this by developing machine learning methods to improve computational antibody design against a desired epitope. In particular, I will be focusing on in silico antibody affinity maturation and humanization.
Recently, the need for rapid vaccine and antibody-therapeutic development has become widely recognised. Thanks to the vast volume of antibody data now available, computational methods offer perhaps the greatest opportunity to both speed up and reduce the cost of this development process. My DPhil aims to advance this area further by using Machine Learning techniques to identify promising antibody-antigen leads with diverse sequence and structure profiles that can then be taken into the lab.
My primary focus is on methods development in computer-aided drug discovery, chiefly in high throughput docking, ligand-based virtual screening, network pharmacology, cheminfomatics, bioinformatics, machine learning and more recently protein engineering. Current research projects include: addressing the limitations of scoring functions in docking, in particular to improve our understanding of molecular recognition of small molecules; handling receptor flexibility in protein-ligand docking; and fragment-based drug discovery.
Accurately predicting the binding affinity of a small molecule to a protein target is a key problem in both molecular docking and virtual screening. My research involves investigating how machine learning techniques can be used to effectively leverage the increasing abundance of binding affinity and protein structure data to improve the scoring functions used to predict binding affinity. I am particularly interested in identifying which molecular features are most informative of binding activity and how this varies between families of proteins.
Fragment based-drug screening is a popular approach for drug discovery in which a set of small chemical compounds is assayed against biological targets in order to identify weekly binding hits that can be later exploited to produce lead compounds. My research focuses on the development of machine learning tools that exploit 3D information about hits and protein targets with the aim of facilitating drug discovery. I am particularly interested in automatic fragment merging and compound scoring using non-supervised machine learning techniques. In collaboration with the XChem team, we pursue to implement ready-to-use applications that can help experimentalist with no computational skills to perform better experiments.
I am interested in using statistical and machine learning to robustly automate tasks in structure based drugs discovery. My current project is working on an algorithm to autofit ligands into PanDDA event maps. I work with Prof. Charlotte Deane (Oxford), Prof. Frank von Delft (Oxford) and Prof. Gerard Biocogne (Global Phasing).
Fragment screening is increasingly used in early-stage drug discovery, but designing efficient campaigns is a difficult and open problem. I hope to improve this efficiency, initially by using machine learning methods, such as convolutional neural networks, to more accurately predict protein-ligand and protein-fragment interactions. Subsequent work will include developing active learning techniques to better inform experimental decision making.
The goal of drug discovery is to design novel, non-patented molecules with desired molecular and therapeutic properties. Traditionally, medicinal chemists have relied on their own chemical intuition to design novel small molecules drugs. However, one of the difficulties associated with such a process is that multiple criteria, such as safety, bioavailability, etc., must be simultaneously satisfied in order to be a successful drug. Because many criteria must be concurrently met, the search for a new drug against a given target is an example of a classic multi-objective optimization problem. Due to dramatic improvements in GPU hardware and the predictive power of machine learning and deep learning methods, there has been a growth in interest to apply these state-of-the-art techniques to facilitate the generation of new molecules. My work focuses on combining the atomic-based and functional group-based modifications via synthetic organic reactions through reinforcement learning to enable compound predictions that are synthetically accessible while at the same time addresses the classic multi-objective optimization problem in drug discovery.
Computational methods for drug discovery such as virtual screening and more recently, machine learning models are slowly changing the way drug discovery is done. However, in pre-clinical drug discovery, the most challenging part of optimizing the desired properties of lead compounds after they have been identified is still mostly done by hand. My research focuses on the development of machine learning methods for the de-novo generation of new molecules in order to help chemists optimize lead compounds to drug candidates. More specifically, I am addressing the challenge of designing compounds with desired polypharmacology and selectivity patterns against the protein family of metallo-β-lactamases for the treatment of antibiotic resistant bacteria.
Immunomodulatory imide drugs (IMIDs) are an important class of medicinal compounds used in the treatment of multiple myeloma. They include the drug thalidomide, as well as its analogues lenalidomide and pomalidomide. Only in recent years has the mechanism of action for IMIDs in human cells been discovered: the IMID acts as a ‘molecular glue’, bridging together the ubiquitin E3 ligase cereblon with a variety of other proteins, known as ‘neosubstrates’. After binding to cereblon, the neosubstrates become polyubiquinated and subsequently degraded - the cellular ‘kiss of death’. Many of cereblon’s neosubstrates share a common region based on a zinc finger motif. Structural studies have shown that this motif - known as a ‘degron’ - is required for the neosubstrate to bind to the cereblon/IMID binary complex. Furthermore, this degron can be added to genes for other, unrelated proteins, resulting in a fusion protein that contains an inducible off-switch. Addition of an IMID will cause the fusion protein to become polyubiquitinated and degraded. Thalidomide and its known analogues are not always suitable for this kind of system, as they have a variety of off-target effects. My research aims to explore how the structures of both the IMID small molecule inducer and the fusion protein’s degron can be modified so that the IMID/degron ‘off switch’ system is orthogonal: i.e. the small molecule will only interact with its intended target, and not any of the natural, degron-containing proteins present in human cells.
Lead optimisation is the phase of a drug development program where promising compounds undergo slight modifications with the aim of improving selected properties whilst maintaining other favourable properties. Despite recent interest in computational methods which claim to be able to facilitate de-novo generation of molecules with desirable properties, such methods have yet to be widely deployed in lead-optimisation programs. Most in-silico generative processes perform constrained optimisation by generating molecules with a high Tanimoto similarity with the original molecule, meaning that important functional groups can be modified. I am working to develop a fragment-growing model which will keep the original fragment fixed and incorporate important protein-specific information, with the aim of generating a lead optimisation tool which can be used by medicinal chemists to suggest modifications to a lead.
Fragment-based drug design campaigns rely on initial fragment screens thoroughly exploring the target binding site. However, this is not guaranteed with current fragment libraries, which are designed to be as diverse as possible. In collaboration with XChem at Diamond Light Source, I am using historic fragment screens to further understand protein-fragment interactions. This will provide a basis for the generation of target-specific fragment libraries that can more comprehensively explore the binding site and increase the potential for diverse lead compounds.
Ideally, in drug discovery, once a target has been identified and characterised, it should be possible to efficiently and exhaustively search chemical space for bioactive molecules with specific physio-chemical properties. Recent progress in computational capabilities and machine- learning methods mean that De Novo molecular generation tools are a step toward the above ideal. In collaboration with Exscientia, my research will focus on further developing machine learning tools for molecular design. Subsequent work will involve the elucidation and development of chemical synthesis pathways that will enable us more effectively make new compounds to tackle disease.