We carry out research on many topics in protein informatics, including the following areas. Please use the thumbnails to navigate to summaries of the research by group members in each area.
Proteins can fold during their synthesis by the ribosome through the process of co-translational folding. Such co-translational folding occurs out of equilibrium, meaning that the speed of translation can drastically influence the ability of proteins to fold and function. Changes to translation speed can prevent folding, increase misfolding, cause aggregate formation, and are implicated as causal factors in cystic fibrosis, hemophilia, and certain cancers. Most protein structure prediction methods do not consider the biological process of translation when making predictions and thus neglect the influence of translation speed. I am working to incorporate aspects of the non-equilibrium nature of protein synthesis into the protein sequence-to-structure prediction method SAINT2. The updated software will hopefully improve predictions and provide a tool capable of predicting the influence of changes to translation speed on proteins.
The structures and folding pathways of proteins are of vital importance but are experimentally challenging to study. Although a subset of soluble proteins are known to be capable of correct refolding from a denatured state in vitro, for many this is extremely difficult, inefficient or impossible. In cells, the folding process begins during synthesis, which contributes to the high efficiency of protein folding in vivo. Directional elongation, non-uniform translation speeds, and spatial restrictions due to the ribosome and cellular crowding are features of cotranslational folding that restrict the conformational search space and may promote energetically favourable folding intermediates. My research aims to use computational methods to reveal interesting folding mechanisms and inform our understanding of biology, which may in turn improve computational protein structure prediction.
Protein folding occurs in the context of tRNA, ribosomes, membranes, chaperones, and a great diversity of other biological components. Protein structure prediction, however, typically takes account only of protein primary sequence. Following from others' work in the group showing that cotranslational approaches can improve protein models, I work on incorporating the effect of other biological factors into computational models of folding. My principle tool for this work is the modelling software SAINT2.
Allosteric regulation is a common mechanism in the protein world but little is known about the underlying mechanisms on a residue level. This project links the analysis of allostery with the analysis of co-evolution of protein residues. Co-evolution occurs when the interaction between two (or more) residues is crucial for a protein’s stability or functionality, e.g. allosteric signal transmission. My research aims to develop a pipeline for automated allostery analysis based solely on sequence information. Currently, there are not many tools available to analyse allostery, especially no high-throughput ones, so developing and improving such basic tools could be a first step of in the process of understanding allosteric signal transmission and potentially improving combination therapies where two drugs target a protein at different sites.
Some proteins co-translationally fold on the ribosome; depending on the speed of translation this can have an impact on the folding and downstream function of the protein. Working with others in the group, we hope that introducing a timing element into existing protein sequence-to-structure prediction protocols we can improve structure prediction and begin to probe the impact that translation speeds can have on protein folding.
Protein interaction data is subject to experimental error, which means the PINs we work with are noisy observations of the underlying “true”, i.e. biologically relevant, interaction networks. One way of quantifying the reliability of the data is by assigning each interaction a confidence score, such as an estimate of the likelihood that the interaction is “true” given the available evidence. This gives rise to uncertain networks, where the nodes (proteins) are known and the edges (interactions) are “uncertain” and are truly present with probability equal to their score. The classical approach to uncertain networks is to impose a score threshold and convert them to simple deterministic networks. However, the result is very sensitive to the choice of threshold and violates some basic properties implied by the confidence scores. Instead, my research focuses on developing a robust stochastic methodology for extracting information about the structure of the biologically relevant network directly from the scored data.
Rhizobium sp. are bacteria that establish a symbiotic relationship with legumes. Bacteria transform the atmospheric Nitrogen into ammonia, that is used by the plant. Knowing all the genes and proteins involved in this process is fundamental in order to improve the crops' growth. The objective of my project is to generate a gene coexpression network that helps us to get new knowledge about the Nitrogen fixation and the bacteria metabolism.
Synaptic Plasticity is the modulation of synapses to effect change in signal response strength of receiving neurons. It is a key mechanism in current models of learning and memory. Synapses are modulated in response to signals. These signals are mediated by small molecules such as calcium ions and integrated to produce a neuronal signalling response in the receiving neuron creating a cascade of information processing and dissemination within the brain. Dendrites form a geometrically complex branching tree of wires along which the signal receiving and modifiable synaptic spines are embedded. Calcium dynamics within dendrites and between spines shape signal integration and synaptic plasticity in ways that are not well understood. I am working with collaborators in the Emptage Lab at the Department of Pharmacology to develop a systematic multilayer networks approach to analysing 3D dynamic calcium images of dendrites. The data is generated using a novel light sheet microscopy protocol being developed by Nigel Emptage and Peter Haslehurst . Apart from providing new hypotheses to be tested by these collaborators, we hope that this research will generate exciting insights into the complex relationship between dendritic tree topology, calcium signal integration and the many modes of synaptic plasticity.
 Haslehurst, et al., 2018. Fast volume-scanning light sheet microscopy reveals transient neuronal events. Biomedical optics express, 9(5), pp.2154-2167.
Antibodies are becoming increasingly important in a therapeutic capacity, due to their ability to bind with high specificity and affinity to an enormous variety of substances. Binding is mainly controlled by six loops known as the complementarity determining regions, or CDRs. Of these, there is one that is far more variable than the others, the CDR-H3 loop. It is this loop that contributes the most to the binding properties of the anitbody, but its structural diversity means its structure is the most difficult to predict. In my research, I am currently developing methods to try and improve the accuracy of CDR-H3 loop prediction.
Antibodies are products of immune systems that bind pathogens with high specificity, thus making them the most successful class of biopharmaceuticals. Successful exploitation of antibodies relies on our ability to interrogate their diversity. My DPhil project is on investigation of antibody next-generation sequencing (Ig-seq) data to inform rational antibody engineering. Application space of Ig-seq data is very vast. I am particularly interested in contrasting Ig-seq datasets across and within various species. This can ultimately lead to a set of rules that govern antibody development in respective species. This work is in close collaboration with UCB Pharma via the DTP iCASE scheme.
The global market for therapeutic monoclonal antibodies (mAbs) is exploding, with 8-10 new mAbs approved each year, and a positive outlook for the years ahead. There is high demand for a reliable in silico protocol that can aid in designing an antibody against any specific protein epitope. My DPhil aims to make significant strides towards this reality, by evaluating the typical properties of antibody therapeutics (either approved or in advanced development), and how lessons drawn from these can be incorporated into a computational pipeline that converts a diverse human antibody library into a set of putative specific, non-immunogenic antigen-binders.
Antibodies are proteins of the adaptive immune system. They are produced to specifically target foreign molecules, known as antigens, during the immune response. Given their high specificity and high binding affinity, they are an attractive platform for designing novel biotherapeutics. Currently, my research focuses on how the structure of the binding sites on the antibody and antigen correlate to each other. By developing computational tools to exploit known antibody structures and next generation sequencing data, we can observe the natural antibody repertoire and devise a set of rules for antibody design.
Monoclonal antibodies have taken a lead role in the drug landscape in recent years, in large part due to their potential in immuno-oncology, with global sales in monoclonal antibody therapeutics steadily increasing. Current early-stage antibody drug development relies heavily on time- and cost-intensive experimental screens.
In my research, I aim to develop machine learning methods, with a particular focus on deep learning approaches, for the in-silico predictions of antibody properties from sequence or structure, in order to enable rapid explorations of the antibody space for drug development purposes.
Due to advances in next-generation sequencing methodologies, the humoral immune response can be dissected with increasing precision. Structural annotation of this rapidly-expanding collection of sequence data can be used as a tool in the development of antibody-based therapeutics, connecting sequence to function. In my work, there will be particular emphasis on prediction of binding specificity, with a view for use in vaccine development.
Biotherapeutics is one of the fastest growing areas in the pharmaceutical industry - within which antibodies make up the predominant class. Many of these antibody therapeutics are generated in non-human systems and have demonstrated to be immunogenic and induce human immune responses when injected into patients. This results in neutralization of its therapeutic properties and limits the application of such antibodies in treatment of human disease. My work involves data mining of antibody next-generation sequencing data to assist in humanization methods and reduce immunogenecity of antibody therapeutics.
Antibodies are the principal effector proteins of the immune system, which target and inactivate pathogens. Next generation sequencing of antibodies has led to an abundance of immunoglobulin sequencing (Ig-seq) data becoming available, and a challenging area of research is investigating how we can use this sequencing data to accurately predict the likelihood of an antibody binding to a target. My DPhil project is exploring how we can best use Ig-seq data to improve our ability to design antibody therapeutics, by combining sequence and structural information to more accurately understand and predict antibody binding.
My primary focus is on methods development in computer-aided drug discovery, chiefly in high throughput docking, ligand-based virtual screening, network pharmacology, cheminfomatics, bioinformatics, machine learning and more recently protein engineering. Current research projects include: addressing the limitations of scoring functions in docking, in particular to improve our understanding of molecular recognition of small molecules; handling receptor flexibility in protein-ligand docking; and fragment-based drug discovery.
Accurately predicting the binding affinity of a small molecule to a protein target is a key problem in both molecular docking and virtual screening. My research involves investigating how machine learning techniques can be used to effectively leverage the increasing abundance of binding affinity and protein structure data to improve the scoring functions used to predict binding affinity. I am particularly interested in identifying which molecular features are most informative of binding activity and how this varies between different families of proteins.
Drug discovery remains a challenging and lengthy process, with the failure of drug candidates often occurring late in the pipeline due to properties such as poor pharmacokinetics, lack of efficacy or toxicity. My research focuses on the development of novel in silico methods in fragment-based lead discovery which considers these properties in the hit to lead optimisation. The aim is to produce an efficient workflow which systematically samples chemical space for the most promising set of molecules for a particular biological target and can be viewed as an “idea generator” to assist medicinal chemists in choosing what to make next after a hit is identified from an initial fragment screen.
The aim of virtual screening is to identify novel chemical structures of molecules that bind to the protein of interest. Ligand-based virtual screening methods are based on 2D similarity and 3D similarity between compounds. My research focuses on the development of novel methods in ligand-based virtual screening which address some limitations of the current methods and improve the predictive performance.
Vast improvements in computational capabilities combined with increasing size and accuracy of protein structure and protein-ligand interaction data have enabled development of novel in silico methods in drug discovery. My research focuses on developing and applying state-of-the-art machine learning methods to structure-based drug design, while ensuring results are physical interpretable for medicinal chemists and readily testable. More broadly, I am interested in automation of the drug discovery process, and decision making of chemists.
By targeting a protein, and thus by extension the biological pathway in which that protein is involved, small molecules can be used to modulate a disease phenotype on a cellular, tissue or organism level. As the mode of action is often unknown, various experimental and in silico approaches have been used to connect the phenotype, the protein target or targets, and/or gene expression with the chemical structure and activity of a small compound(s) . In contrast to the traditional ‘one drug – one target’ paradigm, the so-called ‘magical bullet theory’ as postulated by Ehrlich , compounds often modulate the (disease) phenotype by binding to multiple targets . My research is focused on novel methodologies to predict the effect of a potential drug on the biological system. This would include deconvoluting its mode of action and assessing its side-profile (thus deriving an indication of potential side effects in the later clinniical stages).
 Ravindranath, A. C. et al. Connecting gene expression data from connectivity map and in silico target predictions for small molecule mechanism-of-action analysis. Mol. BioSyst. 11, 86–96 (2015).
 Strebhardt, K. & Ullrich, A. Paul Ehrlich’ s magic bullet concept : 100 years of progress. Nat. Rev. cancer 8, 473–480 (2008).
 Brouwers, L., Iskar, M., Zeller, G., van Noort, V. & Bork, P. Network neighbors of drug targets contribute to drug side-effect similarity. PLoS One 6 (2011).
Fragment screening is increasingly used in early-stage drug discovery, but designing efficient campaigns is a difficult and open problem. I hope to improve this efficiency, initially by using machine learning methods, such as convolutional neural networks, to more accurately predict protein-ligand and protein-fragment interactions. Subsequent work will include developing active learning techniques to better inform experimental decision making.
Computational methods for drug discovery such as virtual screening and more recently, machine learning models are slowly changing the way drug discovery is done. However, in pre-clinical drug discovery, the most challenging part of optimizing the desired properties of lead compounds after they have been identified is still mostly done by hand. My research focuses on the development of machine learning methods for the de-novo generation of new molecules in order to help chemists optimize lead compounds to drug candidates. More specifically, I am addressing the challenge of designing compounds with desired polypharmacology and selectivity patterns against the protein family of metallo-β-lactamases for the treatment of antibiotic resistant bacteria.
I am interested in using statistical and machine learning to robustly automate tasks in structure based drugs discovery. My current project is working on an algorithm to autofit ligands into PanDDA event maps. I work with Prof. Charlotte Deane (Oxford), Prof. Frank von Delft (Oxford) and Prof. Gerard Biocogne (Global Phasing).
The goal of drug discovery is to design novel, non-patented molecules with desired molecular and therapeutic properties. Traditionally, medicinal chemists have relied on their own chemical intuition to design novel small molecules drugs. However, one of the difficulties associated with such a process is that multiple criteria, such as safety, bioavailability, etc., must be simultaneously satisfied in order to be a successful drug. Because many criteria must be concurrently met, the search for a new drug against a given target is an example of a classic multi-objective optimization problem. Due to dramatic improvements in GPU hardware and the predictive power of machine learning and deep learning methods, there has been a growth in interest to apply these state-of-the-art techniques to facilitate the generation of new molecules. My work focuses on combining the atomic-based and functional group-based modifications via synthetic organic reactions through reinforcement learning to enable compound predictions that are synthetically accessible while at the same time addresses the classic multi-objective optimization problem in drug discovery.
Lead optimisation is the phase of a drug development program where promising compounds undergo slight modifications with the aim of improving selected properties whilst maintaining other favourable properties. Despite recent interest in computational methods which claim to be able to facilitate de-novo generation of molecules with desirable properties, such methods have yet to be widely deployed in lead-optimisation programs. Most in-silico generative processes perform constrained optimisation by generating molecules with a high Tanimoto similarity with the original molecule, meaning that important functional groups can be modified. I am working to develop a fragment-growing model which will keep the original fragment fixed and incorporate important protein-specific information, with the aim of generating a lead optimisation tool which can be used by medicinal chemists to suggest modifications to a lead.