We carry out research on many topics in protein informatics, including the following areas. Please use the thumbnails to navigate to summaries of the research by group members in each area.
The structures and folding pathways of proteins are of vital importance but are experimentally challenging to study. Although a subset of soluble proteins are known to be capable of correct refolding from a denatured state in vitro, for many this is extremely difficult, inefficient or impossible. In cells, the folding process begins during synthesis, which contributes to the high efficiency of protein folding in vivo. Directional elongation, non-uniform translation speeds, and spatial restrictions due to the ribosome and cellular crowding are features of cotranslational folding that restrict the conformational search space and may promote energetically favourable folding intermediates. My research aims to use computational methods to reveal interesting folding mechanisms and inform our understanding of biology, which may in turn improve computational protein structure prediction.
The incorporation of co-translational folding into protein structure prediction software has the potential to greatly improve the process of model generation. I am investigating how we can use the information provided by the predicted contacts during the different stages of sequential decoy production to enrich the population of good decoys, and increase the quality of the decoys generated. I will also be looking at whether we can use this information to aid in the understanding of cotranslational folding in biological systems.
Protein folding occurs in the context of tRNA, ribosomes, membranes, chaperones, and a great diversity of other biological components. Protein structure prediction, however, typically takes account only of protein primary sequence. Following from others' work in the group showing that cotranslational approaches can improve protein models, I work on incorporating the effect of other biological factors into computational models of folding. My principle tool for this work is the modelling software SAINT2.
Protein interactions can be represented using networks. Accordingly, approaches that have been developed in network science are appropriate for the analysis of protein interactions, and they can lead to the detection of new drug targets. Thus far, only ordinary ("monolayer") protein interaction networks have been exploited for drug discovery. However, because "multilayer networks" allow the representation of multiple types of interactions and of time-dependent interactions, they
have the potential to improve insight from network-based approaches .
Aim of my PhD project is firstly to employ multilayer methods on well-established data to investigate potential use cases of multilayer protein interaction networks. For example, we can find time-resolved groups of proteins that show similar activity during inflammation . Ultimately, we explore the integration of data sets from multiple sources to draw more solid insights on a biological system than monolayer approaches.
 Kivelä, Mikko, et al. "Multilayer networks." Journal of Complex Networks (2014)
 Calvano, Steve E., et al. "A network-based analysis of systemic inflammation in humans." Nature (2005)
Protein interaction data is subject to experimental error, which means the PINs we work with are noisy observations of the underlying “true”, i.e. biologically relevant, interaction networks. One way of quantifying the reliability of the data is by assigning each interaction a confidence score, such as an estimate of the likelihood that the interaction is “true” given the available evidence. This gives rise to uncertain networks, where the nodes (proteins) are known and the edges (interactions) are “uncertain” and are truly present with probability equal to their score. The classical approach to uncertain networks is to impose a score threshold and convert them to simple deterministic networks. However, the result is very sensitive to the choice of threshold and violates some basic properties implied by the confidence scores. Instead, my research focuses on developing a robust stochastic methodology for extracting information about the structure of the biologically relevant network directly from the scored data.
My research interests concern the development and mathematical analysis of algorithms for large networks, certain inverse problems on graphs, and big data analysis, with applications to various problems in engineering, machine learning, finance, and biology. Particular areas of interest are spectral and SDP-relaxation algorithms and applications, the group synchronization problem, ranking from noisy pairwise comparisons, lead-lag relationships in multivariate time series, clustering, core-periphery structure in networks, multiplex networks, dimensionality reduction and diffusion maps (with an eye towards heterogeneous data and nonlinear time series), spectral algorithms for analysis of signed graphs and correlation networks. The above problems share an important feature: they can all be solved by exploiting the spectrum of their corresponding graph Laplacian.
Protein Interaction Networks (PINs) are subject to biases both in their construction (as better studied proteins are more likely to be highly connected) and in the detection of functional modules (due to biases in the ways these proteins are studied). These biases can make it more difficult to find useful modules that are functionally relevant and not just disproportionately understood due to trends in research. One way to address this bias is to repeatedly sample the local network background of a candidate community using a random walk. CommWalker is a method that tries to obtain an impression of the level of local functional homogeneity of a community (Leucken et al, 2017). By accounting for these background levels of functional similarity within a network, communities that are truly, uniquely functionally cohesive can be brought to the fore. These communities may be just as interesting as better studied parts of the network but may not yet have been thoroughly investigated, creating the potential for novel biomedical discoveries!
Luecken MD, Page, MJT, Crosby AJ, Mason S, Reinert G, Deane CM, CommWalker: Correctly Evaluating Modules in Molecular Networks in Light of Annotation Bias, Bioinformatics, 2017
Rhizobium sp. are bacteria that establish a symbiotic relationship with legumes. Bacteria transform the atmospheric Nitrogen into ammonia, that is used by the plant. Knowing all the genes and proteins involved in this process is fundamental in order to improve the crops' growth. The objective of my project is to generate a gene coexpression network that helps us to get new knowledge about the Nitrogen fixation and the bacteria metabolism.
In Silico Antibody Affinity Maturation
Antibodies make up a class of proteins indispensable in mediating immune responses. Thanks to their binding versatility immunoglobulins can recognize virtually any antigen. In a process termed 'affinity maturation', antibody interaction interfaces undergo an accelerated mutation process which accounts for the diverse binding capacities of those molecules. It goes without saying that the ability to design high-affinity specific binders is of huge interest to the pharmaceutical industry. Currently, there are not many commercial nor academic software packages (like OptCDR; Pantazes et al. 2010) which would design immunoglobulins for a specific antigen, even though proof of concept of computational antibody design was provided by Lippow in 2007. Main focus of my research is the study of the antibody affinity maturation process with the ultimate aim to produce tools which would streamline the current industrial antibody design process.
Antibodies are becoming increasingly important in a therapeutic capacity, due to their ability to bind with high specificity and affinity to an enormous variety of substances. Binding is mainly controlled by six loops known as the complementarity determining regions, or CDRs. Of these, there is one that is far more variable than the others, the CDR-H3 loop. It is this loop that contributes the most to the binding properties of the anitbody, but its structural diversity means its structure is the most difficult to predict. In my research, I am currently developing methods to try and improve the accuracy of CDR-H3 loop prediction.
Prediction of Antibody Affinities
My research aims to predict an antibody's affinity by using its structural features. Although many programs have been developed to predict the affinity of a protein-protein interaction, none have been specifically designed for antibodies. Antibodies rely on a unique binding mode that is significantly different from general protein-protein interfaces -- we hope to harness these features for modelling an antibody's affinity toward its antigen. In sum, the aim is to probe for possible relationships between an antibody's structure and its affinity to an antigen, but ultimately, the idea is to construct a validated model that can guide tomorrow's antibody engineering solutions.
Antibodies are products of immune systems that bind pathogens with high specificity, thus making them the most successful class of biopharmaceuticals. Successful exploitation of antibodies relies on our ability to interrogate their diversity. My DPhil project is on investigation of antibody next-generation sequencing (Ig-seq) data to inform rational antibody engineering. Application space of Ig-seq data is very vast. I am particularly interested in contrasting Ig-seq datasets across and within various species. This can ultimately lead to a set of rules that govern antibody development in respective species. This work is in close collaboration with UCB Pharma via the DTP iCASE scheme.
Antibodies are proteins of the adaptive immune system. They are produced to specifically target foreign molecules, known as antigens, during the immune response. Given their high specificity and high binding affinity, they are an attractive platform for designing novel biotherapeutics. Currently, my research focuses on how the structure of the binding sites on the antibody and antigen correlate to each other. By developing computational tools to exploit known antibody structures and next generation sequencing data, we can observe the natural antibody repertoire and devise a set of rules for antibody design.
The global market for therapeutic monoclonal antibodies (mAbs) is exploding, with 8-10 new mAbs approved each year, and a positive outlook for the years ahead. There is high demand for a reliable in silico protocol that can aid in designing an antibody against any specific protein epitope. My DPhil aims to make significant strides towards this reality, by evaluating the typical properties of antibody therapeutics (either approved or in advanced development), and how lessons drawn from these can be incorporated into a computational pipeline that converts a diverse human antibody library into a set of putative specific, non-immunogenic antigen-binders.
My primary focus is on methods development in computer-aided drug discovery, chiefly in high throughput docking, ligand-based virtual screening, network pharmacology, cheminfomatics, bioinformatics, machine learning and more recently protein engineering. Current research projects include: addressing the limitations of scoring functions in docking, in particular to improve our understanding of molecular recognition of small molecules; handling receptor flexibility in protein-ligand docking; and fragment-based drug discovery.
Proteins can interact with small molecules at multiple sites on their surfaces; the primary, orthosteric site, where the ligand is directly linked to the function of the protein, and allosteric sites in which the binding causes a functional effect at a distant site. My research looks at these allosteric sites, understanding how local conformational changes are associated with allosteric action. My research will utilise crystallographic fragment screening, where small fragment compounds are soaked with crystals, to determine features which are conserved over many datasets, and which vary when the fragments are bound. Analysis of multiple datasets of the same protein should allow for confidence in detection of features that are present with and without ligands bound at the allosteric site.
With our increased understanding of the role epigenetics play in various diseases, finding small molecule inhibitors is proving vital in our probing of the complex epigenetic regulation networks. I am looking at using in silico free energy calculations to direct inhibitor optimisation. This methodology will be applied to various bromodomains within family VII, to investigate how the method can be used to increase affinity and selectivity of known binders. This project will also find me in the lab, making the computationally optimised compounds as a means of validating the technique and hopefully developing novel inhibitors.
Drug discovery remains a challenging and lengthy process, with the failure of drug candidates often occurring late in the pipeline due to properties such as poor pharmacokinetics, lack of efficacy or toxicity. My research focuses on the development of novel in silico methods in fragment-based lead discovery which considers these properties in the hit to lead optimisation. The aim is to produce an efficient workflow which systematically samples chemical space for the most promising set of molecules for a particular biological target and can be viewed as an “idea generator” to assist medicinal chemists in choosing what to make next after a hit is identified from an initial fragment screen.
Accurately predicting the binding affinity of a small molecule to a protein target is a key problem in both molecular docking and virtual screening. My research involves investigating how machine learning techniques can be used to effectively leverage the increasing abundance of binding affinity and protein structure data to improve the scoring functions used to predict binding affinity. I am particularly interested in identifying which molecular features are most informative of binding activity and how this varies between different families of proteins.
Vast improvements in computational capabilities combined with increasing size and accuracy of protein structure and protein-ligand interaction data have enabled development of novel in silico methods in drug discovery. My research focuses on developing and applying state-of-the-art machine learning methods to structure-based drug design, while ensuring results are physical interpretable for medicinal chemists and readily testable. More broadly, I am interested in automation of the drug discovery process, and decision making of chemists.
By targeting a protein, and thus by extension the biological pathway in which that protein is involved, small molecules can be used to modulate a disease phenotype on a cellular, tissue or organism level. As the mode of action is often unknown, various experimental and in silico approaches have been used to connect the phenotype, the protein target or targets, and/or gene expression with the chemical structure and activity of a small compound(s) . In contrast to the traditional ‘one drug – one target’ paradigm, the so-called ‘magical bullet theory’ as postulated by Ehrlich , compounds often modulate the (disease) phenotype by binding to multiple targets . My research is focused on novel methodologies to predict the effect of a potential drug on the biological system. This would include deconvoluting its mode of action and assessing its side-profile (thus deriving an indication of potential side effects in the later clinniical stages).
 Ravindranath, A. C. et al. Connecting gene expression data from connectivity map and in silico target predictions for small molecule mechanism-of-action analysis. Mol. BioSyst. 11, 86–96 (2015).
 Strebhardt, K. & Ullrich, A. Paul Ehrlich’ s magic bullet concept : 100 years of progress. Nat. Rev. cancer 8, 473–480 (2008).
 Brouwers, L., Iskar, M., Zeller, G., van Noort, V. & Bork, P. Network neighbors of drug targets contribute to drug side-effect similarity. PLoS One 6 (2011).
The aim of virtual screening is to identify novel chemical structures of molecules that bind to the protein of interest. Ligand-based virtual screening methods are based on 2D similarity and 3D similarity between compounds. My research focuses on the development of novel methods in ligand-based virtual screening which address some limitations of the current methods and improve the predictive performance.
Structure and fragment-based lead-design (SBLD and FBLD) offer an efficient and rational route towards developing potent and selective small molecules. Facilities enabling crystallographic high-throughput fragment-based screening, such as XChem, have greatly increased the output of structural information available for SBLD/FBLD efforts. This influx of data has generated a need for computational tools to drive decision making for progressing fragment-hits cheaply and efficiently. My work focusses on three areas that address this need. 1) Generation of visualisation tools to enable human-driven analysis of structural data. 2) Semi-automated methods for exploring chemical space and prioritising future experimental work. 3) Combined application of energetics, experimental and Deep Learning methods to understand protein-ligand complexes.