We carry out research on many topics in protein informatics, including the following areas. Please use the thumbnails to navigate to summaries of the research by group members in each area.
Novel Applications for Natural Move Monte Carlo Simulations: From Protein Complexes to Nanomachines
My research revolves around the sampling of structural conformations of protein complexes and molecular machines. For this purpose I use the MOSAICS software package (Minary 2007) to perform Hierarchical Natural Move Monte Carlo simulations (Sim 2012). This method makes use of a combination of complexity reducing concepts with the aim to make molecular modeling of large complexes and their structural changes tractable. Key features include multicanonical sampling protocols, 3pt representation
and the use of knowledge potentials (Minary 2008). In addition we decompose our structures into groups of residues that are known or expected to move in concert "naturally". Sampling along these pre-defined degrees of freedom results in an ensemble of structures that can be represented by probability distributions that describe the "event space". Clustering of the structures subequently provides us with possible candidates for alternative conformers. Current work revolves around MHC complexes. In due course I will move on to larger proteins and nanomachines such as antibodies, the proteasome and the chaperonin.
Minary, P., 2007. MOSAICS versions [-3.9]. Available at: http://www.cs.ox.ac.uk/mosaics.
Minary, P. & Levitt, M., 2008. Probing protein fold space with a simplified model. Journal of molecular biology, 375(4), pp.920–33.
Sim, A.Y.L., Levitt, M. & Minary, P., 2012. Modeling and design by hierarchical natural moves. Proceedings of the National Academy of Sciences of the United States of America, 109(8), pp.2890–5.
Finding the protein structural information contained in codon usages
As life first appeared on this planet, we can assume that the primitive genetic code was both smaller and simpler than that seen today. Most likely it only contained a handful of amino acids, each one with its own unique properties. As evolution progressed additional amino acids were including in code, turning the originally coarse grained scheme into a sensitive system. My research asks why this increase in complexity stopped at just 20 amino acids when there are 64 codons? Why are there not64 amino acids for 64 codons, one for each, thereby allowing for even greater control and variation in the protein product? Why does degeneracy still exists within the genetic code? My hypothesis and work is based on the combined theory of codon optimality and cotranslational folding, that the choice of codon directly affects the translation efficiency of the ribosome. Consequently, different codons give the nascent polypeptide chain varying amounts of time to explore the fold-space and as such the choice of codon directly affects the final structure of protein.
I am investigating different areas within membrane protein structure and the process of membrane protein folding. One of these is areas is kinks, which are common in alpha-helical transmembrane proteins. A kink is a site where an alpha helix changes direction, and kinks are thought to be important for flexibility and function. Therefore I have investigated their conservation across families of homologous proteins. A major focus in this work was the superfamily of G-protein coupled receptors, which are very important drug targets. I am now particularly interested in membrane protein folding, from the perspective of insertion into the membrane during translation. One way to learn more about the folding process is through testing our ability to predict protein structures, and therefore I am looking at how knowledge of co-translational folding can improve structure prediction of membrane proteins. Co-translational approaches have been used in protein structure prediction of soluble proteins, but current de novo membrane protein structure prediction methods do not consider the direction of translation in their protocols.
The structures and folding pathways of proteins are of vital importance but are experimentally challenging to study. Although a subset of soluble proteins are known to be capable of correct refolding from a denatured state in vitro, for many this is extremely difficult, inefficient or impossible. In cells, the folding process begins during synthesis, which contributes to the high efficiency of protein folding in vivo. Directional elongation, non-uniform translation speeds, and spatial restrictions due to the ribosome and cellular crowding are features of cotranslational folding that restrict the conformational search space and may promote energetically favourable folding intermediates. My research aims to use computational methods to reveal interesting folding mechanisms and inform our understanding of biology, which may in turn improve computational protein structure prediction.
The incorporation of co-translational folding into protein structure prediction software has the potential to greatly improve the process of model generation. I am investigating how we can use the information provided by the predicted contacts during the different stages of sequential decoy production to enrich the population of good decoys, and increase the quality of the decoys generated. I will also be looking at whether we can use this information to aid in the understanding of cotranslational folding in biological systems.
Complex systems and interaction phenomena are usually described as networks in which the individuals (nodes) interact (link) to each other. Examples of these systems can be found in a daily basis, e.g. air traffic, world trade market, social networks, spread of diseases and so on. Thus, it is important to understand the behaviour of such systems and the random processes to which they are bound. Random Graphs are one of the tools that help us analyse this particular - yet general - behaviour. A particular case of these complex systems is the protein-protein interaction (PPI) network for which despite of its recent good results on small virus PPI networks (Hayes et al., 2013) - around 50 to 120 nodes- current models seem to fail in representing PPI networks for more complex organisms (Shao et al., 2013; Rito et al., 2010) such as the yeast PPI network, which has more than 5000 nodes - more than 40 times the number of nodes in the mentioned virus PPIs - therefore a good model for this networks is still needed. Thus, I’m currently developing new random graph models that aim to fit PPIs among others.
When investigating the causes for a complex disease, a standard approach is to find genes which have differential expression patterns depending on whether or not test samples exhibit the disease phenotype. These differentially expressed genes are regarded to encode for proteins which are relevant to the underlying biological process. While important individual proteins may be discovered via this approach, the biological function or pathway which is regulated by these proteins cannot be
understood without the context in which the proteins act to perform said function. My research is targeted toward providing this functional context by identifying biological modules in protein interaction networks (PINs). A biological module is a group of interacting molecules which perform a common function and is regarded to be an important organizational scale in biology . This project builds upon results obtained by Lewis et al.  showing that functional communities in yeast PINs can be identified at multiple scales. We aim to identify these functional modules in human PINs and use them as a biological vocabulary for understanding the function of differentially expressed genes. A long term aim for this project is to be able to identify activated pathways in phenotypes which are investigated via gene expression profiles. This represents an important step from identifying genes linked with a disease phenotype, to producing biologically testable hypotheses of pathways that may cause a phenotype.
 Hartwell et al., Nature 402 (6761), 1999
 Lewis et al., BMC Sys Biol 4 (1), 2010
Protein interactions can be represented using networks. Accordingly, approaches that have been developed in network science are appropriate for the analysis of protein interactions, and they can lead to the detection of new drug targets. Thus far, only ordinary ("monolayer") protein interaction networks have been exploited for drug discovery. However, because "multilayer networks" allow the representation of multiple types of interactions and of time-dependent interactions, they
have the potential to improve insight from network-based approaches .
Aim of my PhD project is firstly to employ multilayer methods on well-established data to investigate potential use cases of multilayer protein interaction networks. For example, we can find time-resolved groups of proteins that show similar activity during inflammation . Ultimately, we explore the integration of data sets from multiple sources to draw more solid insights on a biological system than monolayer approaches.
 Kivelä, Mikko, et al. "Multilayer networks." Journal of Complex Networks (2014)
 Calvano, Steve E., et al. "A network-based analysis of systemic inflammation in humans." Nature (2005)
Protein interaction data is subject to experimental error, which means the PINs we work with are noisy observations of the underlying “true”, i.e. biologically relevant, interaction networks. One way of quantifying the reliability of the data is by assigning each interaction a confidence score, such as an estimate of the likelihood that the interaction is “true” given the available evidence. This gives rise to uncertain networks, where the nodes (proteins) are known and the edges (interactions) are “uncertain” and are truly present with probability equal to their score. The classical approach to uncertain networks is to impose a score threshold and convert them to simple deterministic networks. However, the result is very sensitive to the choice of threshold and violates some basic properties implied by the confidence scores. Instead, my research focuses on developing a robust stochastic methodology for extracting information about the structure of the biologically relevant network directly from the scored data.
In Silico Antibody Affinity Maturation
Antibodies make up a class of proteins indispensable in mediating immune responses. Thanks to their binding versatility immunoglobulins can recognize virtually any antigen. In a process termed 'affinity maturation', antibody interaction interfaces undergo an accelerated mutation process which accounts for the diverse binding capacities of those molecules. It goes without saying that the ability to design high-affinity specific binders is of huge interest to the pharmaceutical industry. Currently, there are not many commercial nor academic software packages (like OptCDR; Pantazes et al. 2010) which would design immunoglobulins for a specific antigen, even though proof of concept of computational antibody design was provided by Lippow in 2007. Main focus of my research is the study of the antibody affinity maturation process with the ultimate aim to produce tools which would streamline the current industrial antibody design process.
High Resolution Modelling of Antibody Structures
My research focuses on studying and predicting the structure of the framework regions of antibody variable domains (VH and VL). The specificity of an antibody for a particular antigen is largely determined by hyper-variable loops (CDRs). However the structure of the framework which they are mounted upon is also thought to be important in determining antibody-antigen affinity. I aim to produce high resolution models of antibody molecules and study how the relative orientation of the variable domains affects the antigen binding site.
Antibodies are becoming increasingly important in a therapeutic capacity, due to their ability to bind with high specificity and affinity to an enormous variety of substances. Binding is mainly controlled by six loops known as the complementarity determining regions, or CDRs. Of these, there is one that is far more variable than the others, the CDR-H3 loop. It is this loop that contributes the most to the binding properties of the anitbody, but its structural diversity means its structure is the most difficult to predict. In my research, I am currently developing methods to try and improve the accuracy of CDR-H3 loop prediction.
Prediction of Antibody Affinities
My research aims to predict an antibody's affinity by using its structural features. Although many programs have been developed to predict the affinity of a protein-protein interaction, none have been specifically designed for antibodies. Antibodies rely on a unique binding mode that is significantly different from general protein-protein interfaces -- we hope to harness these features for modelling an antibody's affinity toward its antigen. In sum, the aim is to probe for possible relationships between an antibody's structure and its affinity to an antigen, but ultimately, the idea is to construct a validated model that can guide tomorrow's antibody engineering solutions.
Antibodies are an essential part of the immune system, being able to attain high specificity and affinity to a large variety of antigens, keeping us safe from most of our molecular invaders. What is exciting is that their binding specificity is controlled through the residues of only six loops, called the Complementarity Determining Region (CDR). Naturally, diversity in the CDR is created through V(D)J recombination and somatic hyper-mutation which “program” the antibody to bind to millions of antigens. Synthetically, it has been shown that grafting different loops or motifs to the CDR can transfer binding properties from other antibodies or proteins, making them a key target for protein design and bio-therapeutic use.
My work is based on the hypothesis that antibody loops are different from the loops in the rest of the protein world, and part of my work is to quantify this difference. The other part of my work is to capitalize on this difference by identifying what we can transplant from non-antibody proteins to antibodies to increase their binding repertoire.
Antigens are proteins produced by the immune system to detect and act upon any foreign objects (antigens). Their high affinity and specificity makes lab-produced monoclonal antibodies (mAbs) attractive as potential drugs. In the last three decades the global market for therapeutic mAbs has been growing exponentially. The binding properties of a typical antibody are primarily determined by the structure of just six hypervariable loops called Complementarity Determining Regions or CDRs. In my DPhil I study the structural and chemical properties of CDRs and how they relate to the binding properties of an antibody, with the ultimate goal of being able to design a functional antibody for a novel target from the available structural and sequential CDR repertoire.
My primary focus is on methods development in computer-aided drug discovery, chiefly in high throughput docking, ligand-based virtual screening, network pharmacology, cheminfomatics, bioinformatics, machine learning and more recently protein engineering. Current research projects include: addressing the limitations of scoring functions in docking, in particular to improve our understanding of molecular recognition of small molecules; handling receptor flexibility in protein-ligand docking; and fragment-based drug discovery.
Proteins can interact with small molecules at multiple sites on their surfaces; the primary, orthosteric site, where the ligand is directly linked to the function of the protein, and allosteric sites in which the binding causes a functional effect at a distant site. My research looks at these allosteric sites, understanding how local conformational changes are associated with allosteric action. My research will utilise crystallographic fragment screening, where small fragment compounds are soaked with crystals, to determine features which are conserved over many datasets, and which vary when the fragments are bound. Analysis of multiple datasets of the same protein should allow for confidence in detection of features that are present with and without ligands bound at the allosteric site.
With our increased understanding of the role epigenetics play in various diseases, finding small molecule inhibitors is proving vital in our probing of the complex epigenetic regulation networks. I am looking at using in silico free energy calculations to direct inhibitor optimisation. This methodology will be applied to various bromodomains within family VII, to investigate how the method can be used to increase affinity and selectivity of known binders. This project will also find me in the lab, making the computationally optimised compounds as a means of validating the technique and hopefully developing novel inhibitors.
Drug discovery remains a challenging and lengthy process, with the failure of drug candidates often occurring late in the pipeline due to properties such as poor pharmacokinetics, lack of efficacy or toxicity. My research focuses on the development of novel in silico methods in fragment-based lead discovery which considers these properties in the hit to lead optimisation. The aim is to produce an efficient workflow which systematically samples chemical space for the most promising set of molecules for a particular biological target and can be viewed as an “idea generator” to assist medicinal chemists in choosing what to make next after a hit is identified from an initial fragment screen.