About Me

Hi there! I'm Claire, a researcher based in the Department of Statistics at the University of Oxford. I am a member of the Oxford Protein Informatics Group (OPIG), supervised by Prof. Charlotte Deane.

I am the group's lead Research Software Engineer - my main job is maintaining our suite of antibody tools, known the SAbDab-SAbPred platform. I am responsible for facilitating in-house installation of this platform for any pharmaceutical companies wishing to use them, and also look after our web servers.

Research

As Research Software Engineer for the Oxford Protein Informatics Group (OPIG), most of my time these days is spent maintaining the group's software. In particular I look after the tools related to our antibody research, known as the SAbDab-SAbPred platform. Many pharmaceutical companies use these tools, and I facilitate this by installing the platform in-house through consultancy. The tools are also available online - I developed the websites for SAbDab,, SAbPred, and most recently OAS.

When I do have the chance to do my own research, my work focusses on protein structure prediction, - mainly of loops, which are the regions of a protein that are not part of secondary structure elements. Loops often play a key role in protein function; understanding their properties and knowing their structures is therefore extremely valuable.

I developed a novel algorithm for the prediction of protein loop structures, called Sphinx. Sphinx uses a combination of knowledge-based and ab initio approaches to maximise our use of the available structural data. While it was originally designed to improve our ability to model the CDR H3 loops of antibodies, I have also developed versions for membrane protein loops and loops from non-specific protein types. I have also explored ways to improve the prediction of long loops, through the inclusion of properties that can be predicted from sequence. For example, I have used contact prediction to establish sets of spatial constraints, which can then be considered during decoy ranking to improve predictions.

More recently, I have carried out analyses of loops which are able to adopt multiple conformations, and investigated the ability of several algorithms to predict their structures. Unfortunately, it seems that current methodologies are accurate for loops of a single conformation, but fail when they attempt to model conformational ensembles. For more information on this, or my other research, see my Publications list!

Through student supervision I also work/have worked on the prediction of therapeutic antibody developability, structural analysis of immune repertoires, antibody humanisation, improving loop decoy ranking through deep learning, and predicting antibody binding.

Background

Research Software Engineer
University of Oxford, 2018-Present

Responsibilities: Maintaining the group's software, including web servers and our virtual machine SAbBox; consultancy work for pharmaceutical companies (e.g. installation of/assistance using our tools, research projects); student supervision; my own research sometimes!

Postdoctoral Researcher and Associate Director of the Systems Approaches to Biomedical Science CDT
University of Oxford, 2016-2018

Main research themes: analysis and structure prediction of protein loops with multiple conformations; membrane protein loop modelling; analysis of therapeutic antibodies (through supervision).

DPhil Systems Approaches to Biomedical Science
University of Oxford, 2012-2016

Thesis title: Hybrid methods for protein loop modelling

MChem (Master of Chemistry), First Class
Durham University, 2007-2011

Masters project title: Simulating photodissociation reactions of protonated aminophenols
Dissertation title: Cellular drug delivery

Gallery

PyMOL images

Maltoporin (PDB entry 1af6). One of seven images I created to divide the chapters of my doctoral thesis.
An antibody bound to its antigen, an HIV glycoprotein (PDB entry 4ydk). One of seven images I created to divide the chapters of my doctoral thesis.
An antibody binding site (PDB entry 1lo4). One of seven images I created to divide the chapters of my doctoral thesis.
DNA Methyltransferase (PDB entry 3pt6). One of seven images I created to divide the chapters of my doctoral thesis.
Kinase C2K alpha (PDB entry 5cu6). One of seven images I created to divide the chapters of my doctoral thesis.
HIV Protease bound to the inhibitor Ritonavir (PDB entry 1hxw). One of seven images I created to divide the chapters of my doctoral thesis.
A zinc finger, from the splicing factor U2AF from yeast (PDB entry 4yh8). One of seven images I created to divide the chapters of my doctoral thesis.
β-barrel-assembly machinery (BAM) complex from E. coli (PDB entry 5ayw). I made this to use as the header on the OPIG Twitter profile!
An antibody binding site, with peptide antigen (PDB entry 2hh0). I made this for the re-design of the SAbDab-SAbPred website.
An alpha helix from a slightly unusual perspective... (PDB entry 3d31)
Major Histocompatibility Complex (PDB entry 1hsa).

Selected paper/thesis figures

Figure 1 from 'How repertoire data are changing antibody science', Journal of Biological Chemistry, 2020.<br><br>Caption: A - antibody structure. An antibody is made up of four chains; two light (orange) and two heavy (blue). Each chain is made up of a series of domains — the variable domains of the light and heavy chains together are known as the Fv region (shown on the right; PDB entry 12E8). The Fv features six loops known as complementarity determining regions (CDRs, shown in dark blue); these are mainly responsible for antigen binding. B - example sequences for the VH and VL, highlighting the CDR regions and the genetic composition.
Figure 2 from 'Evidence of Antibody Repertoire Functional Convergence through Public Baseline and Shared Response Structures', 2020.<br><br>Caption: Structural overlap analysis. Datasets are arranged in order of their internal structural diversity (most diverse first). Distinct baseline structures from individual 1 are clustered sequentially with all other repertoire snapshots. Distinct structures present in every tested dataset are classed as ‘public structures’, whereas those that are absent in at least one individual are termed ‘private structures’.
Figure from supplementary information for 'Evidence of Antibody Repertoire Functional Convergence through Public Baseline and Shared Response Structures', 2020.<br><br>Caption: A schematic illustrating our repertoire structural profiling algorithm. Heavy (VH) and light (VL) chain sequences from a repertoire snapshot are first analysed separately for their FREAD modellability (unmodellable chains are crossed out). They are then clustered by sequence identity using CD-HIT (90% threshold) for computational tractability. All VH and VL cluster centre chains are subsequently paired, and VH-VL orientations that cannot reliably modelled are removed (again shown by crosses). Finally, predicted modellable Fvs with identical combinations of CDR lengths are structurally clustered to identify ‘distinct structures’.
Antibody diversity (figure from doctoral thesis). A — V(D)J recombination. B — Junctional diversification. C — Combinatorial joining of heavy and light chains. D — Somatic hypermutation.
Antibody structure (figure from doctoral thesis). A — An antibody is made up of four chains; two light (white and pink) and two heavy (grey and blue). Each chain is made up of a series of domains — the variable domains of the light and heavy chains together are known as the Fv region. B — An actual antibody structure (PDB code 1IGT). C — The structure of the Fv, viewing the binding site from the side (left) and above (right). The complementarity determining regions (CDRs) are coloured according to the legend. D — The connectivity of the β strands in the VH and VL. The same colours are used for the CDRs as in part C of this figure.
Flowchart to explain the Sphinx loop modelling algorithm (figure from doctoral thesis; also appears in slightly different format in Sphinx paper).
Peptide bonds and dihedral angles. (figure from doctoral thesis). A — amino acids join together to form long chains, linked through the formation of peptide bonds via a condensation reaction. B — The backbone dihedral angles, φ, ψ and ω. C — A Ramachandran plot showing which pairs of φ/ψ angles form the core, allowed, generously allowed and disallowed regions. Only certain pairs of φ and ψ angles are permissible, since some combinations introduce clashes between sidechains.
Protein structure levels (figure from doctoral thesis).<br><br>Caption: The four levels of protein structure, using haemoglobin (PDB code 1BUW) as an example case. The primary structure of a protein is its amino acid sequence. The secondary structure refers to local regions of regular conformation. The way the whole protein chain folds up into a three-dimensional structure is known as tertiary structure. Quaternary structure is only applicable to proteins formed of multiple chains, and refers to how these chains interact with each other.
Types of secondary structure (figure from doctoral thesis).

Other

A visual representation of the colours used on country flags. I used the colours given on flagcolor.com and plotted this using ggplot in R. Countries are grouped by continent. The distance of each block from the centre is given by the hue of the colour.
3D print of rat liver vault. Every year I run a 3D printing practical for the students of the Doctoral Training Centre; this was made by one of this year's students (although the photo was taken by me!)
3D print of a clathrin cage. Every year I run a 3D printing practical for the students of the Doctoral Training Centre; this is one I made last year.
My version of the graph of sequence identity vs RMSD from Chothia and Lesk's 1986 paper 'the relation between the divergence of sequence and structure in proteins'. I was asked to recreate this for a special collection of articles in the Frontiers in Molecular Biosciences journal, put together to commemorate the life of Cyrus Chothia.
I created this image for the header of the Observed Antibody Space website. It represents amino acid usage for antibody heavy chains - each column is a position in the antibody sequence (IMGT numbering), and the different colours represent the different amino acid types. The difference between the variable H3 region and the rest of the sequence is visible - however, this was intended to be more decorative than informative!

Posters

ISMB Conference 2014
ISMB Conference 2015
ISMB Conference 2018
OPIG Poster, created for an event to celebrate the opening of the new Statistics Department building in 2016.

Other

Besides science, I enjoy:

  • Music - I play clarinet, saxophone and piano and am a member of the OUP Orchestra
  • Art and Crafts - particularly pencil drawing and crocheting
  • Reading
  • Cryptic Crosswords
  • Baking
  • Swimming