Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Tertiary_quaternary_structure_annotator

Authors: F. Cazals and and T. Dreyfus and R. Tetley

Introduction

This package provides annotations of tertiary and quaternary structures.

In a nutshell:

  • Consider a structure (molecule, complex) decomposed into units. A unit may be a polypeptide chain, a domain, a set of residues, etc.
  • We seek interactions between these units, which may be: covalent bonds (disulfide bonds), salt bridges, etc.

Pre-requisites

A decomposition for a polypeptide chain is a set of intervals such that (i) each interval specifies a set of consecutive a.a., and (ii) any two intervals are disjoint. Additionally, if the union of intervals does not cover the whole a.a. sequence of a polypeptide chain, a gap refers to the set of a.a. squeezed in-between two consecutive intervals.


Two comments are in order:

  • Gaps are of special interest when a region of a molecule could not be reconstructed–e.g. a flexible loop in a crystal structure.
A unit in a structure refers either to (i) a polypeptide chain, an interval in a decomposition, or a gap in a decomposition.


We are interested in building graphs connecting units. Edges between units encode, in particular, features of biochemical interest:

Vertices. A vertex corresponds to one unit. There are 2 cases:

  1. A vertex represents a polypeptide chain or an interval in a decomposition.

  2. A vertex represents a gap in a decomposition if an annotation (salt-bridge, disulfide bond) involves the amino acids defined by this gap.

In addition, each vertex is decorated with two numbers $a/b$, with $a$ the number of residues found in the structure, and $b$ the number of residues specifying the interval.

Edges coding the primary sequence. Two units which are consecutive along the sequence are linked by a dashed line. In addition, if the two units are separated by a gap, the corresponding interval (in terms of residue ids) is displayed with the edge. (Corollary: a dashed edge with no interval connects two units linked by a peptide bond.)

Edges coding biophysical features.

The following edges, represented by bold lines, are sought between two units:

  1. edge S-B: edge counting the number of only salt-bridges between the two units. See the package Pointwise_interactions .

  2. edge S-S: edge counting the number of disulfide bonds between the two units. See the package Pointwise_interactions .

A particular case is that of a disulfide bond or salt-bridge involving a region corresponding to a gap (see Definition above), yet present in the structure. Since by definition, such a region does not have a label, we proceed as follows:

  • a new label is created by concatenating those of the units before and after, and a node $v$ is created.
  • node $v$ is linked to those before and after by dashed lines (indicating the sequentiality along the sequence).
  • node $v$ is endowed with edges reporting the S-S and/or S-B. Note that these edges may be loops if the disulfide bond or salt bridge is internal to the region.

Illustration. As a simple illustration where the units are polypeptide chains, package Space_filling_model_interface_finder provides an example of an antibody whose disulfide bonds are sought.

Illustration. Fig. TQ_structure_annotator_graph illustrates the annotation functionalities for the so-called domain II of a class II fusion protein [143] .

Note that the nodes of the graph correspond on the structure to beta sheets and the loops connecting them. The solid lines illustrate salt bridges and disulfide bonds between these elements.

Annotating the domain II of a class II fusion protein–structure and graph

The specification file used for representing the domain is :

domains-template-begin EFF1
a 111-122
b 125-134
c 163-170
d 184-193
cd 171-183
e 199-204
f 269-274
h 303-306
h’ 309-313
i 324-330
j 334-340
k 364-368
l 375-380
end
chains-enumeration-begin
A like EFF1
BC
end

The Graphviz command used for generating the graph is :

dot -Tsvg interface.dot -o interfaces.svg

Algorithms

Annotating a structure requires finding salt bridges and disulfides bonds, see the package Pointwise_interactions .

Implementation and functionalities

Computing the graph

The graph is computed trough the functor SBL::CSB::T_Tertiary_quaternary_structure_annotator < ParticleTraits , MolecularSystemLabelsTraits > . The template parameter ParticleTraits defines the representation of a particle and the base molecular system used for representing a PDB / mmCIF file. In particular, the complete hierarchy defined by such a system alows to easily navigate through polypeptidic chains, residues and atoms. See ParticleTraits for more details. The template parameter MolecularSystemLabelsTraits defines labels attached to the particles. In particular, it allows to dissect the input molecule in terms of chains (SBL::Models::T_Chain_label_traits) or domains (SBL::Models::T_Domain_label_traits). See MolecularSystemLabelsTraits for more details.

The class SBL::CSB::T_Tertiary_quaternary_structure_annotator is a functor taking as input a molecular model (see Molecular_system) and returns a graph of type SBL::CSB::T_Tertiary_quaternary_structure_annotator::Interfaces_graph . It takes also an optional tag specifying if salt bridges and disulfide bonds should be search inside a unit : this may lead to edges forming a loop in the output graph. The tag is false by default.

Visualizing the graph

For visualizing the graph, the static method SBL::CSB::T_Tertiary_quaternary_structure_annotator::print dumps the graph into a .dot file, a file format used by the Graphviz software. The following command allows to generate a pdf image file representing an output graph :

dot -Tpdf interface.dot -o interfaces.pdf 

Module

The package provides a module (see Module_base) SBL::Modules::T_Tertiary_quaternary_structure_annotator_module allowing to easily tune different parameters related to the different bonds and bridges – see package Pointwise_interactions, dump statistics, and report the output graph in .dot format or for molecular visualisation software such as VMD or PyMOL .

Examples

The following examples show how to use the class T_Tertiary_quaternary_structure_annotator in different contexts : when looking at interfaces between polypeptidic chains or between domains.

Interfaces between chains

The following example loads an input PDB file and prints a Graphviz file representing the interfaces graph between the polypeptidic chains. A Graphviz file can be processed using the dot software to produce an image of the graph.

Interfaces between domains

The following example loads an input PDB file and prints a Graphviz file representing the interfaces graph between the domains defined in an input specification file. A Graphviz file can be processed using the dot software to produce an image of the graph.

Using the module

The following example instantiates the module SBL::Modules::T_Tertiary_quaternary_structure_annotator_module for loading an input PDB file and printing a Graphviz file representing the interfaces graph between the polypeptidic chain .

Applications

Functionalities of this package are made available via the package Space_filling_model_interface_finder .