Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Tertiary_quaternary_structure_annotator

Authors: F. Cazals and and T. Dreyfus and R. Tetley

Introduction

This package provides annotations of tertiary and quaternary structures.

In a nutshell:

  • Consider a structure (molecule, complex) decomposed into units. A unit may be a polypeptide chain, a domain, a set of residues, etc.
  • We seek interactions between these units, which may be: covalent bonds (disulfide bonds), salt bridges, etc.

Pre-requisites

A decomposition for a polypeptide chain is a set of intervals such that (i) each interval specifies a set of consecutive a.a., and (ii) any two intervals are disjoint. Additionally, if the union of intervals does not cover the whole a.a. sequence of a polypeptide chain, a gap refers to the set of a.a. squeezed in-between two consecutive intervals.


Two comments are in order:

  • Gaps are of special interest when a region of a molecule could not be reconstructed–e.g. a flexible loop in a crystal structure.
A unit in a structure refers either to (i) a polypeptide chain, an interval in a decomposition, or a gap in a decomposition.


We are interested in building graphs connecting units. Edges between units encode, in particular, features of biochemical interest:

Vertices. A vertex corresponds to one unit. There are 2 cases:

  1. A vertex represents a polypeptide chain or an interval in a decomposition.

  2. A vertex represents a gap in a decomposition if an annotation (salt-bridge, disulfide bond) involves the amino acids defined by this gap.

In addition, each vertex is decorated with two numbers $a/b$, with $a$ the number of residues found in the structure, and $b$ the number of residues specifying the interval.

Edges coding the primary sequence. Two units which are consecutive along the sequence are linked by a dashed line. In addition, if the two units are separated by a gap, the corresponding interval (in terms of residue ids) is displayed with the edge. (Corollary: a dashed edge with no interval connects two units linked by a peptide bond.)

Edges coding biophysical features.

The following edges, represented by bold lines, are sought between two units:

  1. edge S-B: edge counting the number of only salt-bridges between the two units. See the package Pointwise_interactions .

  2. edge S-S: edge counting the number of disulfide bonds between the two units. See the package Pointwise_interactions .

A particular case is that of a disulfide bond or salt-bridge involving a region corresponding to a gap (see Definition above), yet present in the structure. Since by definition, such a region does not have a label, we proceed as follows:

  • a new label is created by concatenating those of the units before and after, and a node $v$ is created.
  • node $v$ is linked to those before and after by dashed lines (indicating the sequentiality along the sequence).
  • node $v$ is endowed with edges reporting the S-S and/or S-B. Note that these edges may be loops if the disulfide bond or salt bridge is internal to the region.

Illustration. As a simple illustration where the units are polypeptide chains, package Space_filling_model_interface_finder provides an example of an antibody whose disulfide bonds are sought.

Illustration. Fig. TQ_structure_annotator_graph illustrates the annotation functionalities for the so-called domain II of a class II fusion protein [140] .

Note that the nodes of the graph correspond on the structure to beta sheets and the loops connecting them. The solid lines illustrate salt bridges and disulfide bonds between these elements.

Annotating the domain II of a class II fusion protein–structure and graph

The specification file used for representing the domain is :

domains-template-begin EFF1
a 111-122
b 125-134
c 163-170
d 184-193
cd 171-183
e 199-204
f 269-274
h 303-306
h’ 309-313
i 324-330
j 334-340
k 364-368
l 375-380
end
chains-enumeration-begin
A like EFF1
BC
end

The Graphviz command used for generating the graph is :

dot -Tsvg interface.dot -o interfaces.svg

Algorithms

Annotating a structure requires finding salt bridges and disulfides bonds, see the package Pointwise_interactions .

Implementation and functionalities

Computing the graph

The graph is computed trough the functor SBL::CSB::T_Tertiary_quaternary_structure_annotator < ParticleTraits , MolecularSystemLabelsTraits > . The template parameter ParticleTraits defines the representation of a particle and the base molecular system used for representing a PDB file, as defined in the ESBTL library. In particular, the complete hierarchy defined by such a system alows to easily navigate through polypeptidic chains, residues and atoms. See ParticleTraits for more details. The template parameter MolecularSystemLabelsTraits defines labels attached to the particles. In particular, it allows to dissect the input molecule in terms of chains (SBL::Models::T_Chain_label_traits) or domains (SBL::Models::T_Domain_label_traits). See MolecularSystemLabelsTraits for more details.

The class SBL::CSB::T_Tertiary_quaternary_structure_annotator is a functor taking as input a molecular model (see ESBTL) and returns a graph of type SBL::CSB::T_Tertiary_quaternary_structure_annotator::Interfaces_graph . It takes also an optional tag specifying if salt bridges and disulfide bonds should be search inside a unit : this may lead to edges forming a loop in the output graph. The tag is false by default.

Visualizing the graph

For visualizing the graph, the static method SBL::CSB::T_Tertiary_quaternary_structure_annotator::print dumps the graph into a .dot file, a file format used by the Graphviz software. The following command allows to generate a pdf image file representing an output graph :

dot -Tpdf interface.dot -o interfaces.pdf 

Module

The package provides a module (see Module_base) SBL::Modules::T_Tertiary_quaternary_structure_annotator_module allowing to easily tune different parameters related to the different bonds and bridges – see package Pointwise_interactions, dump statistics, and report the output graph in .dot format or for molecular visualisation software such as VMD or PyMOL .

Examples

The following examples show how to use the class T_Tertiary_quaternary_structure_annotator in different contexts : when looking at interfaces between polypeptidic chains or between domains.

Interfaces between chains

The following example loads an input PDB file and prints a Graphviz file representing the interfaces graph between the polypeptidic chains. A Graphviz file can be processed using the dot software to produce an image of the graph.

#include <SBL/CSB/Tertiary_quaternary_structure_annotator.hpp>
#include <SBL/Models/Atom_with_hierarchical_info_and_annotations_traits.hpp>
#include <SBL/Models/Chain_label_traits.hpp>
#include <SBL/Models/PDB_file_loader.hpp>
#include <iostream>
typedef Annotator::Interfaces_graph Interfaces_graph;
int main(int argc, char *argv[]){
if(argc < 2)
return -1;
//Loads a PDB file.
Molecular_geometry_loader loader;
loader.set_loaded_water(false);
loader.add_input_file_name(argv[1]);
loader.load(true, std::cout);
Annotator annotator;
Interfaces_graph G = annotator(loader.get_geometric_model());
std::ofstream out("interfaces.dot");
out.close();
return 0;
}
Builds a graph representing the biochemical interactions between units of a protein.
Definition: Tertiary_quaternary_structure_annotator.hpp:117
static void print(const Interfaces_graph &G, std::ostream &out)
Prints the graph on dot format into an output stream.
Definition: Tertiary_quaternary_structure_annotator.hpp:993
boost::adjacency_list< boost::listS, boost::listS, boost::undirectedS, Vertex_properties, Edge_properties > Interfaces_graph
Representation of the interfaces graph, where vertices are labeled with an identifier,...
Definition: Tertiary_quaternary_structure_annotator.hpp:251
Traits class defining atoms traits with annotations (at least name and radius). Traits class defining...
Definition: Atom_with_hierarchical_info_and_annotations_traits.hpp:215
Model of MolecularSystemLabelTraits for chains.
Definition: Chain_label_traits.hpp:108
Loader for one or more PDB files, even listed in a file. Loader for one or more PDB files,...
Definition: PDB_file_loader.hpp:94

Interfaces between domains

The following example loads an input PDB file and prints a Graphviz file representing the interfaces graph between the domains defined in an input specification file. A Graphviz file can be processed using the dot software to produce an image of the graph.

#include <SBL/CSB/Tertiary_quaternary_structure_annotator.hpp>
#include <SBL/Models/Atom_with_hierarchical_info_and_annotations_traits.hpp>
#include <SBL/Models/Domain_label_traits.hpp>
#include <SBL/IO/Primitive_labels_loader.hpp>
#include <SBL/Models/PDB_file_loader.hpp>
#include <iostream>
typedef Annotator::Interfaces_graph Interfaces_graph;
int main(int argc, char *argv[]){
if(argc < 3)
return -1;
//Load specification file for domains
Labels_loader labels_loader;
labels_loader.get_partner_classifier().set_specification_file(argv[2]);
labels_loader.load(true, std::cout);
//Loads a PDB file.
Molecular_geometry_loader loader;
loader.set_loaded_water(false);
loader.set_loaded_hetatoms(false);
loader.add_input_file_name(argv[1]);
loader.load(true, std::cout);
Annotator annotator;
Interfaces_graph G = annotator(loader.get_geometric_model());
std::ofstream out("interfaces.dot");
out.close();
return 0;
}
Loader for systems' labels specification. Loader for systems' labels specification.
Definition: Primitive_labels_loader.hpp:64
bool load(unsigned verbose, std::ostream &out)
Loads the data.
Definition: Primitive_labels_loader.hpp:268
const Primitive_partner_classifier & get_partner_classifier(void) const
Classifier for the partners (const).
Definition: Primitive_labels_loader.hpp:220
Model of MolecularSystemLabelTraits for molecular complexes defined in a file.
Definition: Domain_label_traits.hpp:105

Using the module

The following example instantiates the module SBL::Modules::T_Tertiary_quaternary_structure_annotator_module for loading an input PDB file and printing a Graphviz file representing the interfaces graph between the polypeptidic chain .

#include <SBL/Modules/Tertiary_quaternary_structure_annotator_module.hpp>
#include <SBL/Models/Atom_with_hierarchical_info_and_annotations_traits.hpp>
#include <SBL/Models/Chain_label_traits.hpp>
#include <SBL/Models/PDB_file_loader.hpp>
#include <iostream>
struct Traits
{
};
int main(int argc, char *argv[]){
if(argc < 2)
return -1;
//Loads a PDB file.
Molecular_geometry_loader loader;
loader.add_input_file_name(argv[1]);
loader.load(true, std::cout);
TQSA_module module;
module.get_molecular_model() = &loader.get_geometric_model();
module.run(3, std::cout);
module.statistics(std::cout);
module.report("example_");
return 0;
}
Module building the Biochemical Interfaces Graph of a molecular structure.
Definition: Tertiary_quaternary_structure_annotator_module.hpp:102
void run(unsigned verbose, std::ostream &out)
Runs the module following the input options.
Definition: Tertiary_quaternary_structure_annotator_module.hpp:297
void statistics(std::ostream &out)
Reports high-level statistics on the module.
Definition: Tertiary_quaternary_structure_annotator_module.hpp:315
void report(const std::string &prefix)
Reports the output and statistics in output files.
Definition: Tertiary_quaternary_structure_annotator_module.hpp:492
const Molecular_model *& get_molecular_model(void)
Definition: Tertiary_quaternary_structure_annotator_module.hpp:273

Applications

Functionalities of this package are made available via the package Space_filling_model_interface_finder .