Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.

Authors: F. Cazals and T. Dreyfus
In this application, we present various volume based geometric approximation schemes, so as to design (hierarchical) coarse grain (CG) representations of a molecular geometric model. While our focus is on the preservation of molecular volumes, the geometric models generated can further be decorated with biophysical properties.
Consider a molecular geometric model defined as a union of balls, e.g. the solvent accessible model of an atomic resolution system. Also assume that one has a geometric approximation scheme to approximate a collection of particles (atoms or pseudoatoms) by a set of balls, with . In section Algorithms and Methods , we shall actually deal with three such schemes:
As we shall see, volume preserving approximations of atomic models can be obtained using a number of balls equal to 25% of the number of atoms.
Our algorithms are of special interest to coarse grain a molecular geometric model represented by its solvent accessible model. In this context, we actually provide several alternatives:
The aforementioned algorithm can be used iteratively to produce a hierarchy of models. For example, in the context of reconstruction by data integration [5], one may build a hierarchy of coarsetofine models. In a first step, one uses the program to compute an interpolated approximation of a molecule with N (=10000) atoms, using n (=1000) balls. Then, one iteratively feeds the balls of the approximation obtained to to obtain an ever simpler approximation of the same volume. For example, starting with N=10,000 atoms and selecting at each step 10% of the balls yields models of size 1000, 100 and 10.
This section describes how to use the program .
To present the user's workflow, we specify more carefully the geometric approximation schemes introduced in [30] to approximate a collection of balls by a smaller set of balls (Fig. Geometric approximations").
The following pieces of notation are used in the sequel: if refers to a collection of balls, refers to the corresponding domain i.e. .
Geometric approximations: inner / outer / interpolated. Illustrations on a protein complex consisting of 9060 atoms (PDB id: 1fin) 
In this package, we provide a general (and optimal) solution to problem pbinnercoverering. However, we only address the design of concentric outer and interpolated approximations. The reason for doing so is twofold. First, defining an outer approximation by growing the balls of an inner approximation defines a socalled Toleranced Model (TOM), namely a oneparameter family of shapes obtained by linearly interpolating the radii of the balls between the inner and outer radii. Such models proved instrumental to model large protein assemblies plagued with uncertainties on the shapes and positions of the subunits [23] , [54] , [55] . Second, as we shall see, growing tight inner approximations produces tight outer approximations.
The main input of is a collection of balls. The file format used is a text file listing the balls. A basic calculation is launched as follows:
> sblballcovortxt.exe f data/spheres.txt ninnerballs 20 outer interpolated directory results verbose outputprefix log
Note that a default radius of is added to all atoms to define the Solvent Accessible Model of the input molecule.
File Name  Description 
3D Spheres text file  A list of 3D spheres (a line contains the coordinates and the radius of one sphere) 
Preview  File Name  Description 
General: log and visualization file  
Log file  Log file containing high level information on the run of  
Approximations VMD file  VMD visualization state file of inner, outer and interpolated covers  
Module inner approximation  
Inner approximation text file  Text file listing the balls in the inner approximation  
Module outer approximation  
Outer approximation text file  Text file listing the balls in the outer approximation  
Module interpolated approximation  
Interpolated approximation text file  Text file listing the balls in the interpolated approximation 
This section describes how to use the programs and in different contexts. The prerequisites being identical to those of , the reader is referred to section Prerequisites: Volumetric Approximation Schemes .
The main input of is a molecular model loaded from a PDB file. A basic calculation is launched as follows:
> sblballcovorpdbU.exe f data/1vfb.pdb ninnerballs 20 outer interpolated directory results verbose outputprefix log
File Name  Description 
1vfb PDB file  PDB file of an ImmunoglobulineAntibody complex 
Preview  File Name  Description 
General  
Log file  Log file containing high level information on the run of  
Approximations VMD file  VMD visualization state file of inner, outer and interpolated covers  
Inner approximation  
Inner approximation text file  Text file listing the balls in the inner approximation  
Outer approximation  
Outer approximation text file  Text file listing the balls in the outer approximation  
Interpolated approximation  
Interpolated approximation text file  Text file listing the balls in the interpolated approximation 
We provide one example application of the algorithm . Given a macromolecular model, crosslink experiments report pairs of LYS which get crosslinked. In order for two LYS to crosslink, two conditions are necessary: these LYS must be exposed, and their distance must be compatible with the size of the crosslinker.
We can therefore use algorithm to design CG models of a molecular model, so as to CG separately the exposed LYS and the atoms of all remaining a.a.
The SBL provides VMD and PyMOL plugins to use the programs of Space_filling_model_coarse_graining . The plugins are accessible in the Extensions menu of VMD or in the Plugin menu of PyMOL . Upon termination of a calculation launched by the plugin, the following visualizations are available:
In this section, we sketch the algorithms used to compute the three geometric approximations.
The main ingredients of the inner approximation are as follows [30] :
That is the specification of the algorithm requires a collection of balls defining a region , a selection size or a target ratio between the volume of and that of — that is one expects . The output consists of an ordered set of balls , together with the increment in volume associated to each ball.
Lazy ball selection. One comment is in order about the incremental selection of the ball yielding the best increment, called the dominating ball in the sequel. We perform this operation using the following lazy strategy.
As a preprocessing step, we compute the intersection graph of the candidate balls, namely the graph with one vertex per ball , and one edge for every pair of intersecting balls. Incidences in this graph are used to identify the balls whose volume increment needs to be recomputed.
More precisely:
Denote the bounary of the domain . To solve the outer approximation problem, we enlarge the balls from the set , so as to cover the portion of sticking out from .
To see how, recall that the Apollonius diagram of a collection of balls is the Voronoi diagram defined for the following generalized distance [16] :
.
Note that the Apollonius distance is merely the Euclidean distance to the sphere bounding . For each ball of the selection , consider the restriction of the boundary of the input domain not covered yet by the domain , to its Voronoi cell in the Apollonius diagram.
If this restriction is non empty, we define the expansion radius of by the maximum distance of a point of that region to , that is:
If this restriction is empty, the expansion radius is set to the original radius, that is . Expanding each ball of the selection by its expansion radius yields an outer cover of the input domain.
The class T_Space_filling_model_coarse_graining_outer_cover implements this computation without computing the partition of the boundary of the input object induced by the Apollonius Voronoi diagram.
Consider an inner approximation together with the associated concentric outer approximation, and denote and the inner and outer radii of the ith ball.
Given a parameter , we define the interpolated radius of the th ball as
An interpolated approximation is the union of these interpolated balls; it is called volume preserving if its volume matches that of the input shape.
Increasing the value of in Eq. eqinterpolatedradius yields nested balls, whence nested interpolated approximations. Therefore, finding the volume preserving interpolated approximation requires a binary search on . Practically, the binary search is stopped when the discrepancy between the volumes is less than .
The class T_Space_filling_model_coarse_graining_interpolated_cover implements this algorithm.
The programs of Space_filling_model_coarse_graining described above are based upon generic C++ classes, so that additional versions can easily be developed.
In order to derive such versions, there are two important ingredients, that are the workflow class, and its traits class.
T_Space_filling_model_coarse_graining_traits:
This class defines the main types used in the modules of the workflow. It is templated by the classes of the concepts required by these modules. This design makes it possible to use the same workflow within different(biophysical) contexts to make new programs. To use the workflow T_Space_filling_model_coarse_graining_workflow , one needs to define:
ParticleTraits  Model of the concept ParticleTraits. 
MolecularGeometryLoader  Follow the concept MolecularGeometryLoader and can be either SBL::Models::T_PDB_file_loader when the input is a PDB file, or SBL::Models::T_Spheres_3_file_loader when the input is a file listing spheres. 
ParticleAnnotator  Follow the concept ParticleAnnotator. 
ParticlesBuilder  Functor building the input particles, as defined in the concept ParticleTraits. 
T_Space_filling_model_coarse_graining_workflow: