Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.

Authors: F. Cazals and T. Dreyfus and N. MalodDognin
A binding patch (patch for short) of a molecular model is a set of atoms involved in an interaction with a partner in a complex, as defined in the applications of Space_filling_model_shelling_diagram_surface_encoding.
The classical way to compare two patches and consists of finding subsets of atoms and , in onetoone correspondence – in particular and are of the same size, in such a way that there exists a rigid motion superimposing onto . The quality of such a match is typically assessed using the least RMSD.
This approach has two main limitations:
Space_filling_model_shelling_diagram_comparison gets around these two difficulties by proposing structural comparisons favoring either geometric or topological features, as illustrated on Fig. figlegointerfaceflatcurvedmontage.
To present them, recall that Space_filling_model_shelling_diagram_surface_encoding computes a topological encoding of binding patches, by aggregating the atoms of a binding patch into shells, a shell being a collection of neighboring atoms located at a fixed distance of the binding patch boundary. The relative position of these shells yields an encoding of the patch as an ordered tree known as the atom shelling tree, see Fig. figshellingtree.
The fact that the tree is ordered is important: using dynamic programming, ordered trees can be compared via a simple operation known as the Tree Edit Distance (TED) [22] . In a nutshell, the algorithm for TED is a generic algorithm computing the cost of morphing a tree into a tree , based upon three costs, namely the deletion of a node from , the insertion of a node into , and the transformation of a node of into a node of .
Interestingly, adjusting the semantics of the three aforementioned operations allows defining two types of TED calculations:
Notations For both the intersection , the symmetric difference and the similarity score , stands for the used methodology ( : topological TED; : Clique; : geometrical TED). Abusing terminology, we shall indistinctly speak of a shell or its node representation in the shelling tree.
An idealized illustration of the topological rigidity handled in Space_filling_model_shelling_diagram_comparison For the sake of illustration, a fictitious binding patch is presented, each lego block meant to represent one atom. The color coding of the atoms codes the shelling order of the patch, as defined in Space_filling_model_shelling_diagram_surface_encoding. From left to right, the binding patch bends: while the relative position of the atoms in 3D changes, each particular atom retains exactly the same neighbors, which we illustrate with the blue atom. Such a binding patch is termed geometrically flexible, but topologically rigid. Space_filling_model_shelling_diagram_comparison provides methods to assess the rigidity / flexibility of binding patches, either from a geometric or topological standpoint. 
This section presents the program .
Isotopology. A shell of a shelling tree is an ancestor of a shell if there is an oriented path from to in the shelling tree. Two shells are isotopologic to the shells if either:
(i) and , or
(ii) is an ancestor of and is an ancestor of , or,
Two atoms of and two atoms of are isotopologic iff the shells containing them are isotopologic. Between and , a subset of atoms is isotopologic to a subset if there exists a onetoone mapping between the atoms of and the ones of such that any two atoms in are isotopologic to their counterparts in .
Problem statement. The comparison problem reduces to an ordered Tree Edit Distance problem (TED) having the following edition costs.
Adding or deleting a shell has a cost of , since all the atoms of are added/removed. Morphing a shell into a shell has cost equal to the size of their symmetric difference:
.
Denoting the number of isotopologic atoms between and , is the size of the symmetric difference
.
The similarity between two trees is then the number of their isotopologic atoms normalized by the size of the two trees to be in [0,1]:
Quasiisometry. Given two shelling trees and , two atoms and from are quasiisometric to the atoms and from if the euclidean distances (between and ) and (between and ) are such that , with a distance threshold circa 2 .
Between and , a subset of atoms is quasiisometric to a subset if there exists a onetoone matching between the atoms of and the ones of such that any two atoms in are quasiisometric to their counterparts in . Such matchings have the property that the corresponding Root Mean Squared Deviation of internal distances ( ) is smaller than .
Problem statement.
To favor geometric comparisons, we are interested in finding the largest subset of atoms between and satisfying the aforementioned quasiisometry condition.
Mimicking Eq. eqsimtedt, the similarity between and is:
It can be shown that the previous calculation reduces to a maximum clique calculation [36] .
To strike a balance between geometric and topological criteria, we are interested in finding the subsets of atoms that are isotopologic and where the quasiisometric constraints are satisfied between the matched shells. Meeting these criteria is amenable to a TED calculation, using the following costs.
The costs for inserting/deleting a shell is . The cost for morphing a shell into a shell is equal to the size of their symmetric difference , where is the subset of isotopologic and quasiisometric atoms, as found by applying the Clique method between the two shells and .
Problem statement.
To balance topological and geometric comparisons, we are interested in finding the largest subset of atoms between and that is both quasiisometric and isotopologic.
Let be the number of atoms between and that are both isopologic and quasiisometric.
In this case, is the size of the symmetric difference that is:
.
The similarity between two trees is then the size of normalized by the size of the two trees:
Given the computational complexity of purely geometric comparisons, we focus in the sequel on the topological and mixed comparisons.
Both methods respect the isotopologic constraints, but only Clique respects all the isometric constraints. verifies the isometric constraints only when and (and thus and ) come from the same shell. Finally, in , the isometric constraints are not verified at all.
This implies that the size of the atomic subsets returned by Clique are smaller than the size of the subsets returned by which are smaller than the values found by .
Thus, the following holds:
Also, because of the possibly broken isometric constraints, may return matchings having values larger than .
This section describes the application's options and file formats.
Given the atom shelling trees of two binding patches (XML file format), use the treeeditdistance based methods to measure their dissimilarity, and also record the optimum treeeditscript (the sequence of treeedit operations). An example run of is given as follows:
> sblcompatch.exe shellingforest data/1vfb_atom_shelling_tree_A.xml shellingforest data/1vfb_atom_shelling_tree_B.xml comparisonmode g directory results verbose outputprefix log
File Name  Description 
Source Atom Shelling Forest xml file  Atom shelling tree of the binding patch of the partner A of the IGAg complex 1vfb 
Target Atom Shelling Forest xml file  Atom shelling tree of the binding patch of the partner B of the IGAg complex 1vfb 
There are two output files that are created from the last command line:
During its execution, a record on the main steps undertaken is dumped into the called window. This information can be sent stored a log file with the option l .
Preview  File Name  Description 
General: log file  
Log file  Log file containing high level information on the run of  
Module shelling forests comparison  
Comparison xml file  XML file describing how to move from the source tree to the target tree  
Comparison dot file  dot file for visualizing the matching between the two input trees using GraphViz 
For visualizing the shelling forests comparison, we recommend you to install Graphviz (see the Graphviz web site), and using the dot software for drawing the graph from a .dot file. Note that there are other software from the Graphviz library for drawing the graphs with different embedding.
The computation of isotopologic atom sets reduces to the classical TED, see [22] , with the insertion / deletion / morphing costs discussed above.
The computation of quasiisometric subsets of atoms is rephrased as a maximum clique problem as follows. Let be a graph whose vertex set is depicted by a grid in which each row represents an atom of and each column represents an atom of . Matching the atoms and is represented by the vertex (on row , column ). and such that and , if and are quasiisometric and isotopologic to and then the edge is in . The largest subset of atoms between and that is both quasiisometric and isotopologic, denoted by , corresponds to a maximum clique in .
The maximum clique problem is NPHard [104] , and is solved by using Ostergard's algorithm [137] . Note that the isotopology constraints reduce the number of edges in , thus easing the maximum clique solving process.
The applications of Space_filling_model_shelling_diagram_comparison described above are based upon generic C++ classes, so that additional versions can easily be developed.
In order to derive such versions, there are two important ingredients, that are the workflow class, and its traits class.
T_Space_filling_model_shelling_diagram_comparison_traits:
T_Space_filling_model_shelling_diagram_comparison_workflow: