Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.

Authors: F. Cazals and T. Dreyfus and C. Roth
This package provides methods to build transition graphs, i.e. graphs connecting local minima of an energy landscape, across index one saddles.
This package provides various methods dealing with the construction of transition graphs (TG), i.e. graphs connecting local minima and index one saddles (Fig. figtgbuilderhimmelblaubasinsTG and definition deftransitiongraph). The TGs built are meant to be used as input for the analysis presented in the following packages:
In the sequel, we consider a conformational ensemble . We may also assume that the local minimum associated with each conformation , that is obtained by following the negative gradient of the potential energy, is known. Following classical terminology, this minimization process is called quenching. This set is denoted . We assume that the (potential) energy of each conformation is known.
Methods to build TG are provided in three settings:
From a database of stationary points (local minima and index one saddles).
From a conformational ensemble with a set of conformations, and the local minima associated to all conformations – obtained upon quenching.
From a dense sampling .
Himmelblau: (Compressed) Transition graph (A) The landscape of Himmelblau, decomposed into the catchment basins of the four local minima (B) The transition graph, with one node for each critical (i.e., stationary) point (C) The compressed transition graph, where the information associated with saddles is stored in the edges joining local minima. 
We first recall the following two definitions (details deftransitiongraph)):
The construction of the transition graph also output a weight for each local minimum. The weight is expected to code the probability of occupancy of the basin associated with the local minimum. Note that when persistence is used, the weight is the sum of the weights from the basins associated to the persistent local minimum.
Because weights are only on minima, we use the compressed transition graph representation, and set the weights on the vertices. There are two ways to set the weights:
TG built from a DB of stationary points. The weight of a local minimum is the normalized Boltzmann's factor computed using the energy of this local minimum. Beware of units for systems with reduced units, such as BLN69 for which both Boltzmann constant and temperature are set to unity. The following Python script computes the normalized Boltzmann's factor:
import sys, numpy with open(sys.argv[1]) as fp: exps = [numpy.exp(float(l)) for l in fp] vals = [v/numpy.sum(exps) for v in exps] for v in vals: print(v)
TG built from sample points. The weight of a local minimum is the fraction of samples in the basin of the local minimum.
In that case, the user must specify energies only – since weights are computed from the fraction of samples in a basin.
When a database of connected critical points (local minima and index one saddles) is available, building a transition graph is trivial.
We note that compressed transition graphs (rather than transition graphs) are built (definition defcompressedtg), to minimize the memory footprint.
Main options.
Input. The input consists of five files:
Main output.
Optional output.
Comments.
Consider now the case where a plain sampling has been obtained, together with the local minimum associated with each sample (obtained by quenching) [133], [112] . Let us also assume that the transitions themselves have not been computed. In this case, one can locate pairs of conformations, say satisfying two conditions (see Fig. figmorsestableunstablemanifoldsalgoflowbackward):
the conformations and are neighbor – their distance (for some suitable distance metric) is less than a user defined threshold;
Such a pairs is said to witness a bifurcation.
Note that the same two local minima may be witnessed by multiple pairs. For small enough value of the distance threshold, the sample with lowest energy amidst such pairs witnesses the adjacency of the catchment basins of and . An algorithm to detect such pairs is sketched below, see section Algorithms and Methods.
Main options.
Input.
Main output.
Optional output.
Comments.
Consider finally the cases where only samples are known – that is the local minimum associated with a conformation is unknown. Also assume that the neighborhood of has been densely sampled, and that has been connected to a set of nearest neighbors, using a nearest neighbor graph (NNG, definition defnng). We estimate the gradient of the potential energy at using its neighbors in the NNG. More precisely, we estimate the gradient using the neighbor of yielding the maximum slope. Starting from and iteratively following this estimated gradient eventually leads to a sample having all its neighbors above it. One can use this sample as a substitute for .
Main options.
Input.
Main output.
Optional output.
Comments.
In the following, we sketch the construction of the TG from a collection of points and associated local minima. The reader is also referred to the package Morse_theory_based_analyzer for further details (see also the MTBA module in the workflow, Fig. figtgbworkflow).
Assume that one is given a conformational ensemble together with the corresponding quenched points . The algorithm consists of two main steps (Fig. figmorsestableunstablemanifoldsalgoflowbackward):
Step 1: Nearest neighbor graph. We build a NNG on the conformations – see package Nearest_neighbors_graph_builder, and in addition endow each sample with an edge to . Note that this connection directly links each sample to its local minimum.
Identifying the connexion between the basins of two local minima by quenching two samples and Sample is quenched to while is quenched to . The samples and sit across the (white) ridge separating the basins of their respective 
The programs of Transition_graph_of_energy_landscape_builders described above are based upon generic C++ classes, so that additional versions can easily be developed.
In order to derive other versions, there are two important ingredients, that are the workflow class, and its traits class.
T_Transition_graph_builder_from_sampled_energy_landscape_traits:
This class defines the main types used in the modules of the workflow. It is templated by the classes of the concepts required by these modules. This design makes it possible to use the same workflow within different(biophysical) contexts to make new programs. To use the workflow T_Transition_graph_builder_from_sampled_energy_landscape_workflow , one needs to define:
GeometricKernel  Traits class defining various geometric objects, in particular the number type used for representing the coordinates of a conformation, and the base representation of a point in dimension D – see class CGAL::Cartesian_d from the CGAL library 
DistanceFunction  Functor returning the distance between two conformations. It has to define the types Point (alias for a Conformation) and FT (alias for the returned number type) 
TDensityFunction  Functor returning the density of a conformation represented by a vertex in a nearest neighbor graph. The template parameter isthe type of the nearest neighbor graph. 
T_Transition_graph_builder_from_sampled_energy_landscape_workflow:
For an analysis on the star of a vertex in a weighted graph, we recommand the use of the program sblenergylandscapeanalysisgraphstar.exe, defined in the package Energy_landscape_analysis .
See the following jupyter notebook:
Useful functions
from SBL import SBL_pytools
from SBL_pytools import SBL_pytools as sblpyt
help(sblpyt)
When a database of connected critical points (local minima and index one saddles) is available, building a transition graph is trivial.
We note that compressed transition graphs (rather than transition graphs) are built, to minimize the memory footprint.
The options of the runFromDatabase method in the next cell are:
import re #regular expressions
import sys #misc system
import os
import pdb
import shutil # python 3 only
def run_from_database(metric, minimaPoints, minimaEnergies, transitionPoints, transitionEnergies, samples2Mins, weights = None, discardLoop = True):
odir = "tmpresultsdb%s" % metric
if os.path.exists(odir):
os.system("rm rf %s" % odir)
os.system( ("mkdir %s" % odir) )
# check executable exists and is visible
exe = shutil.which("sbltgbuilder%s.exe" % metric)
if not exe:
print("Executable not found")
return
print(("Using executable %s\n" % exe))
cmd = "sbltgbuilder%s.exe fromDB samplestomins %s \
pointsfile %s energies %s pointsfile %s energies %s \
directory %s verbose log" \
% (metric, samples2Mins, minimaPoints, minimaEnergies, transitionPoints, transitionEnergies, odir)
if discardLoop:
cmd += " discardloops"
if weights:
cmd += " weights %s" % weights
print(("Executing %s\n" % cmd))
os.system(cmd)
cmd = "ls %s" % odir
ofnames = os.popen(cmd).readlines()
print( ("All output files in %s:" % odir),ofnames)
#find the log file and display log file
#sblpyt.show_log_file(odir)
# For the BLN69 data, use the lrmsd
print("Marker : Calculation Started")
run_from_database("lrmsd", "data/bln69_database_minima_conformations.txt",\
"data/bln69_database_minima_energies.txt",\
"data/bln69_database_transitions_conformations.txt",\
"data/bln69_database_transitions_energies.txt",\
"data/bln69_database_transitions.txt",\
"data/bln69_database_minima_weights.txt")
#run_from_database("lrmsd", "data/bln69_database_minima_conformations_2.txt",\
# "data/bln69_database_minima_energies_2.txt",\
# "data/bln69_database_transitions_conformations_2.txt",\
# "data/bln69_database_transitions_energies_2.txt",\
# "data/bln69_database_transitions_2.txt",\
# "data/bln69_database_minima_weights_2.txt")
print("Marker : Calculation Ended")
Consider now the case where a plain sampling has been obtained, together with the local minimum associated with each sample (obtained by quenching). Let us also assume that the transitions themselves have not been computed. In this case, one can locate pairs of conformations, say $(c_i, c_j)$ satisfying two conditions:
the conformations $c_i$ and $c_j$ are neighbor â€“ their distance (for some suitable distance metric) is less than a user defined threshold;
the conformations $c_i$ and $c_j$ quench to different local minima.
Such a pairs is said to witness a bifurcation. Note that the same two local minima may be witnessed by multiple pairs. For small enough value of the distance threshold, the sample with lowest energy amidst such pairs witnesses the adjacency of the catchment basins of $q(c_i)$ and $q(c_j)$.
The options of the runFromMinima method in the next cell are:
import re #regular expressions
import sys #misc system
import os
import pdb
import shutil # python 3 only
def run_from_minima(metric, confPoints, confEnergies, quenchedPoints, quenchedEnergies, samples2Mins):
odir = "tmpresultsminima%s" % metric
if os.path.exists(odir):
os.system("rm rf %s" % odir)
os.system( ("mkdir %s" % odir) )
# check executable exists and is visible
exe = shutil.which("sbltgbuilder%s.exe" % metric)
if not exe:
print("Executable not found")
return
print(("Using executable %s\n" % exe))
cmd = "sbltgbuilder%s.exe fromsampling \
pointsfile %s energies %s pointsfile %s energies %s samplestomins %s \
directory %s verbose log numneighbors 8" \
% (metric, confPoints, confEnergies, quenchedPoints, quenchedEnergies, samples2Mins, odir)
print(("Executing %s\n" % cmd))
os.system(cmd)
cmd = "ls %s" % odir
ofnames = os.popen(cmd).readlines()
print("All output files:",ofnames)
#find the log file and display log file
#sblpyt.show_log_file(odir)
images = []
images.append( sblpyt.find_and_convert("disconnectivity_forest.eps", odir, 100) )
images.append( sblpyt.find_and_convert("persistence_diagram.pdf", odir, 100) )
sblpyt.show_row_n_images(images, 100)
print("Marker : Calculation Started")
run_from_minima("euclid", "data/bln69_sampling_samples_conformations.txt",\
#run_from_minima("lrmsd", "data/bln69_sampling_samples_conformations.txt",\
"data/bln69_sampling_samples_energies.txt",\
"data/bln69_sampling_quenched_conformations.txt",\
"data/bln69_sampling_quenched_energies.txt",\
"data/bln69_sampling_samples_to_quench.txt")
print("Marker : Calculation Ended")
Consider finally the cases where only samples are known â€“ that is the local minimum $q(c_i)$ associated with a conformation $c_i$ is unknown. Also assume that the neighborhood of $c_i$ has been densely sampled, and that $c_i$ has been connected to a set of nearest neighbors, using a nearest neighbor graph (NNG). We estimate the gradient of the potential energy at $c_i$ using its neighbors in the NNG. More precisely, we estimate the gradient using the neighbor $c_k$ of $c_i$ yielding the maximum slope. Starting from $c_i$ and iteratively following this estimated gradient eventually leads to a sample having all its neighbors above it. One can use this sample as a substitute $\tilde{q}(c_i)$ for $q(c_i)$.
The options of the runFromSamples method in the next cell are:
import re #regular expressions
import sys #misc system
import os
import pdb
import shutil # python 3 only
def run_from_samples(metric, confPoints, confEnergies, persistence = 1.):
odir = "tmpresultssamples%s" % metric
if os.path.exists(odir):
os.system("rm rf %s" % odir)
os.system( ("mkdir %s" % odir) )
# check executable exists and is visible
exe = shutil.which("sbltgbuilder%s.exe" % metric)
if not exe:
print("Executable not found")
return
print(("Using executable %s\n" % exe))
cmd = "sbltgbuilder%s.exe fromsampling \
pointsfile %s energies %s persistencethreshold %f \
directory %s verbose log numneighbors 8" \
% (metric, confPoints, confEnergies, persistence, odir)
print(("Executing %s\n" % cmd))
os.system(cmd)
cmd = "ls %s" % odir
ofnames = os.popen(cmd).readlines()
print("All output files:",ofnames)
#find the log file and display log file
#sblpyt.show_log_file(odir)
images = []
images.append( sblpyt.find_and_convert("disconnectivity_forest.eps", odir, 100) )
images.append( sblpyt.find_and_convert("persistence_diagram.pdf", odir, 100) )
sblpyt.show_row_n_images(images, 100)
print("Marker : Calculation Started")
#run_from_samples("euclid","data/himmelblau_grid_points.txt", "data/himmelblau_grid_heights.txt")
run_from_samples("euclid","data/himmelblau_rand_points.txt", "data/himmelblau_rand_heights.txt")
print("Marker : Calculation Ended")
print("Marker : Calculation Started")
#run_from_samples("euclid","data/bln69_sampling_samples_conformations.txt",\
#run_from_samples("lrmsd","data/bln69_sampling_samples_conformations.txt",\
# "data/bln69_sampling_samples_energies.txt")
run_from_samples("lrmsd","data/bln69_wales_samples.txt",\
"data/bln69_wales_energies.txt")
print("Marker : Calculation Ended")