Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Authors: F. Cazals and T. Dreyfus and C. Roth

Transition_graph_of_energy_landscape_builders

This package provides methods to build transition graphs, i.e. graphs connecting local minima of an energy landscape, across index one saddles.

Goals

This package provides various methods dealing with the construction of transition graphs (TG), i.e. graphs connecting local minima and index one saddles (Fig. fig-tg-builder-himmelblau-basins-TG and definition def-transition-graph). The TGs built are meant to be used as input for the analysis presented in the following packages:

In the sequel, we consider a conformational ensemble $ C = { c_1, \dots, c_n} $. We may also assume that the local minimum $ q(c_i) $ associated with each conformation $ c_i $, that is obtained by following the negative gradient of the potential energy, is known. Following classical terminology, this minimization process is called quenching. This set is denoted $ C_q = { q(c_1), \dots, q(c_n)} $. We assume that the (potential) energy $ E(\cdot) $ of each conformation is known.

Methods to build TG are provided in three settings:

  • From a database of stationary points (local minima and index one saddles).

  • From a conformational ensemble $ C\cup C_q $

  • From a dense sampling $ C = { c_1, \dots, c_n} $.

Himmelblau: (Compressed) Transition graph
(A) The landscape of Himmelblau, decomposed into the catchment basins of the four local minima (B) The transition graph, with one node for each critical (i.e., stationary) point (C) The compressed transition graph, where the information associated with saddles is stored in the edges joining local minima.

Using sbl-tg-builder-lrmsd.exe from a database of stationary points

Pre-requisites

When a database of connected critical points (local minima and index one saddles) is available, building a transition graph is trivial.

We note that compressed transition graphs (rather than transition graphs) are built (definition def-compressed-tg), to minimize the memory footprint.

Input: Specifications and File Types

The input consists on five files:

  • a text file listing the conformations of the local minima,
  • a text file listing the energies of the local minima,
  • a text file listing the pairs of minima linked by the transition states, on transition state per line,
  • a text file listing the conformations of the transition states,
  • a text file listing the energies of the transition states

A calculation is launched as follows:

> sbl-tg-builder-lrmsd.exe --from-DB --discard-loops --samples-to-mins data/bln69_databasetransitions.txt --points-file data/bln69_database_minima_conformations.txt --energies data/bln69_database_minima_energies.txt --points-file data/bln69_database_transitions_conformations.txt --energies data/bln69_database_transitions_energies.txt --directory results --verbose --output-prefix --log 
The main options of the program sbl-tg-builder-lrmsd.exe are:
–from-DB: run transition graph building from a database
–discard-loops: remove from the transition graph all transitions incident to only one minimum
–points-file string(= plain text file listing all conformations as D-dimensional points (first for minima): second for transitions)
–energies string(= plain text file listing all energies (first for minima): second for transitions)
–samples-to-mins string: plain text file listing transitions as pairs of indices of minima


Note that points and energies are specified using the same option for both transitions and minima: the minima correspond always to the first specified file.


File Name

Description

Transitions file Transitions as pairs of minima indices
BLN69 conformations file Conformations of sampled transition states of BLN69
Energies file Energies of sampled transition states of BLN69
BLN69 conformations file Conformations of sampled local minima of BLN69
Energies file

Energies of sampled local minima of BLN69

Input files for the run described in section Input: Specifications and File Types .

Output: Specifications and File Types

PreviewFile Name Description
General: log file
Log text file High level information on the run
Moduel TG builder from DB
Transition graph XML archive

Output transition graph serializedinto an XML archive

Output files for the runs described in section Input: Specifications and File Types, classified by modules – see Fig. fig-tgb-workflow .

Using sbl-tg-builder-lrmsd.exe from a set of samples and associated local minima

Pre-requisites

Consider now the case where a plain sampling has been obtained, together with the local minimum associated with each sample (obtained by quenching) [107], [89] . Let us also assume that the transitions themselves have not been computed. In this case, one can locate pairs of conformations, say $ (c_i, c_j) $ satisfying two conditions (see Fig. fig-morse-stable-unstable-manifolds-algo-flow-backward):

  • the conformations $ c_i $ and $ c_j $ are neighbor – their distance (for some suitable distance metric) is less than a user defined threshold;

  • the conformations $ c_i $ and $ c_j $ quench to different local minima.

Such a pairs is said to witness a bifurcation. Note that the same two local minima may be witnessed by multiple pairs. For small enough value of the distance threshold, the sample with lowest energy amidst such pairs witnesses the adjacency of the catchment basins of $ q(c_i) $ and $ q(c_j) $. An algorithm to detect such pairs is sketched below, see section Algorithms and Methods.

Input: Specifications and File Types

The input consists on five files:

  • a text file listing all the conformations,
  • a text file listing the energies of all the conformations,
  • a text file listing the quenched conformations,
  • a text file listing the energies of the quenched conformations,
  • a text file listing the pairs of indices (conformations, quenched conformations).

A calculation is launched as follows:

> sbl-tg-builder-lrmsd.exe --from-sampling --points-file data/bln69_sampling_samples_conformations.txt --energies data/bln69_sampling_samples_energies.txt --samples-to-mins data/bln69_sampling_samples_to_quench.txt --points-file data/bln69_sampling_quenched_conformations.txt --energies data/bln69_sampling_quenched_energies.txt  --directory results --verbose --output-prefix --log 
The main options of the program bl-tg-builder-lrmsd.exe are:
–from-sampling: run transition graph building from a sampling
–points-file string: plain text file listing all conformations or quenched conformations as D-dimensional points
–energies string: plain text file listing all energies of conformations or quenched conformations
–samples-to-mins string(= plain text file listing pairs of indices (conformations): quenched conformations)


File Name

Description

BLN69 conformations file A small sampling of BLN69
BLN69 energies file Energies of conformations in the sampling
BLN69 conformations file Quenched conformations from the sampling
BLN69 energies file Energies of the quenched conformations
Indices pairs file

List of pairs (conformation, quenched conformation) Input files for the run described in section Input: Specifications and File Types .

Input files for the run described in section Input: Specifications and File Types .

Output: Specifications and File Types

PreviewFile Name

Description

General: log file and weighted TG
Log file Log file containing high level information on the run
Transition graph XML archive

Transition graph of input sampling serialized into an XML archive

Module MTB analysis: minima weights in TG

Morse Smale Witten chain complex xml file

Morse Smale Witten chain complex serialized using Boost into an XML archive

Stable manifold partition xml file

XML archive listing the samples repartition by persistent basin

Disconnectivity forest image file

Disconnectivity forest drawn in eps file formnat

Sorted basins plain text file

List of all basins sorted by persistence

Persistence diagram plot script

Gnuplot script for the persistence diagram

Persistences plain text file

List of all finite persistences

Persistence histogram plot script

R script for the persistence histogram

Persistence histogram image

Persistence histogram drawn by R in pdf file format

Output files for the run described in section Input: Specifications and File Types, classified by modules – see Fig. fig-tgb-workflow .

Using sbl-tg-builder-euclid.exe from a set of samples

Pre-requisites

Consider finally the cases where only samples are known – that is the local minimum $ q(c_i) $ associated with a conformation $ c_i $ is unknown. Also assume that the neighborhood of $ c_i $ has been densely sampled, and that $ c_i $ has been connected to a set of nearest neighbors, using a nearest neighbor graph (NNG, definition def-nng). We estimate the gradient of the potential energy at $ c_i $ using its neighbors in the NNG. More precisely, we estimate the gradient using the neighbor $ c_k $ of $ c_i $ yielding the maximum slope. Starting from $ c_i $ and iteratively following this estimated gradient eventually leads to a sample having all its neighbors above it. One can use this sample as a substitute $ \tilde{q}(c_i) $ for $ q(c_i) $.

Input: Specifications and File Types

The input consists on two files:

  • a text file listing all the conformations,
  • a text file listing the energies of all the conformations.

A calculation is launched as follows:

> sbl-tg-builder-euclid.exe --from-sampling --points-file data/himmelbleau_sampling.txt --energies data/himmelbleau_heights.txt --persistence-threshold 10 --directory results --verbose --output-prefix --log 
The main options of the program bl-tg-builder-euclid.exe are:
–from-sampling: run transition graph building from a sampling
–points-file string: plain text file listing all conformations as D-dimensional points
–energies string: plain text file listing all energies
–persistence-threshold float: persistence threshold for removing not-persistent basins due to noisy data


File Name

Description

2D points conformations file 2D points on a grid for sampling Himmelbleau function
Energies file Height of points in the Himmelbleau sampling
input files for the run described in section Input: Specifications and File Types .

Output: Specifications and File Types

PreviewFile Name

Description

General: log file

Transition graph XML archive

Transition graph of input sampling serialized into an XML archive

Log file

Log file containing high level information on the run

Module NNG builder

NNG xml file

Nearest neighbor graph serialized using Boost into an XML archive

Module MTB abalysis: minima weights in TG

Morse Smale Witten chain complex xml file

Morse Smale Witten chain complex serialized using Boost into an XML archive

Stable manifold partition xml file

XML archive listing the samples repartition by persistent basin

Disconnectivity forest image file

Disconnectivity forest drawn in eps file formnat

Sorted basins plain text file

List of all basins sorted by persistence

Persistence diagram plot script

Gnuplot script for the persistence diagram

Persistence diagram image

Persistence diagram drawn by gnuplot in pdf file format

Persistences plain text file

List of all finite persistences

Persistence histogram plot script

R script for the persistence histogram

Persistence histogram image

Persistence histogram drawn by R in pdf file format

Output files for the run described in section Input: Specifications and File Types, classified by modules – see Fig. fig-tgb-workflow .

Algorithms and Methods

In the following, we sketch the construction of the TG from a collection of points and associated local minima. The reader is also referred to the package Morse_theory_based_analyzer for further details (see also the MTBA module in the workflow, Fig. fig-tgb-workflow).

Assume that one is given a conformational ensemble $ C $ together with the corresponding quenched points $ C_q $. The algorithm consists of two main steps (Fig. fig-morse-stable-unstable-manifolds-algo-flow-backward):

  • Step 1: Nearest neighbor graph. We build a NNG on the conformations $ C $ – see package Nearest_neighbors_graph_builder, and in addition endow each sample $ c_i $ with an edge to $ q(c_i) $. Note that this connection directly links each sample to its local minimum.

  • Step 2: Identification of transitions. Samples are sorted and processed by increasing energy. Upon inspecting a sample, we focus on its lower star, that is the subset of its neighbors with lower energy. We aim at identifying pairs of samples $ (c_i,c_j) $, such that (i) $ c_j $ belongs to the lower star of $ c_i $ (recall that samples are processed by increasing height), and (ii) $ q(c_i) \neq q(c_j) $. We also say that these samples witness a bifurcation. When such a pair of samples is found, we record the path from $ q(c_i) $ to $ q(c_j) $ through the samples $ c_i $ and $ c_j $. The collection of all such paths defines a transition graph between the vertices from the set $ C_q $.
The following remarks are in order:
  • The bifurcations, as defined above, may involve more than two local minima. This is e.g. the case if a sample has in its lower star two or more samples flowing to distinct minima. This situation may occur in different settings: (i) in case of coarse sampling, the connected samples are located in the basins of several minima, (ii) the samples are located in the neighborhood of a critical point of index $ >1 $, or (iii) they are located in the neighborhood of a degenerate critical point (a so-called monkey saddle). Our method, which uses solely the samples, does not distinguish between these situations.
  • The saddles detected provide pessimistic estimates of barriers' heights. Also, they may not be close to the real saddles (i.e., critical points of index one of the PEL). The intuitive reason for this is that the stable manifold (catchment basin) of an index one saddle has dimension $ d-1 $ – the dimension of the whole configuration space minus one, which leaves ample space.
But they nevertheless indicate transitions between two basins associated with $ q(c_i) $ and $ q(c_j) $.


Identifying the connexion between the basins of two local minima by quenching two samples $ c_i $ and $ c_j $
Sample $ c_i $ is quenched to $ q(c_i) $ while $ c_j $ is quenched to $ q(c_j) $. The samples $ c_i $ and $ c_j $ sit across the (white) ridge separating the basins of their respective

Programmer's Workflow

The programs of Transition_graph_of_energy_landscape_builders described above are based upon generic C++ classes, so that additional versions can easily be developed.

In order to derive other versions, there are two important ingredients, that are the workflow class, and its traits class.

The Traits Class

T_Transition_graph_builder_from_sampled_energy_landscape_traits:

This class defines the main types used in the modules of the workflow. It is templated by the classes of the concepts required by these modules. This design makes it possible to use the same workflow within different(biophysical) contexts to make new programs. To use the workflow T_Transition_graph_builder_from_sampled_energy_landscape_workflow , one needs to define:

  • what is the representation of a conformation (e.g number type of coordinates)
  • what is the distance between two conformations
  • what is the density of a conformation in the sample in a given nearest neighbor graph
Template Parameters
GeometricKernelTraits class defining various geometric objects, in particular the number type used for representing the coordinates of a conformation, and the base representation of a point in dimension D – see class CGAL::Cartesian_d from the CGAL library
DistanceFunctionFunctor returning the distance between two conformations. It has to define the types Point (alias for a Conformation) and FT (alias for the returned number type)
TDensityFunctionFunctor returning the density of a conformation represented by a vertex in a nearest neighbor graph. The template parameter isthe type of the nearest neighbor graph.

The Workflow Class

T_Transition_graph_builder_from_sampled_energy_landscape_workflow: