Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Authors: F. Cazals and T. Dreyfus and D. Mazauric

Energy_landscape_comparison

This package package provides method to perform various comparisons of two samples energy landscapes, taking into account the location of local minima, their occupancy probabilities, and possible their connexsions as encoded in a transition graph.

Goals

Comparing two (sampled) energy landscapes is of interest in various settings, e.g., to assess the coherence of two force fields for a given system (atomic, coarse grained), to compare two related systems (e.g. a wild type and mutant protein), or simply to compare simulations launched with different initial conditions (and check whether the same regions in conformational space have been visited).

In comparing two landscapes, two categories of criteria are of interest, namely

  • features of the basins, in particular the local minima and their associated occupancy probabilities, called masses for the sake of genericity in the sequel.

  • transitions between these basins.

This package provides methods to compare (sampled) energy landscapes, in two guises:

  • Earth Mover Distance: a comparison method exploiting solely the location of local minima, and the masses of the basins.

  • EMD with connectivity constraints: a comparison method also taking into account the connectivity of the basins.

These functionalities are available in the programs $\sblELCL$ and $\sblELCE$.

Prerequisites

Landscapes modeled as transition graphs

Landscapes and vertex weighted TG. In the sequel, we assume that the energy landscape (EL) is coded in a compressed transition graphs, as defined in section Energy landscapes . We also assume that each vertex is endowed with a mass, typically the volume or the occupancy probability of its catchment basin. The reader is also referred to the package dedicated to the construction of transition graph, see Transition_graph_of_energy_landscape_builders .

In the following we provide two comparison methods: the first one deals with features of the basins only, while the second one additionally exploits the information on transitions.

Basins and their masses.Consider the basin $B$ of a local minimum. Abusing notations, the footprint of the basin in the conformational space $\calC$ is also denoted $B$. Denoting $k_B$ is the Boltzmann constant and $Z$ the partition function.

  • For small systems, the mass of the system may be obtained by integrating the Boltzmann factor over the basin region, namely:

    $ w(B) = \frac{1}{Z} \int_B \exp{ \frac{-V(c)}{k_B T}} dc. $

  • If minima are found by optimization (e.g., basin hopping), the $w(B)$ can be estimated using the eigenvalues of the Hessian (curvature) matrix evaluated at those points [141] . Such a calculation amounts to focusing on the vibrational entropy of the system.

  • If, on the other hand, the samples are obtained from a thermodynamic ensemble, as is typically the case for molecular dynamics and Monte Carlo procedures, Boltzmann weighting is automatically satisfied; the basin weight can correspondingly be estimated from the number of points $n_B$ falling in the basin region:

    $ w(B) = \frac{n_B}{n}. $

Earth mover distance

Consider two landscapes, for which masses of the basins have been computed.

For the sake of exposure, we call these two landscapes the source landscape $\PELsource$ and the demand landscape $\PELdemand$. We also denote $n_s$ and $n_d$, respectively, their numbers of basins.

The local minima and the associated basins of the source landscape are denoted $s_i$ and $\basinof{s_i}$, respectively. For the demand landscape, local minima and basins are denoted $d_j$ and $\basinof{d_j}$.

We also assume that the weights of the basins have been computed. Denoting these $\sbasinw$ and $\dbasinw$ for a source and demand basin respectively, we define the sum of weights $W_s = \sum_i \sbasinw$ and $W_d = \sum_j \dbasinw$. Finally, so that the distance between the two aforementioned local minima is $\dCalC(\PELsourcemin{i}, \PELdemandmin{j})$.

To compare the two landscapes, we use the earth mover distance [125] , which is a particular case of mass transportation [138] . Intuitively, the technique fills basins in the target (aka demand) landscape using mass from basins of the source landscape. A basin from $\PELsource$ can be split into several parts, and equivalently, a basin from $\PELdemand$ can be filled from several basins from $\PELsource$ (Fig. fig-comparison-of-basins-energy-landscapes). Denote $\flowij$ the mass from $\basinof{s_i}\in \PELsource$ moved into $\basinof{d_j}\in \PELdemand$. The cost of moving $\flowij$ units of mass depends linearly on the distances between the minima $\PELsourcemin{i}$ associated with $\basinof{s_i}$ and $\PELdemandmin{j}$ associated with $\basinof{d_j}$. A transport plan from $\PELsource$ to $\PELdemand$ is defined by triples $(\PELsourcemin{i}, \PELdemandmin{j}, \flowij)$, with $\flowij>0$. Note that there are at most $n_s\times n_d$ such triples.

Finding the optimal i.e. least cost transport plan amounts to solving the following linear program (LP):

$ LP \begin{cases} \mbox{Min} \sum_{i=1,\dots,n_s, j=1,\dots,n_d} \flowij \times \dCalC(\PELsourcemin{i}, \PELdemandmin{j})\\ \sum_{i =1,\dots,n_s} \flowij = \dbasinw & \forall j \in 1,\dots,n_d, \\ \sum_{j = 1,\dots,n_d} \flowij \leq \sbasinw & \forall i \in 1,\dots,n_s, \\ \flowij \geq 0 & \forall i \in 1,\dots,n_s, \forall j \in 1,\dots,n_d. \end{cases} $

The first equation is the linear functional to be minimized, while the remaining ones define linear constraints. In particular, the second one expresses the fact that every basin from $\PELdemand$ need to be filled, while the third one indicates that a basin from $\PELsource$ cannot provide more than it contains. (To simplify matters, we have assumed that $W_s \geq W_d$. Handling the case $W_s < W_d$ poses no difficulty, and the reader is referred to [125] .)

Based on this linear program, we introduce the total number of edges, the total flow, the total cost, and their ratio, known as the earth mover distance [125] :

$ \begin{cases} \numedgesPemd = \sum_{i,j \mid \flowij > 0} 1,\\ \flowPemd = \sum_{i,j} \flowij,\\ \costPemd = \sum_{i,j} \flowij \dCalC(\PELsourcemin{i}, \PELdemandmin{j}),\\ \distPemd = \costPemd / \flowPemd. \end{cases} $
If the sum of basins from both landscapes match i.e. are identical, then the LP is symmetric. If not, one should consider solving the two LP.


Comparing two energy landscapes $\PELsource$ and $\PELdemand$
The landscape $\PELsource$ is partitioned into the basins associated to its local minima (three of them on this example), and likewise for $\PELdemand$ (four local minima). Comparing the landscapes is phrased as a mass transportation problem on the bipartite graph defined by the two sets of minima. Note that sets of connected basins from the top landscape are mapped to connected basins of the bottom landscape.

Earth mover distance with connectivity constraints

The previous comparison ignores transitions between local minima. To take these connections into account, we modify the method by imposing connectivity constraints to transport plans. To see whether a transport plan is valid, pick any connected subgraph $S$ from $\PELsource$ – that is $S$ connects selected local minima in $\PELsource$. Let $D$ be the set of vertices from $\PELdemand$ such that for each vertex $d_j$ in $D$, there is at least one edge emanating from a vertex $s_i$ of the subgraph $S$ with $\flowij>0$. The transport plan is called valid iff the subgraph $D$ is also connected. We summarize this discussion with the following definition–see also (Fig. fig-emd-LP-not-edge-connectivity-violated}:

A transport plan is said to respect connectivity constraints iff any connected set of basins from $\PELsource$ induces, through edges carrying strictly positive flow, a connected set of basins from $\PELdemand$.


An important remark is the following: a transport plan respecting connectivity constraints may not fully satisfy the demand. (It can actually be shown that there exist instance such that no transport plan fully satisfies the demand.)

Since EMD-CC does not necessarily admit a solution fully satisfying the demand, we define the problem Earth Mover Distance with Cost and Connectivity Constraints :

The problem EMD-CCC (maximum-flow under cost and connectivity constraints problem) aims at computing the largest volume of flow that can be supported respecting the connectivity constraints, and such that the total cost is less than a given bound C.


Note that the previous definition calls for two algorithms:

  • algorithm $\text{\algoemdcccg}$: computes a transport plan given a upper bound $C$ on the transport cost.

  • algorithm $\text{\algoemdcccgi}$: computes transport plans whose cost lies in a given interval.

The latter algorithm actually has 2 recursion modes. To see which, assume that an interval $[0,C_{max}]$ of possible costs is given. The two modes are:

Recursion mode: refined. Given two costs $C_{inf}$ and $C_{sup}$, $\text{\algoemdcccg}$ is called three times for the different costs $C_{inf}$, $C=(C_{inf}+C_{sup})/2$, and $C_{sup}$. Then:

  • a) If the two total volumes of flow with cost $C_{inf}$ and $C$ are different, then $\text{\algoemdcccgi}$ is called with the interval $[C_{inf},C]$.

  • b) If the two total volumes of flow with cost $C$ and $C_{sup}$ are different, then $\text{\algoemdcccgi}$ is called with the interval $[C,C_{sup}]$.

Recursion mode: coarse. Similar to refined mode, except that only option b) is considered.

The solution of the linear program may not satisfy connectivity constraints
A transport plan between a source and a demand graph each consisting of a linear chain of four vertices. The vertices of the edge ${s_1,s_2}$ of the source graph export towards the vertices $d_1$ and $d_3$ of the demand graph. The subgraph of the demand graph induced by these vertices is not connected – there is no edge linking $d_1$ to $d_3$.

Using sbl-energy-landscape-comparison-euclid.exe and sbl-energy-landscape-comparison-lrmsd.exe

In the following, we specify the two programs sbl-energy-landscape-comparison-euclid.exe and sbl-energy-landscape-comparison-lrmsd.exe, which differ by the type of metric used to compare distances between the points associated to vertices defining the graphs.

Main specifications

Main options.

The main options of the program sbl-energy-landscape-comparison-euclid.exe are:
–transition-graph string: transition graph XML archives (used twice for target and source transition graphs)
–with-connectivity-constraints string: run Earth Mover Distance with connectivity constraints
–symmetric-mode string(= when using connectivity constraints): moving also from target to source


Input.

  • Possibly, the options displayed above.

Main output.

  • (txt format, suffix log.txt) Main log file
  • (xml format, suffix emd_engine.xml) Transport plan in xml format, from first landscape (say A) to the second landscape (say B)
  • (xml format, suffix engine_symmetric.xml) As above, but for the transport plan from B to A
  • (dot format, suffix transportation_plan.dot) Transportation plan in dot format, to be rendered with graphviz (dot, circo)
  • (dot format, suffix transportation_plan_symmetric.dot) As above, but for the transport plan from B to A

Optional output. Comments.

Algorithms and Methods

Earth mover distance

Once the weights of basins have been computed, solving the linear program of Eq. (eq-emd-lp) has polynomial complexity [84] . Practically, various solvers can be used, e.g. the one from the Computational Geometry Algorithms Library [37] , lp_solve, the CPLEX solver from IBM, etc. In the following, the algorithm solving the linear program of Eq. (eq-emd-lp) is called $\text{\algoEMDLP}$ .

Earth mover distance with connectivity constraints

Finding transport plans respecting connectivity constraints turns out to be a hard combinatorial problem [28] . The problem is not in APX, which means that if $\Pcc \neq \NPcc$ holds, then, no polynomial algorithm with constant approximation factor exist.

Following definition def-tp-cc, we provide two algorithms:

  • $\text{\algoemdcccg}$: computes a transport plan respecting connectivity constraints, given a bound C.

  • $\text{\algoemdcccgi}$: computes different transport plans for different bounds not given by the user. More precisely, since the maximum cost is upper-bounded by $C_{max}$, $\text{\algoemdcccgi}$ computes different flow solutions by calling $\text{\algoemdcccg}$ for different costs in the range [0,C-max] in a loop. There are two different modes for determining such set of costs.
However, a greedy algorithm providing admissible solutions respecting connectivity constraints, denoted $\text{\algoemdcccg}$ for earth mover distance with cost and connectivity constraints}, has been reported in [28] .


Using the previous algorithm(s), in a manner analogous to Eq. (eq-emd-lp-dist), we define the total number of edges, the total flow, the total cost, and their ratio:

$ \begin{cases} \numedgesAemdccc = \sum_{i,j \mid \flowij > 0} 1,\\ \flowAemdccc = \sum_{i,j} \flowij,\\ \costAemdccc = \sum_{i,j} \flowij \dCalC(\PELsourcemin{i}, \PELdemandmin{j}),\\ \distAemdccc = \costAemdccc / \flowAemdccc. \end{cases} $
It can be shown that the Earth Mover Distance as defined by Eq. (eq-emd-lp}) yields a metric, provided that $\dCalC$ is itself a metric, and that the sum of weights for the source and the demand graphs are equal [125] . On the other hand, the EMD with connectivity constraints fails to satisfy the triangle inequality [28] . The EMD with connectivity constraints is not symmetric either, but is easily made so in taking a symmetric function of the one sided quantities. To assess this lack of symmetry, we introduce the following ratios, respectively geared towards the flow and the cost:
$ \begin{cases} \ratiosymFlow = \frac{ \min(\flowAemdccc(A, B), \flowAemdccc(B, A)) } { \max(\flowAemdccc(A, B), \flowAemdccc(B, A))} ,\\ \\ \ratiosymCost = \frac{ \min(\costAemdccc(A, B), \costAemdccc(B, A)) } {\max(\costAemdccc(A, B), \costAemdccc(B, A))}. \end{cases} $


Programmer's Workflow

The programs of Energy_landscape_comparison described above are based upon generic C++ classes, so that additional versions can easily be developed.

In order to derive other versions, there are two important ingredients, that are the workflow class, and its traits class.

The Traits Class

T_Energy_landscape_comparison_traits:

This class defines the main types used in the modules of the workflow. It is templated by the classes of the concepts required by these modules. This design makes it possible to use the same workflow within different(biophysical) contexts to make new programs. To use the workflow T_Energy_landscape_comparison_workflow , one needs to define:

  • what is the representation of a conformation (e.g number type of coordinates).
Template Parameters
GeometricKernelTraits class defining various geometric objects, in particular the number type used for representing the coordinates of a conformation, and the base representation of a point in dimension D – see class CGAL::Cartesian_d from the CGAL library.
ELSampleRepresentation of a conformation enriched by a height that is typically its energy.
DistanceFunctor returning the distance between two conformations. It has to define the types Point representing the conformation, and the type FT representing the returned value type.

The Workflow Class

T_Energy_landscape_comparison_interface_workflow:

Jupyter demo

See the following jupyter notebook:

  • Jupyter notebook file
  • Energy_landscape_comparison

    Energy_landscape_comparison

    Useful functions

    In [1]:
    from SBL import SBL_pytools
    from SBL_pytools import SBL_pytools as sblpyt
    help(sblpyt)
    
    Help on class SBL_pytools in module SBL_pytools:
    
    class SBL_pytools(builtins.object)
     |  Static methods defined here:
     |  
     |  convert_eps_to_png(ifname, osize)
     |  
     |  convert_pdf_to_png(ifname, osize)
     |  
     |  find_and_convert(suffix, odir, osize)
     |      # find file with suffix, convert, and return image file
     |  
     |  find_and_show_images(suffix, odir, osize)
     |  
     |  find_file_in_output_directory(suffix, odir)
     |  
     |  show_eps_file(eps_file, osize)
     |  
     |  show_image(img)
     |  
     |  show_log_file(odir)
     |  
     |  show_pdf_file(pdf_file)
     |  
     |  show_row_n_images(images, size)
     |  
     |  show_text_file(file_suffix, odir)
     |  
     |  show_txt_file(file_suffix, odir)
     |  
     |  ----------------------------------------------------------------------
     |  Data descriptors defined here:
     |  
     |  __dict__
     |      dictionary for instance variables (if defined)
     |  
     |  __weakref__
     |      list of weak references to the object (if defined)
    
    

    Functions

    Options used to compare two landscapes

    The options of the compare method in the next cell are:

    • metric: euclid or lrmsd
    • transGraph1: first transition graph XML archive
    • transGraph2: second transition graph XML archive
    • symmetric: moving also from target to source
    • connectivityConstraints: run Earth Mover Distance with connectivity constraints
    In [2]:
    import re  #regular expressions
    import sys #misc system
    import os
    import pdb
    import shutil # python 3 only
    import matplotlib.pyplot as plt # python 3 only
    from SBL import PALSE
    
    def buildTransitionGraphFromDB(metric, minimaPoints, minimaEnergies, transitionPoints, transitionEnergies, samples2Mins, odir = "tmp-results-tg-db", weights = None, discardLoop = True):
    
        if os.path.exists(odir):
            os.system("rm -rf %s" % odir)
        os.system( ("mkdir %s" % odir) )
        
        # check executable exists and is visible
        exe = shutil.which("sbl-tg-builder-%s.exe" % metric)
        if not exe: 
            print("Executable not found")
            return 
        
        cmd = "sbl-tg-builder-%s.exe --from-DB --samples-to-mins %s \
                --points-file %s --energies %s --points-file %s --energies %s \
                --directory %s --verbose --log" \
                % (metric, samples2Mins, minimaPoints, minimaEnergies, transitionPoints, transitionEnergies, odir)
        if discardLoop:
            cmd += " --discard-loops"
        if weights:
            cmd += " --weights %s" % weights
        print(("Executing %s\n" % cmd))        
        os.system(cmd)
        if os.path.isfile("%s/sbl-tg-builder-%s__tg.xml" % (odir, metric)):
            return "%s/sbl-tg-builder-%s__tg.xml" % (odir, metric)
        else:
            print("Something went wrong during TG build and there is no output TG")
            return None
            
                
    def buildTransitionGraphFromMinima(metric, confPoints, confEnergies, quenchedPoints, quenchedEnergies, samples2Mins, odir = "tmp-results-tg-minima"):
    
        if os.path.exists(odir):
            os.system("rm -rf %s" % odir)
        os.system( ("mkdir %s" % odir) )
        
        # check executable exists and is visible
        exe = shutil.which("sbl-tg-builder-%s.exe" % metric)
        if not exe: 
            print("Executable not found")
            return 
        
        cmd = "sbl-tg-builder-%s.exe --from-sampling \
                  --points-file %s --energies %s --points-file %s --energies %s --samples-to-mins %s \
                  --directory %s" \
                  % (metric, confPoints, confEnergies, quenchedPoints, quenchedEnergies, samples2Mins, odir)
        print(("Executing %s\n" % cmd))
        os.system(cmd)
        if os.path.isfile("%s/sbl-tg-builder-%s__weighted_transition_graph.xml" % (odir, metric)):
            return "%s/sbl-tg-builder-%s__weighted_transition_graph.xml" % (odir, metric)
        else:
            print("Something went wrong during TG build and there is no output TG")
            return None
        
    # A function to run the calculation
    def compare(metric, transGraph1, transGraph2, symmetric = True, connectivityConstraints = True):
    
        odir = "tmp-results-%s" % metric
        if os.path.exists(odir):
            os.system("rm -rf %s" % odir)
        os.system( ("mkdir %s" % odir) )
        
        # check executable exists and is visible
        exe = shutil.which("sbl-energy-landscape-comparison-%s.exe" % metric)
        if not exe:
            print("Executable not found")
            return
        
       
        print(("Using executable %s\n" % exe))
        cmd = "sbl-energy-landscape-comparison-%s.exe --transition-graph %s --transition-graph %s \
                  --directory %s  --log " \
                  % (metric, transGraph1, transGraph2, odir)
        if symmetric:
            cmd += " --symmetric-mode "
        if connectivityConstraints:
            cmd += " --with-connectivity-constraints "
        os.system(cmd)
            
        cmd = "ls %s" % odir
        ofnames = os.popen(cmd).readlines()
        print( ("All output files in %s:" % odir),ofnames)
            
        #find the log file and display log file
        #sblpyt.show_log_file(odir)
        return odir
            
    # sbl-energy-landscape-comparison-lrmsd__emd_engine_symmetric.xml  sbl-energy-landscape-comparison-lrmsd__emd_engine.xml
    # if_suffix: emd_engine.xml emd_engine.xml emd_engine_symmetric.xml
    def scatter_plot_cost_flow(odir, if_suffix):
        database = PALSE.PALSE_xml_DB()
        database.load_from_directory(odir, (".*%s" % if_suffix))
        costs = database.get_all_data_values_from_database("transportation_plan/edge-to-cost/item/second", float)[0]
        flows = database.get_all_data_values_from_database("transportation_plan/edge-to-flow/item/second", float)[0]
        
        print("Total cost:", sum(costs))
        print("Total flow:", sum(flows))
        print("\nNormalized cost:", sum(costs)/sum(flows))
        
        plt.scatter(costs, flows)
        plt.xlabel("Cost")
        plt.ylabel("Flow")
        plt.show()
            
            
    
    def plot_transportation_plans(metric, odir):
        tp_AB = sblpyt.find_file_in_output_directory(odir, "")
        
        of_AB = "sbl-energy-landscape-comparison-%s__transportation_plan.dot" % metric
        of_BA = "sbl-energy-landscape-comparison-%s__transportation_plan_symmetric.dot" % metric
        
        cmd_AB = "dot %s/%s -Tpdf -o tp-AB.pdf" % (odir, of_AB); os.system(cmd_AB)
        cmd_BA = "dot %s/%s -Tpdf -o tp-BA.pdf" % (odir, of_BA); os.system(cmd_BA)
        
        sblpyt.convert_pdf_to_png("tp-AB.pdf", 100)
        sblpyt.convert_pdf_to_png("tp-BA.pdf", 100)
        
        images = ["tp-AB.png", "tp-BA.png"]
        print("Images", images)
        sblpyt.show_row_n_images(images, 100)
        
           
    

    Example 1: himmelblau

    First: comparison with the earth mover distance

    • Run the calculation
    • Plot the flows dthat circulate on selected priviledged edges
    • Plot the transporation plan.
    In [3]:
    odir = compare("euclid", "data/himmelbleau_tg.xml","data/himmelbleau_transition_graph_noisy.xml", \
                connectivityConstraints = False)  
    scatter_plot_cost_flow(odir, "emd_engine.xml")
    scatter_plot_cost_flow(odir, "emd_engine_symmetric.xml")
    plot_transportation_plans("euclid", "tmp-results-euclid")
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-energy-landscape-comparison-euclid.exe
    
    All output files in tmp-results-euclid: ['sbl-energy-landscape-comparison-euclid__emd_engine_symmetric.xml\n', 'sbl-energy-landscape-comparison-euclid__emd_engine.xml\n', 'sbl-energy-landscape-comparison-euclid__log.txt\n', 'sbl-energy-landscape-comparison-euclid__transportation_plan.dot\n', 'sbl-energy-landscape-comparison-euclid__transportation_plan_symmetric.dot\n']
    XML: 1 / 1 files were loaded
    
    Total cost: 9.077130258083344
    Total flow: 0.9897999788518064
    
    Normalized cost: 9.170671299279123
    
    XML: 1 / 1 files were loaded
    
    Total cost: 5.507286012172699
    Total flow: 0.9898003360722214
    
    Normalized cost: 5.56403732294839
    
    Images ['tp-AB.png', 'tp-BA.png']
    Figs displayed
    

    Second: comparison imposing connectivity constraints

    In [4]:
    compare("euclid", "data/himmelbleau_tg.xml","data/himmelbleau_transition_graph_noisy.xml",\
                connectivityConstraints = True)
    scatter_plot_cost_flow("tmp-results-euclid", "emd_engine.xml")
    scatter_plot_cost_flow("tmp-results-euclid", "emd_engine_symmetric.xml")
     
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-energy-landscape-comparison-euclid.exe
    
    All output files in tmp-results-euclid: ['sbl-energy-landscape-comparison-euclid__emd_engine_symmetric.xml\n', 'sbl-energy-landscape-comparison-euclid__emd_engine.xml\n', 'sbl-energy-landscape-comparison-euclid__log.txt\n', 'sbl-energy-landscape-comparison-euclid__transportation_plan.dot\n', 'sbl-energy-landscape-comparison-euclid__transportation_plan_symmetric.dot\n']
    XML: 1 / 1 files were loaded
    
    Total cost: 9.07713026852846
    Total flow: 0.9898
    
    Normalized cost: 9.17067111389014
    
    XML: 1 / 1 files were loaded
    
    Total cost: 9.07713026852846
    Total flow: 0.9897999999999999
    
    Normalized cost: 9.17067111389014
    

    Example 2: BLN69

    First: comparison with the Earth mover distance

    We compare both landscapes, in both directions. Given that the sum of weights are identical, the transport plan retrieved is symmetric.

    In [5]:
    print("Marker : Calculation Started")
    tg1 = buildTransitionGraphFromDB("lrmsd", "data/bln69_database_minima_conformations_1.txt",\
                                    "data/bln69_database_minima_energies_1.txt",\
                                    "data/bln69_database_transitions_conformations_1.txt",\
                                    "data/bln69_database_transitions_energies_1.txt",\
                                    "data/bln69_database_transitions_1.txt",\
                                     "tmp-results-tg-1",\
                                    "data/bln69_database_minima_weights_1.txt")
    """tg1 = buildTransitionGraphFromMinima("lrmsd", "data/bln69_sampling_samples_conformations.txt",\
                  "data/bln69_sampling_samples_energies.txt",\
                  "data/bln69_sampling_quenched_conformations.txt",\
                  "data/bln69_sampling_quenched_energies.txt",\
                  "data/bln69_sampling_samples_to_quench.txt",\
                  "tmp-results-tg-1") 
    """
    tg2 = buildTransitionGraphFromDB("lrmsd", "data/bln69_database_minima_conformations_2.txt",\
                                    "data/bln69_database_minima_energies_2.txt",\
                                    "data/bln69_database_transitions_conformations_2.txt",\
                                    "data/bln69_database_transitions_energies_2.txt",\
                                    "data/bln69_database_transitions_2.txt",\
                                     "tmp-results-tg-2",\
                                    "data/bln69_database_minima_weights_2.txt")
    
    Marker : Calculation Started
    Executing sbl-tg-builder-lrmsd.exe --from-DB --samples-to-mins data/bln69_database_transitions_1.txt             --points-file data/bln69_database_minima_conformations_1.txt --energies data/bln69_database_minima_energies_1.txt --points-file data/bln69_database_transitions_conformations_1.txt --energies data/bln69_database_transitions_energies_1.txt             --directory tmp-results-tg-1 --verbose --log --discard-loops --weights data/bln69_database_minima_weights_1.txt
    
    Executing sbl-tg-builder-lrmsd.exe --from-DB --samples-to-mins data/bln69_database_transitions_2.txt             --points-file data/bln69_database_minima_conformations_2.txt --energies data/bln69_database_minima_energies_2.txt --points-file data/bln69_database_transitions_conformations_2.txt --energies data/bln69_database_transitions_energies_2.txt             --directory tmp-results-tg-2 --verbose --log --discard-loops --weights data/bln69_database_minima_weights_2.txt
    
    
    In [6]:
    compare("lrmsd", tg1, tg2, connectivityConstraints = False)
    scatter_plot_cost_flow("tmp-results-lrmsd", "emd_engine.xml")
    scatter_plot_cost_flow("tmp-results-lrmsd", "emd_engine_symmetric.xml")
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-energy-landscape-comparison-lrmsd.exe
    
    All output files in tmp-results-lrmsd: ['sbl-energy-landscape-comparison-lrmsd__emd_engine_symmetric.xml\n', 'sbl-energy-landscape-comparison-lrmsd__emd_engine.xml\n', 'sbl-energy-landscape-comparison-lrmsd__log.txt\n', 'sbl-energy-landscape-comparison-lrmsd__transportation_plan.dot\n', 'sbl-energy-landscape-comparison-lrmsd__transportation_plan_symmetric.dot\n']
    XML: 1 / 1 files were loaded
    
    Total cost: 82.05229536443949
    Total flow: 0.9999998508942526
    
    Normalized cost: 82.05230759891013
    
    XML: 1 / 1 files were loaded
    
    Total cost: 83.6860646083951
    Total flow: 0.9999998414315452
    
    Normalized cost: 83.68607787836716
    

    Second: comparison with connectivity constraints

    Using connectivity constraints, the transportation plan is not symmetric anymore. It should also be noticed that connectivity constraints prevent the full satisfaction -- the total flow obtained is strictly less than the maximum total flow.

    In [7]:
    tg1 = buildTransitionGraphFromDB("lrmsd", "data/bln69_database_minima_conformations_1.txt",\
                                    "data/bln69_database_minima_energies_1.txt",\
                                    "data/bln69_database_transitions_conformations_1.txt",\
                                    "data/bln69_database_transitions_energies_1.txt",\
                                    "data/bln69_database_transitions_1.txt",\
                                     "tmp-results-tg-1",\
                                    "data/bln69_database_minima_weights_1.txt")
    """tg1 = buildTransitionGraphFromMinima("lrmsd", "data/bln69_sampling_samples_conformations.txt",\
                  "data/bln69_sampling_samples_energies.txt",\
                  "data/bln69_sampling_quenched_conformations.txt",\
                  "data/bln69_sampling_quenched_energies.txt",\
                  "data/bln69_sampling_samples_to_quench.txt",\
                  "tmp-results-tg-1"\) 
    """
    tg2 = buildTransitionGraphFromDB("lrmsd", "data/bln69_database_minima_conformations_2.txt",\
                                    "data/bln69_database_minima_energies_2.txt",\
                                    "data/bln69_database_transitions_conformations_2.txt",\
                                    "data/bln69_database_transitions_energies_2.txt",\
                                    "data/bln69_database_transitions_2.txt",\
                                     "tmp-results-tg-2",\
                                    "data/bln69_database_minima_weights_2.txt")
    
    Executing sbl-tg-builder-lrmsd.exe --from-DB --samples-to-mins data/bln69_database_transitions_1.txt             --points-file data/bln69_database_minima_conformations_1.txt --energies data/bln69_database_minima_energies_1.txt --points-file data/bln69_database_transitions_conformations_1.txt --energies data/bln69_database_transitions_energies_1.txt             --directory tmp-results-tg-1 --verbose --log --discard-loops --weights data/bln69_database_minima_weights_1.txt
    
    Executing sbl-tg-builder-lrmsd.exe --from-DB --samples-to-mins data/bln69_database_transitions_2.txt             --points-file data/bln69_database_minima_conformations_2.txt --energies data/bln69_database_minima_energies_2.txt --points-file data/bln69_database_transitions_conformations_2.txt --energies data/bln69_database_transitions_energies_2.txt             --directory tmp-results-tg-2 --verbose --log --discard-loops --weights data/bln69_database_minima_weights_2.txt
    
    
    In [8]:
    compare("lrmsd", tg1, tg2) 
    scatter_plot_cost_flow("tmp-results-lrmsd", "emd_engine.xml")
    scatter_plot_cost_flow("tmp-results-lrmsd", "emd_engine_symmetric.xml")
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-energy-landscape-comparison-lrmsd.exe
    
    All output files in tmp-results-lrmsd: ['sbl-energy-landscape-comparison-lrmsd__emd_engine_symmetric.xml\n', 'sbl-energy-landscape-comparison-lrmsd__emd_engine.xml\n', 'sbl-energy-landscape-comparison-lrmsd__log.txt\n', 'sbl-energy-landscape-comparison-lrmsd__transportation_plan.dot\n', 'sbl-energy-landscape-comparison-lrmsd__transportation_plan_symmetric.dot\n']
    XML: 1 / 1 files were loaded
    
    Total cost: 50.1654268596212
    Total flow: 0.7316503125753201
    
    Normalized cost: 68.56475832429429
    
    XML: 1 / 1 files were loaded
    
    Total cost: 48.52413363243705
    Total flow: 0.6995052012572126
    
    Normalized cost: 69.36922491101593