Structural Bioinformatics Library Template C++ / Python API for developping structural bioinformatics applications.
User Manual

# Earth_mover_distance

Authors: F. Cazals and T. Dreyfus and D. Mazauric

# Introduction

Optimal transportation problems have a long standing history in mathematics and computer science, originating with the works of Monge on earth moving ( << la théorie des déblais et des remblais >> ) [128] . Such problems were later rephrased in terms of Riemannian geometry and measure theory [163] , one key concept being the distance between two distributions, namely the minimal amount of work that must be performed to transform one distribution into the other by moving distribution mass around.

Optimal transportation based metrics are now widely used, with applications in image processing [148] , statistics [114] , geometry processing [156] , biophysics [41] ...

In this context, this package provides two algorithms:

• The classical i.e. linear program based Earth Mover Distance (EMD),

• A version of EMD working on graphs rather than isolated sites, and accommodating connectivity constraints on these graphs.

## Earth Mover Distance

We consider two sets of nodes, namely the source nodes , and the demand nodes respectively. We assume that these nodes are embedded in some metric space , and denote a metric in this space. We assume that each source node has a mass of supply (aka weights) denoted , and equivalently, that each demand node has a mass of demand . The sums of masses for all source and demand nodes are denoted and , that is and . We also assume that transporting a unit mass of supply from to demand node has unit cost .

A transport plan is a mathematical program aiming at satisfying the demand of the demand nodes given the supply of the supply nodes, while minimizing the overall transportation cost. More formally, a transport plan from to is defined by triples , with the so-called flow quantities. Note that there are at most such triples.

Finding the optimal i.e. least cost transport plan amounts to solving the following linear program (LP):

The first equation is the linear functional to be minimized. Constraints (1,2) respectively state that a demand node does not receive more than its demand , and that a source node does not export more than it supply. Constraint (4) states that the total flow circulating is the minimum mass of supply or demand.

Based on this linear program, we introduce the total number of edges, the total flow, the total cost, and their ratio, known as the earth mover distance [148] :

## Earth Mover Distance with connectivity constraints

The basic problem.

Consider now the case where the source and demand nodes are the vertices of two graphs and . The solution provided by the EMD is oblivious to the connectivity of these two graphs. In selected applications, one may wish to preserve connectivity based properties. Typically, one would like a vertex or an edge of to export flow to a connected component of . In the sequel, we specify a new problem such that such constraints are respected–which requires using more stringent transport plans.

More formally, to see whether a transport plan is valid, pick any connected subgraph from . Let be the subgraph of induced by the vertices from such that for each such vertex , there is at least one edge emanating from a vertex of the subgraph with . The transport plan is called valid iff the subgraph is also connected.

We summarize this discussion with the following definition:

A transport plan is said to respect connectivity constraints iff any connected set of vertices from induces, through edges carrying strictly positive flow, a connected set of vertices from .

The following remarks are in order:

• A transport plan respecting connectivity constraints may not fully satisfy the demand.

• It is actually easy to respect all connectivity constraints, by using an arbitrary number of edges which may carry an arbitrarily small amount of flow, and guarantee that connectivity constraints are respected [35] . This motivates the definition of a new problem denoted .

The problem solved: .

An alternative to the ill-posed problem of minimizing the transport cost while conserving connectivity constraints is the so-called maximum-flow under cost and connectivity constraints problem ( ). In this problem one aims at computing the largest amount of flow that can be supported respecting the connectivity constraints and such that the total cost is less than a given bound . We define the following upper bound for the maximum total cost of any admissible flow as

Consider an upper bound , and an upper bound on the number of edges which carry non null flow. Denote and the set of all connected subgraphs of and , respectively. Problem is defined as follows:

Constraints (1,2,3) are the usual constraints found in EMD. Constraint (4) takes into account the upper bound on the transport cost, while constraint (5) bounds the number of edges used. Finally, constraint (6) encodes the connectivity constraints.

# Algorithms

## Earth Mover Distance

Algorithms.

Once the weights of all vertices, solving the linear program of Eq. (emd-eq-emd-lp) has polynomial complexity [102] . Practically, various solvers can be used, e.g. the one from the Computational Geometry Algorithms Library [45] , lp_solve, the CPLEX solver from IBM, etc.

This package currently proposes two options to solve the LP problems of Eq. emd-eq-emd-lp :

The choice of the solver is done via the option –lp-solver (values: lp_solve, clp) of the programs provided by this package, see below.

In the following, the algorithm solving the linear program of Eq. (emd-eq-emd-lp) is called .

## Earth Mover Distance with connectivity constraints

Algorithms.

Finding transport plans respecting connectivity constraints are hard combinatorial problems. The following results are proved in [35] :

• the decision version of is NP-complete,

• a Polynomial Time Approximation Scheme for exists when transport plans involving a large number of edges are allowed.

Practically, we solve with two algorithms denoted and . These algorithms are implemented within the class SBL::CADS::T_Earth_mover_distance_with_connectivity_constraints , and can be called by specifying a recursion mode:

• Recursion mode norec: algorithm . A greedy strategy returning an admissible solution, given a maximum cost provided by the user.

• Recursion mode refined: algorithm . Given a range of costs, this interval is iteratively halved, and the upper bound of each interval is used to compute a solution with this upper bound as constraint.

• Recursion mode coarse: algorithm . As in the previous option, except that upon splitting the interval into two, only the left interval (that associated with the smallest upper bound) is used to recurse.

For a given solution, in a manner analogous to Eq. (emd-eq-emd-lp-dist), we define the total number of edges, the total flow, the total cost, and their ratio:

It can be shown that the Earth Mover Distance as defined by Eq. (emd-eq-emd-lp) yields a metric, provided that is itself a metric, and that the sum of weights for the source and the demand graphs are equal [148] . On the other hand, earth mover distances with connectivity constraints are not symmetric in general [35] , but can easily be made so by taking a symmetric function of the one sided quantities. To assess this lack of symmetry, we introduce the following ratios, respectively geared towards the flow and the cost:

# Implementation and functionality

Vertices information.

The generic paradigm followed in the SBL implies that any kind of data structure could be used as input of the EMD algorithms. In particular, it could be two ensembles of geometric points, or two graphs where vertices are embedded in a metric space. For this reason, the algorithms are templated by a concept VerticesAccessor defining how to access the data used in the algorithms. (NB: an accessor allows iterating over the vertices and their neighbors, if any, and also allows access to the mass of vertices.) Two models of VerticesAccessor are predefined:

EMD algorithms.

Each algorithm is implemented in a separate class as a functor taking as argument the source and the demand, and returning the output transportation plan:

• SBL::CADS::T_Earth_mover_distance < VerticesAccessor , Distance > is the EMD algorithm described in section Earth Mover Distance . The template parameter VerticesAccessor is described just above, and Distance is a distance functor between the objects represented by the vertices.

Connectivity checker.

In order to compare both algorithms described in sections Earth Mover Distance and Earth Mover Distance with connectivity constraints , the class SBL::CADS::T_Earth_mover_distance_connectivity_constraints_checker< EarthMoverDistance > computes the number of edges in a transportation plan that do not respect the connectivity of the demand. The template parameter EarthMoverDistance is one of the EMD algorithms class. Note that for a transportation plan computed from the class SBL::CADS::T_Earth_mover_distance_with_connectivity_constraints, the connectivity must be respected.

Module.

This package provides a module, see Modules, to be integrated into a workflow : SBL::CADS::T_Earth_mover_distance_module< ModuleTraits > . The template parameter ModuleTraits should define two types:

• EMD_vertices_accessor that has to comply the concept VerticesAccessor
• EMD_distance that has to comply with the concept Distance .

This module allows adding options to select the algorithm used, to run symmetric instances of the problems (source and demand are reversed), etc.

In addition to the log file containing statistics on the cost and the flow, two output files are reported :

• The transportation plan, as a Graphviz file, suffixed by __transportation_plan.dot : this file can be processed by Graphviz to produce a visual representation of the transportation plan as a bipartite graph: the vertices shown are the source and demand nodes; for each edge, the flow and cost are represented.

• A serialized XML archive suffixed by __emd_engine.xml : the archive contains in particular the serialization of the transportation plan, to be used with the package PALSE for further analysis. In particular, for each edge of the transportation plan, the flow and associated cost are provided.

# Examples

## Earth Mover Distance

The following example computes the transportation plan of two sets of three weighted 2D points with no connectivity, using the algorithm described in section Earth Mover Distance , and then dumps statistics about the transportation plan.

#include <CGAL/Cartesian.h>
typedef CGAL::Cartesian<double>::Point_2 Point_2;
<Point_2, double> EMD_vertices_accessor;
class Distance
{
public:
typedef double FT;
inline FT operator()(const Point_2& u, const Point_2& v)const
{
return CGAL::sqrt(CGAL::squared_distance(u, v));
}
};
int main(int argc, char *argv[])
{
std::vector<EMD_vertices_accessor::Weighted_vertex> source;
source.push_back(EMD_vertices_accessor::Weighted_vertex(Point_2(0, 0), 4));
source.push_back(EMD_vertices_accessor::Weighted_vertex(Point_2(1, 0), 2));
source.push_back(EMD_vertices_accessor::Weighted_vertex(Point_2(2, 0), 4));
std::vector<EMD_vertices_accessor::Weighted_vertex> demand;
demand.push_back(EMD_vertices_accessor::Weighted_vertex(Point_2(0, 0), 2.5));
demand.push_back(EMD_vertices_accessor::Weighted_vertex(Point_2(0, 1), 5));
demand.push_back(EMD_vertices_accessor::Weighted_vertex(Point_2(0, 2), 2.5));
EMD emd;
EMD::Transportation_plan tplan = emd(source, demand);
std::cout << std::endl << tplan << std::endl << std::endl;
std::cout << "Source flow: " << tplan.get_final_flow() << " / " << tplan.get_total_source() << " = " << tplan.get_percentage_of_used_flow() << std::endl;
std::cout << "Remaining demand: " << tplan.get_remaining_demand() << " / " << tplan.get_total_demand() << " = " << tplan.get_percentage_of_remaining_demand() << std::endl;
std::cout << "Ratio flow/demand: " << tplan.get_demand_flow_ratio() << std::endl;
std::cout << "Transportation cost (normalized): " << tplan.get_transportation_cost() << " / " << tplan.get_final_flow() << " = " << tplan.get_normalized_transportation_cost() << std::endl;
return 0;
}
Transportation plan for the Earth Mover Distance algorithms.
Definition: Earth_mover_distance_transportation_plan.hpp:132
FT get_total_source(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:228
FT get_transportation_cost(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:266
FT get_final_flow(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:238
FT get_demand_flow_ratio(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:248
FT get_remaining_demand(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:253
FT get_total_demand(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:233
FT get_normalized_transportation_cost(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:271
FT get_percentage_of_used_flow(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:243
FT get_percentage_of_remaining_demand(void) const
Definition: Earth_mover_distance_transportation_plan.hpp:261
Accessors for weighted vertices stored in a simple vector with no connectivity.
Definition: Earth_mover_distance_vertices_accessor_vector.hpp:65
std::pair< VertexType, MassType > Weighted_vertex
Definition: Earth_mover_distance_vertices_accessor_vector.hpp:70
Earth mover distance algorithm using lp_solve software for solving linear program problems.
Definition: Earth_mover_distance.hpp:159
static void set_verbose_mode(unsigned verbose)
Definition: Earth_mover_distance.hpp:221
FT operator()(const Conformation &p, const Conformation &q)
Definition: Least_RMSD_cartesian.hpp:234

## Earth Mover Distance with connectivity constraints

The following example computes the transportation plan of two sets of three weighted 2D points with no connectivity, using the algorithm described in section Earth Mover Distance with connectivity constraints , and then dumps statistics about the transportation plan.

#include <CGAL/Cartesian.h>
typedef CGAL::Cartesian<double>::Point_2 Point_2;
<Point_2, double> EMD_vertices_accessor;
class Distance
{
public:
typedef double FT;
inline FT operator()(const Point_2& u, const Point_2& v)const
{
return CGAL::sqrt(CGAL::squared_distance(u, v));
}
};
int main(int argc, char *argv[])
{
Graph source;
Graph demand;
std::cout << "Modes: " << EMD::get_algorithm_type() << ", " << EMD::get_edge_selection_strategy() << ", " << EMD::get_recursion_mode() << std::endl;
EMD emd;
EMD::Transportation_plan tplan = emd(source, demand);
std::cout << std::endl << tplan << std::endl << std::endl;
std::cout << "Source flow: " << tplan.get_final_flow() << " / " << tplan.get_total_source() << " = " << tplan.get_percentage_of_used_flow() << std::endl;
std::cout << "Remaining demand: " << tplan.get_remaining_demand() << " / " << tplan.get_total_demand() << " = " << tplan.get_percentage_of_remaining_demand() << std::endl;
std::cout << "Ratio flow/demand: " << tplan.get_demand_flow_ratio() << std::endl;
std::cout << "Transportation cost (normalized): " << tplan.get_transportation_cost() << " / " << tplan.get_final_flow() << " = " << tplan.get_normalized_transportation_cost() << std::endl;
return 0;
}
Accessors for weighted vertices stored in a boost graph.
Definition: Earth_mover_distance_vertices_accessor_graph.hpp:66
std::vector< Weighted_vertex > Vertices_container
Definition: Earth_mover_distance_vertices_accessor_vector.hpp:72
Earth mover distance algorithm with connectivity constraints on the input data.
Definition: Earth_mover_distance_with_connectivity_constraints.hpp:216

## Earth Mover Distance connectivity constraints checker

The following example does the same calculations as in section Earth Mover Distance, but also checks that the connectivity constraints are respected. If not, the number of edges that violate the constraints is dumped.

#include <CGAL/Cartesian.h>
typedef CGAL::Cartesian<double>::Point_2 Point_2;
<Point_2, double> EMD_vertices_accessor;
class Distance
{
public:
typedef double FT;
inline FT operator()(const Point_2& u, const Point_2& v)const
{
return CGAL::sqrt(CGAL::squared_distance(u, v));
}
};
int main(int argc, char *argv[])
{
Graph source;
Graph demand;
EMD emd;
EMD::Transportation_plan tplan = emd(source, demand);
EMD_cc_checker checker;
EMD_cc_checker::result_type res = checker(source, demand, tplan);
tplan.dump(std::cout);
std::cout << "vertices fraction : " << res.first << std::endl;
std::cout << "edges fraction : " << res.second << std::endl;
return 0;
}
Functor measuring the violation of the connectivity constraints of any Earth Mover Distance algorithm...
Definition: Earth_mover_distance_connectivity_constraints_checker.hpp:80
static void set_verbose_mode(unsigned verbose)
Definition: Earth_mover_distance_connectivity_constraints_checker.hpp:167
std::pair< double, double > result_type
Definition: Earth_mover_distance_connectivity_constraints_checker.hpp:96
list res
Definition: dialanine_single_well_plot.py:19

## Earth Mover Distance module

The following example shows a simple instantiation of the EMD module within a workflow.

#include <SBL/Modules/Earth_mover_distance_module.hpp>
#include <SBL/Modules/Module_based_workflow.hpp>
#include <CGAL/Cartesian.h>
typedef CGAL::Cartesian<double>::Point_2 Point_2;
class Traits
{
public:
<Point_2, double> EMD_vertices_accessor;
class EMD_distance
{
public:
typedef double FT;
inline FT operator()(const Point_2& u, const Point_2& v)const
{
return CGAL::sqrt(CGAL::squared_distance(u, v));
}
};//end class EMD_distance
};//end class Traits
typedef Traits::EMD_vertices_accessor::Vertices_container Graph;
class Workflow
{
public:
private:
private:
Workflow_vertex m_EMD;
EMD_module* m_EMD_module;
public:
inline Workflow()
: m_workflow("example_emd_module")
{
this->m_workflow.add_helper("Example of instantiation of the EMD module.");
this->m_EMD = this->m_workflow.register_module<EMD_module>("EMD");
this->m_EMD_module = (EMD_module*)&this->m_workflow.get_module(this->m_EMD);
this->m_workflow.set_start_module(this->m_EMD);
this->m_workflow.set_end_module(this->m_EMD);
}
inline void start(int argc, char *argv[])
{
this->m_workflow.parse_command_line(argc, argv);
this->m_EMD_module->get_source() = new Graph();
*this->m_EMD_module->get_source());
*this->m_EMD_module->get_source());
*this->m_EMD_module->get_source());
this->m_EMD_module->get_demand() = new Graph();
*this->m_EMD_module->get_demand());
*this->m_EMD_module->get_demand());
*this->m_EMD_module->get_demand());
this->m_workflow.start();
}
};//end class Workflow
int main(int argc, char *argv[])
{
Workflow workflow;
workflow.start(argc, argv);
return 0;
}
Representaiont of the workflow of an application enriched with default options.
Definition: Module_based_workflow.hpp:703
Application's module instantiating the eartyh mover distance algorithm.
Definition: Earth_mover_distance_module.hpp:106
Workflow_graph_traits::vertex_descriptor Vertex
Representation of a verttex in the workflow that could be a module, or the Start or End vertices.
Definition: Module_based_workflow.hpp:292
Module_base & get_module(Vertex u)
Definition: Module_based_workflow.hpp:1764
void parse_command_line(int argc, char *argv[])
Parses the command line options and store the values of the options: the check_options method is call...
Definition: Module_based_workflow.hpp:2093
void set_start_module(Vertex u)
Modules connected to the start vertex will be first executed; in case of multiple vertices,...
Definition: Module_based_workflow.hpp:1967
Vertex register_module(const std::string &name="")
Registers a module with an input: note that modules are executed in a FIFO manner.
Definition: Module_based_workflow.hpp:1833
Set the helper of the application.
Definition: Module_based_workflow.hpp:1513
void set_end_module(Vertex u)
The results of all modules connected to the end vertex are reported.
Definition: Module_based_workflow.hpp:1974
void start(void)
Runs the application by pushing into the stack the Start vertex.
Definition: Module_based_workflow.hpp:2220

# Applications

## Programs

Executables in this package are named sbl-emd-OBJ-COST.exe, with

• OJB : weighted point, graph: graph, tg: transition graph.
• COST : the cost function used between points or vertices of the graphs compared.

Objects. The rationale for handling these three types of objects is as follows:

• Weighted points: a weighted point is a pair (point in a metric space, weight). See the example below. Note that several metrics/costs may be used for the same point type.
• Graph : a simple graph specified in text format. That is, assuming vertices are indexed in the range 0..n-1, edges are pairs of integers listed in a text file.
• Transition graph: a particular graph type whose edges correspond to saddle points on (potential) energy landscapes, see the pacakge Transition_graph_traits

Costs. A function assigning a positive value to two points/graph vertices. Recall that is this function is a metric (satisfies the triangle inequality), then EMD is also a metric.

Examples.

Example 1: Consider d-dimensional points in the Euclidean space, for which one uses the Euclidean distance. We provide the three programs, detailed below:

• sbl-emd-wp-euclid.exe, sbl-emd-graph-euclid.exe, sbl-emd-tg-euclid.exe

Example 2: Assume now that the d-dimensional points represent molecular conformations, for which one uses the least Root Mean Square Deviation as a distance. This package provides

Example 3: Assume now that the objects compared are sequences of nucleotides (nucleic acids) or amino-acids (polypeptide chains).

In the context of biophysics the package Energy_landscape_comparison uses the Earth_mover_distance algorithms to compare transition graphs (see Transition_graph_traits) derived from sampled energy landscapes.

The program sbl-emd-graph-euclid.exe can be used to compare the critical points of two elevated surfaces as defined in the package Morse_theory_based_analyzer : the program sbl-Morse-theory-based-analyzer-nng-euclid.exe used with the option –report-MSW-as-txt reports exactly the input of sbl-emd-graph-euclid.exe.

The program sbl-emd-tg-euclid.exe compares two transition graphs, using the Euclidean metric, as defined in the package Energy_landscape_comparison. For example, the application Transition_graph_of_energy_landscape_builders creates such transition graphs from samples of the energy landscape of target molecule.

The program sbl-emd-tg-lrmsd.exe compares two transition graphs using the lRMSD metric, as defined in the package Energy_landscape_comparison.

## Main specifications

Main options – comparing weighted points.

points1: Text file listing the first D-dimensional points weights1: Text file listing the first weights

The main options are:

–points-file string: Input points as D-dimensional points – passed twice
–weights-file string: Input weights – passed twice

Main option for the LP program used by EMD. One needs to specify the linear solver used:

–lp-solver string: Linear solver used; options are: lp_solve or clp

Main options – comparing graphs or transition graphs. The following options, each called twice i.e. once for each graph, are used to passe the input:

–points-file string: Description of points in point d format – one point per line
–weights-file string: Description of positive weights – the ordering as points
–edges-file string: File listing edges of the graph – two integers in 0..n-1

Main options for EMD with connectivity constraints. The main options are:

–constraints string: use EMD-CC
–algorithm string: with EMD-CC –select the algorithm (reg or star)
–edgeSelection string: with EMD-CC – selection scheme (min-cost or first-found)
–recursion string: with EMD-CC – recursion mode (norec / refined / coarse)

# Jupyter demo

See the following jupyter notebook:

• Jupyter notebook file
• Earth_mover_distance

# Earth_mover_distance¶

In [1]:
from SBL import SBL_pytools
from SBL_pytools import SBL_pytools as sblpyt
help(sblpyt)

from SBL import TG_weights_from_normalized_boltzmann
from TG_weights_from_normalized_boltzmann import *

from SBL import TG_builders
from TG_builders import *

from SBL import EMD_comparators
from EMD_comparators import *

Help on class SBL_pytools in module SBL_pytools:

class SBL_pytools(builtins.object)
|  Static methods defined here:
|
|  convert_eps_to_png(ifname, osize)
|
|  convert_pdf_to_png(ifname, osize)
|
|  find_and_convert(suffix, odir, osize)
|      # find file with suffix, convert, and return image file
|
|  find_and_show_images(suffix, odir, osize)
|
|  find_file_in_output_directory(suffix, odir)
|
|  show_eps_file(eps_file, osize)
|
|  show_image(img)
|
|  show_log_file(odir)
|
|  show_pdf_file(pdf_file)
|
|  show_row_n_images(images, size)
|
|  show_text_file(file_suffix, odir)
|
|  show_this_text_file(txt_file)
|
|  show_txt_file(file_suffix, odir)
|
|  ----------------------------------------------------------------------
|  Data descriptors defined here:
|
|  __dict__
|      dictionary for instance variables (if defined)
|
|  __weakref__
|      list of weak references to the object (if defined)



## Example : Comparing weighted points¶

### Options.¶

The main options of the compareSamples method in the next cell are:

• points1: Text file listing the first D-dimensional points
• weights1: Text file listing the first weights
• points2: Text file listing the second D-dimensional points
• weights2: Text file listing the second weights
In [2]:
weights1 = TG_weights_from_normalized_boltzmann.get_weights("../demos/data/bln69_energies_1.txt")
weights2 = TG_weights_from_normalized_boltzmann.get_weights("../demos/data/bln69_energies_2.txt")
exe_sbl = "sbl-emd-wp-lrmsd.exe"

lp_solver = "lp_solve"
odir = "results-samples-bln69-%s" % lp_solver
EMD_comparators.compare_weighted_points(exe_sbl, lp_solver,
"../demos/data/bln69_minima_1.txt", weights1,
"../demos/data/bln69_minima_2.txt", weights2, odir)
EMD_comparators.scatter_plot_unit_cost_flow_DB(odir, "emd_engine.xml")

lp_solver = "clp"
odir = "results-samples-bln69-%s" % lp_solver
EMD_comparators.compare_weighted_points(exe_sbl, lp_solver,
"../demos/data/bln69_minima_1.txt", weights1,
"../demos/data/bln69_minima_2.txt", weights2, odir)
EMD_comparators.scatter_plot_unit_cost_flow_DB(odir, "emd_engine.xml")


Sum of values:0.9999999999999998
Sum of values:0.9999999999999998
Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-wp-lrmsd.exe

Executing /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-wp-lrmsd.exe --points-file ../demos/data/bln69_minima_1.txt --points-file ../demos/data/bln69_minima_2.txt -v -l -d results-samples-bln69-lp_solve --weights-file bln69_energies_1-weights.txt  --weights-file bln69_energies_2-weights.txt --lp-solver lp_solve
All output files: ['sbl-emd-wp-lrmsd__emd_engine.xml\n', 'sbl-emd-wp-lrmsd__log.txt\n', 'sbl-emd-wp-lrmsd__transportation_plan.dot\n']
Showing file results-samples-bln69-lp_solve/sbl-emd-wp-lrmsd__log.txt

++Showing file results-samples-bln69-lp_solve/sbl-emd-wp-lrmsd__log.txt
Running:  /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-wp-lrmsd.exe --points-file ../demos/data/bln69_minima_1.txt --points-file ../demos/data/bln69_minima_2.txt -v -l -d results-samples-bln69-lp_solve --weights-file bln69_energies_1-weights.txt --weights-file bln69_energies_2-weights.txt --lp-solver lp_solve

Statistics:
-- Number of loaded weighted points in ensemble: 100
-- Number of loaded weighted points in ensemble: 100

EMD
Starting module SBL::Modules::T_Earth_mover_distance_module...
cumulative mass source: 0.99999999999999977796
cumulative mass demand: 0.99999999999999977796
STEP: dumping the mps file for the LP i.e. linear_program_solver_input.mps

STEP: solving the LP with lp_solve...
Done with exit code 0

STEP: parsing the result from lp_solve i.e. linear_program_solver_input.res

Statistics:
EMD without connectivity constraint...

Source flow: 1.00000 / 1.00000 = 1.00000
Remaining demand: 0.00000 / 1.00000 = 0.00000
Ratio flow/demand: 1.00000
Transportation cost (normalized): 0.16320 / 1.00000 = 0.16320

Report...

Ending module SBL::Modules::T_Earth_mover_distance_module...

End Run

General Statistics:

Times elapsed for computations (in seconds):
-- EMD: 1.11253099999999993663
Total: 1.11253099999999993663

--Done

XML: 1 / 1 files were loaded

EMD_comparators: summary
Total cost: 0.16319675899339775
Total flow: 0.9999998508942526
Normalized cost: 0.1631967833269761

Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-wp-lrmsd.exe

Executing /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-wp-lrmsd.exe --points-file ../demos/data/bln69_minima_1.txt --points-file ../demos/data/bln69_minima_2.txt -v -l -d results-samples-bln69-clp --weights-file bln69_energies_1-weights.txt  --weights-file bln69_energies_2-weights.txt --lp-solver clp
All output files: ['sbl-emd-wp-lrmsd__emd_engine.xml\n', 'sbl-emd-wp-lrmsd__log.txt\n', 'sbl-emd-wp-lrmsd__transportation_plan.dot\n']
Showing file results-samples-bln69-clp/sbl-emd-wp-lrmsd__log.txt

++Showing file results-samples-bln69-clp/sbl-emd-wp-lrmsd__log.txt
Running:  /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-wp-lrmsd.exe --points-file ../demos/data/bln69_minima_1.txt --points-file ../demos/data/bln69_minima_2.txt -v -l -d results-samples-bln69-clp --weights-file bln69_energies_1-weights.txt --weights-file bln69_energies_2-weights.txt --lp-solver clp

Statistics:
-- Number of loaded weighted points in ensemble: 100
-- Number of loaded weighted points in ensemble: 100

EMD
Starting module SBL::Modules::T_Earth_mover_distance_module...
cumulative mass source: 0.99999999999999977796
cumulative mass demand: 0.99999999999999977796
STEP: dumping the mps file for the LP i.e. linear_program_solver_input.mps

STEP: solving the LP with clp...
Done with exit code 0

STEP: parsing the result from clp i.e. linear_program_solver_input.txt

Statistics:
EMD without connectivity constraint...

Source flow: 0.99886 / 1.00000 = 0.99886
Remaining demand: 0.00114 / 1.00000 = 0.00114
Ratio flow/demand: 0.99886
Transportation cost (normalized): 0.16262 / 0.99886 = 0.16280

Report...

Ending module SBL::Modules::T_Earth_mover_distance_module...

End Run

General Statistics:

Times elapsed for computations (in seconds):
-- EMD: 1.10677999999999987502
Total: 1.10677999999999987502

--Done

XML: 1 / 1 files were loaded

EMD_comparators: summary
Total cost: 0.16261716647313668
Total flow: 0.9988600136857713
Normalized cost: 0.1628027593907608

In [ ]:


In [ ]:



## Example : Comparing graphs¶

### Options.¶

The main options of the compareGraphs method in the next cell are:

• points1: Text file listing the first D-dimensional points
• weights1: Text file listing the weights of the first points
• edges1: Text file listing the edges between the first points
• points2: Text file listing the second D-dimensional points
• weights2: Text file listing the weights of the second points
• edges2: Text file listing the edges between the second points
• constraints: use EMD-CC
• algorithm: with EMD-CC, select the algorithm (reg or star)
• edgeSelection: with EMD-CC, selection scheme (min-cost or first-found)
• recursion: with EMD-CC, recursion mode (norec, refined or coarse)
In [10]:
constraints = [True, False]
edges_sel = ["min-cost"]
recursion = ["norec", "coarse"]

weights1 = TG_weights_from_normalized_boltzmann.get_weights("../demos/data/bln69_energies_1.txt")
weights2 = TG_weights_from_normalized_boltzmann.get_weights("../demos/data/bln69_energies_2.txt")

exe_sbl = "sbl-emd-graph-lrmsd.exe"
for c in constraints:
for e in edges_sel:
for r in recursion:
odir = EMD_comparators.compare_graphs(exe_sbl,
"data/bln69_minima_1.txt", weights1, "data/bln69_edges_1.txt",
"data/bln69_minima_2.txt", weights2, "data/bln69_edges_2.txt",
constraints = c,  edgeSelection = e, recursion = r)
print("Output dir:", odir)
EMD_comparators.scatter_plot_unit_cost_flow_DB(odir, "emd_engine.xml")


Sum of values:0.9999999999999998
Sum of values:0.9999999999999998
Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe

Executing /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-True-reg-min-cost-norec --vertex-weights-file bln69_energies_1-weights.txt  --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt  --edges-file data/bln69_edges_2.txt --with-connectivity-constraints --algorithm "reg" --edge-selection "min-cost" --recursion-mode "norec"
Showing file results-graph-True-reg-min-cost-norec/sbl-graph-emd-lrmsd__log.txt

++Showing file results-graph-True-reg-min-cost-norec/sbl-graph-emd-lrmsd__log.txt
Running:  /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-True-reg-min-cost-norec --vertex-weights-file bln69_energies_1-weights.txt --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt --edges-file data/bln69_edges_2.txt --with-connectivity-constraints --algorithm reg --edge-selection min-cost --recursion-mode norec

Statistics:
-- Number of loaded vertices / edges: 100 / 124
-- Number of loaded vertices / edges: 100 / 123

EMD
Starting module SBL::Modules::T_Earth_mover_distance_module...

+++Running algo regular with edge selection min cost

g_source: nb of vertices = 100 and nb of edges = 124
Total production = 1.00
g_demand: nb of vertices = 100 and nb of edges = 123
Total demand = 1.00
Running greedy_main...
(Total cost = 0.07 ; Percentage of flow = 0.73 ; Number of edges = 121)
Statistics:
EMD with connectivity constraints...

Source flow: 0.73165 / 1.00000 = 0.73165
Remaining demand: 0.26835 / 1.00000 = 0.26835
Ratio flow/demand: 0.73165
Transportation cost (normalized): 0.06992 / 0.73165 = 0.09557

Report...

Ending module SBL::Modules::T_Earth_mover_distance_module...

End Run

General Statistics:

Times elapsed for computations (in seconds):
-- EMD: 8.521659
Total: 8.521659

--Done

Output dir: results-graph-True-reg-min-cost-norec
XML: 1 / 1 files were loaded

EMD_comparators: summary
Total cost: 0.06992068117924019
Total flow: 0.731650312575675
Normalized cost: 0.09556570943446191

Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe

Executing /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-True-reg-min-cost-coarse --vertex-weights-file bln69_energies_1-weights.txt  --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt  --edges-file data/bln69_edges_2.txt --with-connectivity-constraints --algorithm "reg" --edge-selection "min-cost" --recursion-mode "coarse"
Showing file results-graph-True-reg-min-cost-coarse/sbl-graph-emd-lrmsd__log.txt

++Showing file results-graph-True-reg-min-cost-coarse/sbl-graph-emd-lrmsd__log.txt
Running:  /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-True-reg-min-cost-coarse --vertex-weights-file bln69_energies_1-weights.txt --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt --edges-file data/bln69_edges_2.txt --with-connectivity-constraints --algorithm reg --edge-selection min-cost --recursion-mode coarse

Statistics:
-- Number of loaded vertices / edges: 100 / 124
-- Number of loaded vertices / edges: 100 / 123

EMD
Starting module SBL::Modules::T_Earth_mover_distance_module...

+++Running algo regular with edge selection min cost

g_source: nb of vertices = 100 and nb of edges = 124
Total production = 1.00
g_demand: nb of vertices = 100 and nb of edges = 123
Total demand = 1.00
Running greedy_main...
(Total cost = 0.00 ; Percentage of flow = 0.05 ; Number of edges = 1)
(Total cost = 0.00 ; Percentage of flow = 0.09 ; Number of edges = 2)
(Total cost = 0.00 ; Percentage of flow = 0.43 ; Number of edges = 5)
(Total cost = 0.00 ; Percentage of flow = 0.44 ; Number of edges = 6)
(Total cost = 0.01 ; Percentage of flow = 0.58 ; Number of edges = 9)
(Total cost = 0.02 ; Percentage of flow = 0.61 ; Number of edges = 61)
(Total cost = 0.06 ; Percentage of flow = 0.71 ; Number of edges = 68)
(Total cost = 0.07 ; Percentage of flow = 0.73 ; Number of edges = 121)
Statistics:
EMD with connectivity constraints...

Source flow: 0.73165 / 1.00000 = 0.73165
Remaining demand: 0.26835 / 1.00000 = 0.26835
Ratio flow/demand: 0.73165
Transportation cost (normalized): 0.06992 / 0.73165 = 0.09557

Report...

Ending module SBL::Modules::T_Earth_mover_distance_module...

End Run

General Statistics:

Times elapsed for computations (in seconds):
-- EMD: 10.599726
Total: 10.599726

--Done

Output dir: results-graph-True-reg-min-cost-coarse
XML: 1 / 1 files were loaded

EMD_comparators: summary
Total cost: 0.06992068117924019
Total flow: 0.731650312575675
Normalized cost: 0.09556570943446191

Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe

Executing /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-False-reg-min-cost-norec --vertex-weights-file bln69_energies_1-weights.txt  --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt  --edges-file data/bln69_edges_2.txt
Showing file results-graph-False-reg-min-cost-norec/sbl-graph-emd-lrmsd__log.txt

++Showing file results-graph-False-reg-min-cost-norec/sbl-graph-emd-lrmsd__log.txt
Running:  /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-False-reg-min-cost-norec --vertex-weights-file bln69_energies_1-weights.txt --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt --edges-file data/bln69_edges_2.txt

Statistics:
-- Number of loaded vertices / edges: 100 / 124
-- Number of loaded vertices / edges: 100 / 123

EMD
Starting module SBL::Modules::T_Earth_mover_distance_module...
cumulative mass source: 0.99999999999999977796
cumulative mass demand: 0.99999999999999977796
STEP: dumping the mps file for the LP i.e. linear_program_solver_input.mps

STEP: solving the LP with lp_solve...
Done with exit code 0

STEP: parsing the result from lp_solve i.e. linear_program_solver_input.res

Statistics:
EMD without connectivity constraint...

Source flow: 1.00000 / 1.00000 = 1.00000
Remaining demand: 0.00000 / 1.00000 = 0.00000
Ratio flow/demand: 1.00000
Transportation cost (normalized): 0.16320 / 1.00000 = 0.16320

Report...

Ending module SBL::Modules::T_Earth_mover_distance_module...

End Run

General Statistics:

Times elapsed for computations (in seconds):
-- EMD: 16.76989299999999971647
Total: 16.76989299999999971647

--Done

Output dir: results-graph-False-reg-min-cost-norec
XML: 1 / 1 files were loaded

EMD_comparators: summary
Total cost: 0.16319675899339775
Total flow: 0.9999998508942526
Normalized cost: 0.1631967833269761

Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe

Executing /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-False-reg-min-cost-coarse --vertex-weights-file bln69_energies_1-weights.txt  --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt  --edges-file data/bln69_edges_2.txt
Showing file results-graph-False-reg-min-cost-coarse/sbl-graph-emd-lrmsd__log.txt

++Showing file results-graph-False-reg-min-cost-coarse/sbl-graph-emd-lrmsd__log.txt
Running:  /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-lrmsd.exe --vertex-points-file data/bln69_minima_1.txt --vertex-points-file data/bln69_minima_2.txt -v -l -d results-graph-False-reg-min-cost-coarse --vertex-weights-file bln69_energies_1-weights.txt --vertex-weights-file bln69_energies_2-weights.txt --edges-file data/bln69_edges_1.txt --edges-file data/bln69_edges_2.txt

Statistics:
-- Number of loaded vertices / edges: 100 / 124
-- Number of loaded vertices / edges: 100 / 123

EMD
Starting module SBL::Modules::T_Earth_mover_distance_module...
cumulative mass source: 0.99999999999999977796
cumulative mass demand: 0.99999999999999977796
STEP: dumping the mps file for the LP i.e. linear_program_solver_input.mps

STEP: solving the LP with lp_solve...
Done with exit code 0

STEP: parsing the result from lp_solve i.e. linear_program_solver_input.res

Statistics:
EMD without connectivity constraint...

Source flow: 1.00000 / 1.00000 = 1.00000
Remaining demand: 0.00000 / 1.00000 = 0.00000
Ratio flow/demand: 1.00000
Transportation cost (normalized): 0.16320 / 1.00000 = 0.16320

Report...

Ending module SBL::Modules::T_Earth_mover_distance_module...

End Run

General Statistics:

Times elapsed for computations (in seconds):
-- EMD: 16.55179599999999950910
Total: 16.55179599999999950910

--Done

Output dir: results-graph-False-reg-min-cost-coarse
XML: 1 / 1 files were loaded

EMD_comparators: summary
Total cost: 0.16319675899339775
Total flow: 0.9999998508942526
Normalized cost: 0.1631967833269761


## Let us display all graphs at once. On this simple example, it is seen that the no-recursion and coarse recursion mode yield the same results¶

In [11]:
cmd = "ls results-graph*/*scatter.png"
images = [line.rstrip() for line in files]
print(images)
sblpyt.show_row_n_images(images, 200)

['results-graph-False-reg-min-cost-coarse/scatter.png', 'results-graph-False-reg-min-cost-norec/scatter.png', 'results-graph-True-reg-min-cost-coarse/scatter.png', 'results-graph-True-reg-min-cost-norec/scatter.png']

Figs displayed

In [ ]:


In [ ]:


In [ ]: