Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Morse_theory_based_analyzer

Authors: F. Cazals and T. Dreyfus

Introduction

This package offers a set of topological analysis inspired by Morse theory, namely the study of real valued function defined on a While Morse theory is classically defined in the differential setting ([86] , [17] or [11]), we are instead concerned with discrete structures (coded as graphs, see details below).

In all generality, the following constructions are of interest :

  • The critical points and their connections in co-dimension one (i.e., connexion from local minima to index one saddles, from index one saddles to index two saddles, etc).

  • The stable manifolds partition, that is a partition of the input points induced by the flow of non critical points toward critical points.

  • The disconnectivity forest, namely the forest representing the merge events observed when increasing the function value.

  • The persistence diagram of local minima when studying sub-level sets of the function.

  • The sub-level sets assignment, that is the assignment of points whose heights is below a certain threshold to stable manifolds of minima whose critical value is below that threshold.

Again, these notions are classically defined in the differential setting, so that the discrete setting requires the appropriate generalizations, in particular that of a (pseudo-)gradient [59] .

Notice again that these notions are classically defined in the differential setting, so that the discrete setting requires the appropriate generalizations, in particular that of a (pseudo-)gradient [59] .

For applications, of particular interest is the case of index 0 and index 1 critical points. When studying sub-level sets of smooth function, as done in Morse theory, connected components appear with local minima, and merge at selected index one saddles.

In the sequel, we focus on the aforementioned constructions which involve local minima and index one saddles only – more precisely substitutes defined in the discrete setting. In focusing on index 0 and 1 local minima, it turns out that the aforementioned pieces of information can be easily computed using a Union-find data structure [54]. In this package, we take a different route in this package, working with the Morse Smale Witten (MSW) chain complex, namely the bipartite graph connecting local minima and index one saddles [11].

Working with the MSW complex instead of the initial graphs indeed yields several benefits:

  • The MSW is an abstract structure whose complexity (size) depends on the number of critical points, instead of the number of samples. Thus, operations on sub-level sets are carried out more efficiently (e.g. sub-level sets assignments for various thresholds).
  • The MSW can easily be simplified to get rid of all local minima which are not persistent.

Practically, three types of inputs are supported:

  • Nearest Neighbors Graph (NNG): a graph connecting points, each with a function value. Selected points shall play the role of critical points (local minima and index one saddles) in the smooth setting.
  • Weighted Graph (WG): an edge valued graph. The values on edges provide the ordering to process the edges and vertices by increasing function value. By construction, every edge of the graph shall be an index one saddle. Likewise, each vertex is an index 0 saddle. The value of a vertex is such that it is less than those of the incident saddles.
  • Vertex Weighted Graph (VWG): a vertex valued graph, the vertex weight being the function value. Every vertex in an index 0 saddle, and every edge is an index one saddle. A weight is defined on each edge, in such a way that this value is larger than of the incident vertices.
Persistence diagrams are typically used to simplify topological space. In particular, 0-order persistence can be used to simplify landscapes. Fig. fig-MSW-vs-DF illustrates the subtle difference between simplification procedures relying upon Union-find and the MSW complex.


Landscape simplification using topological persistence

Simplification: strategy using Union-Find versus refined strategy using the Morse-Smale-Witten (MSW) complex. (A,B) Two landscapes, with critical points of indices 0 (disks) and 1 (squares). (C) The disconnectivity graph (DG) of both landscapes, namely the tree depicting the evolution of connected components of sublevel sets. Despite the differences between their MSW complexes, both landscapes share the same DG: upon passing the critical point e, the stable manifold of c merges with that born at a. (D,E) The MSW complexes of (A,B), respectively. In cancelling the pair of critical points (c, e), one does not know from the DG with which (a or b) the basin of c should be merged. But the required information is found in the MSW complex: on the landscape (A), c is merged with a; on the landscape (B), c is merged with b.

Architecture

There are two main packages :

This section describes the main class to be used in Core algorithms, and the modules to be used in applications.

Main class

The class SBL::GT::T_Morse_theory_based_analyzer has only one template parameter, that is MorseSmaleWittenChainComplex , a representation of the MSW chain complex provided by the package Morse_theory_based_analyzer. Its constructor takes a MSW chain complex and constructs the corresponding disconnectivity forest. It offers the following functionalities.

Critical points and their connections.

The method SBL::GT::T_Morse_theory_based_analyzer::get_paths returns the set of all triples (0-saddle, 1-saddle, 0-saddle) that exists in the underlying MSW chain complex : the saddle points are represented by their vertices in the input MSW chain complex.

Stable manifolds partition.

The method SBL::GT::T_Morse_theory_based_analyzer::get_stable_manifold_partition returns a partition of all the points associated to the different 0-saddles. It proceeds as follows :

  • for each 0-saddle in the MSW chain complex, it creates a stable manifold data structure, that is an empty container of points;

This is done so that all points used for building the MSW chain complex are segregated into the stable manifolds of local minima. Note that points associated to saddles of strictly positive index may be arbitrarily assigned to different stable manifolds. To avoid duplication of such points, an arbitrary choice is made, that is, each such point is ascribed to a chosen stable manifold.

Disconnectivity forest.

The disconnectivity forest is the main data structure for understanding how index one saddles are linked together. While processing the saddles of the MSW chain complex by increasing function value, the connected components of points associated to local minima merge. The internal vertices of the disconnectivity forest are exactly the index one saddles from which two different connected components merge.

More precisely, consider the sub-level set of all points that are below a threshold: those points make different components associated to different local minima. When the threshold is under the elevation of the lowest local minimum, the sub-level set is empty. When rising the threshold value, two events may occur :

  • there is a 0-saddle whose associated value passes under the threshold : a new component is created with the point represented by the 0-saddle, corresponding to a leaf in the disconnectivity forest;
  • there is a saddle of index one whose associated value passes under the threshold, such that the components of its descendants were not merged : the descendant components are merged into a new one, corresponding to an internal vertex in the disconnectivity forest, linked by edges to the corresponding descendants;

When a point which is not a saddle has an associated value passing under the threshold, it is simply added to the connected component associated to the saddle of this point : there is no change in the disconnectivity forest.

Advanced. The algorithm used for computing the disconnectivity forest is the one described in [54], but applied to vertices of the MSW chain complex. This can be done since all events that occur are associated to the saddles, i.e the vertices of the MSW chain complex.


The class SBL::GT::T_Morse_theory_based_analyzer provides the following operations on the disconnectivity forest :

Persistence diagram.

The persistence of a saddle is the measure of the lifetime of the homological feature associated with this saddle. For example, the persistence of a local minimum is the lifetime of the connected component which gets created at the function value of the minimum.

When two components merge, the smallest component dies while the largest component grows. Hence, the persistence of a saddle s is the difference between the value associated to the saddle where the component of s dies, and the value of s itself. The persistence interval of a saddle is just defined by these two values. When a saddle cannot be merged anymore, it has an infinite persistence.

The persistence diagram is defined as the list of pairs of vertices in the disconnectivity forest such that a component is created at the value of the first vertex, and dies at the value of the second vertex. The list is sorted by ascending values of persistences. Three operations are available on persistence diagrams:

Sub-level sets assignment.

A sub-level set is a component of points as defined previously where all the points are below a given threshold. Hence, all points whose values are below the threshold are assigned to a sub-level set. Assignment refers to the fact that points above the threshold are not included in the sub-level sets, so that sub-level sets are not a partition of all the input points of the MSW chain complex. Accessing the sub-level sets assignment for a given threshold value is done using the method SBL::GT::T_Morse_theory_based_analyzer::get_sublevelsets_connected_components

Simplification.

While all the previous operations aim to access specific analysis on the MSW chain complex, the simplification operation aims to remove the least persistent saddles from the analysis. In particular, it helps to eliminate noise and spurious saddles, so to have more smooth analysis of the MSW chain complex. See Morse_Smale_Witten_chain_complex for a full description of the simplification algorithm. To simplify the MSW chain complex so as to eliminate all saddles having a persistence under a given threshold, one has to use the method SBL::GT::T_Morse_theory_based_analyzer::simplify .

Note that this method simplifies the MSW chain complex and recompute the disconnectivity forest and the persistence diagram.

Morse_theory_based_analyzer_module

Base module.

The module SBL::Modules::T_Morse_theory_based_analyzer_module< ModuleTraits , MorseSmaleWittenChainComplex > is a generic class computing the different described analysis over an input MSW chain. The template parameter MorseSmaleWittenChainComplex is the representation of the MSW chain complex and should be an instantiation of the class SBL::GT::T_Morse_Smale_Witten_chain_complex . When the template parameter ModuleTraits defines the type Morse_Smale_Witten_chain_complex, there is no need of specifying this second template parameter.

In addition to the previously described analysis, the module provides command-line options for customizing the analysis :

  • –persistence-threshold : removes from the analysis all the saddles with a persistence below the input threshold.
  • –sublevelset-threshold : computes the sub-level sets by applying the input threshold.

NB: when this option is used, selected samples are filtered out from stable manifolds whose union is computed to define the connected components of sublevel sets. Results are stored in files prefixed with subleveltset_cluster, not to be confused with files prefixed with stable_manifold_cluster (see below).

  • –keep-largest-ccs : keeps in the displayed disconnectivity forest only the k largest connected components.
  • –club-saddles : creates into the displayed disconnectivity forest bins of saddles of a given size, representing them as a unique vertex of the forest.
  • –view-labels : adds into the displayed disconnectivity forest labels for each vertex.
  • –split-basins : reports each stable manifold of the partition in a separated plain text file, rather than all in the same XML archive file. This feature helps in particular when partitioned points are then used by other programs reading plain text files of points.

NB: when this option is used, stable manifolds, aka clusters, are stored in output files prefixed with stable_manifold_cluster.

When the MSW chain complex has already been computed, it is stored in a XML archive file. It is possible to directly load this XML archive using the option –MSW-filename : this is useful when previous steps for computing the MSW chain complex are time-consuming.


Specialized modules.

Remind that the MSW chain complex can be built from a Nearest Neighbors Graph (NNG), a Weighted Graph (WG), or a Vertex Weighted Graph (VWG). The previous module is specialized for each possible input graph :

  • SBL::Modules::T_Morse_theory_based_analyzer_for_NNG_module< ModuleTraits > : computes the MSW chain complex from a NNG and performs the described analysis. The template parameter ModuleTraits has to define the types Graph representing the input NNG, Morse_function representing the real-value function over the vertices of the NNG, and Distance_graph_function computing the distance between two vertices of the NNG. An additional command-line option –normalize-gradient allows to normalize the gradient of the associated function by the distances associated to the edges. for analyzing a NNG:
  • SBL::Modules::T_Morse_theory_based_analyzer_for_weighted_graph_module< ModuleTraits > : computes the MSW chain complex from a WG, and performs the described analysis. When building the MSW chain complex, the edges of the WG are considered as 1-saddles in the MSW chain-complex. The template parameter ModuleTraits has to define the types Graph representing the input WG, and Get_weight computing the weight associated to an edge or a vertex. Note that a WG does not necessarily defines a weight over vertices, so that the class Get_weight has to define this weight.

For example, the weight of a vertex can be the minimum weight over its incident edges, minus a user-defined epsilon value. In all cases, the the value of a vertex is required to be less than the values of its incident edges.

  • SBL::Modules::T_Morse_theory_based_analyzer_for_vertex_weighted_graph_module< ModuleTraits > : computes the MSW chain complex from a VWG and performs the described analysis. When building the MSW chain complex, the edges of the VWG are considered as 1-saddles in the MSW chain-complex. The template parameter ModuleTraits has to define the types Graph representing the input VWG, and Get_weight computing the weight associated to a vertex of the input VWG.

Note that in using a VWG, weights on edges are not required. If undefined, the maximum weight over the incident vertices is used.

Examples

Using the analyzer

The following example shows how to manually build a MSW chain complex and perform analysis over it. It creates an abstract landscape as a graph where vertices are the critical points of the landscape and edges link neighbors in the landscape. Critical points are actually abstract, meaning that they have no geometrical representation (i.e they are just vertices in a graph). Then, it builds the MSW chain complex by adding the saddles and linking them. Finally, it builds the analyzer engine and displays the different analysis.

#include <SBL/GT/Morse_Smale_Witten_chain_complex.hpp>
#include <SBL/GT/Morse_theory_based_analyzer.hpp>
#include <boost/graph/adjacency_list.hpp>
#include <SBL/IO/Disconnectivity_forest_embedder.hpp>
#include <SBL/IO/Disconnectivity_forest_viewer_eps.hpp>
#include<fstream>
#include<iostream>
typedef boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS, double, double> Graph;
class Morse_function
{
public:
typedef double FT;
typedef boost::graph_traits<Graph>::vertex_descriptor Point;
const Graph* m_G;
Morse_function(const Graph* G = NULL): m_G(G){}
const FT& operator()(const Point& p)const
{
return (*this->m_G)[p];
}
};
typedef MTB_analyzer::Disconnectivity_forest Disconnectivity_forest;
int main(int argc, char *argv[]){
std::cout << "Landscape is: " << std::endl;
std::cout << "\\ _" << std::endl;
std::cout << " \\ / \\ _ _" << std::endl;
std::cout << " \\_/ \\_/ \\ _ / \\ " << std::endl;
std::cout << " \\ / \\_/ \\ " << std::endl;
std::cout << " \\ / \\ " << std::endl;
std::cout << " \\_/ \\ " << std::endl;
std::cout << " \\_/ " << std::endl;
std::cout << " 0 1 2 3 4 5 6 7 8" << std::endl;
Graph G;
std::vector<boost::graph_traits<Graph>::vertex_descriptor> vertices;
vertices.push_back(boost::add_vertex(4, G));
vertices.push_back(boost::add_vertex(6, G));
vertices.push_back(boost::add_vertex(4, G));
vertices.push_back(boost::add_vertex(5, G));
vertices.push_back(boost::add_vertex(1, G));
vertices.push_back(boost::add_vertex(4, G));
vertices.push_back(boost::add_vertex(3, G));
vertices.push_back(boost::add_vertex(5, G));
vertices.push_back(boost::add_vertex(0, G));
for(unsigned i = 1; i < vertices.size(); i++)
boost::add_edge(vertices[i-1], vertices[i], G);
Morse_function function(&G);
MSW_CC msw(function);
std::vector<MSW_CC::Critical_vertex> critical_vertices;
for(unsigned i = 0; i < vertices.size(); i++)
critical_vertices.push_back(msw.add_critical_vertex(vertices[i], i % 2));
msw.connect_critical_vertices(critical_vertices[1], critical_vertices[0]);
msw.connect_critical_vertices(critical_vertices[1], critical_vertices[2]);
msw.connect_critical_vertices(critical_vertices[3], critical_vertices[2]);
msw.connect_critical_vertices(critical_vertices[3], critical_vertices[4]);
msw.connect_critical_vertices(critical_vertices[5], critical_vertices[4]);
msw.connect_critical_vertices(critical_vertices[5], critical_vertices[6]);
msw.connect_critical_vertices(critical_vertices[7], critical_vertices[6]);
msw.connect_critical_vertices(critical_vertices[7], critical_vertices[8]);
MTB_analyzer analyzer(msw);
std::cout << "Number of critical vertices: " << msw.get_number_of_critical_vertices() << std::endl;
std::cout << "Number of minima: " << msw.get_number_of_critical_vertices(0) << std::endl;
std::cout << "Number of 1-saddles: " << msw.get_number_of_critical_vertices(1) << std::endl;
if(argc > 1)
{
analyzer.simplify(atoi(argv[1]));
std::cout << "After persistence: " << std::endl;
std::cout << "Number of critical vertices: " << msw.get_number_of_critical_vertices() << std::endl;
std::cout << "Number of minima: " << msw.get_number_of_critical_vertices(0) << std::endl;
std::cout << "Number of 1-saddles: " << msw.get_number_of_critical_vertices(1) << std::endl;
}
std::ofstream out("disconnectivity_forest.eps");
SBL::IO::T_Disconnectivity_forest_embedder<Disconnectivity_forest, MSW_CC>
embedder(analyzer.get_disconnectivity_forest(), msw);
SBL::IO::T_Disconnectivity_forest_viewer_eps<Disconnectivity_forest,
SBL::IO::T_Disconnectivity_forest_embedder<Disconnectivity_forest,
MSW_CC> >
viewer(embedder, out);
for(boost::graph_traits<Disconnectivity_forest>::edge_iterator
it = boost::edges(analyzer.get_disconnectivity_forest()).first; it != boost::edges(analyzer.get_disconnectivity_forest()).second; it++)
viewer(boost::source(*it, analyzer.get_disconnectivity_forest()), boost::target(*it, analyzer.get_disconnectivity_forest()),
analyzer.get_disconnectivity_forest());
out.close();
std::string cmd = "ps2pdf disconnectivity_forest.eps; xpdf disconnectivity_forest.pdf&" ;
std::system(cmd.c_str());
return 0;
}

Using the module over a NNG

This example shows how to use the module SBL::Modules::T_Morse_theory_based_analyzer_for_NNG_module for analyzing an input set of 2D points elevated by input weights. It first computes a NNG of the input set of points, and then computes the MSW chain complex from which the analysis are made.

#include <SBL/GT/ANN_metric_space_proximity_tree.hpp>
#include <SBL/GT/ANN_meta.hpp>
#include <SBL/Modules/Nearest_neighbors_graph_builder_module.hpp>
#include <SBL/Modules/Morse_theory_based_analyzer_module.hpp>
#include <SBL/IO/Numbers_file_loader.hpp>
class Traits
{
public:
typedef double FT;
struct Point : public std::pair<FT, FT>
{
friend std::istream& operator>>(std::istream& in, Point& p)
{in >> p.first >> p.second; return in;}
friend std::ostream& operator<<(std::ostream& out, const Point& p)
{out << p.first << p.second; return out;}
template <class Archive>
void serialize(Archive& ar, unsigned version)
{
ar & boost::serialization::make_nvp("x", this->first);
ar & boost::serialization::make_nvp("y", this->second);
}
};
typedef std::pair<Point, FT> Elevated_point;
typedef boost::property<boost::edge_weight_t, FT> Edge_property;
typedef boost::adjacency_list<boost::vecS, boost::vecS,
boost::undirectedS,
Elevated_point, Edge_property> Nearest_neighbors_graph;
class Nearest_neighbors_distance
{
public:
typedef Elevated_point Point;
inline FT operator()(const Point& p, const Point& q)const
{
FT x = (p.first.first - q.first.first);
FT y = (p.first.second - q.first.second);
return std::sqrt(x*x + y*y);}
};//end class Distance
template <class DistanceType>
class Nearest_neighbors_query : public SBL::GT::T_ANN_meta<SBL::GT::T_ANN_metric_space_proximity_tree<DistanceType> >
{
public:
inline Nearest_neighbors_query(DistanceType& dist) : SBL::GT::T_ANN_meta<SBL::GT::T_ANN_metric_space_proximity_tree<DistanceType> >(dist){}
};//end class Nearest_neighbors_query
class Distance_graph_function
{
public:
typedef double FT;
typedef boost::graph_traits<Nearest_neighbors_graph>::vertex_descriptor Point;
private:
const Nearest_neighbors_graph* m_graph;
public:
inline Distance_graph_function(const Nearest_neighbors_graph* graph = NULL)
: m_graph(graph)
{
assert(graph != NULL);
}
inline FT operator()(const Point& u, const Point& v)const
{return Nearest_neighbors_distance()((*this->m_graph)[u], (*this->m_graph)[v]);}
};
class Morse_function
{
public:
typedef double FT;
typedef boost::graph_traits<Nearest_neighbors_graph>::vertex_descriptor Point;
const Nearest_neighbors_graph* graph;
inline Morse_function(void) : graph(NULL){}
inline FT operator()(const Point& u)const{return (*graph)[u].second;}
};
};
namespace std {
std::ostream& operator<<(std::ostream& out, const Traits::Elevated_point& p)
{out << p.first << p.second; return out;}
}
namespace boost {
namespace serialization
{
template <class Archive>
void serialize(Archive& ar, Traits::Elevated_point& p, unsigned version)
{
ar & boost::serialization::make_nvp("height", p.first);
ar & boost::serialization::make_nvp("point", p.second);
}
}
}
int main(int argc, char *argv[]){
if(argc < 3)
return 0;
Points_loader points_loader("points");
points_loader.get_file_names().push_back(argv[1]);
points_loader.load(true);
Weights_loader weights_loader("weights");
weights_loader.get_file_names().push_back(argv[2]);
weights_loader.load(true);
NNG_module nng_module;
nng_module.get_points() = new NNG_module::Points_vector();
for(unsigned i = 0; i != points_loader.get_data().size(); i++)
nng_module.get_points()->push_back(std::make_pair(points_loader.get_data()[i],
weights_loader.get_data()[i]));
nng_module.run(0, std::cout);
MTB_analyzer_module mtb_module;
mtb_module.get_nearest_neighbors_graph() = &nng_module.get_nearest_neighbors_graph();
mtb_module.get_Morse_function() = new Traits::Morse_function();
mtb_module.get_Morse_function()->graph = mtb_module.get_nearest_neighbors_graph();
mtb_module.run(3, std::cout);
mtb_module.statistics(std::cout);
mtb_module.report("example_Morse_theory_based_analyzer_NNG_");
return 0;
}

Using the module over a WG

This example shows how to use the module SBL::Modules::T_Morse_theory_based_analyzer_for_weighted_graph_module for analyzing an input WG, that is an edge weighted graph. Note that the input WG does not define weights over the vertices, but the algorithm requires such weights. Hence, it is the functor Get_weight that defines the weight of a vertex as the minimal weight among its incident edges.

#include <SBL/Modules/Morse_theory_based_analyzer_module.hpp>
#include "Graph_loader.hpp"
class Traits
{
public:
typedef double NT;
typedef boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS,
NT,
boost::property<boost::edge_weight_t, NT> > Graph;
class Get_weight
{
typedef boost::graph_traits<Graph> Graph_traits;
typedef Graph_traits::vertex_descriptor Vertex;
typedef Graph_traits::edge_descriptor Edge;
public:
typedef Vertex value_type;
typedef NT FT;
inline const FT& operator()(Edge e, const Graph& G)const
{return boost::get(boost::edge_weight, G, e);}
inline FT operator()(Vertex u, const Graph& G)const
{
assert(boost::out_degree(u, G) > 0);
FT min_height = 0;
for(Graph_traits::out_edge_iterator it = boost::out_edges(u, G).first; it != boost::out_edges(u, G).second; it++)
if(it == boost::out_edges(u, G).first || Get_weight::operator()(*it, G) < min_height)
min_height = Get_weight::operator()(*it, G);
return min_height;
}
};
};
int main(int argc, char *argv[]){
if(argc < 3)
return 0;
Graph_loader loader;
loader.s_edges_file_name = argv[1];
loader.s_edge_weights_file_name = argv[2];
loader.load(true);
MTB_analyzer_module module;
module.get_weighted_graph() = &loader.m_graph;
module.get_Morse_function() = new MTB_analyzer_module::Morse_function();
module.run(3, std::cout);
module.statistics(std::cout);
module.report("example_Morse_theory_based_analyzer_WG_");
return 0;
}

Using the module over a VWG

This example shows how to use the module SBL::Modules::T_Morse_theory_based_analyzer_for_vertex_weighted_graph_module for analyzing an input VWG. Note that the input VWG does not define weights over the edges, and the algorithm does not require such weights since they can be estimated from the weights over the vertices.

#include <SBL/Modules/Morse_theory_based_analyzer_module.hpp>
#include "Graph_loader.hpp"
class Traits
{
public:
typedef double NT;
typedef boost::adjacency_list<boost::vecS, boost::vecS, boost::undirectedS,
NT,
boost::property<boost::edge_weight_t, NT> > Graph;
class Get_weight
{
typedef boost::graph_traits<Graph> Graph_traits;
typedef Graph_traits::vertex_descriptor Vertex;
typedef Graph_traits::edge_descriptor Edge;
public:
typedef Vertex value_type;
typedef NT FT;
inline FT operator()(Vertex u, const Graph& G)const
{
G[u];
}
};
};
int main(int argc, char *argv[]){
if(argc < 3)
return 0;
Graph_loader loader;
loader.s_vertex_weights_file_name = argv[1];
loader.s_edges_file_name = argv[2];
loader.load(true);
MTB_analyzer_module module;
module.get_weighted_graph() = &loader.m_graph;
module.get_Morse_function() = new MTB_analyzer_module::Morse_function();
module.run(3, std::cout);
module.statistics(std::cout);
module.report("example_Morse_theory_based_analyzer_VWG_");
return 0;
}

Applications

Programs

This package offers also a number of programs applying the analysis to the following cases :

  • sbl-Morse-theory-based-analyzer-nng-euclid.exe : (i) computes the NNG of a set of elevated points in the Euclidean space, (ii) builds the MSW chain complex with the elevation of the points as function, and (iii) run the analysis.
  • sbl-Morse-theory-based-analyzer-nng-euclid-density-gauss.exe : (i) computes the NNG of a set of points in the Euclidean space, (ii) computes an elevation for each of them based upon a Gaussian based kernel density estimate, (iii) builds the MSW chain complex, and (iv) run the analysis.
  • sbl-Morse-theory-based-analyzer-wg.exe : builds the MSW chain complex associated to the input weighted graph as discussed in the examples, and run the analyis.
  • sbl-Morse-theory-based-analyzer-vwg.exe : builds the MSW chain complex associated to the input vertex weighted graph as discussed in the examples, and run the analysis.

Use case

This section an example of use of the program sbl-Morse-theory-based-analyzer-nng-euclid-density-gauss.exe over a collection of 2D points. Consider a distribution of points following a mixture of gaussians, as shown on figure fig-points-mixture-input . These points are saved in a file "points.txt", each line representing a point, and each point being represented by its dimension followed by its coordinates :

2 -10.56576 7.42972
2 -4.15042 11.18972
...

Randomly generated 2D points

500 points randomly generated following a mixture of 5 gaussians (100 points per gaussian)

We want to cluster these input points so that each cluster is saved in a separate file :

./sbl-Morse-theory-based-analyzer-nng-euclid-density-gauss.exe --points-file points.txt --num-neighbors 8 --persistence-threshold 1 --split-basins --no-popup

The different options are :

  • –num-neighbors : determines the minimal number of neighbors that each points has in the NNG;
  • –persistence-threshold : allows to eliminate non persistent minima, i.e to merge small clusters into larger ones.
  • –split-basins : prints the clusters into different files.
  • –no-popup : prevents the program from popping graphs such as the persistence diagram and the disconnectivity forest.

Once the program terminated, the files prefixed by sbl-Morse-theory-based-analyzer-nng-euclid-density-gauss__stable_manifold_cluster_ contains the list of indices of points for each cluster. The index of a points is the line of the original input file describing this point. Note that the indices start at 0, not 1. In order to convert indices to points, one can use the following bash script using sed command-line :

for f in sbl-Morse-theory-based-analyzer-nng-euclid-density-gauss__stable_manifold_cluster_*.txt; do
x=$(echo $f | egrep -o '[0-9]+')
rm -rf points_$x.txt
while read i; do; sed -n "$((${i}+1))p" points.txt >> points_$x.txt; done < $f;
done;

or equivalently using perl command-line :

for f in sbl-Morse-theory-based-analyzer-nng-euclid-density-gauss__stable_manifold_cluster_*.txt; do
x=$(echo $f | egrep -o '[0-9]+')
rm -rf points_$x.txt
while read i; do; perl -n -e 'print $_ if $.==(${i}+1)' points.txt >> points_$x.txt; done < $f;
done;

As a result, there is one file prefixed by points_ per cluster with the index of the corresponding minimum. From those files, software like gnuplot can be easily used for displaying the points colored by cluster, as shown on figure fig-points-mixture-cluster .

Clusters of input 2D points
The 500 input points from the mixture of gaussians (left) have been separated in four clusters (right).