Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
T_Cluster_engine_k_means< KMTraits > Class Template Reference

K-means algorithm. More...

#include <Cluster_engine_k_means.hpp>

Constructors

 T_Cluster_engine_k_means (void)
 
 T_Cluster_engine_k_means (void)
 

Functor

template<class InputIterator , class OutputIterator >
OutputIterator operator() (InputIterator begin, InputIterator end, OutputIterator out)
 
template<class InputIterator , class OutputIterator >
OutputIterator operator() (InputIterator begin, InputIterator end, OutputIterator out)
 

Internal Methods

template<class InputIterator >
FT get_score (InputIterator begin, InputIterator end, const std::vector< Center_of_mass > &centers, const std::vector< std::size_t > &p2c_map) const
 
template<class InputIterator >
K_means_scores get_scores (InputIterator begin, InputIterator end) const
 

Print

template<class PointsIterator , class ClustersIterator , class OutputIterator >
OutputIterator get_centers_of_mass (PointsIterator p_begin, PointsIterator p_end, ClustersIterator c_begin, ClustersIterator c_end, OutputIterator out) const
 
template<class InputIterator >
std::ostream & dump_clusters (InputIterator begin, InputIterator end, std::ostream &out) const
 
template<class InputIterator >
std::ostream & dump_centers (InputIterator begin, InputIterator end, std::ostream &out) const
 
template<class PointsIterator , class ClustersIterator >
void report (PointsIterator p_begin, PointsIterator p_end, ClustersIterator c_begin, ClustersIterator c_end, const std::string &prefix) const
 
template<class InputIterator >
std::ostream & dump_clusters (InputIterator begin, InputIterator end, std::ostream &out) const
 
std::ostream & dump_centers (std::ostream &out) const
 
void dump_centers (std::string cfname) const
 
template<class InputIterator >
void dump_input_points (InputIterator begin, InputIterator end) const
 
template<class PointsIterator , class ClustersIterator >
void report (PointsIterator p_begin, PointsIterator p_end, const std::string &prefix) const
 

Parameters

static void set_maximal_number_of_iterations (std::size_t max)
 
static void set_k (std::size_t k)
 
static void set_mode (const std::string &mode)
 
static void set_seed (std::size_t seed)
 
static void set_maximal_number_of_iterations (std::size_t max)
 
static void set_k (std::size_t k)
 
static void set_mode (const std::string &mode)
 
static void set_seed (std::size_t seed)
 
static void set_verbose (bool b)
 

Command line options

template<class OptionsDescription >
static void add_options (OptionsDescription &options)
 
static bool check_options (std::string &message)
 
static std::string get_output_prefix (void)
 
template<class OptionsDescription >
static void add_options (OptionsDescription &options)
 
static bool check_options (std::string &message)
 
static std::string get_output_prefix (void)
 

Detailed Description

template<class KMTraits>
class SBL::GT::T_Cluster_engine_k_means< KMTraits >

K-means algorithm.

\sbl_add_package_main_class{Cluster_engines, T_Cluster_engine_k_means, K-means algorithm,

Centers are stored in a vector. At the end of each iteration\, each center is replaced by the center of mass of points assigned to it. Note that the com is actually computed by accumulating the coords of points during the assignment\, with proper normalization once all assignments have been done.

The assignment of points to centers is recorded indirectly using the vector m_point_to_NN1_index: its i-th entry is the index of the NN of the i-th data point Likewise, m_point_NN2_index retains the second NN of each data point.

Complexity-wise\, the assignment of points to centers is made in a linear pass. we do not use a search DS since the centers only "live" one iteration.

Template Parameters
PointTypeRepresentation of a point.
VectorTypeRepresentation of a vector.
DistanceCompute the distance between two points.

}

Centers are stored in a vector. At the end of each iteration, each center is replaced by the center of mass of points assigned to it. Note that the com is actually computed by accumulating the coords of points during the assignment, with proper normalization once all assignments have been done.

The assignment of points to centers is recorded indirectly. That is, we maintain a vector (point2center_index) whose i-th entry is an iterator (pointer) onto the vector of centers.

Complexity-wise, the assignment of points to centers is made in a linear pass. we do not use a search DS since the centers only "live" one iteration.

Template Parameters
PointTypeRepresentation of a point.
VectorTypeRepresentation of a vector.
DistanceCompute the distance between two points.