Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Molecular_distances

Authors: F. Cazals and T.Dreyfus

Introduction

Molecular distances aim to compare different conformations of the same molecule. Conformations are D dimensional points, where each successive triple of coordinates are a 3D point representing a particle of the molecule. Then, computing the distance between two conformations consists on matching each particle from both conformations, then computing the distance between the two aligned sets of 3D points. Conformations can be represented with cartesian coordinates, or with internal coordinates.

above: we should specify the representations used ie cartesian and internal coords
below. when you mention the angular distance: to what angles does this refer to ??? use terms which have been defined in the package on coordinates. this means that we also have to revise section 2.2. also, I propose to use 2 sections to list distances: distances for representations in cartesian coords distances for representations in internal coordinates NB: I am also asking Romain to send you his code on binnet-cauchy


This package offers a variety of distances for comparing conformations, depending on their coordinate system.

Implementation

In this section, we will list and define all the distances that are used in this package.

Least-RMSD

Let $ C_1 $ and $ C_2 $ be two conformations of a molecule of $ N $ atoms. We note $ x_{1, 3*i}, x_{1, 3*i+1}, x_{1, 3*i+2} $ the $ x, y, z $ coordinates of the $ i $-th atom of the conformation $ C_1 $. The rigid registration of $ C_2 $ over $ C_1 $ is denoted $ \hat{C}_2 $, and its coordinates $ \hat{x}_{2, i} $. The least-RMSD of $ C_1 $ and $ C_2 $ is the root mean square distance between coordinates of $ C_1 $ and $ \hat{C_2} $:
$ \sqrt{\frac{\Sigma_{}^{}(\hat{x}_{2, i} - x_{1,i})^2}{3*N}} $


The rigid registration is performed using the package Point_cloud_rigid_registration_3. Note that this rigid registration takes only account for rotations and translations. In this package, we also need to take account for chirality, so that we need in addition a mirror transformation in the registration.

The least-RMSD is defined over conformations with cartesian coordinates.


subsection sec-mol-dist-implementation-angle Angular distance
While the l-RMSD is a distance between two conformations in a Euclidean space, the angular distance is defined in a space of angles. Saying that, the coordinates of particles are internal coordinates and vary in the interval $ [0, 2\pi] $. The angular distance is a RMSD where the pairwise distances between particles is defined over $ [0, 2\pi] $.
The angular distance is defined over the angles of conformations with internal coordinates.



RMSD internal distance

The previous distances were RMSD defined for pairs of particles. Another way to compare conformation is to compute the RMSD between all possible pairs of particles of one conformation with all matching pairs of particles of the other conformation. The resulting distance is called the RMSD internal distance.

The RMSD internal distance is defined over conformations with cartesian coordinates.


Other distances

This package offers also the possibility to plug any executable that takes two input D dimensional points and returns a single value distance. In this way, any other kind of distance, that is externally defined, can be used in the SBL.

The coordinate system to use with the conformations depends on the definition of the distance in the external program.


Design

Alignment

Except for the distance described in section Other distances, all described molecular distances require an alignment process to match particles of the two input conformations.

The concept MolecularAlignment is a simple functor taking as input the position of a particle in a conformation and a boolean tag indicating the direction of the alignment (first conformation to second, or opposite direction). It then returns the position of the input particle in the other conformation. The default functor is SBL::CSB::Molecular_alignment_default and simply returns the input position, assuming that both conformations are already aligned.

Particles

The l-RMSD and the RMSD internal distances require both a geometric definition of a particle. As mentionned, a particle is represented by a 3D point. Thus, the concept GetParticle defines a functor that takes as input a conformation and a position, and returning a 3D point corresponding to the target particle. The default functor is SBL::CSB::T_Get_particle_default and simply returns a CGAL Point_3 structure of the ith particle in the input conformation.

Least-RMSD

The l-RMSD is defined in the class SBL::CSB::T_Least_RMSD_cartesian< Conformation , GetParticle , MolecularAlignment > . The parameter Conformation is the representation of the input conformations and should be compliant with the CGAL::Point_d class of the CGAL library. The parameters GetParticle and MolecularAlignment were both previously described.

In addition, the class SBL::CSB::T_Least_RMSD_cartesian_with_chirality< Conformation , GetParticle , MolecularAlignment > computes the distances between mirror images of the input conformations, and returns the smallest distance.

subsubsection sec-mol-dist-implementation-design-angle Angular distance
The angular distance is defined in the class SBL::CSB::T_Squared_angular_internal_distance< Conformation , FT , MolecularAlignment > . The parameter FT is the number type used for computing the distance (by default, the double type), while the two other parameters were both previously described. Note that this functor returns the squared distance, avoiding the use of the square root operation, that might not be necessary.


RMSD internal distance

The RMSD internal distance is defined in the class SBL::CSB::T_Squared_RMSD_internal_distance< Conformation , InternalDistance , GetParticle , MolecularAlignment > . The parameter InternalDistance is a functor for computing the distance between two particles in the same conformation. The three other parameters were all previously described. Note that this functor returns the squared distance, avoiding the use of the square root operation, that might not be necessary.

Other distances

Externally defined distances can be wrapped using the class SBL::CSB::T_External_distance< Conformation , FT > . Parameters were all already described. The only additional requirements are :

  • the name of the external executable computing the distance is named sbl-conf-distance.exe (this can be achieved with a symbolic link);
  • the executable takes as input two text files, each representing a D dimensional point
  • the executable prints in the standard output the distance as a single value.

Examples

Least-RMSD

The following example loads two conformations from two input files, and computes the least-RMSD between those two conformations.

#include <iostream>
#include <fstream>
#include <SBL/CSB/Least_RMSD_cartesian.hpp>
#include <CGAL/Cartesian_d.h>
#include <CGAL/Cartesian.h>
#include <SBL/Models/Conformation_traits_point_d.hpp>
#include <SBL/Models/Conformation_traits_vector.hpp>
//Works with primitive types
typedef CGAL::Cartesian_d<double> K;
//typedef K::Point_d Conformation;
typedef std::vector<K::FT> Conformation;
int main(int argc, char *argv[])
{
//Read 3D points from an input file
if(argc < 3) return -1;
Conformation p, q;
std::ifstream in_p(argv[1]);
in_p >> p;
in_p.close();
std::ifstream in_q(argv[2]);
in_q >> q;
in_q.close();
LRMSD distance;
std::cout << "LRMSD: " << distance(p, q) << std::endl;
return 0;
}
subsection sec-mol-dist-examples-angle Angular distance
The following example loads two conformations from two input files, and computes the angular distance between those two conformations.
#include <iostream>
#include <fstream>
#include <SBL/CSB/Squared_angular_internal_distance.hpp>
#include <CGAL/Cartesian_d.h>
#include <CGAL/Cartesian.h>
#include <SBL/Models/Conformation_traits_point_d.hpp>
//Works with primitive types
typedef CGAL::Cartesian_d<double> K;
typedef K::Point_d Conformation;
int main(int argc, char *argv[])
{
//Read 3D points from an input file
if(argc < 3) return -1;
Conformation p, q;
std::ifstream in_p(argv[1]);
in_p >> p;
in_p.close();
std::ifstream in_q(argv[2]);
in_q >> q;
in_q.close();
Distance distance;
std::cout << "Angular distance : " << CGAL::sqrt(distance(p, q)) << std::endl;
return 0;
}


RMSD internal distance

The following example loads two conformations from two input files, and computes the RMSD internal distance between those two conformations.

#include <iostream>
#include <fstream>
#include <SBL/CSB/Squared_RMSD_internal_distance.hpp>
#include <CGAL/Cartesian_d.h>
#include <CGAL/Cartesian.h>
#include <SBL/Models/Conformation_traits_point_d.hpp>
//Works with primitive types
typedef CGAL::Cartesian_d<double> K;
typedef K::Point_d Conformation;
typedef CGAL::Cartesian<double>::Compute_squared_distance_3 Internal_distance_base;
struct Internal_distance : public Internal_distance_base
{
typedef double FT;
typedef CGAL::Cartesian<double>::Point_3 Point;
};
int main(int argc, char *argv[])
{
Conformation p, q;
std::ifstream in_p(argv[1]);
in_p >> p;
in_p.close();
std::ifstream in_q(argv[2]);
in_q >> q;
in_q.close();
RMSD distance;
std::cout << "RMSD for internal distances: " << CGAL::sqrt(distance(p, q)) << std::endl;
return 0;
}

Other distances

The following example loads two conformations from two input files, and computes the distance from an external executable called sbl-conf-distance.exe.

#include <iostream>
#include <fstream>
#include <SBL/CSB/External_distance.hpp>
#include <CGAL/Cartesian_d.h>
#include <SBL/Models/Conformation_traits_point_d.hpp>
typedef CGAL::Cartesian_d<double> K;
typedef K::Point_d Conformation;
int main(int argc, char *argv[])
{
Conformation p, q;
std::ifstream in_p(argv[1]);
in_p >> p;
in_p.close();
std::ifstream in_q(argv[2]);
in_q >> q;
in_q.close();
Distance distance;
std::cout << "External distance : " << distance(p, q) << std::endl;
return 0;
}

Applications

This package offers also a variety of programs performing different tasks related to the least-RMSD:

  • sbl-lrmsd-pdb.exe : simply computes the least-RMSD between two conformations of a molecule defined in PDB files;
  • sbl-lrmsd-all-against-one.exe : computes the least-RMSD between a conformation defined in a first file and a collection of conformations defined in a second file;
  • sbl-lrmsd-all-pairs.exe : computes the matrix of least-RMSD between all possible distinct pairs of conformations defined in an input file;
  • sbl-lrmsd-conformational-ensembles-intersection.exe : computes the common set of conformations between two input sets using the least-RMSD (two conformations are said identical if their distance is below an input treshold); the output is a file listing pairs of indices, each index representing the position of a matched conformation in the file it was defined;
  • sbl-lrmsd-conformational-ensembles-centroid-per-atom.exe : computes a conformation where each particle is a centroid of all matching particles of a set of input conformations.