Molecular_distances
Authors: F. Cazals and T.Dreyfus
Introduction
Goal. Molecular distances aim to compare different conformations of the same molecule or complex.
A conformation is represented either in Cartesian coordinates (using the Point_d format) or in internal coordinates. Computing distances between conformations occurs in essentially two situations:
- Conformations of a given molecule (or complex) yielded by a simulation method.
- Conformation of the same molecule (or complex) yielded by different experimental methods.
In the former case, it is clear that the same atoms are available. In the latter, the hydrogen atoms are typically not reported in a coherent way.
Modes. For distances based on Cartesian coordinates, this account for the following four modes used in distance calculations:
- Calpha atoms: distances are computed on Calpha.
- Backbone atoms: only Calpha, C and N atoms.
- Heavy atoms: distances are computed on all atoms but hydrogen atoms.
- All atoms: distances are computed on all atoms, including hydrogen atoms.
For proteins which do not have the same sequence, distance calculations require an alignment. Such calculations are available shortly in the package Molecular_distances_flexible .
Implementation: distances based on Cartesian coordinates
Mapping between atoms
As noticed in Introduction, a mapping between atoms is required to compute distances based on Cartesian coordinates. The trivial identity mapping is computed as follows:
- Chains. It is assumed that the two files compared contain the same number of chains – if applicable. The trivial mapping between chains is used – i-th chain of the first file compared with the i-th chain of the second file.
- Amino-acids. When comparing two chains, a.a. acids are sorted by residue sequence number, and common a.a. are selected. (NB: this allow handling the case where one chain misses selected a.a., e.g. located in a flexible loop.)
- Atoms. Common atoms of two a.a. are selected, using the atom names.
Least-RMSD
Let
and
be two conformations of a molecule of
atoms. We note
the
coordinates of the
-th atom of the conformation
. The rigid registration of
over
is denoted
, and its coordinates
. The
least-RMSD of
and
is the root mean square distance between coordinates of
and
:
The rigid registration is performed using the package Point_cloud_rigid_registration_3. Note that this rigid registration takes only account for rotations and translations. In this package, we also need to take account for chirality, so that we need in addition a mirror transformation in the registration.
Implementation: other distances
RMSD internal distance
The previous distances were RMSD defined for pairs of particles. Another way to compare conformation is to compute the RMSD between all possible pairs of particles of one conformation with all matching pairs of particles of the other conformation. The resulting distance is called the RMSD internal distance.
Other distances
This package offers also the possibility to plug any executable that takes two input D dimensional points and returns a single value distance. In this way, any other kind of distance, that is externally defined, can be used in the SBL.
Design
Alignment
Except for the distance described in section Other distances, all described molecular distances require an alignment process to match particles of the two input conformations.
The concept MolecularAlignment is a simple functor taking as input the position of a particle in a conformation and a boolean tag indicating the direction of the alignment (first conformation to second, or opposite direction). It then returns the position of the input particle in the other conformation. The default functor is SBL::CSB::Molecular_alignment_default and simply returns the input position, assuming that both conformations are already aligned.
Particles
The l-RMSD and the RMSD internal distances require both a geometric definition of a particle. As mentionned, a particle is represented by a 3D point. Thus, the concept GetParticle defines a functor that takes as input a conformation and a position, and returning a 3D point corresponding to the target particle. The default functor is SBL::CSB::T_Get_particle_default and simply returns a CGAL Point_3 structure of the ith particle in the input conformation.
Least-RMSD
The l-RMSD is defined in the class SBL::CSB::T_Least_RMSD_cartesian< Conformation , GetParticle , MolecularAlignment > . The parameter Conformation is the representation of the input conformations and should be compliant with the CGAL::Point_d class of the CGAL library. The parameters GetParticle and MolecularAlignment were both previously described.
In addition, the class SBL::CSB::T_Least_RMSD_cartesian_with_chirality< Conformation , GetParticle , MolecularAlignment > computes the distances between mirror images of the input conformations, and returns the smallest distance.
RMSD internal distance
The RMSD internal distance is defined in the class SBL::CSB::T_Squared_RMSD_internal_distance< Conformation , InternalDistance , GetParticle , MolecularAlignment > . The parameter InternalDistance is a functor for computing the distance between two particles in the same conformation. The three other parameters were all previously described. Note that this functor returns the squared distance, avoiding the use of the square root operation, that might not be necessary.
Other distances
Externally defined distances can be wrapped using the class SBL::CSB::T_External_distance< Conformation , FT > . Parameters were all already described. The only additional requirements are :
- the name of the external executable computing the distance is named sbl-conf-distance.exe (this can be achieved with a symbolic link);
- the executable takes as input two text files, each representing a D dimensional point
- the executable prints in the standard output the distance as a single value.
Examples
Least-RMSD
The following example loads two conformations from two input files, and computes the least-RMSD between those two conformations.
#include <iostream>
#include <fstream>
#include <SBL/CSB/Least_RMSD_cartesian.hpp>
#include <CGAL/Cartesian_d.h>
#include <CGAL/Cartesian.h>
#include <SBL/Models/Conformation_traits_point_d.hpp>
#include <SBL/Models/Conformation_traits_vector.hpp>
typedef CGAL::Cartesian_d<double> K;
typedef std::vector<K::FT> Conformation;
int main(int argc, char *argv[])
{
if(argc < 3) return -1;
Conformation p, q;
std::ifstream in_p(argv[1]);
in_p >> p;
in_p.close();
std::ifstream in_q(argv[2]);
in_q >> q;
in_q.close();
std::cout << "LRMSD: " << distance(p, q) << std::endl;
return 0;
}
RMSD internal distance
The following example loads two conformations from two input files, and computes the RMSD internal distance between those two conformations.
#include <iostream>
#include <fstream>
#include <SBL/CSB/Squared_RMSD_internal_distance.hpp>
#include <CGAL/Cartesian_d.h>
#include <CGAL/Cartesian.h>
#include <SBL/Models/Conformation_traits_point_d.hpp>
typedef CGAL::Cartesian_d<double> K;
typedef K::Point_d Conformation;
typedef CGAL::Cartesian<double>::Compute_squared_distance_3 Internal_distance_base;
struct Internal_distance : public Internal_distance_base
{
typedef double FT;
typedef CGAL::Cartesian<double>::Point_3 Point;
};
int main(int argc, char *argv[])
{
Conformation p, q;
std::ifstream in_p(argv[1]);
in_p >> p;
in_p.close();
std::ifstream in_q(argv[2]);
in_q >> q;
in_q.close();
std::cout << "RMSD for internal distances: " << CGAL::sqrt(distance(p, q)) << std::endl;
return 0;
}
Other distances
The following example loads two conformations from two input files, and computes the distance from an external executable called sbl-conf-distance.exe.
#include <iostream>
#include <fstream>
#include <SBL/CSB/External_distance.hpp>
#include <CGAL/Cartesian_d.h>
#include <SBL/Models/Conformation_traits_point_d.hpp>
typedef CGAL::Cartesian_d<double> K;
typedef K::Point_d Conformation;
int main(int argc, char *argv[])
{
Conformation p, q;
std::ifstream in_p(argv[1]);
in_p >> p;
in_p.close();
std::ifstream in_q(argv[2]);
in_q >> q;
in_q.close();
Distance distance;
std::cout << "External distance : " << distance(p, q) << std::endl;
return 0;
}
Applications
This package also offers a variety of programs performing different tasks related to the least-RMSD:
- sbl-lrmsd-for-pdb-pair.exe : computes the least-RMSD between two conformations of a molecule defined in two PDB files passed as arguments. The chains loaded can be selected.
- sbl-lrmsd-for-pdb-list.exe : computes the least-RMSD between any two conformations of a molecule defined in PDB files listed in a given text file.
- sbl-binet-cauchy-for-pdb-pairs.exe : computes the Binet Cauchy kernel score between two conformations of a molecule defined in two PDB files passed as arguments. The chains loaded can be selected.
- sbl-binet-cauchy-for-pdb-list.exe : computes the Binet Cauchy kernel score between any two conformations of a molecule defined in PDB files listed in a given text file.
- sbl-lrmsd-all-against-one.exe : computes the least-RMSD between a conformation defined in a first file and a collection of conformations defined in a second file;
- sbl-lrmsd-all-pairs.exe : computes the matrix of least-RMSD between all possible distinct pairs of conformations defined in an input file;
- sbl-lrmsd-conformational-ensembles-intersection.exe : computes the common set of conformations between two input sets using the least-RMSD (two conformations are said identical if their distance is below an input treshold); the output is a file listing pairs of indices, each index representing the position of a matched conformation in the file it was defined;
- sbl-lrmsd-conformational-ensembles-centroid-per-atom.exe : computes a conformation where each particle is a centroid of all matching particles of a set of input conformations.
- sbl-lrmsd-subdomain-comparator.exe : Given a list of proteins (as PDB files) and a sub-domain label classification (as defined in MolecularSystemLabelsTraits) for each, computes the least-RMSD between all sub-domains which have equal labels accross partners.