Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

MolecularGeometryLoader

Authors: F. Cazals and T. Dreyfus

Introduction

This package defines a C++ concept for loading into main memory molecular geometric models stored in a file. The type of molecular geometric model, and the file format from which the geometric model is loaded, determine the model of MolecularGeometryLoader to use.

There are two important types of molecular geometric models used within the SBL:

  • the Work Package: Space Filling Models, that is, collections of 3D balls representing particles in a molecule or complex. Abusing terminology, we since a ball is bounded by a sphere, we use balls and spheres interchangeably.
  • the Conformations of a flexible molecule or complex, represented using either in Cartesian or internal coordinates. In the former case, the conformation is represented as a d-dimensional point (dD point for short); in the latter, various formats may be used to store the conformation.

Note that the SBL uses the CGAL library for representing geometric primitives in general, and 3D balls as well as d-dimensional points in particular. It also uses the ESBTL library for loading the molecular models from PDB files.

Three file formats are currently supported in the SBL for storing the molecular geometric models:

  • the PDB file format: the standard file format to store molecular geometric models from bio-informatics or bio-physical experiments:
ATOM    137  CG  GLU A  17      46.697  26.649  -9.999  1.00 34.67           C  
ATOM    138  CD  GLU A  17      45.260  26.976 -10.409  1.00 39.87           C
...
  • the plain text format for 3D balls : a simple text file listing 3D balls, one per line with first the coordinates, then the radius:
x_1 y_1 z_1 r_1
x_2 y_2 z_2 r_2
...
  • the plain text format for dD points: a simple text file listing the Conformations as dD points, one per line with first the dimension, then the coordinates (in the following example each line contains 6 coordinates):
6 x_1_1 x_1_2 x_1_3 x_1_4 x_1_5 x_1_6
6 x_2_1 x_2_2 x_2_3 x_2_4 x_2_5 x_2_6
...

Using existing models in existing Applications : I/O

As exposed in section Introduction, the SBL relies on two types of molecular geometric models, each type being used in a group of Applications :

Thus, the type of molecular geometry used is generally clear from the application context.

In an application using the concept MolecularGeometryLoader, two programs using different models of MolecularGeometryLoader will contain in their name a keyword specifying which model is used. Consider for example the application computing molecular surfaces and volumes:

  • the program using plain text files for loading the molecular geometric model is named $\text{\sblvorlumetxt}$ .
Molecular geometric models stored in text files (collections of balls, dD points) contains a minimal amount of information. On the opposite, pdb files contain numerous pieces of informations, so that options can be set when loading a molecular geometric model. For example, from the command line, one can set the number i.e. the id of the model to load (when multiple models are in the pdb file), set a threshold on the B factor, discard water molecules, load hydrogen atoms, etc.


For any program dealing with particles, it is possible to annotate the particles with user defined annotations using the option –annotations-file – see section Optional Annotations . This option takes a specification file as input–see section Optional Annotations for the specification. It is possible to load multiple annotations using multiple specification files. For more details about annotations, the reader is reffered to the package ParticleAnnotator.


Using existing models to develop novel applications

Space Filling Model Loaders

Two loaders can be used to load Work Package: Space Filling Models :

  • SBL::Models::T_Spheres_3_file_loader< Sphere3 , Point_3 > loads a text file listing 3D balls. Sphere3 determines the type of the 3D ball represented by its bounding sphere, while Point3 determines the type of the center ofthe 3D ball. Sphere3 can be a geometric object from the CGAL library (CGAL::Weighted_point_3 or CGAL::Sphere_3), or the particle type of the class SBL::Models::T_Geometric_particle_traits . By default, Point3 is inferred from the center type of Sphere3 .
  • SBL::Models::T_PDB_file_loader< ESBTLMolecularSystem , PDBLineFormat > loads a PDB file using the ESBTL machinery. ESBTLMolecularSystem determines the architecture of the containers of the atoms to be loaded. PDBLineFormat represents the format of a line in a PDB file. Note that all the template parameters have default values, allowing to use the class SBL::Models::T_PDB_file_loader <> for default ESBTL molecular systems, with an input PDB file with a default line format.

Conformation Loaders

The class SBL::Models::T_Conformations_file_loader< ConformationType , ConformationBuilder , ESBTLMolecularSystem , PDBLineFormat > loads a list of dD points, or a list of PDB files – one per conformation, using the ESBTL machinery.

The parameters of this class are as follows:

  • ConformationType is the geometric type of the conformation (e.g, a dD point),
  • ConformationBuilder is a functor building one conformation from an input file (PDB file, dD point file, etc). For example, if the conformation is loaded from a PDB file and teh conformation type is represented with internal coordinates, the builder constructs the internal coordinates from the PDB file.
  • ESBTLMolecularSystem determines the architecture of the containers of the atoms to be loaded.
  • PDBLineFormat represents the format of a line in a PDB file.

Note that all the template parameters have default value, allowing to use the class SBL::Models::T_Conformations_file_loader<> for default ESBTL molecular systems, with an input PDB file with a default line format, and to store the Conformations in CGAL dD points where coordinates are floating number types. Note also that the class SBL::Models::T_Conformations_file_loader offers the possibility to save the loaded Conformations in a file as a simple list of dD points.

Developing new models of MolecularGeometryLoader concept

One may need to load a molecular geometric model from a file that is neither a PDB file nor a plain text file with the syntax specified in section Introduction . In such a case, it is recommended to develop a new model of the concept MolecularGeometryLoader.

Any C++ model of MolecularGeometryLoader inherits from the class SBL::Modules::Loader_base and has to implement the pure virtual methods declared in SBL::Modules::Loader_base. In addition, a model of MolecularGeometryLoader must provide access to the loaded molecular geometric models. While Work Package: Space Filling Models are more focussed on one particular model, Conformations are currently grouped by ensembles following the context, one has to implement the following methods:

  • Work Package: Space Filling Models requires the method get_geometric_model() that returns the loaded geometric model—assuming a single model was loaded, and get_geometric_model(i) that returns the ith geometric loaded if several models were loaded;
#include <SBL/IO/Loader_base>
class Space_filling_model_loader : public SBL::IO::Loader_base
{
public:
//Virtual methods from Loader_base
boost::program_options::options_description add_options(void);
bool load(unsigned verbose, std::ostream& out);
bool check_options(std::string& message)const;
std::string get_output_prefix(void)const;
std::string get_name(void)const;
//Accessing molecular geometric models
const Molecular_geometric_model& get_geometric_model(unsigned i)const;
Molecular_geometric_model& get_geometric_model(unsigned i);
const Molecular_geometric_model& get_geometric_model(void)const;
Molecular_geometric_model& get_geometric_model(void);
};
Base loader from which any loader should inherit.
Definition: Loader_base.hpp:68
virtual bool load(unsigned verbose, std::ostream &out)
Load function.
Definition: Loader_base.hpp:109
virtual std::string get_name(void) const
Return the name of the class itself.
Definition: Loader_base.hpp:111
virtual bool check_options(std::string &message) const
Checks that the input options' values are coherent.
Definition: Module_base.hpp:93
virtual boost::program_options::options_description add_options(void)
Virtual method for adding options to the module.
Definition: Module_base.hpp:87
virtual std::string get_output_prefix(void) const
Returns a prefix that concatains the input line options used when running the module.
Definition: Module_base.hpp:99
  • Conformations require the method get_geometric_model_ensemble() that returns the loaded geometric models ensemble, if only one ensemble was loaded, and get_geometric_model_ensemble(i) that returns the ith loaded ensemble if several were loaded.
#include <SBL/IO/Loader_base>
class Conformations_loader : public SBL::IO::Loader_base
{
public:
//Virtual methods from Loader_base
boost::program_options::options_description add_options(void);
bool load(unsigned verbose, std::ostream& out);
bool check_options(std::string& message)const;
std::string get_output_prefix(void)const;
std::string get_name(void)const;
//Accessing molecular geometric models
const Molecular_geometric_model_ensemble& get_geometric_model_ensemble(unsigned i)const;
Molecular_geometric_model_ensemble& get_geometric_model_ensemble(unsigned i);
const Molecular_geometric_model_ensemble& get_geometric_model_ensemble(void)const;
Molecular_geometric_model_ensemble& get_geometric_model_ensemble(void);
};