Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

ParticleAnnotator

Authors: F. Cazals and T. Dreyfus

Introduction

Annotations: Principles. In the SBL, particles are either atoms or pseudo-atoms. Annotating such particles with properties (chemical, physical, biological) is a common task faced while modeling macro-molecular structures. These annotations are associated with molecular geometric models. In particular, for atoms, the annotations aim at complementing those typically found in a PDB file.

Typical atomic annotations are the following ones:

  • Atomic radii, be they van der Waals radii or so-called group radii (which account for the absence of H atoms, see section Atomic Radii and Group Radii)
  • Atomic groups, based on chemical properties.
  • Amino acid (a.a.) groups, also based on chemical properties.
  • Solvation properties, for atoms or a.a.

Note that annotations may depend upon one another, which we illustrate with atomic radii: an atomic radius depends on the atom type, which itself depends on the PDB atom name. We show in section Handling Multiple Annotators how to handle such dependencies.

Compulsory versus optional.

In the context of a given application, we distinguish two types of annotations:

Compulsory: an annotation strictly required by the application.

Such annotations are static in the sense where each particle has data members to store them.

Consider the application $\text{\vorlumeEP}$ from the package Space_filling_model_surface_volume, which computes molecular surfaces and volumes: computing the volume of a molecule requires atomic radii, which are not found in PDB files: radii define a compulsory annotation for this application.


Optional: an annotation not strictly required by the application, but typically used during the post-processing phase.

Such annotations are dynamic, i.e. the corresponding data members are created iff the corresponding annotations are loaded.

Consider the exposed surface areas, computed on a per atom basis, returned by $\text{\vorlumeEP}$. One may wish to aggregate these by atom types, say polar or charged. Such an annotation is used at the post-processing stage and is optional.


Using existing models in existing Applications : I/O

Compulsory Annotations

Such annotations can be loaded in two ways: from a file, or from a data structure present in main memory. We now detail these two options.

Compulsory annotations loaded from file: file format. A file specifying the annotations can be loaded by the application. Note that the header, if any, is specific to the annotator loading this file; that is, the syntax / semantic of this file must be specified by the annotator. See e.g., SBL::Models::T_Radius_annotator_for_particles_with_annotated_name

# the first line is a keyword used by the annotator
# then, the pairs (key, value) are listed
annotated_name
Cali 1.87
Caro 1.76
...

Compulsory annotations loaded from data structures present in main memory. In that case, no file is loaded. This mechanism must be integrated in the application design / implementation, as explained in section Using existing models to develop novel applications .

Optional Annotations

Such annotations must be loaded from a file. Note that all optional annotations use the same annotator, so that the header of any optional annotation file specifies the same three properties (see example in section Example : Running a Program With a Particle Annotator):

  • the name of the annotation: a simple string (one word without space character) that identifies the set of annotations in the file. In the example below: solvation_parameter.
  • the composition of the key: a sequence of tags separated by the minus character; this allows to specify how to read the key in the file, and how to make a key from a particle. The possibilities are : CHAIN, RES_NAME, RES_ID, ATOM_NAME, ATOM_ID and ELEMENT. For example, a key composed of a chain identifier and a residue name is specified with CHAIN-RES_NAME. In the example below: RES_NAME-ATOM_NAME
  • the type of the annotation: a C++ primitive type or a string; it specifies the c++ type of the annotation. In the example below: double.

Note that the character '#' on a line allows to discard the entire line.

Optional annotation file: the example of Eisenberg's solvation parameters to estimate solvation free energies [66].

# First line: 3 keywords i.e. (i) annotation name (ii) key composition (iii) type of annotation
# Subsequent lines: 
solvation_parameter RES_NAME-ATOM_NAME double
ALA-N -9
ALA-CA 18
ALA-C 18
...

Note that it is possible to use wild cards to specify the same value for several keys.

If one wants to set the solvation parameter to -9 for all atoms named N, or to set a default value (e.g 0) for all atoms the following expressions are valid:
*-N -9
*-* 0


Note that for a given particle, wild cards are evaluated iff there is no key that matches this particle. Note also that if several expressions with wild cards are valid for the same particle, annotation stems from the first valid expression encountered.

Example : Running a Program With a Particle Annotator

Assume that one wishes to run the program $\text{\vorlumeEP}$ on a molecule, and to make analysis on volume of Secondary Structure Elements (SSE), and on the exposure of atoms related to their hydrophobic status. Such analysis require three different annotations:

  • (i) the radii of atoms, for representing the atoms of the input molecule as 3D balls,
  • (ii) the SSE containing an atom, if any,
  • (iii) the hydrophoby class of an atom.

The status of these annotations are as follows:

(i) A compulsory annotation: all particles have a field radius, which is initialized with the corresponding annotator. The annotator setting the radius is SBL::Models::T_Radius_annotator_for_particles_with_annotated_name, and use either default values (van der Walls radii or group radii), or an annotations file for its initialization.

(ii) An optional annotation, only used as post-processing. However, SSE are contained in PDB files headers, and are loaded with the particles. Thus, SSE annotations are treated as a compulsory annotation – in particular each particle will be naturally decorated by its SSE in the output files.

(iii) An optional annotation, since it is only used as post-processing. Since this information is not present in PDB files, a file must be passed which specifies the hydrophobic properties of atoms/particles. See the example of Eisenberg's solvation parameters mentioned above.

Out of our three annotations, only the hydrophobic status and the radii must be loaded from files. When running the program of the example, the options related to the annotations are:

Particle Annotator:
  --annotated-names arg                 Annotated names file name
  --atomic-group-radii arg              Atomic group radii file name
  --radius-water arg (=1.4)             Radius of the water probe for SAS model
                                        (default is 1.4).
  --annotations-file arg                Dynamic annotations file name
  • –annotated-names: file listing names of atoms for setting radii related to these names; if the file is omitted, default values are loaded as described in the class SBL::Models::T_Name_annotator_for_atoms
  • –radius-water: value to add to all radii of atoms, mimicking a water probe; if the value is omitted, 1.4 is the default value taken
  • –annotations-file: file listing optional annotations (e.g the hydrophobic status of atoms); the option is called for each different optional annotations to add to the atoms.

Using existing models to develop novel applications

In this section, we show how to use and compose existing annotators, that are either compulsory or optional.

Using Annotators in a Workflow

Within an application, an annotator may be used in two ways:

  • Coupled to a file loader: the annotator annotates from information read in a file.
  • Coupled to a data structure present in main memory: the annotator annotates from information found in that data structure.

In both cases, the loader mechanism from the package Module_base is used.

A Simple Compulsory Annotator

A C++ model of ParticleAnnotator is a functor taking a particle as input, and modifying it by annotating the relevant data members. Consequently, the particle must have the corresponding field, together with a method to set it. For example, the annotator SBL::Models::T_Name_annotator_for_atoms requires the method SBL::Models::T_Particle_with_annotations::get_annotations, which returns a structure with a field atom_name. If one wants to use the annotator with another particle data structure, SBL::Models::T_Name_annotator_for_atoms is templated by a functor SetAnnotation that allows customizing the way an annotation is set on a particle.

Note that for the annotator SBL::Models::T_Name_annotator_for_atoms, there are default annotations used when no annotation file is provided – see the reference manual of the corresponding annotator.

The important C++ models of the C++ concept ParticleAnnotator are listed on the main page of the package (ParticleAnnotator). Two of them in particular are discussed in the sequel.

A Generic Compulsory Annotator

The C++ model SBL::Models::T_Generic_annotator provides a generic basic way to load annotations from a plain text file and to annotate particles. The class SBL::Models::T_Generic_annotator has seven template arguments (the first four are mandatory, the remaining three are optional):

  • KeyType : the type of the key (generally a simple string or an enum type is sufficient). It must be comparable since it is stored in a map, and readable from an input stream.
  • AnnotationType : the type of the annotation to load. It must be readable from an input stream.
  • MakeKey : the functor that takes as input a particle, and returns the key corresponding to the input particle.
  • SetAnnotation : the functor that takes as input an annotation and a reference to a particle, and annotates the particle with the input annotation.
  • GetOptionName : for specifying the input annotations file name (by default, the option name is "\-\-annotations-file").
  • GetOptionHelp : a functor returning the command line option help (by default, the option help is "Annotations file name").
  • GetOptionDisplayName : a functor returning the command line option display name for this annotator (by default, the display name is "Annotator"). Note that if the annotator is used within a collection of annotators, it has no display name.

If one wants to use the generic annotator while loading the annotations from a data structure in main memory, it is possible to use a simpler class SBL::Models::T_Generic_annotator_without_file having three template parameters:

  • AnnotationType : the type of the annotation to load. It must be readable from an input stream.
  • SetAnnotation : the functor that takes as input an annotation and a reference to a particle, and annotates the particle with the input annotation.
  • GetInstanceName : the functor that returns a name specific to an instance of a generic annotator (useful when using several generic annotators)

Handling Multiple Annotators

It is possible to use multiple annotations using the class SBL::Models::T_Particle_annotator_collector< ParticleAnnotator1 , ParticleAnnotator2 >, that is itself a C++ model of the C++ concept ParticleAnnotator. The two template parameters are different C++ models of the C++ concept ParticleAnnotator, and the class SBL::Models::T_Particle_annotator_collector aims to use the functionality of both annotators (for annotating, and for loading). It is possible to compose with the class SBL::Models::T_Particle_annotator_collector itself for using more than two annotations at once.

The annotators in a collection may be initialized either from annotations' files or C++ data structures.

Note that some annotations have dependencies with one another. For example the class SBL::Models::T_Radius_annotator_for_particles_with_annotated_name requires that the particle has an annotated name. For this reason, there is an ordering induced by the order of the template parameters of the class SBL::Models::T_Particle_annotator_collector : the first annotator is always called before the second one. In the case of SBL::Models::T_Radius_annotator_for_particles_with_annotated_name , one must set first the annotated name (first template parameter), then the radius (second template parameter).

Example: Instantiating the Annotators

Consider the example of section Example : Running a Program With a Particle Annotator, where one wants to run the program $\text{\vorlumeEP}$ on a molecule, and post process the result to analyze the volume of Secondary Structure Elements (SSE), and the exposure of atoms depending on their hydrophobic properties. We now show how to instantiate annotations and their annotators to carry out such analysis.

In the SBL, annotated particles are minimally decorated with a name, a radius and a list of optional annotations. This means that each time an annotated particle type is used, it has at least these requirements. For example, in the source file of the program $\text{\vorlumeEP}$, the class SBL::Models::Atom_with_flat_info_and_annotations_traits_epick is used to represent a particle type with an annotated radius deduced from an annotated name, and possibly holding optional annotations loaded from a file.

Example: Defining Custom Annotators

We now show how to define a custom annotator for using SSE annotators. First, in addition to the previous annotations, a serializable annotation class has to be defined for representing the SSE information. See Generalities: IO operations, Serialization for a definition of serializable.

//Annotation for checking if an atom is in a secondary structure element
struct SSE_annotation
{
bool is_in_sse;
friend class access;
template <class Archive>
void serialize(Archive& ar, const unsigned BOOST_PFTO int version)
{ar & boost::serialization::make_nvp("is_in_sse", is_in_sse);}
};
//Make a particle type with default annotations (name, radius and optional annotations) and SSE annotations

Note that the class SBL::Models::T_Atom_with_hierarchical_info_and_annotations_traits_epick is now templated by the SSE_annotation type, creating annotations for a name and a radius field, a list of optional annotations, and SSE information.

The class SBL::Models::T_Generic_annotator_without_file is then used for defining the annotator that will assign the SSE annotations. In addition to the SSE annotations class, two template parameters have to be defined:

  • a functor returning the name of the annotator: (Get_SSE_annotator_name)
  • a functor setting the annotation of a particle: (Set_SSE_annotation)
//Defining the annotator for the SSE by defining the name of the annotator and how to set the annotation.
struct Get_SSE_annotator_name
{std::string operator()(void)const{return "SSE Annotator";}};
struct Set_SSE_annotation{void operator()(Particle_type& p)const
{p.get_annotations().is_in_sse = (p.residue().chain().get_secondary_structure_element_of(p) != NULL);}};
Set_SSE_annotation,
Get_SSE_annotator_name> SSE_annotator;
//Combining the default annotator of the particle type with this annotator

Note that for using all the annotators, one must use the class SBL::Models::T_Particle_annotator_collector with the SSE annotator and the default annotator available from the used particle traits class.

This ends the example by handling the SSE annotations (case (ii)).

Developing new models of ParticleAnnotator concept

Requirements

If the model SBL::Models::T_Generic_annotator does not provide the required functionalities, one may develop his/her own C++ model of ParticleAnnotator. A C++ model of the C++ concept ParticleAnnotator has the following three requirements:

  • It must inherits from SBL::IO::Loader_base, even if the annotations are not loaded from a file (e.g, they are loaded from a C++ data structure); this requires to re-implement a number of virtual methods, as explained in the package Module_base (e.g, the method SBL::IO::Loader_base::load is re-implemented for loading the annotations into a map (key, annotation))
  • It must be a functor with a unique argument, namely the particle to annotate. This functor is the method called for annotating the input particle – it builds a key from the particle and looks for the corresponding annotation in a map loaded from an annotation file or a C++ data structure.

Example: Defining a New Annotator

If the provided generic annotators do not fulfill one's needs, it is also possible to create a new model of ParticleAnnotator following the requirements of section Developing new models of ParticleAnnotator concept. The following example is the source code of the class SBL::Models::No_particle_annotator which is a dummy annotator. However, all requirements are present, giving a template for models of ParticleAnnotator :

class No_particle_annotator : public SBL::IO::Loader_base
{
public:
typedef No_particle_annotator Self;
typedef SBL::IO::Loader_base Base;
//Requirements from the loader base
inline bool check_options(std::string& message)const{return true;}
inline std::string get_output_prefix(void)const{return "";}
inline bool load(unsigned verbose, std::ostream& out){return true;}
inline std::string get_name(void)const{return "SBL::Models::No_particle_annotator";}
//Functor: takes a particle as input and do nothing
template <class Particle> inline void operator()(Particle& p)const{}
//Get annotator: this annotator should never been accessed: return always NULL
inline Base* get_annotator(const std::string& name){return NULL;}
};