Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
|
Authors: F. Cazals and T. Dreyfus
Annotations: Principles. In the SBL, particles are either atoms or pseudo-atoms. Annotating such particles with properties (chemical, physical, biological) is a common task faced while modeling macro-molecular structures. These annotations are associated with molecular geometric models. In particular, for atoms, the annotations aim at complementing those typically found in a PDB file.
Typical atomic annotations are the following ones:
Note that annotations may depend upon one another, which we illustrate with atomic radii: an atomic radius depends on the atom type, which itself depends on the PDB atom name. We show in section Handling Multiple Annotators how to handle such dependencies.
Compulsory versus optional.
In the context of a given application, we distinguish two types of annotations:
Compulsory: an annotation strictly required by the application.
Such annotations are static in the sense where each particle has data members to store them.
Optional: an annotation not strictly required by the application, but typically used during the post-processing phase.
Such annotations are dynamic, i.e. the corresponding data members are created iff the corresponding annotations are loaded.
Such annotations can be loaded in two ways: from a file, or from a data structure present in main memory. We now detail these two options.
Compulsory annotations loaded from file: file format. A file specifying the annotations can be loaded by the application. Note that the header, if any, is specific to the annotator loading this file; that is, the syntax / semantic of this file must be specified by the annotator. See e.g., SBL::Models::T_Radius_annotator_for_particles_with_annotated_name
# the first line is a keyword used by the annotator # then, the pairs (key, value) are listed annotated_name Cali 1.87 Caro 1.76 ...
Compulsory annotations loaded from data structures present in main memory. In that case, no file is loaded. This mechanism must be integrated in the application design / implementation, as explained in section Using existing models to develop novel applications .
Such annotations must be loaded from a file. Note that all optional annotations use the same annotator, so that the header of any optional annotation file specifies the same three properties (see example in section Example : Running a Program With a Particle Annotator):
Note that the character '#' on a line allows to discard the entire line.
Optional annotation file: the example of Eisenberg's solvation parameters to estimate solvation free energies [82].
# First line: 3 keywords i.e. (i) annotation name (ii) key composition (iii) type of annotation # Subsequent lines: solvation_parameter RES_NAME-ATOM_NAME double ALA-N -9 ALA-CA 18 ALA-C 18 ...
Note that it is possible to use wild cards to specify the same value for several keys.
*-N -9 *-* 0
Note that for a given particle, wild cards are evaluated iff there is no key that matches this particle. Note also that if several expressions with wild cards are valid for the same particle, annotation stems from the first valid expression encountered.
Assume that one wishes to run the program on a molecule, and to make analysis on volume of Secondary Structure Elements (SSE), and on the exposure of atoms related to their hydrophobic status. Such analysis require three different annotations:
The status of these annotations are as follows:
(i) A compulsory annotation: all particles have a field radius, which is initialized with the corresponding annotator. The annotator setting the radius is SBL::Models::T_Radius_annotator_for_particles_with_annotated_name, and use either default values (van der Walls radii or group radii), or an annotations file for its initialization.
(ii) An optional annotation, only used as post-processing. However, SSE are contained in PDB files headers, and are loaded with the particles. Thus, SSE annotations are treated as a compulsory annotation – in particular each particle will be naturally decorated by its SSE in the output files.
(iii) An optional annotation, since it is only used as post-processing. Since this information is not present in PDB files, a file must be passed which specifies the hydrophobic properties of atoms/particles. See the example of Eisenberg's solvation parameters mentioned above.
Out of our three annotations, only the hydrophobic status and the radii must be loaded from files. When running the program of the example, the options related to the annotations are:
Particle Annotator: --annotated-names arg Annotated names file name --atomic-group-radii arg Atomic group radii file name --radius-water arg (=1.4) Radius of the water probe for SAS model (default is 1.4). --annotations-file arg Dynamic annotations file name
In this section, we show how to use and compose existing annotators, that are either compulsory or optional.
Within an application, an annotator may be used in two ways:
In both cases, the loader mechanism from the package Module_base is used.
A C++ model of ParticleAnnotator is a functor taking a particle as input, and modifying it by annotating the relevant data members. Consequently, the particle must have the corresponding field, together with a method to set it. For example, the annotator SBL::Models::T_Name_annotator_for_atoms requires the method SBL::Models::T_Particle_with_annotations::get_annotations, which returns a structure with a field atom_name. If one wants to use the annotator with another particle data structure, SBL::Models::T_Name_annotator_for_atoms is templated by a functor SetAnnotation that allows customizing the way an annotation is set on a particle.
Note that for the annotator SBL::Models::T_Name_annotator_for_atoms, there are default annotations used when no annotation file is provided – see the reference manual of the corresponding annotator.
The important C++ models of the C++ concept ParticleAnnotator are listed on the main page of the package (ParticleAnnotator). Two of them in particular are discussed in the sequel.
The C++ model SBL::Models::T_Generic_annotator provides a generic basic way to load annotations from a plain text file and to annotate particles. The class SBL::Models::T_Generic_annotator has seven template arguments (the first four are mandatory, the remaining three are optional):
If one wants to use the generic annotator while loading the annotations from a data structure in main memory, it is possible to use a simpler class SBL::Models::T_Generic_annotator_without_file having three template parameters:
It is possible to use multiple annotations using the class SBL::Models::T_Particle_annotator_collector< ParticleAnnotator1 , ParticleAnnotator2 >, that is itself a C++ model of the C++ concept ParticleAnnotator. The two template parameters are different C++ models of the C++ concept ParticleAnnotator, and the class SBL::Models::T_Particle_annotator_collector aims to use the functionality of both annotators (for annotating, and for loading). It is possible to compose with the class SBL::Models::T_Particle_annotator_collector itself for using more than two annotations at once.
The annotators in a collection may be initialized either from annotations' files or C++ data structures.
Note that some annotations have dependencies with one another. For example the class SBL::Models::T_Radius_annotator_for_particles_with_annotated_name requires that the particle has an annotated name. For this reason, there is an ordering induced by the order of the template parameters of the class SBL::Models::T_Particle_annotator_collector : the first annotator is always called before the second one. In the case of SBL::Models::T_Radius_annotator_for_particles_with_annotated_name , one must set first the annotated name (first template parameter), then the radius (second template parameter).
Consider the example of section Example : Running a Program With a Particle Annotator, where one wants to run the program on a molecule, and post process the result to analyze the volume of Secondary Structure Elements (SSE), and the exposure of atoms depending on their hydrophobic properties. We now show how to instantiate annotations and their annotators to carry out such analysis.
In the SBL, annotated particles are minimally decorated with a name, a radius and a list of optional annotations. This means that each time an annotated particle type is used, it has at least these requirements. For example, in the source file of the program , the class SBL::Models::Atom_with_flat_info_and_annotations_traits_epick is used to represent a particle type with an annotated radius deduced from an annotated name, and possibly holding optional annotations loaded from a file.
We now show how to define a custom annotator for using SSE annotators. First, in addition to the previous annotations, a serializable annotation class has to be defined for representing the SSE information. See Generalities: IO operations, Serialization for a definition of serializable.
Note that the class SBL::Models::T_Atom_with_hierarchical_info_and_annotations_traits_epick is now templated by the SSE_annotation type, creating annotations for a name and a radius field, a list of optional annotations, and SSE information.
The class SBL::Models::T_Generic_annotator_without_file is then used for defining the annotator that will assign the SSE annotations. In addition to the SSE annotations class, two template parameters have to be defined:
Note that for using all the annotators, one must use the class SBL::Models::T_Particle_annotator_collector with the SSE annotator and the default annotator available from the used particle traits class.
This ends the example by handling the SSE annotations (case (ii)).
If the model SBL::Models::T_Generic_annotator does not provide the required functionalities, one may develop his/her own C++ model of ParticleAnnotator. A C++ model of the C++ concept ParticleAnnotator has the following three requirements:
If the provided generic annotators do not fulfill one's needs, it is also possible to create a new model of ParticleAnnotator following the requirements of section Developing new models of ParticleAnnotator concept. The following example is the source code of the class SBL::Models::No_particle_annotator which is a dummy annotator. However, all requirements are present, giving a template for models of ParticleAnnotator :