Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
|
Authors: F. Cazals and T. Dreyfus and T. O'Donnell and R. Tetley
Illustration of Proteus, a god of changing nature, by Andrea Alciato. Similarly to Proteus, proteins keep changing shape, which is of paramount importance for the regulation of their functions.
Proteins: a mix of geometry, topology, biophysics, and biology. Biomolecules in general and polypeptide chains (PC in this package) in particular are complex objects. Their description indeed involves
It should be stressed that these three categories of information exist independently and may be used independently. For example:
Functionalities offered. Because of these varying needs, this package provides functionalities to access all the information available at once. The main classes provided are:
The following piece of code illustrates how to access protein representations and their chains. For the latter, one gets access to
Consider a PDB file containing one or several chains. The loader creates one SBL::CSB::T_Polypeptide_chain_representation for each chain, and these are stored within a SBL::CSB::T_Protein_representation .
Note that one can specify the chains to be loaded. In that case, SBL::CSB::T_Protein_representation contains a map mapping a chain id to the corresponding SBL::CSB::T_Polypeptide_chain_representation .
In the sequel, we show how to access the various pieces of information. The following point should be stressed:
Covalent Structure File Loader statistics: Number of loaded covalent structures: 1 Details for each covalent structure : -- structure 1: -- -- Number of loaded atoms: 3408 -- -- Number of particles: 5406 -- -- Number of modeled particles: 3408 -- -- Number of loaded conformations: 1 -- -- Number of bonds: 5467 -- -- Number of modeled bonds: 3469 -- -- Number of built disulfide bonds: 4 / 4
In the sequel, we focus on SBL::CSB::T_Polypeptide_chain_representation , and show how to access information associated with atoms. As an illustration, the first part of the snippet shows how to count elements of each type via a map; the second one collects the temperature factors of all atoms.
In terms of data structures, dereferencing the iterator on atoms via (*it) gives access to the Molecular_system::Molecular_atom data structure.
Number of elements of each type : C : 613 H : 264 N : 193 O : 185 S : 10 min / max / average of temperature factors : 0 84.09 28.3898
residue.residue_sequence_number(); residue.insertion_code();
In the following, an iterator on the backbone is used to store all backbone atoms into three containers, respectively for Calpha, C, N. As previously, backbone atoms are returned as Molecular_system::Molecular_atom .
Found in the backbone : 129 CA, 129 C, 129 N.
Molecular residues are accessed via the class Molecular_system::Molecular_residue. Collecting residues of a given type is straightforward:
Number of residues of each type : ALA : 12 ARG : 11 ASN : 14 ASP : 7 CYS : 8 GLN : 3 GLU : 2 GLY : 12 HIS : 1 ILE : 6 LEU : 8 LYS : 6 MET : 2 PHE : 3 PRO : 2 SER : 10 THR : 7 TRP : 6 TYR : 3 VAL : 6
In the following, we show how to compute the center of mass of Calpha carbons, in a pedestrian way.
The example also calls the function SBL::CSB::T_Polypeptide_chain_representation::compute_heavy_atoms_center_of_mass() , whose name is self-explanatory.
Center of mass of CA : (53.2452, -17.1442, 7.57498) Center of mass of Heavy atoms: (53.1526, -17.1834, 7.31916)
The following example shows how to change Cartesian coordinates:
Distance between new and old center of mass: 2
In the following, we show how to access internal coordinates, namely bond lengths, valence angles, and dihedral angles. Note in particular that the latter can be used to produce the so-called ramachandran plot. See also Fig. dihedral-angles-backbone.
This first snippet illustrates iterators returning all internal coordinates, which are accessed via dedicated iterators:
Mean bond length: 2.0449 Mean valence angle: 1.63797 Mean dihedral angle: 0.222753 Mean phi angle: -1.05698 Mean psi angle: 0.52554 Mean omega angle: 0.425752
This second snippet focuses on the dihedral angles associated with a given residue:
Residue 2 has Phi angle: -1.87983, Psi angle: 2.15193 and Omega angle: 3.11996 Residue 33 has Phi angle: -1.04222, Psi angle: -0.895227 and Omega angle: -3.08817
Dihedral angles along the backbone of a polypeptide chain. By convention, the three angles display, namely , are associated with the i-th amino-acid. |
Various types of contacts are of interest in a polypeptide chain:
The class SBL::CSB::Pairwise_contacts_for_polypeptide_chain contains a set of contacts for each of the previous types. By defaults, pairs of residues in contact, identified by their sequence number (i.e. resid) are recorded.
In particular, the class SBL::CSB::T_Polypeptide_chain_contacts_finder uses CGAL's Kd_tree to perform a spatial search and find pairs of heavy atoms within a distance threshold.
As mentioned in Introduction, PC come with topological, geometric, and biophysical information. This package provides three main classes:
Internally, the topological, geometric, and biophysical pieces of information are stored into the following DS:
topological information: SBL::CSB::T_Molecular_covalent_structure, from package Molecular_covalent_structure
geometric information: Molecular_conformation, namely any data structure accommodating a conformation i.e. a d-dimensional point. The simplest such data structure is a vector of doubles, but more involved options are made available in package Molecular_conformation .
The loader SBL::IO::T_Protein_representation_loader has a number of options listed below, that can be set either from the command line using the Module_base framework, or directly using appropriate methods :
–pdb-file
string: Input PDB file–water
: loads water molecules–hetatoms
: loads hetero-atoms–coarse-level
int: coarse level for the covalent structure (0 : none, 1 : heavy atoms, 2 : residues)–no-ss-bonds
: does not perform disulfide bonds calculations–occupancy-policy
: Selection policy for atoms with no alternate location and occupancy not equal to 1. Allowed values are: 1 (all), 2 (forbbiden, default), 3 (none), 4 (max), 5 (min)–alternate
string: Alternate coordinates: alternative to be used (char)–model-number
int: ID of the model to be loaded frome the PDB file (default is 1)–chains
string: Subset of chains to be loaded.–allow-incomplete-chains
: Allows to load residues even if there are missing ones.The following example is the tutorial example presented in section Using Polypeptide chain and protein representations snippet by snippet :
Prerequisites. The manipulation of atoms involves three sets of indices:
Finally, recall that an atom is term embeded if it has Cartesian coordinates. In particular, all missing atoms in a PDB files are stored in the graph representing the molecule, but are not embedded.
Particle info for proteins. The class SBL::CSB::T_Particle_info_for_proteins is a record providing the required information, namely: pointer to the atom itself, res_id, ins_code, res_name, atom_name, chain_id.
The particle info is used as a key in a map as follows:
typename T_Molecular_covalent_structure<ParticleInfo>::Particle_rep T_Molecular_covalent_structure<ParticleInfo>::add_particle(const Particle_info& info) { Particle_rep p = boost::add_vertex(this->m_graph); this->m_graph[p] = info; // Particle_info to Particle_rep this->m_particleInfo_to_particleRep.insert(std::make_pair(this->m_graph[p], p)); // ParticleRep to linearPosition this->m_particleRep_to_linearPosition.push_back(p); return p; }
Molecular_covalent_structure. The covalent structure is represented using the class SBL::CSB::T_Molecular_covalent_structure. It is effectively built by the class SBL::CSB::T_Molecular_covalent_structure_builder_for_proteins, which creates the individual amino acids and links them into the graph representing the molecule. This creation involves all atoms of all residues, be they present in the PDB file or not – see the notion of embedded atom below.
In the following, we use the following types, as defined above:
Key members of the class regarding indices:
// The boost graph used to represent the structure Covalent_structure_graph m_graph; // Mapping particleInfo to graph vertex/particleRep ParticleInfo_to_particleRep_map_type m_particleInfo_to_particleRep; // Mapping the particleRep (an int, actually), to the linear position used in the conformation std::vector<int> m_particleRep_to_linearPosition;
In the class SBL::CSB::T_Molecular_covalent_structure, the linear position is set/obtained as follows
void set_particle_linearPosition(Particle_rep p, int position){ this->m_particleRep_to_linearPosition[p] = position; } int get_particle_linearPosition(Particle_rep p)const{ assert(p < this->m_particleRep_to_linearPosition.size()); return this->m_particleRep_to_linearPosition[p]; }
Then, the linear position is used to obtain the x/y/z coordinates from the conformation, as follows:
<ParticleInfo>::get_x(const Conformation& C, Particle_rep p)const{ assert(this->is_embedded(p)); return SBL::Models::T_Conformation_traits<Conformation>::at(C, 3*this->get_particle_linearPosition(p)); }
The linear position is also used to know whether a particle has been assigned Cartesian coordinates:
bool is_embedded(Particle_rep p)const{ return this->get_particle_linearPosition(p) >= 0; }
Conformations. The class SBL::Model::T_Conformation_traits proposes to store a conformation as a vector of float types representing coordinates.
Such conformations are then stored by the PDB loader SBL::IO::T_Protein_representation_loader
std::vector<std::vector<Conformation_type> > m_conformations_ensembles;
These conformations are then passed to the individual chains stored as instances of the class SBL::CSB::T_Polypeptide_chain_representation.
Summary. Summarizing, the construction of instances of the class SBL::CSB::T_Polypeptide_chain_representation involves the following steps:
T_Polypeptide_chain_representation. The high level class SBL::CSB::T_Polypeptide_chain_representation proposes high level operations, in particular the ability to obtain internal coordinates from atomids – specified in the PDB file. Such operations use the internal representation of atoms via their particle representations. As an example, consider:
To provide these high level operations, we use the map:
//! The covalent structure of the protein Molecular_covalent_structure& m_covalent_structure; //! The conformation of the protein Conformation_type m_conformation; // mapping a \sbl_ref_package{Molecular_system} atom via its atom id to the corresponding Particle_rep ie Particle_vertex_descriptor typedef typename std::map<unsigned, Particle_rep> AtomId_to_particleRep_map_type; AtomId_to_particleRep_map_type m_atomId_to_particleRep;
As an example, SBL::CSB::T_Polypeptide_chain_representation::Bond_angle proceeds as follows:
//! Return the value of this angle. FT get_valence_angle() const { //get the internal particle reps Particle_rep a = this->m_P->m_atomId_to_particleRep.at(boost::get<0>(this->m_bond_angle).atom_serial_number()); Particle_rep b = this->m_P->m_atomId_to_particleRep.at(boost::get<1>(this->m_bond_angle).atom_serial_number()); Particle_rep c = this->m_P->m_atomId_to_particleRep.at(boost::get<2>(this->m_bond_angle).atom_serial_number()); //get the interal bond reps std::pair<bool, Bond_rep> bond1 = this->m_P->m_covalent_structure.get_bond_rep(a, b); std::pair<bool, Bond_rep> bond2 = this->m_P->m_covalent_structure.get_bond_rep(b, c); assert(bond1.first && bond2.first); //get the internal bond angle rep std::pair<bool, Bond_angle_rep> bond_angle = this->m_P->m_covalent_structure.get_bond_angle_rep(bond1.second, bond2.second); //return the valence angle return this->m_ic.get_valence_angle(this->m_P->m_conformation, this->m_P->m_covalent_structure, bond_angle.second); }
Coherence. For polypeptide chains represented by the class SBL::CSB::T_Protein_representation, the coherence between these maps is ensured by the following function
void T_Polypeptide_chain_representation<ParticleTraits, MolecularCovalentStructure, ConformationType>:: build_model_to_covalent_structure_mapping() { for(Particles_iterator it = this->particles_begin(); it != this->particles_end(); it++){ this->m_atomId_to_particleRep[(*this)[*it]->atom_serial_number()] = *it; this->m_covalent_structure.set_particle_linearPosition((*it),m_atomId_to_particleRep.size()-1); } };
Due to the previous, the iterators of m_covalent_structure can now be used to extract particles and compute internal coordinate values using m_conformation as the coordinates of particles or for any other purpose requiring simultaneous use of m_covalent_structure and m_conformation.
This package also offers several useful programs to inspect properties of proteins / their conformations: