Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
|
Authors: F. Cazals and S. Loriot and C. Le Breton
Hierarchy. This package allows to load and store molecular systems from PDB and mmCIF files described below.
A Molecular system, in the context of a PDB (Protein Data Bank) or mmCIF (macromolecular Crystallographic Information File) file, refers to a representation of biomolecules such as proteins, nucleic acids, or complexes thereof. It serves as a hierarchical data structure that organizes and stores the information extracted from these files in a standardized format.
A Molecular system represents a filtered version of the biomolecular entity described in the file. It includes a subset of the information extracted and organized from the file. The Molecular system selectively captures and stores relevant data necessary for structural analysis and modeling. It provides a systematic and organized framework to represent and navigate through the various components of the biomolecule.
Let us inspect these levels:
File formats. Molecular systems are (generally) loaded from PDB/mmCIF files parsed using the libcifpp library. PDB/mmCIF files contains a variety of pieces of information (SSE, etc); see the package Molecular_system to see which are loaded into SBL data structures. Practically, the following file formats are used:
mmCFI and loaded pieces of information. A molecular system is based on the atomic information provided in PDB or mmCIF files.
The following pieces of information are retrieved from a mmCIF file: atomic information, SSE, SS bonds.
Filtering of atoms. The class SBL::IO::T_Molecular_system_loader from the package Molecular_system : implements the following default behaviors:
Several sets of these molecular items are provided in the SBL:
Since each of the molecular items is templated by the Molecular_system, it is possible to determine the type of each item based on the others. For instance, with a reference to a Molecular_atom, the Molecular_residue, Molecular_chain, Molecular_model, and Molecular_system to which it belongs can be inferred and retrieved.
Relationship to covalent structure Molecular_covalent_structure. A molecular covalent structure is a graph whose vertices have properties called particle info, which in the simplest case is a string. For a protein SBL::CSB::Particle_info_for_protein contains a pointer to an instance of SBL::CSB::Molecular_atom. (NB: this pointer is null if the atom is absent from the input file, e.g. a hydrogen atom.)
For a polypeptide chain, the construction of the covalent structure from the molecular system is possible if all amino acids are present. If not using the amino acid sequence is mandatory.
We also note in passing that disulfide bridges are defined from pairwise CYS within a prescribed distance threshold, see Pointwise_interactions. Note that this strategy makes it possible to identify SS bonds not listed in the mmCIF file.
Relationship to molecular conformations Molecular_conformation. A molecular model corresponds to one conformation–setting aside alternate locations. A conformation can be built from a molecular conformation, see Molecular_conformation.
Relationship to protein representation Protein_representation. The classes SBL::CSB::Polypeptide_chain and SBL::CSB::Protein_representation bridge the gap between atoms in the covalent structure and coordinates in the conformation. The mapping between both is done using the chain/resid/atom ids. Atoms in the covalent structure devoid of coordinates are termed not embedded.
Labels. The previous hierarchy does not take into account the possible assignment of chains/residues/atoms to specific sub-systems or partners. This possibility is provided by classes from the package MolecularSystemLabelsTraits.