Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
T_Molecular_covalent_structure_loader< Molecular_covalent_structure_builder_ > Class Template Reference

Loader for covalent structures from PDB / mmCIF files. More...

#include <Molecular_covalent_structure_loader.hpp>

Public Member Functions

virtual boost::program_options::options_description add_options (void) override
 Virtual method for adding options to the module. More...
 
virtual bool check_options (std::string &message) const override
 Checks that the input options' values are coherent. More...
 
bool load (unsigned verbose=false, std::ostream &out=std::cout) override
 Load function. More...
 
std::string get_name (void) const override
 Return the name of the class itself. More...
 
std::string get_output_prefix (void) const override
 Returns a prefix that concatains the input line options used when running the module. More...
 
const std::vector< Molecular_covalent_structure > & get_molecular_covalent_structures (void) const
 Retrieves the molecular covalent structures loaded from PDB / mmCIF file(s). More...
 
std::vector< Molecular_covalent_structure > & get_molecular_covalent_structures (void)
 Retrieves the molecular covalent structures loaded from PDB / mmCIF file(s). More...
 
const std::vector< bool > & are_files_valid () const
 Tells whether a loaded file led to the construction of a Molecular_covalent_structure. More...
 
void statistics (std::ostream &out)
 Prints statistics about the loaded PDB files. More...
 
 SET_AND_GET (allow_incomplete_chains, bool)
 By default, set to false, incomplete chains are not allowed, forbidding the construction of incomplete chains and Molecular_covalent_structures. More...
 
 SET_AND_GET (coarse_level, unsigned)
 By default set to 0, loading every single atom as a particle in the Molecular_covalent_structure. More...
 
 SET_AND_GET (max_bond_distance, double)
 By default set to 3 angstom. More...
 
 SET_AND_GET (ss_bond_search, bool)
 By default set to false. More...
 
const std::vector< std::shared_ptr< Molecular_system > > & get_molecular_systems (void) const
 Retrieves the molecular systems loaded from the PDB / mmCIF file(s). This function returns a constant reference to the vector containing the molecular systems loaded from the PDB file(s). More...
 
void statistics (std::ostream &out) const
 Prints statistics about the loaded PDB files. This function prints statistics about the loaded PDB files, including the number of files, their details, and various counts related to the loaded molecular systems. More...
 
void check_validity (void) const
 Performs validity / sanity checks on the loaded PDB file. It notably controls that each residue as a Calpha, and identify missing residues in the sequence. As it returns void, it is only interesting as an informational purpose. More...
 
 SET_AND_GET (loaded_file_paths, std::vector< std::string >)
 Provide a list of file paths to load, pdb and cif formats and are accepted, as well as these format compressed to gz / tar.gz. More...
 
 SET_AND_GET (loaded_chains, std::vector< std::vector< std::string >>)
 By default, all chains are loaded for all files For each file, one can select a different subset of chains, which will be the same for all possible models extracted from the file. If no chain is provided for a given file, then all chains are loaded. The hypothesis is that if the user doesn't want to load any chain, it's better not to add the file at all. More...
 
 SET_AND_GET (loaded_models, std::vector< std::vector< std::size_t >>)
 By default, all models are loaded for all files. For each file, one can select a different subset of models. If no model id provided for a given file, then all models are loaded. The hypothesis is that if the user doesn't want to load any model, it's better not to add the file at all. More...
 
 SET_AND_GET (loaded_water, bool)
 By default, the water molecules are not loaded. Associated atoms are filtered. To load water molecules is set to true (even if loaded_hetatoms is set to false), if set to false, water molecules are not loaded (even if loaded_hetatoms is set to true) More...
 
 SET_AND_GET (loaded_hetatoms, bool)
 By default, hetero atoms are not loaded. These are filtered. More...
 
 SET_AND_GET (loaded_hydrogens, bool)
 By default, hydrogen atoms are not loaded. These are filtered. More...
 
 SET_AND_GET (loaded_occupancy_mode, unsigned)
 By default, the occupancy policy is set to MAX. More...
 
 SET_AND_GET (loaded_alternate_selected, char)
 By default, the alternate atom is chosen according to the occupancy policy. If specified, the alternate atom with alternate_selected charater will be selected. More...
 
 SET_AND_GET (loaded_b_factor_limit, double)
 By default, there is no limit to the b_factor (double numeric limit). Hence, by default, no atom is filtered according to this property. More...
 
 SET_AND_GET (pdb_checker, unsigned)
 By default, no check is realized on the molecular systems. More...
 

Static Public Member Functions

static boost::program_options::options_description *& get_options (void)
 Access to the options' description of the module. More...
 

Protected Member Functions

void load_molecular_covalent_structures (void)
 Loads the Molecular_covalent_structures. More...
 
void load_molecular_systems (void)
 Loads the molecular systems from the specified files taking into account all the provided options. More...
 
void delete_molecular_systems (void)
 Deletes instances of molecular systems allocated by the loader. More...
 
void add_cif_atom_to_sbl_residue (const cif::mm::atom &cif_atom, typename Molecular_system::Residue &sbl_residue, bool is_hetatm)
 Adds a CIF atom or hetatom to a SBL residue. This function adds a CIF atom or hetatom to a SBL residue. More...
 

Protected Attributes

std::vector< std::size_t > m_atm_discarded
 Number of actually discarded atoms for each loaded PDB file (not the sum of all discarded atoms in each category). More...
 
std::vector< std::size_t > m_model_discarded
 Number of discarded atom for not beeing in loaded models. More...
 
std::vector< std::size_t > m_chain_discarded
 Number of discarded atom for not beeing in loaded chains. More...
 
std::vector< std::size_t > m_temp_discarded
 Number of discarded atom for having a too great B factor for each loaded PDB file. More...
 
std::vector< std::size_t > m_htm_discarded
 Number of discarded hetero-atoms for each loaded PDB file. More...
 
std::vector< std::size_t > m_hoh_discarded
 Number of discarded water atoms for each loaded PDB file. More...
 
std::vector< std::size_t > m_h_discarded
 Number of discarded hydrogens for each loaded PDB file. More...
 
std::vector< std::size_t > m_alt_discarded
 Number of discarded alternate location atoms for each loaded PDB file. More...
 

Management

void set_loader_instance_name (const std::string &loader_instance_name)
 
const std::string & get_loader_instance_name (void) const
 

Detailed Description

template<class Molecular_covalent_structure_builder_>
class SBL::IO::T_Molecular_covalent_structure_loader< Molecular_covalent_structure_builder_ >

Loader for covalent structures from PDB / mmCIF files.

Loader for covalent structures from PDB / mmCIF files.

Template Parameters
MolecularCovalentStructureBuilderBuilder of the covalent structure.

Member Function Documentation

◆ add_cif_atom_to_sbl_residue()

void add_cif_atom_to_sbl_residue ( const cif::mm::atom &  cif_atom,
typename Molecular_system::Residue &  sbl_residue,
bool  is_hetatm 
)
protectedinherited

Adds a CIF atom or hetatom to a SBL residue. This function adds a CIF atom or hetatom to a SBL residue.

Parameters
cif_atomThe CIF atom to be added.
sbl_residueThe SBL residue to which the CIF atom / hetatom will be added.
is_hetatmflag indicating whether the atom is a heteroatom.

◆ add_options()

boost::program_options::options_description add_options ( void  )
overridevirtual

Virtual method for adding options to the module.

Adds options specific to the "Covalent structures loader" to the option parser and also as command line arguments.

Reimplemented from T_Molecular_system_loader< Molecular_covalent_structure_builder_::Molecular_covalent_structure::Particle_info::Particle_traits::Molecular_system >.

◆ are_files_valid()

const std::vector< bool > & are_files_valid

Tells whether a loaded file led to the construction of a Molecular_covalent_structure.

To be accessed after load is called.

Returns
A constant reference to the vector of booleans

◆ check_options()

bool check_options ( std::string &  message) const
overridevirtual

◆ check_validity()

void check_validity ( void  ) const
inherited

Performs validity / sanity checks on the loaded PDB file. It notably controls that each residue as a Calpha, and identify missing residues in the sequence. As it returns void, it is only interesting as an informational purpose.

◆ delete_molecular_systems()

void delete_molecular_systems ( void  )
protectedinherited

Deletes instances of molecular systems allocated by the loader.

◆ get_molecular_covalent_structures() [1/2]

std::vector< typename Molecular_covalent_structure_builder_::Molecular_covalent_structure > & get_molecular_covalent_structures ( void  )

Retrieves the molecular covalent structures loaded from PDB / mmCIF file(s).

This function returns a reference to the vector containing the molecular covalent structures loaded from the PDB / mmCIF file(s).

Returns
A reference to the vector of molecular covalent structures

◆ get_molecular_covalent_structures() [2/2]

const std::vector< typename Molecular_covalent_structure_builder_::Molecular_covalent_structure > & get_molecular_covalent_structures ( void  ) const

Retrieves the molecular covalent structures loaded from PDB / mmCIF file(s).

This function returns a constant reference to the vector containing the molecular covalent structures loaded from the PDB / mmCIF file(s).

Returns
A constant reference to the vector of molecular covalent structures

◆ get_molecular_systems()

const std::vector< std::shared_ptr< Molecular_covalent_structure_builder_::Molecular_covalent_structure::Particle_info::Particle_traits::Molecular_system > > & get_molecular_systems ( void  ) const
inlineinherited

Retrieves the molecular systems loaded from the PDB / mmCIF file(s). This function returns a constant reference to the vector containing the molecular systems loaded from the PDB file(s).

Returns
A constant reference to the vector of booleans

◆ get_name()

std::string get_name ( void  ) const
overridevirtual

Return the name of the class itself.

Reimplemented from Loader_base.

◆ get_options()

static boost::program_options::options_description*& get_options ( void  )
inlinestaticinherited

Access to the options' description of the module.

◆ get_output_prefix()

std::string get_output_prefix ( void  ) const
overridevirtual

Returns a prefix that concatains the input line options used when running the module.

Reimplemented from T_Module_option_description< Dummy >.

◆ load()

bool load ( unsigned  verbose = false,
std::ostream &  out = std::cout 
)
overridevirtual

Load function.

Parameters
verboseVerbosity level.
[out]outOutput stream.
Returns
True if loading is successful, false otherwise. Function to call to execute the main function of a Loader This function loads the molecular covalent structures based on the specified options.

Reimplemented from Loader_base.

◆ load_molecular_covalent_structures()

void load_molecular_covalent_structures ( void  )
protected

Loads the Molecular_covalent_structures.

Retrieve the sequences of residues from the files, then builds as many Molecular_covalent_structure based on these sequences of residues. Incomplete chains can be allowed (missing residues), in which case, parts of the chains will be isolated and the number of components will therefore be increased. The Molecular_covalent_structures are then updated : terminal residues can be ionised–one proton added, water molecules can be added, disulfide bonds can be added from file information and geometrical search.

Returns
nothing

◆ load_molecular_systems()

void load_molecular_systems ( void  )
protectedinherited

Loads the molecular systems from the specified files taking into account all the provided options.

◆ SET_AND_GET() [1/14]

SET_AND_GET ( allow_incomplete_chains  ,
bool   
)

By default, set to false, incomplete chains are not allowed, forbidding the construction of incomplete chains and Molecular_covalent_structures.

◆ SET_AND_GET() [2/14]

SET_AND_GET ( coarse_level  ,
unsigned   
)

By default set to 0, loading every single atom as a particle in the Molecular_covalent_structure.

1 for loading heavy atoms only as particles, and 2 for loading residues as particles

◆ SET_AND_GET() [3/14]

SET_AND_GET ( loaded_alternate_selected  ,
char   
)
inherited

By default, the alternate atom is chosen according to the occupancy policy. If specified, the alternate atom with alternate_selected charater will be selected.

◆ SET_AND_GET() [4/14]

SET_AND_GET ( loaded_b_factor_limit  ,
double   
)
inherited

By default, there is no limit to the b_factor (double numeric limit). Hence, by default, no atom is filtered according to this property.

◆ SET_AND_GET() [5/14]

SET_AND_GET ( loaded_chains  ,
std::vector< std::vector< std::string >>   
)
inherited

By default, all chains are loaded for all files For each file, one can select a different subset of chains, which will be the same for all possible models extracted from the file. If no chain is provided for a given file, then all chains are loaded. The hypothesis is that if the user doesn't want to load any chain, it's better not to add the file at all.

◆ SET_AND_GET() [6/14]

SET_AND_GET ( loaded_file_paths  ,
std::vector< std::string >   
)
inherited

Provide a list of file paths to load, pdb and cif formats and are accepted, as well as these format compressed to gz / tar.gz.

◆ SET_AND_GET() [7/14]

SET_AND_GET ( loaded_hetatoms  ,
bool   
)
inherited

By default, hetero atoms are not loaded. These are filtered.

◆ SET_AND_GET() [8/14]

SET_AND_GET ( loaded_hydrogens  ,
bool   
)
inherited

By default, hydrogen atoms are not loaded. These are filtered.

◆ SET_AND_GET() [9/14]

SET_AND_GET ( loaded_models  ,
std::vector< std::vector< std::size_t >>   
)
inherited

By default, all models are loaded for all files. For each file, one can select a different subset of models. If no model id provided for a given file, then all models are loaded. The hypothesis is that if the user doesn't want to load any model, it's better not to add the file at all.

◆ SET_AND_GET() [10/14]

SET_AND_GET ( loaded_occupancy_mode  ,
unsigned   
)
inherited

By default, the occupancy policy is set to MAX.

◆ SET_AND_GET() [11/14]

SET_AND_GET ( loaded_water  ,
bool   
)
inherited

By default, the water molecules are not loaded. Associated atoms are filtered. To load water molecules is set to true (even if loaded_hetatoms is set to false), if set to false, water molecules are not loaded (even if loaded_hetatoms is set to true)

◆ SET_AND_GET() [12/14]

SET_AND_GET ( max_bond_distance  ,
double   
)

By default set to 3 angstom.

Controls the maximum distance between two bond atoms that is tolerated upon checking for incomplete chains

◆ SET_AND_GET() [13/14]

SET_AND_GET ( pdb_checker  ,
unsigned   
)
inherited

By default, no check is realized on the molecular systems.

◆ SET_AND_GET() [14/14]

SET_AND_GET ( ss_bond_search  ,
bool   
)

By default set to false.

If true, a geometrical search, based on the positions of sulfur atoms of cysteins will be conducted. Close enough sulfur atoms will be bond.

◆ statistics() [1/2]

void statistics ( std::ostream &  out)
inline

Prints statistics about the loaded PDB files.

This function prints statistics about the loaded covalent structures.

Parameters
outThe output stream to which the statistics will be printed.

◆ statistics() [2/2]

void statistics ( std::ostream &  out) const
inherited

Prints statistics about the loaded PDB files. This function prints statistics about the loaded PDB files, including the number of files, their details, and various counts related to the loaded molecular systems.

Parameters
outThe output stream to which the statistics will be printed.

Member Data Documentation

◆ m_alt_discarded

std::vector<std::size_t> m_alt_discarded
protectedinherited

Number of discarded alternate location atoms for each loaded PDB file.

◆ m_atm_discarded

std::vector<std::size_t> m_atm_discarded
protectedinherited

Number of actually discarded atoms for each loaded PDB file (not the sum of all discarded atoms in each category).

◆ m_chain_discarded

std::vector<std::size_t> m_chain_discarded
protectedinherited

Number of discarded atom for not beeing in loaded chains.

◆ m_h_discarded

std::vector<std::size_t> m_h_discarded
protectedinherited

Number of discarded hydrogens for each loaded PDB file.

◆ m_hoh_discarded

std::vector<std::size_t> m_hoh_discarded
protectedinherited

Number of discarded water atoms for each loaded PDB file.

◆ m_htm_discarded

std::vector<std::size_t> m_htm_discarded
protectedinherited

Number of discarded hetero-atoms for each loaded PDB file.

◆ m_model_discarded

std::vector<std::size_t> m_model_discarded
protectedinherited

Number of discarded atom for not beeing in loaded models.

◆ m_temp_discarded

std::vector<std::size_t> m_temp_discarded
protectedinherited

Number of discarded atom for having a too great B factor for each loaded PDB file.