Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
T_Molecular_system_loader< Molecular_system_ > Class Template Reference

Class for loading molecular systems. More...

#include <Molecular_system_loader.hpp>

Public Member Functions

 T_Molecular_system_loader (void)
 Default constructor. More...
 
virtual ~T_Molecular_system_loader (void)
 Destructor. More...
 
virtual boost::program_options::options_description add_options (void) override
 Virtual method for adding options to the module. More...
 
virtual bool check_options (std::string &message) const override
 Checks that the input options' values are coherent. More...
 
std::string get_output_prefix (void) const override
 Returns a prefix that concatains the input line options used when running the module. More...
 
bool load (unsigned verbose, std::ostream &out) override
 Load function. More...
 
std::string get_name (void) const override
 Return the name of the class itself. More...
 
const std::vector< std::shared_ptr< Molecular_system > > & get_molecular_systems (void) const
 Retrieves the molecular systems loaded from the PDB / mmCIF file(s). This function returns a constant reference to the vector containing the molecular systems loaded from the PDB file(s). More...
 
const std::vector< bool > & are_files_valid () const
 Tells whether a loaded file led to the construction of a Molecular_system. More...
 
void statistics (std::ostream &out) const
 Prints statistics about the loaded PDB files. This function prints statistics about the loaded PDB files, including the number of files, their details, and various counts related to the loaded molecular systems. More...
 
void check_validity (void) const
 Performs validity / sanity checks on the loaded PDB file. It notably controls that each residue as a Calpha, and identify missing residues in the sequence. As it returns void, it is only interesting as an informational purpose. More...
 
 SET_AND_GET (loaded_file_paths, std::vector< std::string >)
 Provide a list of file paths to load, pdb and cif formats and are accepted, as well as these format compressed to gz / tar.gz. More...
 
 SET_AND_GET (loaded_chains, std::vector< std::vector< std::string >>)
 By default, all chains are loaded for all files For each file, one can select a different subset of chains, which will be the same for all possible models extracted from the file. If no chain is provided for a given file, then all chains are loaded. The hypothesis is that if the user doesn't want to load any chain, it's better not to add the file at all. More...
 
 SET_AND_GET (loaded_models, std::vector< std::vector< std::size_t >>)
 By default, all models are loaded for all files. For each file, one can select a different subset of models. If no model id provided for a given file, then all models are loaded. The hypothesis is that if the user doesn't want to load any model, it's better not to add the file at all. More...
 
 SET_AND_GET (loaded_water, bool)
 By default, the water molecules are not loaded. Associated atoms are filtered. To load water molecules is set to true (even if loaded_hetatoms is set to false), if set to false, water molecules are not loaded (even if loaded_hetatoms is set to true) More...
 
 SET_AND_GET (loaded_hetatoms, bool)
 By default, hetero atoms are not loaded. These are filtered. More...
 
 SET_AND_GET (loaded_hydrogens, bool)
 By default, hydrogen atoms are not loaded. These are filtered. More...
 
 SET_AND_GET (loaded_occupancy_mode, unsigned)
 By default, the occupancy policy is set to MAX. More...
 
 SET_AND_GET (loaded_alternate_selected, char)
 By default, the alternate atom is chosen according to the occupancy policy. If specified, the alternate atom with alternate_selected charater will be selected. More...
 
 SET_AND_GET (loaded_b_factor_limit, double)
 By default, there is no limit to the b_factor (double numeric limit). Hence, by default, no atom is filtered according to this property. More...
 
 SET_AND_GET (pdb_checker, unsigned)
 By default, no check is realized on the molecular systems. More...
 

Static Public Member Functions

static boost::program_options::options_description *& get_options (void)
 Access to the options' description of the module. More...
 

Protected Member Functions

void load_molecular_systems (void)
 Loads the molecular systems from the specified files taking into account all the provided options. More...
 
void delete_molecular_systems (void)
 Deletes instances of molecular systems allocated by the loader. More...
 
void add_cif_atom_to_sbl_residue (const cif::mm::atom &cif_atom, typename Molecular_system::Residue &sbl_residue, bool is_hetatm)
 Adds a CIF atom or hetatom to a SBL residue. This function adds a CIF atom or hetatom to a SBL residue. More...
 

Protected Attributes

std::vector< bool > m_file_is_valid
 Number of files that were discarded because they did not exhibit the minimum set of required variables. More...
 
std::vector< std::size_t > m_atm_discarded
 Number of actually discarded atoms for each loaded PDB file (not the sum of all discarded atoms in each category). More...
 
std::vector< std::size_t > m_model_discarded
 Number of discarded atom for not beeing in loaded models. More...
 
std::vector< std::size_t > m_chain_discarded
 Number of discarded atom for not beeing in loaded chains. More...
 
std::vector< std::size_t > m_temp_discarded
 Number of discarded atom for having a too great B factor for each loaded PDB file. More...
 
std::vector< std::size_t > m_htm_discarded
 Number of discarded hetero-atoms for each loaded PDB file. More...
 
std::vector< std::size_t > m_hoh_discarded
 Number of discarded water atoms for each loaded PDB file. More...
 
std::vector< std::size_t > m_h_discarded
 Number of discarded hydrogens for each loaded PDB file. More...
 
std::vector< std::size_t > m_alt_discarded
 Number of discarded alternate location atoms for each loaded PDB file. More...
 

Management

void set_loader_instance_name (const std::string &loader_instance_name)
 
const std::string & get_loader_instance_name (void) const
 

Detailed Description

template<typename Molecular_system_ = Default_molecular_system>
class SBL::IO::T_Molecular_system_loader< Molecular_system_ >

Class for loading molecular systems.

This class provides functionality for loading molecular systems using various options. The requirements on the files are the following :

  • Cartesian coordinates x, y, z : while it could make sense in a SBL::IO::T_Molecular_covalent_structure_loader to not have any cartesian coordinate, for a molecular system it is of no sense, since Molecular_atom need to be positioned
  • Ids : atom serial number must be provided. It should not be strickly required as one could infer a meaningful implicit sequence of ordered serial numbers. Espcially if no alternate locations are provided, or all occupancy factors are equal to 1. Since it would lead to heavy checks in any but these cases, atom serial numbers are required in the file.
  • Chain ids, residue ids : they are also required to build a Molecular_system.
  • If alternate locations are provided, occupancy factors must as well be provided. The various options act as filters on the atoms / heteroatoms to load. If no model id are provided, all models are loaded. If no chain id are provided, all chains are loaded. Residues for which all atoms have been filtered are not added to the molecular system. Chains for which all residues have been filtered are not added to the molecular system. Models for which all chains have been filtered is still added to the molecular system.
    Template Parameters
    Molecular_system_The type of molecular system to load

Constructor & Destructor Documentation

◆ T_Molecular_system_loader()

Default constructor.

◆ ~T_Molecular_system_loader()

~T_Molecular_system_loader ( void  )
virtual

Destructor.

Member Function Documentation

◆ add_cif_atom_to_sbl_residue()

void add_cif_atom_to_sbl_residue ( const cif::mm::atom &  cif_atom,
typename Molecular_system::Residue &  sbl_residue,
bool  is_hetatm 
)
protected

Adds a CIF atom or hetatom to a SBL residue. This function adds a CIF atom or hetatom to a SBL residue.

Parameters
cif_atomThe CIF atom to be added.
sbl_residueThe SBL residue to which the CIF atom / hetatom will be added.
is_hetatmflag indicating whether the atom is a heteroatom.

◆ add_options()

boost::program_options::options_description add_options ( void  )
overridevirtual

◆ are_files_valid()

const std::vector< bool > & are_files_valid

Tells whether a loaded file led to the construction of a Molecular_system.

To be accessed after load is called.

Returns
A constant reference to the vector of molecular systems.

◆ check_options()

◆ check_validity()

void check_validity ( void  ) const

Performs validity / sanity checks on the loaded PDB file. It notably controls that each residue as a Calpha, and identify missing residues in the sequence. As it returns void, it is only interesting as an informational purpose.

◆ delete_molecular_systems()

void delete_molecular_systems ( void  )
protected

Deletes instances of molecular systems allocated by the loader.

◆ get_molecular_systems()

const std::vector< std::shared_ptr< Molecular_system_ > > & get_molecular_systems ( void  ) const
inline

Retrieves the molecular systems loaded from the PDB / mmCIF file(s). This function returns a constant reference to the vector containing the molecular systems loaded from the PDB file(s).

Returns
A constant reference to the vector of booleans

◆ get_name()

std::string get_name ( void  ) const
overridevirtual

Return the name of the class itself.

Reimplemented from Loader_base.

◆ get_options()

static boost::program_options::options_description*& get_options ( void  )
inlinestaticinherited

Access to the options' description of the module.

◆ get_output_prefix()

std::string get_output_prefix ( void  ) const
overridevirtual

Returns a prefix that concatains the input line options used when running the module.

Reimplemented from T_Module_option_description< Dummy >.

◆ load()

bool load ( unsigned  verbose,
std::ostream &  out 
)
overridevirtual

Load function.

Parameters
verboseVerbosity level.
[out]outOutput stream.
Returns
True if loading is successful, false otherwise. Function to call to execute the main function of a Loader

Loads the molecular systems. This function loads the molecular systems based on the specified options. Returns false if the options are not consistent with each other. Returns true otherwise, even if a filepath is incorrect.

May trigger runtime error for malformed files.

Reimplemented from Loader_base.

◆ load_molecular_systems()

void load_molecular_systems ( void  )
protected

Loads the molecular systems from the specified files taking into account all the provided options.

◆ SET_AND_GET() [1/10]

SET_AND_GET ( loaded_alternate_selected  ,
char   
)

By default, the alternate atom is chosen according to the occupancy policy. If specified, the alternate atom with alternate_selected charater will be selected.

◆ SET_AND_GET() [2/10]

SET_AND_GET ( loaded_b_factor_limit  ,
double   
)

By default, there is no limit to the b_factor (double numeric limit). Hence, by default, no atom is filtered according to this property.

◆ SET_AND_GET() [3/10]

SET_AND_GET ( loaded_chains  ,
std::vector< std::vector< std::string >>   
)

By default, all chains are loaded for all files For each file, one can select a different subset of chains, which will be the same for all possible models extracted from the file. If no chain is provided for a given file, then all chains are loaded. The hypothesis is that if the user doesn't want to load any chain, it's better not to add the file at all.

◆ SET_AND_GET() [4/10]

SET_AND_GET ( loaded_file_paths  ,
std::vector< std::string >   
)

Provide a list of file paths to load, pdb and cif formats and are accepted, as well as these format compressed to gz / tar.gz.

◆ SET_AND_GET() [5/10]

SET_AND_GET ( loaded_hetatoms  ,
bool   
)

By default, hetero atoms are not loaded. These are filtered.

◆ SET_AND_GET() [6/10]

SET_AND_GET ( loaded_hydrogens  ,
bool   
)

By default, hydrogen atoms are not loaded. These are filtered.

◆ SET_AND_GET() [7/10]

SET_AND_GET ( loaded_models  ,
std::vector< std::vector< std::size_t >>   
)

By default, all models are loaded for all files. For each file, one can select a different subset of models. If no model id provided for a given file, then all models are loaded. The hypothesis is that if the user doesn't want to load any model, it's better not to add the file at all.

◆ SET_AND_GET() [8/10]

SET_AND_GET ( loaded_occupancy_mode  ,
unsigned   
)

By default, the occupancy policy is set to MAX.

◆ SET_AND_GET() [9/10]

SET_AND_GET ( loaded_water  ,
bool   
)

By default, the water molecules are not loaded. Associated atoms are filtered. To load water molecules is set to true (even if loaded_hetatoms is set to false), if set to false, water molecules are not loaded (even if loaded_hetatoms is set to true)

◆ SET_AND_GET() [10/10]

SET_AND_GET ( pdb_checker  ,
unsigned   
)

By default, no check is realized on the molecular systems.

◆ statistics()

void statistics ( std::ostream &  out) const

Prints statistics about the loaded PDB files. This function prints statistics about the loaded PDB files, including the number of files, their details, and various counts related to the loaded molecular systems.

Parameters
outThe output stream to which the statistics will be printed.

Member Data Documentation

◆ m_alt_discarded

std::vector<std::size_t> m_alt_discarded
protected

Number of discarded alternate location atoms for each loaded PDB file.

◆ m_atm_discarded

std::vector<std::size_t> m_atm_discarded
protected

Number of actually discarded atoms for each loaded PDB file (not the sum of all discarded atoms in each category).

◆ m_chain_discarded

std::vector<std::size_t> m_chain_discarded
protected

Number of discarded atom for not beeing in loaded chains.

◆ m_file_is_valid

std::vector<bool> m_file_is_valid
protected

Number of files that were discarded because they did not exhibit the minimum set of required variables.

◆ m_h_discarded

std::vector<std::size_t> m_h_discarded
protected

Number of discarded hydrogens for each loaded PDB file.

◆ m_hoh_discarded

std::vector<std::size_t> m_hoh_discarded
protected

Number of discarded water atoms for each loaded PDB file.

◆ m_htm_discarded

std::vector<std::size_t> m_htm_discarded
protected

Number of discarded hetero-atoms for each loaded PDB file.

◆ m_model_discarded

std::vector<std::size_t> m_model_discarded
protected

Number of discarded atom for not beeing in loaded models.

◆ m_temp_discarded

std::vector<std::size_t> m_temp_discarded
protected

Number of discarded atom for having a too great B factor for each loaded PDB file.