Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Multiple_archives_serialization

Authors: F. Cazals and T.Dreyfus

Introduction

This package aims to use the serialization paradigm of the Boost library for saving and reconstructing data on several files:

– the main archive, that is a xml file containing the main data structure

– the secondary archive, that is a plain text or binary file containing secondary information mandatory for reconstructing the data from the main archive.

Serialization from the Boost library

For a complete description of the Serialization package of the Boost library, we refer the reader to the Boost user manual . The serialization is done on two steps:

  • first, the description of the information of a data structure to save into / load from an archive.
  • second, the description of the archive file that contains or will contain the information related to a data structure;

For the first step, there are two ways for describing the serialization of a data:

  • in the first version, the existing method boost::serialization::serialize is overloaded : this is called the non-intrusive serialization since the method is defined outside of the structure it serializes (see section Non Intrusive Serialization).
  • in the second version, a method serialize has to be declared in the class itself : this is called the intrusive serialization (see section Splitted Serialization).

Furthermore, it is possible to split the serialization onto two methods: one for loading (method load), and one for saving (method save). Examples of splitted serialization are given in the section Splitted Serialization .

For the second step, Boost provides several archives corresponding to different file formats (binary, plain text, xml). To each format corresponds two type of archives: for loading (i.e input archives) and for saving (i.e output archives). The archives can be used in the same way as a stream with the operators << and >>. However, for the xml archives, a name has to be associated to the data to serialize: this name is used as a tag in the xml file encapsulating the serialized data. To associate a name to a data, Boost provides the boost::serialization::make_nvp method.

Multiple Archives Serialization

The two main classes are SBL::IO::T_Multiple_archives_serialization_xml_oarchive for output archives and SBL::IO::T_Multiple_archives_serialization_xml_iarchive for input archives. These two classes are described in the following.

Output

In order to save in several archives a data structure, this package provides the class SBL::IO::T_Multiple_archives_serialization_xml_oarchive< DataType , OutputArchive , IsLessData > :

  • the first template argument is the type of data that will be stored in the secondary archive. When saving a data, an index is created and associated to this data in a map from the data to its index.
  • the second template argument is the type of the second archive that will contain the data. It is by default a text output archive, but can be any kind of the boost output archive.
  • the third template argument is an ordering over the data, since we need to store them in a map. By default, it takes the natural ordering over the data.

An example of use of SBL::IO::T_Multiple_archives_serialization_xml_oarchive is given in section Output XML Archive .

Constructing the Archive

There are three ways for creating an object of type SBL::IO::T_Multiple_archives_serialization_xml_oarchive< DataType , OutputArchive >:

– by giving output streams for the xml output archive and the secondary archive,

– by giving output streams only for the xml output archive,

– by giving another archive with already serialized data in the secondary archive.

In the latter case, no secondary archive will be created, and all information on the data that are not stored in the xml output archive will be lost: this is useful when there is no need of reconstruction from the xml archive, and it is too heavy to save all the information of all data in the xml archive.

Saving the Data

During the serialization of a data structure, the data is stored in a map with its serialization index, i.e an index identifying the data in the map. If a secondary archive exists, the data is also stored in this archive. Note that a unique index and a unique copy of each data is stored, even if the data exists in multiple copies.

It is also possible to call the method SBL::IO::T_Multiple_archives_serialization_xml_oarchive::store_data for storing the data.

Serializing the Data Structure

This step is exactly the same as usual, except that the index of serializing objects of type DataType will be also saved. Since all the data has to be already stored, pointers and references over objects of type DataType are treated exactly in the same manner (that is not the case for the boost archives). Note that if one wants to serialize partially an object of type DataType in the xml archive (and not only its index), a mechanism exists for selecting the information to put in the xml archive in any case: the method SBL::IO::is_main_archive allows to determine if a given archive is the main one or not. In the serialization method of the class DataType , one should test the type of the archive using this method in order to select what are the information to put or not in the main archive.

Input

In order to load a data structure from several archives, this package provides the class SBL::IO::T_Multiple_archives_serialization_xml_iarchive< DataType , OutputArchive >. It works in the same manner as SBL::IO::T_Multiple_archives_serialization_xml_oarchive, but with the following differences:

– the class uses a map from the indices to the objects of type DataType . Since the indices are natural numbers from 0 to n-1, n being the number of stored data, the map is a std vector. When loading the data from the secondary archive, new objects are created and pointers are stored in this vector, at the position corresponding to their index. It is possible to access to the set of all data with the method SBL::IO::T_Multiple_archives_serialization_xml_iarchive::get_data that fill a container of DataType using an output iterator.

– there is no possibility to use the class SBL::IO::T_Multiple_archives_serialization_xml_iarchive without secondary archive.

An example of use of SBL::IO::T_Multiple_archives_serialization_xml_oarchive is given in section Input XML Archive

Examples

Non Intrusive Serialization

This example show how to serialize a simple data structure. First, it defines in a non intrusive way the serialization of the data structure. Then, in the main method, it saves the data in a xml archive, and then load from this xml archive the same data.

Splitted Serialization

This example show how to split the serialization of a simple data structure. First, it defines the save and load methods in the data structure. Then, in the main method, it saves the data in a xml archive, and then load from this xml archive the same data.

Output XML Archive

This example shows how to save in two archives a simple data structure: a text archive will contain the list of serialized data to retain, and a xml archive will contain the main data structure where the serialized data is replaced by an index.

Input XML Archive

This example shows how to load from two archives a simple data structure: a text archive contains the list of serialized data, and a xml archive contains the main data structure where the serialized data is replaced by an index.