Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Authors: F. Cazals and T. Dreyfus and A. Lheritier and N. Malod-Dognin

PALSE

Goals: Alleviating Data Parsing and Analysis

On Data Perenniality, Availability, and Analysis

A tenet of Science is the ability to reproduce the results, and a related issue is the possibility to archive and interpret the raw results of (computer) experiments. This package presents an elementary python framework focusing on the archival and the analysis of raw data, with three goals in mind:

  • Raw data perenniality. Make the raw data perennial by ensuring that anyone can make sense out of them even once decoupled from the program that generated them. This is non trivial since the program and scripts / other programs parsing the output need to co-evolve.
  • Raw data availability. Make the perennial raw data accessible, to allow novel analysis by scientists equipped with different methodological tools.
  • Raw data parsing and analysis. Ease the parsing of the raw data, so as to prepare the graphical and statistical analysis.

To see how these goals are met, consider a computing pipeline consisting of raw data generation, raw data parsing, and data analysis i.e. graphical and statistical analysis. PALSE addresses these last two steps by leveraging the hierarchical structure of XML documents.

More precisely, assume that the raw results of a program are stored in XML format, possibly generated by the serialization mechanism of the boost C++ libraries.

For raw data parsing, PALSE imports the raw data as XML documents, and exploits the tree structure of the XML together with the XML Path Language to access and select specific values. For graphical and statistical analysis, PALSE gives direct access to ScientificPython, R, and gnuplot.

In a nutshell, PALSE combines standards languages (python, XML, XML Path Language) and tools (Boost serialization, ScientificPython, R, gnuplot) in such a way that once the raw data have been generated, graphical plots and statistical analysis just require a handful of lines of python code.

The framework applies to virtually any type of data, and may find a broad class of applications.

Using PALSE : Making Statistics from a Database

In this section, we present a short example to compute the average volume of atoms in a molecule. Consider the (power) Voronoi diagram of the 3D balls representing the atoms, and let the restriction of an atom be the intersection between its ball and the Voronoi region of that ball. Also assume that the volumes of restrictions have been computed using the program $ \text{\vorlumeEP} $ from the package Space_filling_model_surface_volume. In the sequel, having introduced basic notions on PALSE, we show how to compute statistics on volumes of restrictions.

Prerequisites

The XML Hierarchical Representation

XML provides a hierarchical representation of a document, which may be seen as a rooted ordered tree of nodes called Elements – see Fig. fig-palse-xml-tree. Each Element is characterized by a tag-name, and possibly contains attributes, a text field and a number of child Elements — there is also an optional tail string which is not supported by PALSE):

The tag-name of an Element $ e $ corresponds to the name of the start-tag (<tag-name>) and the end-tag (</tag-name>) of $ e $ in its XML representation.


An attribute of an Element $ e $ is a variable having a particular value defined in its XML representation with the start-tag of $ e $ (<tag-name\ att='value'>).


The text field of an Element $ e $ is defined as any text in-between the start-tag and the end-tag of $ e $ in its XML representation, which is not embedded in-between another (start-tag, end-tag) pair (<tag-name> text-field </tag-name>).


The children of $ e $ are the Elements which are defined in its XML representation in-between the start-tag and the end-tag of $ e $ (<tag-name> <child-tag-name>\ </child-tag-name> </tag-name>).


In the sequel, by data value, we refer either to the value of an attribute or of a text field.

An Element may be also embedded in a namespace, that is represented by a pair (prefix, uri). The uri is a string of characters identifying a name or a resource, and the prefix is a simple name associated to the uri. Thus, if an Element $ e $ having a tag-name tag is embedded in a namespace (prefix, uri), its tag-name is represented in the XML file by prefix:tag.

However, in the XML tree, all prefixes are replaced by their corresponding uri, so that the tag name of the Element $ e $ is represented in the XML tree by {uri}tag.

The XML trees handled by PALSE Note that a tree $T$ is recursively defined from Elements

The XPath Query Language

Given an XML document, the XPath language has been developed to address specific parts of this document – see the XPath documentation page. Recall that an XML document is represented as a rooted ordered tree: a XPath query always specifies a path to one or more of its Element(s). An XPath may be specified in three different ways (see the python documentation page)

  • (i) a sequence of tag-names only (e.g $ tag\_1/tag\_2 $), denominating all the Elements having as tag $ tag\_2 $ and that are children of Elements with tag $ tag\_1 $.
  • (ii) or a sequence of tag-names, each possibly enriched by one or more attribute names (e.g $ tag\_1/tag\_2[@att] $, or $ tag\_1[@att]/tag\_2 $), denominating the subset of the Elements previously described, but having the attribute $ att $.
  • (iii) or a sequence of tag-names, each possibly enriched by one or more attribute names and possibly a value of these attributes (e.g $ tag\_1/tag\_2[@att='val'] $, or $ tag\_1[@att='val']/tag\_2 $), denominating the subset of the Elements previously described, but the attribute $ att $ having as value $ val $.

PALSE assumes that a given (computer) experiment yields one file storing the results. To make these data perennial and foster their availability, PALSE uses XML, since this language is a standard one, and most importantly, provides an abstract structure amenable to high-level querying and filtering operations.

Example Python Script : Average Volume of Atoms

Assume that one's python path contains the PALSE python directory:

export PYTHONPATH=${PYTHONPATH}:${SBL_DIR}/Applications/PALSE/python

In PALSE, each result file is loaded into an XML ETree, so that one gather data by traversing paths in this tree.

Consider for example a case where one has generated an XML file to describe the volume of the restrictions of the atoms to their Voronoi cells. First, to inspect the hierarchy embedded in an XML file, one can use the method SBL::PALSE::PALSE_xml_DB::get_common_hierarchy_as_xml for generating a string representation of this hierarchy:

1 <boost_serialization>
2  <restrictions>
3  <count></count>
4  <item>
5  <hierarchical_archive_index></hierarchical_archive_index>
6  <total_area></total_area>
7  <is_buried></is_buried>
8  <particle>
9  <insertion_code></insertion_code>
10  <atom_serial_number></atom_serial_number>
11  <residue_name></residue_name>
12  <atom_name></atom_name>
13  <chain_identifier></chain_identifier>
14  <residue_sequence_number></residue_sequence_number>
15  <annotations>
16  <dynamic_annotations>
17  <count></count>
18  </dynamic_annotations>
19  <radius></radius>
20  <name></name>
21  </annotations>
22  </particle>
23  <total_volume></total_volume>
24  </item>
25  </restrictions>
26  <exposed_particles></exposed_particles>
27  <total_area></total_area>
28  <total_volume></total_volume>
29  <total_particles></total_particles>
30  <buried_particles></buried_particles>
31 </boost_serialization>
32 
Note that a database has to be created first, even with only one XML file. If the database contains several XML files, only the common hierarchy from the root of the XML files will be shown.


It is also possible to access to the list of all paths from the root of XML files rather than the XML format, using the method SBL::PALSE::PALSE_xml_DB::get_common_hierarchy_as_list :

/boost_serialization
/boost_serialization/restrictions
/boost_serialization/restrictions/count
/boost_serialization/restrictions/item
/boost_serialization/restrictions/item/hierarchical_archive_index
/boost_serialization/restrictions/item/total_area
/boost_serialization/restrictions/item/is_buried
/boost_serialization/restrictions/item/particle
/boost_serialization/restrictions/item/particle/insertion_code
/boost_serialization/restrictions/item/particle/atom_serial_number
/boost_serialization/restrictions/item/particle/residue_name
/boost_serialization/restrictions/item/particle/atom_name
/boost_serialization/restrictions/item/particle/chain_identifier
/boost_serialization/restrictions/item/particle/residue_sequence_number
/boost_serialization/restrictions/item/particle/annotations
/boost_serialization/restrictions/item/particle/annotations/dynamic_annotations
/boost_serialization/restrictions/item/particle/annotations/dynamic_annotations/count
/boost_serialization/restrictions/item/particle/annotations/radius
/boost_serialization/restrictions/item/particle/annotations/name
/boost_serialization/restrictions/item/total_volume
/boost_serialization/exposed_particles
/boost_serialization/total_area
/boost_serialization/total_volume
/boost_serialization/total_particles
/boost_serialization/buried_particles

Computing the average volume of the restrictions requires the following step:

  • creating an empty database,
  • loading the target XML files into the database,
  • looking for the volume of each restriction,
  • making the average of all volumes.

This is simply done with the following python script:

1 from SBL import PALSE
2 from PALSE import *
3 
4 database = PALSE_xml_DB()
5 
6 database.load_from_directory("data",".*volumes.xml")
7 
8 volumes = database.get_all_data_values_from_database("restrictions/item/total_volume", float)
9 
10 volumes = PALSE_DS_manipulator.convert_listoflists_to_list(volumes)
11 print(sum(volumes)/float(len(volumes)))
The search path does not contain the global tag <boost_serialization>, since by default, the XML tree is represented by its root.


If the database is partitioned over several subdirectories, it is possible to recursively load all the XML files using the method load_from_directory(input_dir, regex, True)


One XML file listing several volumes, and a database possibly containing several XML files, the method get_all_data_values_from_database always returns a list of lists. If one does not want to discriminate in between the different XML files, the method convert_listoflists_to_list returns a unique concatenated list from an input list of lists.


In the example above, the files involved in the database do not obey any specific ordering. As explained below, the function database.sort_databases can be used to sort these files based on a regular expression acting on their names.


Example Python Script : Average Volume by Increasing Radius

When analyzing the results of a run, the analysis may be driven by the variation of a given parameter. For example, the volume of a molecule can be computed for different atomic radii set, in particular with different Van der Waals radii. In this case, one wants to sort the different analysis from PALSE by increasing (or decreasing) Van der Waals radius. When XML files correspond to archives from different runs of the same SBL executable, this is achieved in two steps :

  • adding the option –report-options to the run of the SBL executable, allowing to report an XML file listing all the options used for the calculations;

  • sorting the database of loaded XML files w.r.t a given option using the method PALSE_xml_DB::sort_databases;

This is done with the following python script:

1 from SBL import PALSE
2 from PALSE import *
3 
4 database = PALSE_xml_DB()
5 
6 database.load_from_directory("data",".*volumes.xml")
7 option_values = database.sort_databases("radius-water", float)
8 
9 volumes = database.get_all_data_values_from_database_compare_to("restrictions/item/total_volume", float)
10 
11 for i in range(len(volumes)):
12  print("%f : %f" % (option_values[i], volumes[i][0]))
The sort is done over strings by default. To change this behaviour, the second positional argument (or the type argument) can be set to any comparable type.


the XML archives containing all the options of a run of a SBL executable always terminate by "__options.xml". However, if for any reason the files listing the options are terminated differently, the third positional argument (or the options_suffix argument) can be set to any file suffix.


Example Python Script : Average area of specific residue type

It is also possible to analyze specific elements amongst the data, such as the volume of a specific residue type, or the volume of an residue identified by its sequence number. To do so, three steps are required :

  • gathering all elements in the database at the desired level; for example, if the analysis undertaken focuses on residues, one has to gather all elements corresponding to residues; this is done with the method PALSE_xml_DB::get_all_elements_from_database that returns one list of elements for each dataset in the database;

  • filtering elements matching a given requirement; for example, assume that one wishes to select elements corresponding to residues of type "GLY", or only in the chain "A"; this is done through the method PALSE_xml_DB::filter_elements_by_data_values_equal_to that returns a filtered sublist of elements from the input list of elements;

  • accessing desired values from the filtered elements; for example, for each filtered residue, one may want to access its identifiers, volume and area; this is done with the method PALSE_xml_DB::get_all_data_values_from_elements that returns a list of values matching the input XPath query for each element in the input list of elements;

    The following script shows how to print the average area for all non-buried glycines using the three methods:

    1 from SBL import PALSE
    2 from PALSE import *
    3 
    4 database = PALSE_xml_DB()
    5 
    6 database.load_from_directory("data",".*volume.xml")
    7 
    8 residues = database.get_all_elements_from_database("residues/item")[0]
    9 gly_residues = database.filter_elements_by_data_values_equal_to(residues, "residue_name", "GLY")
    10 
    11 areas = database.get_all_data_values_from_elements(gly_residues, "area", float)
    12 areas = [x for x in PALSE_DS_manipulator.convert_listoflists_to_list(areas) if x > 0]
    13 print(sum(areas)/float(len(areas)))

Example Python Script : Average Volume by Atom Property

It is also possible to compute the volume statistics of atoms sharing a common property. For example, one wants to know the average volume of carbons depending on the residue containing it. The python script is similar to the one presented in section Example Python Script : Average Volume of Atoms, but one has to collect separately all the carbons of each residue type instead of just collecting all the atoms together. This is done in three steps:

  • collecting all the particles in the database,
  • collecting all the existing residues from the particles,
  • for each such residue, filtering the particles having the same residue name and that are carbons.

This is done with the following python script:

1 from SBL import PALSE
2 from PALSE import *
3 
4 database = PALSE_xml_DB()
5 
6 database.load_from_directory("data",".*volumes.xml")
7 
8 particles = database.get_all_elements_from_database("restrictions/item")
9 particles = PALSE_DS_manipulator.convert_listoflists_to_list(particles)
10 
11 residue_names = database.get_all_data_values_from_elements(particles, "particle/residue_name")
12 residue_names = sorted(set(PALSE_DS_manipulator.convert_listoflists_to_list(residue_names)))
13 
14 for resname in residue_names:
15  particles_of_residue = database.filter_elements_by_data_values_equal_to(particles, "particle/residue_name", resname)
16  carbons_of_residue = database.filter_elements_by_data_values_equal_to(particles_of_residue, "particle/atom_name", "C")
17  volumes = database.get_all_data_values_from_elements(carbons_of_residue, "total_volume", float)
18  volumes = PALSE_DS_manipulator.convert_listoflists_to_list(volumes)
19  if len(volumes) > 0:
20  print(resname, sum(volumes)/float(len(volumes)))
21  else:
22  print(resname, 0)

Example Python Script : Distribution of Residue Volumes

PALSE offers also the possibility to plot simple diagrams as 2D plots or histograms. For example, it is possible to plot the distribution of the volumes of the residues. To do so, the following steps are required:

  • collecting all the particles in the database per input XML file,
  • for each XML file, (i) collecting all the existing residues ids from the particles, (ii) classifying the particles per residue id, and (iii) summing the volumes of the particles of each residue,
  • plotting the histogram of the volumes of the residues.

This is done with the following python script:

1 from SBL import PALSE
2 from PALSE import *
3 
4 database = PALSE_xml_DB()
5 
6 database.load_from_directory("data",".*volumes.xml")
7 
8 residue_volumes = []
9 
10 particles = database.get_all_elements_from_database("restrictions/item")
11 
12 for particles_per_file in particles:
13 
14  chain_identifiers = database.get_all_data_values_from_elements(particles_per_file, "particle/chain_identifier")
15  chain_identifiers = sorted(set(PALSE_DS_manipulator.convert_listoflists_to_list(chain_identifiers)))
16  for chain in chain_identifiers:
17  particles_per_chain = database.filter_elements_by_data_values_equal_to(particles_per_file, "particle/chain_identifier", chain)
18  residue_sequence_numbers = database.get_all_data_values_from_elements(particles_per_chain, "particle/residue_sequence_number", int)
19  residue_sequence_numbers = sorted(set(PALSE_DS_manipulator.convert_listoflists_to_list(residue_sequence_numbers)))
20  for i in residue_sequence_numbers:
21  particles_of_residue = database.filter_elements_by_data_values_equal_to(particles_per_chain, "particle/residue_sequence_number", str(i))
22  volumes_of_particles = database.get_all_data_values_from_elements(particles_of_residue, "total_volume", float)
23  volumes_of_particles = PALSE_DS_manipulator.convert_listoflists_to_list(volumes_of_particles)
24  residue_volumes.append(sum(volumes_of_particles))
25 
26 PALSE_statistic_handle.Rhist2d(residue_volumes, "results/residue_volumes_histogram.ps", 'Volume_of_Residues', 16, 20)

Advanced script : combining Batch_manager, PALSE, and SQLite

Often, advanced analysis of results require selecting specific records from the results obtained. Such an endeavor is best done using database queries and selection mechanisms, see e.g. SQLite .

In the context of python coding, sqlite3 provides an interface for SQLite databases.

The following example shows how to combine Batch_manager, PALSE, and SQLite . For the sake of conciseness, advanced selection mechanisms have been omitted, and the reader is referred to standard sources on SQLite and sqlite3 for the description of such mechanisms.

1 #! /usr/bin/python
2 
3 
4 
5 #i################################################################################
6 # This example combines three ingredients:
7 # Step 1. Running a batch of calculations with the batch manager
8 # Step 2. Retrieving the results with PALSE
9 # Step 3. Storing the results into a SQLite database
10 #i################################################################################
11 
12 # Batch manager
13 from SBL import Batch_manager
14 from Batch_manager import *
15 
16 # Palse
17 from SBL import PALSE
18 from PALSE import *
19 
20 # sqlite, see https://docs.python.org/2/library/sqlite3.html
21 import sqlite3
22 
23 
24 # We shall record all the output directories of the batch manager
25 output_dirs = []
26 
27 # Step 0, pre-processing: preparing a SQLite database. For each PDB file,
28 # we shall store # the total surface area and volume of the asymmetric unit
29 #i################################################################################
30 def create_sqlite_DB():
31  # DB creation
32  dbname = "PDB-surface-volume.sqlite"
33 
34  if os.path.isfile(dbname):
35  os.system( ("rm %s" % dbname) )
36 
37  conn = sqlite3.connect(dbname)
38 
39  # DB parameterization
40  c = conn.cursor()
41  c.execute('''CREATE TABLE PDBvorlume (filename text, Asymmetric_unit_surface_area real, Asymmetric_unit_volume real)''')
42  return conn
43 
44 
45 # Step 1. Running a batch of calculations with the batch manager
46 #i################################################################################
47 def run_batches():
48  batch = BM_Batch()
49  #batch.load_dataset("pdb", ".*.pdb", True)
50  batch.load_dataset("data", ".*.pdb", True)
51 
52  batch.load_run_specification("batch-palse-sqlite.spec")
53  batches = batch.split_per_NFO()
54 
55  for b in batches:
56  output_dirs.append(b.get_output_directory())
57  b.run()
58 
59  print "\n--The following directories were created by the batch manger:",output_dirs
60 
61 # We perform steps 2 and 3, that is:
62 # Step 2. Retrieving the results with PALSE
63 # Step 3. Storing the results into a SQLite database
64 #
65 # by processing the output directories generated by the batch manager
66 #i################################################################################
67 def process_output_directory(adir, conn):
68 
69  palse_DB = PALSE_xml_DB()
70  palse_DB.load_from_directory(adir,".*volumes.xml")
71 
72  # filenames of the XML files loaded by PALSE
73  filenames = palse_DB.filenames
74 
75  # retriving the total surface areas
76  areas_ = palse_DB.get_all_data_values_from_database("total_area", float)
77  areas = PALSE_DS_manipulator.convert_listoflists_to_list(areas_)
78 
79  # retriving the total volumes
80  volumes_ = palse_DB.get_all_data_values_from_database("total_volume", float)
81  volumes = PALSE_DS_manipulator.convert_listoflists_to_list(volumes_)
82 
83  # prepare the list of tuples
84  tuples = []
85  for i in range(0, len(areas)):
86  t = (filenames[i], areas[i], volumes[i])
87  tuples.append( t )
88 
89  # adding the tuples into the DB, at once, using a statement
90  statement = "insert into PDBvorlume (filename, Asymmetric_unit_surface_area, Asymmetric_unit_volume) values (?, ?, ?)"
91 
92  cur = conn.cursor()
93  cur.executemany(statement, tuples)
94  cur.close()
95 
96 
97  # Finally, let us make sure the sqlite DB contains all the records
98  cur = conn.cursor()
99 
100  print "\n--DB content:"
101  for row in cur.execute('SELECT * FROM PDBvorlume ORDER BY Asymmetric_unit_volume'):
102  print row
103 
104 
105 
106 # Executions
107 #i################################################################################
108 
109 run_batches()
110 
111 # create the DB and return the connector to the DB
112 conn = create_sqlite_DB()
113 
114 # We process each output directory, in turn
115 for dir in output_dirs:
116  process_output_directory(dir, conn)
117 

Algorithms and Methods

This section describes more precisely the five different steps used in the PALSE methodology. These five steps are summarized on Fig. fig-palse-workflow.

The PALSE workflow The five steps: (1) the native data types of the application(s) are first serialized into XML archives, stored into XML files. It is assumed that one file is generated for each (computer) experiment (2) \palse\ loads these XML files into databases of XML trees (3) Raw data are extracted from the databases using XPath queries (4) these data are prepared (sorted, filtered, etc) (5) the manipulated data are used to perform various data analyses (statistics, etc).

Step 1: Raw Data Generation

By raw data we refer to the results of some (computer) experiment—PALSE can be used to handle data generated by any device, or even manually archived. Archives and their (de-)construction is a tenet of PALSE following the Boost library serialization engines. As specified in the Boost documentation,

we use the term "serialization" to mean the reversible deconstruction of an arbitrary set of C++ data structures to a sequence of bytes. Such a system can be used to reconstitute an equivalent structure in another program context.

That is, an archive refers to a specific rendering of the aforementioned sequence of byte, and PALSE takes as input XML archives.

For computer experiments, the generation of XML data naturally depends on the programming language used, two of them being of particular interest: C++ and python.

In C++, the Boost Serialization tools accommodate the native C++ types and the data structures of the Standard Template Library.

In python, the recursive structure of dictionaries makes conversion of dictionary into an XML tree (and vice-versa) a trivial task.

Step 2: Raw Data Parsing and Database Creation

Consider a set of XML archives, one per (computer) experiment. PALSE imports each such file as an XML tree. A database corresponds to a set of isomorphic trees corresponding to experiments generated with varying parameters. Therefore, several databases may be used to accommodate several sets of parameters, or to store results from different sources. The creation of the trees from the files is delegated to the lxml library (lxml is an easy-to-use library for handling XML and HTML in Python, see the documentation page.

Step 3: Raw Data Querying

Given the XML trees stored in database(s), the retrieval of the values of interest for graphical and statistical analysis combines features of the XML and XPath, as discussed in section The XPath Query Language .

Functions offered in PALSE . The three XPath modes discussed in section The XPath Query Language are used by PALSE functions to query the database, in order to retrieve selected Elements or their data values (either from text fields or attributes):

For data values, if the XPath query ends with a tag, it is a text field. Otherwise, it ends with an attribute and the data value is the value of this attribute.


While queries on data values return directly a list that can be treated with simple statistical tools, queries on Elements return a list that can be used to filter even more the Elements. Given a list of Elements $ L $, a XPath query $ X $, a data value $ v $ and a comparator $ comp $, the method SBL::PALSE::PALSE_xml_DB::filter_elements_by_data_values_compare_to ( $ L $, $ X $, $ v $, $ comp $) selects the Elements $ e $ of $ L $ such that the specified Xpath $ X $ from $ e $ has a data value $ v_e $ positively compared with $ v $ using $ comp $, a functionality also provided for basic comparators (e.g, SBL::PALSE::PALSE_xml_DB::filter_elements_by_data_values_lower_than ( $ L $, $ X $, $ v $)).

We also note that upon calling a function returning Elements, the selection can be converted into strings thanks to the function SBL::PALSE::PALSE_xml_DB::get_data_values_from_elements ( $ L $, $ X $), with $ L $ and $ X $ the arguments described above.

PALSE using BioPython : BioPALSE. PALSE offers also functionality handling PDB files treated by the BioPython module. There are three main functionality allowing to convert the PDB files into XML trees:

  • SBL::PALSE::BioPDB_vs_XML_Etree::load_PDB_files_from_directory (input_dir, regex = ""): it returns a database of XML trees corresponding to the list of PDB files in the input directory that matches the input regular expression. Note that if no regular expression is given, all the PDB files in the input directory are treated.

Step 4: Data Manipulation

The data collected by the previous step typically need to undergo processing before being amenable to analysis. Since the previous step supplies python lists of strings, PALSE provides mechanisms to select, sort, and convert these lists into lists of elementary types. Furthermore, PALSE provides tools for combining lists into dictionaries, for further filtering using regular expressions (for strings) or lists of allowed keys or values.

Step 5: Data Analysis

The lists just produced are ready for graphical and statistical analysis. PALSE encapsulates functionalities from SciPython and R for computing Pearson, Spearman or Kendall's tau correlations between the collected values, as well as Mann-Whitney rank-sum test. More generally, any package interface with python can be used, e.g. gmpy if unlimited-precision integers / rationals / floats must be used to ensure numerical correctness, or gnuplot to generate eye-candy plots and charts, etc.

section sec-code Programming
Here is a small example of how to serialize a simple data structure into an archive:
//nvp for name-value-pair: allows to attribute a name to a
//serializable value. In this context, it is used for decomposing the
//serialization of a data structure
#include <boost/serialization/nvp.hpp>

//type of the archive file of the serialization: here, it is an output
//xml file (we will save the serialization in this file) 
#include <boost/archive/xml_oarchive.hpp>

#include <fstream> 
#include <iostream>

//Small data structure that we want to serialize
struct Data_type
{
  int index;
};

//Definition of the serialize method for our small data structure. It
//takes an archive and load or save the serializable fields of the data
//structure, as described in the method. The symbol "&" means that the
//operation is the same for loading or saving the data structure. 
namespace boost
{
  namespace serialization
  {
    template <class Archive>
    void serialize(Archive& ar, Data_type& data, const unsigned int version)
    {
      ar & boost::serialization::make_nvp("index", data.index);
    }
  }
}

//The main method creates an object of type Data_type, and then save
//it in a serialized xml archive.
int main() 
{ 

  Data_type data; data.index = 1;

  //first, open the file, then open the file as an output xml archive.
  std::ofstream out("test.xml");
  boost::archive::xml_oarchive* ar = new boost::archive::xml_oarchive(out);

  //save the data in the archive
  *ar << boost::serialization::make_nvp("data", data);

  //first close the archive, then close the file (the order is
  //important, since some additional information are dumped in the file
  //when closing the archive)

  delete ar;
  out.close();

  return 0;
}