Structural Bioinformatics Library Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Authors: F. Cazals and T. Dreyfus and S. Loriot

# Goals: Computing Surfaces, Volumes, and Packing Properties

Surface areas and volumes of molecular models are fundamental quantities in structural bioinformatics. Two widely used models are the van der Waals model and the solventaccessible models". The programs of Space_filling_model_surface_volume compute the surface area and the volume of a model defined by a union of balls, following the two aforementioned models. The underlying algorithm partitions the volume into the restrictions of the balls to their respective Voronoi cells [32] , exploiting properties of the -complex of the balls. Two points need to be stressed.

Surface area and volume computations are numerically challenging, due to the geometric constructions involved . Unlike all previous approaches which have systematically used fixed precision floating point (FP) calculations, programs of Space_filling_model_surface_volume use advanced numerics (interval arithmetic and calculations on degree two algebraic numbers) to certify the surface areas and volumes reported. That is, all results are returned in the form of an interval of FP numbers, which is certified to contain the true result. This strategy has been used to shown that (i) calculations based solely on FP numbers systematically incur errors — of typically 20%, and (ii) that the running time overhead of applications of Space_filling_model_surface_volume is of about two.

The computation of restrictions being based on the -complex of the balls processed, it is possible to compute along the way information on cavities of the model.

To recap, the main features of Space_filling_model_surface_volume are as follows:

• Calculation of surface areas and volumes for a whole molecular model, or on a per ball basis – see package Union_of_balls_surface_volume_3. For the particular case of an atomic model of a protein, surface areas and volumes are also reported on a per residue level.

• Calculation of the Betti numbers of the union of balls, which respectively characterize the number of connected components, the number of tunnels, and the number of cavities (i.e. pockets inside the molecule) – see package Betti_numbers.

• Dissection of the surface area into its connected components. In particular, for a connected molecule, the surface areas of the cavities are returned – see package Union_of_balls_surface_volume_3.

• Computation of the volume of the cavities, if any – see package Union_of_balls_surface_volume_3.

 A union of five balls in 3D. This union decomposes into five restrictions, each bounded by spherical caps and planar faces. The 1-skeleton of the restrictions are represented by green curves.
The software Vorlume used to be distributed as a stand-alone program. It now embedded in the Space_filling_model_surface_volume application of the SBL .

# Pre-requisites

We consider a collection of balls, namely . We assume that the coordinates of the center, and the radius of each ball are rational numbers. We wish to compute the volume of the domain , as well as the surface area of its boundary .

We denote the Voronoi region of ball in the Voronoi (power) diagram of the balls, and we define the restriction of the ball as its intersection with its Voronoi region, that is . A restriction is termed exposed if it contributes to the boundary of the union.

As proved in [32] and illustrated on Fig. Restrictions, it is sufficient to compute restrictions to report surfaces and volumes:

• The volume of the union of balls is the sum of the volumes of the restrictions.
• The boundary of the union is the union of the boundaries contributed by the restrictions.

These facts also account for the difficulty of computing surfaces and volumes, from a numerical standpoint. To understand why, it should be observed that two types of points are found on the boundary of a restriction:

• points found at the intersection of three Voronoi facets. Such points are weighted circumcenters of the Delaunay triangulation of the input balls. These points have rational coordinates if one assumes that the input balls have a rational specification.
• points on found at the intersection of three spheres. The coordinates of these points are degree two algebraic numbers.

The coordinates of these points appear in the formulae for the volume and surface area of a restriction, whence the need to compute them accurately. In a nutshell, these points can be handled in two ways, namely using interval arithmetic or using exact number types (rational numbers or algebraic numbers). The former is a fast strategy providing coarse approximations of the coordinates, while the latter provides more precise approximations, at the cost of more intensive calculations. Combining these strategies, we define the following three levels of precision:

• 1: approximating boundary points and weighted circumcenters using interval arithmetic;
• 2: approximating boundary points from their exact counterparts and weighted circumcenters using interval arithmetic;
• 3: approximating boundary points and weighted circumcenters from their exact counterparts.

In any case, a surface area or a volume is returned as an interval certified to contain the exact unknown value. See [32] for the typical width of these intervals.

# Using Vorlume: Computing Surface Area and Volume of a Molecule

## Main specifications

Input. The main input of is a molecular structure loaded from a PDB file. In this case, the ESBTL library is used for deducing the set of balls corresponding to the molecule. Various options are available for tunning the set of balls representing the molecule (see the class SBL::Models::T_PDB_file_loader). Note that a default radius of is added to all atoms to define the Solvent Accessible Model of the input molecule.

Main options.

The main options of the program are:
-f string: PDB file of the molecule to coarse grain
–boundary-viewer string(=none): create a file for visualizing the boundary arcs with vmd or pymol
-E: switch to more exact but longer calculations

Main output.

The files generated are:

• Main log file: suffix log.txt
• VMD file displaying the boundary of the union of balls (points, circle arcs): suffix boundary.vmd
• Surface areas and volumes on a per ball restriction basis: suffix surface_volumes.xml
• Surface areas and volumes added up on a per residue basis: suffix residues_volume.xml
• Statistics on the alpha complex (Euler characteristic, Betti numbers): suffix alpha_complex.xml

## Visualization

For visualizing the halfedges of the boundary, we recommend you to install VMD (see the VMD web site): first load the input PDB file, then load the output visualization state files. The loading of visualization state files may be long, but is a lot faster using the fastload VMD script delivered with the library (in the scripts/vmd directory).

# Using Vorlume: Computing Surface Area and Volume of a Family of 3D Balls

## Input : Specifications and main options

Input. The main input of is a collection of balls from which the (power) Voronoi diagram will be computed. The file format used is a text file listing the balls (x y z radius).

Main options.

The main options of the program are:
-f string: file listing the input family of 3D balls
–boundary-viewer string(=none): create a file for visualizing the boundary arcs with vmd or pymol
-E: switch to more exact but longer calculations

Main output. Identical to the PDB mode – except that the file on a per residue basis is not reported.

Remarks.

The option -E switches to a computation mode using an exact representation for the intersection points on the boundary of the union of balls – instead of using a representation based on interval arithmetic. This results in more precise surface areas and volumes, but a more time consuming computation–see package Union_of_balls_surface_volume_3 for more details.

# Visualization, Plugins, GUIs

The SBL provides VMD and PyMOL plugins to use the programs of Space_filling_model_surface_volume . The plugins are accessible in the Extensions menu of VMD or in the Plugin menu of PyMOL . Upon termination of a calculation launched by the plugin, the following visualizations are available:

• (for VMD only) the edges (circle arcs) found on the boundary of the union of balls representing the atoms.

The new command "sbl_vorlume" is also defined: given a selection of atoms, it returns the volume and the solvent accessible surface area of this selection. For the sake of illustration, let us assume that the selection consists of all atoms. For VMD, the syntax is:

> sbl_vorlume [atomselect top "all"]


For PyMOL, the syntax is:

> sbl_vorlume("all")


# Programmer's Workflow

The programs of Space_filling_model_surface_volume described above are based on generic C++ classes, so that additional versions can easily be developed.

In order to derive such versions, there are two important ingredients: the workflow class and the traits class.

## The Traits Class

This class defines the main types used in the modules of the workflow. It is templated by the classes of the concepts required by these modules. This design makes it possible to use the same workflow within different(biophysical) contexts to make new programs. To use the workflow T_Space_filling_model_surface_volume_workflow , one needs to define:

• what is a particle, and how to represent it,
• how to load particles from an input file,
• how to annotate the particles,
• how to iterate over the set of particles,
• how to build a particle if non-trivial steps are required (e.g., building pseudo-atoms from residues in a protein).
Template Parameters
 ParticleTraits Model of the concept ParticleTraits. MolecularGeometryLoader Follow the concept MolecularGeometryLoader and can be either SBL::Models::T_PDB_file_loader when the input is a PDB file, or SBL::Models::T_Spheres_3_file_loader when the input is a file listing spheres. ParticleAnnotator Follow the concept ParticleAnnotator. ParticlesContainer Type of container for particles used in the -complex module. This is used in particular for iterating either over particles inserted in this container, or over an existing container that needs to be wrapped by this data structure. The type ParticlesContainer just needs to define the methods begin() and end(), and the method push_back(const_reference). ParticlesBuilder Functor building the input particles, as defined in the concept ParticleTraits.

# Jupyter demo

See the following jupyter demos:

• Jupyter notebook file
• jupyter

# Computing Surfaces, Volumes, and Packing Properties¶

## Run the calculation¶

We proceed as follows, using as an example PDB id 1vfb.pdb:

• Call the sbl executable to perform the calculation
• Dump the main log file, which contains human readble information on the calculation: number of atoms processed, total surface area of exposed atoms, total volume of the molecular model, etc.

Note that statistics on a per atom basis are dumped in an xml file. A finer exploitation of these statistics on a per atom basis is provided below.

In [5]:
import re  #regular expressions
import sys #misc system
import os
import pdb
import shutil # python 3 only

#input filename  and output directory
ifname = ".././../../demos/data/1vfb.pdb"
odir = "results"
if not os.path.exists(odir):
os.system("mkdir results")

# check executable exists and is visible
exe = shutil.which("sbl-vorlume-pdb.exe")
print( ("Using executable %s\n" % exe) )

# run command
cmd = "sbl-vorlume-pdb.exe -f %s --directory %s --verbose --output-prefix --log --boundary-viewer vmd" % (ifname,odir)
os.system(cmd)

# list output files
cmd = "ls %s" % odir
print("\nAll output files:",ofnames)

# find the lof file and display log file
cmd = "find %s -name *log.txt" % odir
print("\nLog file is:", lfname)
# for line in log:         print(line.rstrip())


Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-pdb.exe

Log file is: results/sbl-vorlume-pdb__radius_water_1dot4__f_1vfb__alpha_0__sample_eps_1__log.txt


## Extract statistics on a per atom basis, with PALSE¶

We now use the package PALSE , see https://sbl.inria.fr/doc/PALSE-user-manual.html , to collect all volumes and surface areas, on a per atom basis.

In [6]:
from SBL import PALSE
from PALSE import *

# create a DB of xml output files
database = PALSE_xml_DB()

# from the XML files, retrive the  volume of atomic restrictions and compute the average
volumes = database.get_all_data_values_from_database("restrictions/item/total_volume", float)
volumes = PALSE_DS_manipulator.convert_listoflists_to_list(volumes)
ave_volume = sum(volumes)/float(len(volumes))
print( ("Average volume of atomic restrictions: %s" % ave_volume) )

# from the XML files, retrive the  surface areas of atomic restrictions and compute the average
areas = database.get_all_data_values_from_database("restrictions/item/total_area", float)
areas = PALSE_DS_manipulator.convert_listoflists_to_list(areas)
ave_area = sum(areas)/float(len(areas))
print( ("Average surface area of atomic restrictions: %s" % ave_volume) )

XML: 1 / 1 files were loaded

Average volume of atomic restrictions: 23.2299597337172
Average surface area of atomic restrictions: 23.2299597337172


## Plot the distribition of surface areas of atomic restrictions¶

The list of areas can be used to inspect the distribution of surface areas on a per atom basis:

In [7]:
import matplotlib.pyplot as plt
import numpy as np

plt.hist(areas, density=1, bins=30)
plt.xlabel('Areas')
plt.ylabel('Probability')

Out[7]:
Text(0,0.5,'Probability')

## Plot the distribution of volumes of atomic restrictions¶

Likewise, one can plot the distribution of volumes of Voronoi restrictions:

In [8]:
plt.hist(volumes, density=1, bins=30)
plt.xlabel('Areas')
plt.ylabel('Probability')

Out[8]:
Text(0,0.5,'Probability')