Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
Data Models

We present various data types used within the SBL.

PDB models

The PDB format is widely used. The following molecular systems are used in several places, for testing / illustration purposes:

Force fields and potential energies

The potential energy of a molecular system is defined from a force field. In the SBL, all force fields are handled in a coherent way – see package Molecular_potential_energy .

Force field parameters are specified in a XML file – see the FFXML format from the OpenMM library. We provide XML files for the force fields used within the SBL :

Space Filling Models

Space filling models are defined from union of balls, see Terminology and Concepts and the corresponding applications Space Filling ModelsWork Package: Space Filling Models .

The following are used to define / illustrate such data.

Radii for space filling models

The atomic group radii define different sets of radii for the atoms. They are used as annotations of atoms from the pacakge ParticleAnnotator and are particularly important when particles (atoms or residues) are represented by 3D balls. The SBL provides three particular groups radii borrowed from ESBTL:

3D Balls

A number of algorithms are purely geometric and can be used without any biophysical semantic. For these cases, the following family of balls has been used for testing purposes:

Conformational Analysis

Conformational analysis focus on the space describing molecular conformations, and the associated energy landscapes – see Terminology and Concepts .

In the following, we provide landscapes defined by mathematical functions, and also data related to specific molecular systems. The interested user is also referred to the comprehensive Cambridge Energy Landscape Database

Example Landscapes defined from mathematical functions

This section briefly presents selected landscapes used to test our sampling algorithms.

Himmelblau

As a simple illustration, we use the Himmelblau function, see Fig. fig-himmelblau and also Himmelblau's function on Wikipedia.

The Himmeblau function

Function value. The function value is a degree two bivariate polynomial:

$ f(x,y) = (x^2 + y - 11)^2 + (x+y^2-7)^2. $

Gradient. Easily computed by hand.

Movet set to generate a new sample. To generate a new conformation, one picks a random conformations at a predefined distance $\delta$ from the current sample.

Rastrigin

The Rastrigin function is a classical non convex function used in optimization benchmarks.

Function value. Denoting $X$ the d-dimensional vector of coordinates, the function value is defined as follows:

$ f(X) = A d + \sum_{i=1,\dots,d} (x_i^2 - A \cos(2\pi x_i)) $

.

Gradient. Easily computed by hand.

Movet set to generate a new sample. To generate a new conformation, one picks a random conformations at a predefined distance $\delta$ from the current sample.

The trigonometric terrain

The trigonometric terrain is a more complex function [152], challenging exploration algorithms in the 2D case, see Fig. fig-termino-trigo-terrain .

The Trigonometric terrain function

Function value. The function value is :

$ f(x,y) = (x\sin(20y)+y\sin(20x))^2\cosh(\sin(10x)x)+(x\cos(10y)-y\sin(10x))^2\cosh(\cos(20y)y) $

Gradient. Computed by hand.

Movet set to generate a new sample. To generate a new conformation, one picks a random conformation at a predefined distance $\delta$ from the current sample.

Specific molecular systems

BLN69

Model. To illustrate our sampling algorithms, we use a 69 residue BLN model protein [31] , whose landscape has been extensively sampled [174], [129] .

The BLN model represents each protein residue as one of 3 types of beads, namely hydrophobic(B), hydrophilic(L) and neutral(N).

Potential energy. The potential energy of the BLN69 model is given by:

\begin{align} V = \frac{1}{2} K_r \sum_{i}^{N-1} (R_{i,i+1} - R_\text{e})^2 + \frac{1}{2} K_\theta \sum_{i}^{N-2} (\theta_{i} - \theta_\text{e})^2 \\ + \epsilon \sum_{i}^{N-3} [A_i (1 + \cos \phi_{i}) + B_i(1 + \cos 3\phi_{i})] \\ + 4 \epsilon \sum_{i}^{N-2} \sum_{j=i+2}^{N} C_{i,j} [ (\frac{\sigma}{R_{i,j}})^{12} - D_{i,j} (\frac{\sigma}{R_{i,j}})^6 ]. \end{align}

Note that the first three terms are bonded terms, while the fourth is the non bonded term (Lennard-Jones potential). Parameter definitions and values are as specified in [174] .

Gradient. To compute the gradient of expression eq-bln-potential , we use the automatic differentiation tool $\text{\tapenade}$ [95] .

Move sets to generate new conformations. A move set is a unitary operation thanks to which a new conformation is generated from a given conformation, typically at a predefined distance called the step size denoted $\delta$. Designing move sets for condensed matter in general and proteins in particular is a topic in itself, as one wishes to avoid useless conformations (e.g., steric clashes).

Three classical movesets, illustrated here for BLN69, are the following ones:

  • global moveset: the new conformation is chosen uniformly at random on the sphere of radius $\delta$ centered on the current conformation.

  • interpolation moveset: the new conformation is chosen (uniformly at random) on the line-segment joining two conformations.

  • atomic moveset: each atom is moved to a sphere centered on its current location in parameter space,

For the sake of clarity, let us detail the atomic move set. Denoting $N$ the number of pseudo-atoms of the BLN model. and let $\eps=\delta/\sqrt{N}$, with $N$ the number of pseudo-atoms. Denoting $(x_i,y_i,z_i)$ the coordinates of the $i$th atom, the new coordinates are generated uniformly at random on the unit sphere of radius $\eps$ centered $(x_i,y_i,z_i)$. That is, with $u$ and $z$ uniform random numbers in $[0,1]$ and $[-1,1]$:

\begin{equation} \begin{cases} x^{'}_i &= x_i + \eps \sqrt{1-z^2} \cos 2\pi u,\\ y^{'}_i &= y_i + \eps \sqrt{1-z^2} \sin 2\pi u,\\ z^{'}_i &= z_i + \eps z_i. \end{cases} \end{equation}

Note that in applying such a move set, the RMSD between the old and the new conformations is equal to $\delta$.

Example conformation of the BLN69 model
The three types of beads are represented as follows: hydrophobic (B) in red, hydrophylic (L) in blue, and neutral (N) in green. Note the formation of a hydrophobic core clubbing the hydrophobic

Transition graph connecting minima and saddles.

Conformations are central in applications from the Conformational Analysis group. Knowing minima and saddles in the conformational space elevated by the potential energy is important for testing the algorithms. A set of 458082 minima linked by 378913 saddles of BLN69 is available here. They were originally generated by Wales using a Basin Hopping algorithm. The archive is composed of 5 files :

  • Wales_minima_conformations.txt : the conformations of the minima in the Point_d format;
  • Wales_minima_energies.txt : the energy of each minimum, in the same order than the conformations;
  • Wales_transitions_conformations.txt : the conformations of the saddles in the Point_d format;
  • Wales_transitions_energies.txt : the energy of each saddle, in the same order than the conformations;
  • Wales_graph.txt : the pairs of indices of minima (starting at 0) that are linked by the saddles, in the same order than the conformations;