Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
Installation Guide

Tutorial guiding the installation of the library.

Introduction

There are three modes to use the SBL:

  • Using the SBL: Conda based packaging of the whole library.
  • Using the SBL: pre-compiled static programs and plugins for VMD and pymol.
  • Using the SBL: (advanced) compilation and installation.

Using the SBL: Conda based packaging

Conda and the SBL channel

Conda and miniconda. Conda is a cross-platform multi-language environment management system, see Conda .
There are several distributions, including Anaconda and Miniconda. The latter just ships the repository management system, which is sufficient for our purposes. Thus, visit the Miniconda distribution per OS page and install miniconda for your operating system. In the sequel, we assume that the Conda directory created is named miniconda2 .

Conda channels and the sbl chanel.

To distribute an environment, one creates a Conda channel, from which packages are distributed.

  • The channel for the Structural Bioinformatics Library is SBL.

Note that the web page of a package informs on which operating systems are supported.

Installing the SBL from the Conda channel

Conda allows one to create local environments. A local environment has all required resources (libraries and dependencies in particular) to compile and run executables from a library. It may be seen as a virtual environment, except that all files are located in one's Conda directory. Two nice features are the following ones:

  • The creation of a local environment takes place in one's conda home directory. That is, no root privileges are required.
  • The number of local environments is arbitrary, which is convenient to handle different versions of libraries in particular.

This said, the main commands are as follows – see also the Conda package management commands :

  • To create en environment for the sbl
        conda create -n sbl
    
  • To activating / deactivating the sbl environment:
    conda activate sbl
    conda deactivate
    
  • To install a package available from the SBL channel, e.g. the sbl package:
    conda install -c sbl sbl
    
  • For those interested in running applications dealing with sequence alignemnts, which use HMMER, Muscle, and Clustal-Omega, add the biobuilds channel:
    conda install -c sbl sbl -c biobuilds
    
  • Finally, all resources can be accessed as follows:
    cd miniconda2/envs/sbl/
    

The main resources are:

  • Documentation
    File miniconda2/envs/sbl/share/doc/SBL/html/index.html
    
  • Executables
    In directory miniconda2/envs/sbl/bin
    
  • Include files for development:
    In directory miniconda2/envs/sbl/include/SBL
    
On Linux like systems, activates updates the PATH environment variable to hide/expose the binaries/executables, and also adds environment variables prefixed by CONDA


Using the SBL: pre-compiled static programs and plugins

TODO TD check that the install script for plugins also works with the binaries from the conda based distrib

It is also possible to install the library using the script sbl-plugins-installer-unix.py (works for linux and macos), available from the website Applications page.

This script:

  • (i) clones the SBL,

  • (ii) downloads the pre-compiled static programs,

  • (iii) performs installations in local directories the VMD (in ~/sblvmdplugins) and PyMOL (in ~/.pymol/startup/SBL) plugins, allowing one to use the SBL programs through a GUI.

    The option –help prints all available options for this script. If the SBL is not already installed, the following command performs the tasks (i), (ii) and (iii) for a Linux platform :

    > python sbl-plugins-installer-unix.py --sbl-dir=</path/to/target/install/dir> --sbl-install=clone --platform=linux
    

    If the SBL is already installed, then the following command will perform the tasks (ii) and (iii) for a Linux platform :

    > python sbl-plugins-installer-unix.py --sbl-dir=</path/to/target/install/dir> --sbl-install=pull --platform=linux
    

    On MacOS, just replace linux by macos. For PyMOL, if the local plugins directory has never been used, it should be parametrized within the GUI : Plugin > Settings > Add new directory, then select the directory ~/.pymol/startup, and then restarts PyMOL.

Compilation dependencies

  • Selected libraries are used in almost all applications, so that it is better to install them whatever the use of the SBL :
  • Selected libraries are specific, meaning that they are used in a specific program that will not be compiled without the necessary resource. This means that most packages can be used even if the library is missing, since the specific dependency is restrained to a file – see details below.

(Mandatory) General dependencies

GMP and MPFR

These libraries provide classes implementing number types and the accompanying operations, allowing the development of algorithms with specific arithmetic requirements:

  • GMP is a library for arbitrary precision arithmetic,
  • MPFR is a library for multiple-precision floating-point computations with correct rounding,

Note that GMP and MPFR are mandatory to use CGAL . For each library, if one of them is installed in a non standard location, the <LIBRARY_NAME>_DIR environment variable needs to point to the root directory of that library.

Boost

The reference C++ Boost libraries provide various tools used throughout the library — see Boost home page for more details.

While the generic components of Boost are directly integrated in Core, the following non generic Boost packages are used in the applications:

  • Boost Serialization : provides tools for serializing data structures, i.e to store them in an archive such that it is possible to reconstruct them from this archive. In Applications, an archive is a XML file, allowing to use PALSE for analyzing the output data of an application (see PALSE).
  • Boost Program Options : provides tools to handle (generating and parse) the command line options of applications. (Note that program options are used together with the workflows and the Modules, allowing one to assign the options to the modules used to define the workflow of an application.)
  • Boost Regex : provides tools to manipulate regular expressions.
  • Boost Thread : provides classes and functions to manage multiple threads.
  • Boost System : provides simple and light-weight tools to manage errors.

If Boost is installed in a non standard location, the BOOST_ROOT environment variable needs to point to the root directory of Boost .

CGAL

The Computational Geometry Algorithms Library (CGAL) provides various core geometric constructions. In particular, CGAL is used in large number of packages in Core. A number of libraries are provided with the CGAL library, as the GMP and MPFR libraries. A detailed explanation on how to install the CGAL library is provided on the CGAL installation guide.

Note that if the CGAL library is installed in a non standard location, the CGAL_DIR environment variable must point to the root directory of CGAL.

ESBTL

The Easy Structural Biology Template Library (ESBTL) library is a generic C++ library (header-only) for parsing and managing data in a PDB file. It also provides geometric representations of molecules using CGAL.

The ESBTL library is provided with the SBL library as third party–no installation required. To use one's version of ESBTL, just set the environment variable ESBTL_DIR to the root directory of the custom ESBTL.

Eigen

The Eigen library is used for linear algebra, in particular to represent matrices and compute eigenvalues.

If the Eigen is installed in a non standard location, the EIGEN_DIR environment variable needs to point to the root directory of the Eigen library.

(Specific) Optimization

%ii-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%-%

Ipopt

Ipopt (Interior Point OPTimizer; a.k.a. coin-or-Ipopt) is a library for non-linear optimization. It is used in the package Real_value_function_minimizer for finding local minima of real value functions. If the library is installed in a non-standard location, the IPOPT_ROOT environment variable needs to point to the root directory of the library.

The package Real_value_function_minimizer requires at least one of the two libraries Ipopt or LBFGS++ .


LBFGS++

LBFGS++ (Limited-memory BFGS) is a header-only C++ library that implements the eponym algorithm–see the Download or clone page. It is used in the package Real_value_function_minimizer for finding local minima of real value functions. Since it is header-only, it is easy to install and to use, making an easy alternative to the Ipopt library. If the library is installed in a non-standard location, the LBFGSPP_DIR environment variable needs to point to the include directory of the library.

The package Real_value_function_minimizer requires at least one of the two libraries Ipopt or LBFGS++ .


LP solve

The binary program lp_solve is used to solve the linear programs arising when comparing energy landscapes. The environment variable LP_SOLVE must be set to indicate the location of the binary.

(Specific) Biological sequence analysis

Seqan

Seqan is a C++ header-only library providing a collection of sequence alignment algorithms. It is used in the package Alignment_engines. If the library is installed in a non-standard location, the SEQAN_DIR environment variable needs to point to the include directory of the library.

The package Alignment_engines relies on Seqan for sequence alignments only.


Muscle

Todo RT do we have to set env variables for the cmakelist?

  • Muscle : a multiple sequence alignment program; used in FunChaT

Clustal Omega

HMMER

  • HMMER: a biosequence analysis tool using profile hidden Markov models; used in FunChaT

Phobius

  • Phobius: a combined transmembrane topology and signal peptide predictor; used in FunChaT

(Specific) Misc

FLANN

FLANN (Fast Library for Approximate Nearest Neighbors) is a library offering a collection of approximate nearest neighbor algorithms and methods for peaking the best algorithm to use ependening on the input dataset. It is wrapped in the class SBL::GT::T_ANN_FLANN_wrapper of the package Spatial_search for comparing / replacing the other algorithms implemented in the package. If the library is installed in a non-standard location, the FLANN_DIR environment variable needs to point to the root directory of the library.

The packages Spatial_search and MolecularGeometryLoader can be used without this dependency, which is optional.


Gromacs

Gromacs is used for loading trajectories of conformations from XTC files in the package MolecularGeometryLoader. If the library is installed in a non-standard location, the GROMACS_DIR environment variable needs to point to the root directory of the library.

Toto

rapidxml

rapidxml is a header only C++ XML parser used in the SBL for loading force field parameters from XML files in the package Molecular_potential_energy . If the library is installed in a non-standard location, the RAPIDXML_DIR environment variable needs to point to the root directory of the library.

The package Molecular_potential_energy relies heavily on rapidxml and cannot be used without it.


OpenMP

OpenMP is used to parallelize loops in the SBL. It is particularly used for parallelizing the run of collections of modules within workflows – see Modules when using workflows. When OpenMP is used for the compilation of a program, the C++ macro SBL_WITH_OPENMP is automatically defined, enforcing the use of OpenMP for parallelizing the for loops.

MPFI

In the SBL, interval arithmetics is managed using the Boost library. However, in order to use multi-precision interval arithmetic, the library MPFI has to be installed. It is used in particular to compute the volume of unions of balls in SBL. When MPFI is used for the compilation of a program, the C++ macro SBL_WITH_MPFI is automatically defined, enforcing the use of MPFI rather than Boost for managing multi-precision interval arithmetic. If MPFI is installed in a non standard location, the MPFI_DIR environment variable needs to point to the root directory of the library.

Using the SBL: compilation and installation

Getting the source code

The source code is available from the following tarball.

It may also be obtained by cloning the read-only git repository as follows:

> git clone git://sbl.inria.fr/git/sbl.git

In the sequel, we assume that the environment variable SBL_DIR points to the directory containing the source code.

Compiling the library from the source code

To compile and install the library from this source code, CMake is used. The version 2.6 or latter of CMake is recommended. Note that the following installation requires root privileges: if you do not have them, refer to section Non standard installation directory.

The installation runs through four steps:

  • 1) Creating the build directory. From the directory containing the source code:
> mkdir build_sbl; cd build_sbl
  • 2) Running cmake. To compile the programs during the installation, you can set the SBL_APPLICATIONS tag to ON. It is also possible to compile only part of the programs by specifying a value for SBL_APPLICATIONS (Core for all applications in Core, SFM for all Space Filling Model applications, CA for Conformational Analysis applications, DM for Data Managment applications)
> cmake \<path/to/your/sbl/git/directory\> -DSBL_APPLICATIONS=ON
  • 3) Running the compilation and the installation (note that the compilation of the programs may take up to about twenty minutes) :
> make; make install

This last step will compile the programs (if SBL_APPLICATIONS is set to ON). It will also copy files around, into the standard locations indicated below (or into the directory pointed at by the CMAKE_INSTALL_PREFIX, see below):

  • the include directory of each package is copied in the /usr/include directory,
  • the include directory of the ESBTL library is copied into the /usr/include directory,
  • the Python source code of Python packages is copied into /usr/lib/python,
  • the compiled programs and python scripts are copied into /usr/bin,
  • the cmake files of the library are copied into into /usr/share/cmake,
  • the documentation, the demos of the applications and the source code of the examples are copied into /usr/share/doc.

Note that if a new version of the library is available, the installation must be carried out again upon updating the git repository.

The variable CMAKE_INSTALL_PREFIX calls for one comment. This variable should contain the name of the directory containing all subdirectories to be installed, namely include, doc, bin. Phrased differently, if one set CMAKE_INSTALL_PREFIX to /path/to/my/directory/bin, then, all sub-directories will be found below /bin, which is clearly an undesired ending.


To update one's version of the library, it is sufficient to update one's local git repository. However, note that the examples, tests and programs have to be compiled one by one. Therefore, using the library without installing it is only recommended for those willing to use the SBL library as a header-only library.


To uninstall the library i.e. remove all the installed files, proceed as follows:
> make; make uninstall


Using the SBL: advanced compilation and installation

In this section, we show the various options for compiling the different parts of the library, and installing it.

Non standard installation directory

When installing the SBL library, one may not have the root privileges, may want to install the SBL into a local directory. Doing so merely requires setting the cmake variable CMAKE_INSTALL_PREFIX to your local install directory when running cmake:

> cmake \</path/to/sbl/directory\> -DCMAKE_INSTALL_PREFIX=\</path/to/local/install/directory\>
> make
> make install

Given this target directory, executables are installed in the bin sub-folder of the target install directory while Python modules are installed in the python sub-folder.

Note that executables and Python modules installed in a non standard location will not be directly usable except if the corresponding environment variables have been set :
  • PATH for executables,
  • PYTHONPATH for Python modules.
For example, unix users equipped with a zsh shell should set the environment variables as follows:
> export PATH=$PATH:\</path/to/local/install/directory\>/bin
> export PYTHONPATH=$PYTHONPATH:\</path/to/local/install/directory\>/python


Examples and test compilation

Within a package from Core, examples are short programs showing the basic functionality provided in that package.

In addition, tests can be used in such packages, by compiling and running short test programs checking various functionalities of the packages. For compiling all the examples and the tests of the SBL library while installing it, just turn ON the tags SBL_EXAMPLES and SBL_TESTS when running cmake:

> cmake \</path/to/sbl/directory\> -DSBL_EXAMPLES=ON -DSBL_TESTS=ON
> make

Then, to test all the packages from Core, just run the tests with the following command:

> make test

Note that the previous does not require any installation step since the examples and tests are only compiled locally and only the sources of the examples are installed for documentation (in /usr/share/doc).

Debug vs release

It is possible to compile the programs in Debug or in Release mode using the cmake variable CMAKE_BUILD_TYPE. However, since the SBL library is only made of headers and programs, we recommend the Debug mode only for the developers. For compiling in Debug mode, one can run the following command:

> cmake \</path/to/sbl/directory\> -DCMAKE_BUILD_TYPE=Debug
> make 

Note that by default, the Release mode is used. Note also that to debug symbols from other libraries, if these are not header-only, compiling them in Debug mode is mandatory.

Static programs

It is possible to create static versions of the programs by setting to ON the cmake variable BUILD_STATIC_SBL:

> cmake \</path/to/sbl/directory\> -DBUILD_STATIC_SBL=ON
> make 
In order to enjoy purely static programs, all the dependencies must also be available in static mode :
  • mandatory dependencies :
    • ESBTL and Eigen are header-only libraries; so is CGAL from the version 4.10 onward.
    • GMP and Boost are generally provided by Unix systems with static versions.
    • MPFR (and CGAL before the version 4.10) require a local static compilation if not provided by the OS.
  • specific dependencies :
    • Seqan, LBFGS++ and rapidxml are header-only libraries.
    • FLANN is generally provided by Unix systems with a static version.
    • Ipopt and Gromacs require a local static compilation if not provided by the OS.


Note also that only the programs are compiled in static mode since the examples and tests are not installed. If one wants to compile static examples or static tests, such compilations should be done locally.

Installing VMD plugins

The SBL library provides VMD plugins for visualizing the output of the programs (details in section VMD (Visual Molecular Dynamics)).

It should be recalled that VMD plugins involve three ingredients:

  • The executables from the SBL, which are called by the plugins. In the sequel, we assume that these have been installed in a directory referenced in one's PATH, as explained above.
  • A folder containing the tcl code for the plugins. For the SBL, this folder is called sblvmdplugins, and we create/update it in one's home directory.
  • The so-called .vmdrc file, located in one's home directory, which parameterizes one's VMD. For SBL plugins, it indicates where the plugins are installed.

To automatically install the VMD plugins, i.e. create or update the sblvmdplugins directory and create or update the .vmdrc file, proceed as follows with cmake:

> cmake \</path/to/sbl/directory\> -DSBL_VMD_PLUGINS=ON
> make ; make install
Note that the VMD plugins do not require any compilation. That is, the aforementioned install merely installs .tcl files. These plugins call executables from the SBL, found from one's PATH environment variable.


Installing PyMOL plugins

The SBL library provides also PyMOL plugins (details in section PyMOL (Python Based Molecular Visualization System)). The installation works as for the VMD plugins, but using instead the cmake variable SBL_PYMOL_PLUGINS :

> cmake \</path/to/sbl/directory\> -DSBL_PYMOL_PLUGINS=ON
> make ; make install

The previous command looks for a folder .pymol in one's home directory, creates it if necessary, and installs the plugins. It manages the directory architecture used by PyMOL and creates or updates initialization files if necessary.

Note that the PyMOL plugins do not require any compilation. That is, the aforementioned install merely installs .py files. These plugins call executables from the SBL, found from one's PATH environment variable.


Documentation compilation

The documentation is written in Doxygen format, and can be compiled as follows using the script sbl-doc-manager.py from the scripts directory at the root of the project. This script produces the documentation and prints out the path to the index.html file, to be opened with a web browser:

> sbl-doc-manager.py -w <path/to/sbl/directory> -d <path/to/output/directory>

Note that the option -w can be omitted if the environment variable $SBL_DIR is set.

The documentation being written in Doxygen, it can be directly compiled as follows:
> cd
\</path/to/installed/sbl/directory\>/share/doc/SBL; doxygen;
In doing so, the files are generated in the current directory. The benefits of using the aforementioned script is that it also performs a number of house-keeping tasks (moving pictures, creating symbolic links, etc).


Note that some optional post-processing are done internally on the documentation, in particular for adding logos on the short description of the packages. This is done using an internal python script, see the ref sbl-devel-for-sbl-tutorial .