Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Authors: F. Cazals and T. Dreyfus

Space_filling_model_interface_finder

Goals: Interfaces and interactions within macro-molecular complexes

We consider a structure (molecule, complex) decomposed into units. A unit may be a polypeptide chain, a domain, a set of residues, etc. Our focus is on interactions between these units, and we ascribe such interactions to two categories, referred to as geometric interactions/interfaces and biochemical interactions/interfaces.

Geometric interactions/interfaces. Such interfaces typically correspond to non-covalent interactions between atoms ascribed to units. Typical cases where such interfaces are of interest are:

  • Given the asymmetric unit of a crystal structure, report the pairwise interfaces observed.
  • Given a protein complex, be it a homomer or a multimer, report the contact between the domains of the proteins involved.

For each geometric interface between two units, high level information is provided, including (i) the number of particles involved at each interface, (ii) tThe number of connected components i.e. patches of each interface. These pieces of information provide a first indication on the various contacts between the molecules in presence, e.g. to select interfaces to further investigate using the applications of Space_filling_model_interface from Space_filling_model_interface .

For protein - protein interfaces, the number of interface atoms contributed by a sub-unit strongly correlates with the interface area (measured by the buried surface area), each atom contributing on average 10 squared Angstroms [100].


Biochemical interactions/interfaces. The focus here is on more local interactions between units, such as disulfide bonds and ionic interactions–see the package Pointwise_interactions.

Applications provided. The applications $\text{\bifEC}$ and $\text{\bifED}$, for Binary Interfaces Finder.

Prerequisites

Geometric - Voronoi interfaces

Contacts. Consider a model in a PDB file, consisting of chains made of particles. We assume (details below), that each particle has been assigned a primitive label, as defined in the concept MolecularSystemLabelsTraits. For the program $\text{\bifEC}$, there is one primitive label per chain, without any hierarchy.

Consider the $\alpha$-complex of the solvent accessible representation of the particles of the model–see package Alpha_complex_of_molecular_model.

Our focus is on contacts, which are of two types, bicolor and mediated. In the sequel, we briefly recall the definitions of these concepts, and refer the reader to the package Space_filling_model_interface for more details.

An edge in the alpha complex defines a bicolor contact provided that its endpoints carry two different primitive labels. To define a mediated contact, consider a mediator particle sandwiched between two particles with different primitive labels (i.e, there is an edge from both partners' particles to the mediator's particle). Such a particle induces two mediated contacts, one with each particle carrying a primitive label – see Fig. fig-bif-mediated-contact.

Interfaces. The contacts between partners are edges of the $\alpha$-complex: these edges have a dual face termed Voronoi tile in the associated Voronoi diagram called. The set of all Voronoi tiles involving only bicolor (resp. mediated) contacts is called the bicolor (resp. mediated) interface . The union of bicolor and mediated interfaces of two given partners defines the tricolor interface , plainly called interface if there is no ambiguity.

The program $\text{\bifEC}$ collects all bicolor and mediated contacts in a molecular structure, and groups them by interface. For each interface, two parameters are reported: first, the number of particles involved, and second, the number of connected components.

Voronoi interfaces: bicolor and mediated.

This fictitious molecular structure involves two partners A (red) and B (blue) and two water molecules (gray). A and B make one bicolor contact; the water molecule $w_1$ makes two mediated contacts.

Biochemical annotations

This packages hinges on the functionalities from the package Tertiary_quaternary_structure_annotator, so as to report a number of specific interactions between regions in a structure or complex, such as:

  • salt bridges,
  • disulfide bonds,
  • other covalent interactions.

Using BIF: analysis between chains of a molecule

This section presents the program $\text{\bifEC}$, which deals with the case where one unit refers to one polypeptide chain.

Main specifications

Input. A PDB file. Note that $\text{\bifEC}$ works with water molecules that are loaded by default: removing them is done using the options –no-water .

Main options.

The main option of the program $\text{\bifEC}$ is:
-f string: PDB file of the input molecule


Note that a default radius of $1.4 \AA$ is added to all atoms to define the Solvent Accessible Model of the input molecule.

Main output. Main output files:

  • One general log file, in txt format
  • One dot file for the geometric interfaces, and the associated xml file
  • One dot file for the biochemical interfaces, and the associated xml file

Remarks.

  • For visualizing the Interfaces Graph, we recommend you to install Graphviz (see the Graphviz web site), and using the circo software for drawing the graph from a .dot file. Note that there are other software from the Graphviz library for drawing the graphs with different embedding.

Using BIF: analysis between units of a molecule

This section presents the program $\text{\bifED}$, which deals with the case where one unit refers to a set of residues in a protein.

Main specifications

Input. Running $\text{\bifED}$ requires two input files:

  • the input PDB file,
  • the labels specification file, which specifies the decomposition of a chain into regions called units. Using the specification file, one can define a template dissecting a chain into domains–or other segments, resulting in one primitive label for each segment. This template can be used for several chains, resulting in a hierarchy of labels for each chain. See the package MolecularSystemLabelsTraits.

Main options.

The main options of the program $\text{\bifED}$ are:
-f string: PDB file of the input molecule
–domain-labels string: file specifying the domains of the input molecule


Main output.

  • One general log file, in txt format
  • One dot file for the geometric interfaces, and the associated xml file
  • One dot file for the biochemical interfaces, and the associated xml file

Comments.

For the program $\text{\bifED}$, the labels are defined from the class SBL::Models::Domain_label_traits, which allows a user to specify his own labels. These labels typically define domains within a protein.


If one defines an incomplete template that is attributed to a chain, i.e there are atoms of this chain that are not included in any domain defined by the template, the atoms are attributed to the chain. This may lead to uncontrolled behaviors: an extra virtual domain representing all the remaining atoms of a chain should always be defined.


Visualization, plugins, GUIs

The SBL provides VMD and PyMOL plugins to use the programs of Space_filling_model_interface_finder . The plugins are accessible in the Extensions menu of VMD or in the Plugin menu of PyMOL . Upon termination of a calculation launched by the plugin, the following visualizations are available:

  • A graph, called the Binary Interfaces Graph, representing the binary interfaces found:

    • each vertex of this graph is embedded at the center of mass of the atoms of its corresponding label,
    • each edge of this graph is decorated by the number of atoms involved in the interface, and also by the number of connected components of the corresponding interface.

Algorithms and methods

Interfaces

Using the notions of bicolor and mediated contacts recalled in the section Prerequisites, assume that

  • each particle of the partners has been attributed a primitive partner's label
  • water molecules sandwiched between any pair of partners have been identified.

The applications perform two tasks:

  • First, for each primitive tricolor interface (composed of bicolor and mediated contacts), labels, the following pieces of information are reported: the number of atoms contributed by each partner, the number of water molecules sandwiched, and the number of contacts of the three types.
  • Second, a Union-Find algorithm is run on the mediated interfaces, so as to identify the connected components between the previous interfaces.

Ionic interactions

See algorithms in the package Pointwise_interactions.

Programmer's workflow

The programs of Space_filling_model_interface_finder described above are based upon generic C++ classes, so that additional versions can easily be developed for protein or nucleic acids, both at atomic or coarse-grain resolution.

In order to derive such versions, there are two important ingredients, that are the workflow class, and its traits class.

The traits class

T_Space_filling_model_interface_finder_traits:

The workflow class

T_Space_filling_model_interface_finder_workflow:

Jupyter demo

  • Jupyter notebook file
  • jupyter

    Finding interfaces and interactions within macro-molecular complexes

    In the sequel, we study two cases:

    • Case 1: using sbl-bif-chainsW-atomic.exe to investigate contacts between chains
    • Case 2: using sbl-bif-domainsW-atomic.exe to investigate contacts between regions/domains of a protein

    Case 1, example 1: the ternary antibody-antigen complex -- pdbid 1vfb.pdb

    Run the calculation

    In [63]:
    #!/usr/bin/python
    
    import re  #regular expressions
    import sys #misc system
    import os
    import pdb
    import shutil # python 3 only
    
    
    def find_file_in_output_directory(str, odir):
        cmd = "find %s -name *%s" % (odir,str)
        return os.popen(cmd).readlines()[0].rstrip()
        
    def run_calculation(pdbid):
        #input filename  and output directory
        pdb_prefix = pdbid.rsplit(".")[0]
        ifname = "../../../demos/data/%s" % pdbid
        odir = "results-%s" % pdb_prefix
        if not os.path.exists(odir): os.system( ("mkdir %s" % odir) )
    
        # check executable exists and is visible
        exe = shutil.which("sbl-bif-chainsW-atomic.exe")
        print( ("Using executable %s\n" % exe) )
    
        # run command
        cmd = "%s -f %s -p 1 --directory %s --verbose --output-prefix --log" % (exe,ifname,odir)
        os.system(cmd)
    
        # list output files
        ofnames = os.popen( ("ls %s" % odir) ).readlines()
        print("\nAll output files are:",ofnames)
    
        # find the lof file and display log file
        lfname = find_file_in_output_directory("log.txt", odir) 
        print("\nLog file is:", lfname)
        #log = open(lfname).readlines()
        # for line in log:         print(line.rstrip())
    
    run_calculation("1vfb.pdb")
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-bif-chainsW-atomic.exe
    
    
    All output files are: ['sbl-bif-chainsW-atomic__radius_water_1dot4__f_1vfb__p_1__biochemical_interfaces_graph.dot\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1vfb__p_1__biochemical_interfaces_graph.xml\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1vfb__p_1__log.txt\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1vfb__p_1_vor_interfaces_graph.dot\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1vfb__p_1_vor_interfaces_graph.xml\n']
    
    Log file is: results-1vfb/sbl-bif-chainsW-atomic__radius_water_1dot4__f_1vfb__p_1__log.txt
    

    Display the output

    The 1vfb file contains three polypeptide chains names A, B, and C.

    In [64]:
    from IPython.display import Image
    
    def display_graphs(pdbid):
        pdb_prefix = pdbid.rsplit(".")[0]
        ifname = "../../../demos/data/%s" % pdbid
        odir = "results-%s" % pdb_prefix
    
        # find the dot file listing interfaces
        of_interf_dot = find_file_in_output_directory("vor_interfaces_graph.dot", odir)
        of_interf_xml = find_file_in_output_directory("vor_interfaces_graph.xml", odir)           
        
    
        # plot and display image
        of_geom_png = "%s-geom_interfaces.png" % pdb_prefix
        cmd = "dot -Tpng %s -o %s" % (of_interf_dot,of_geom_png);  os.system(cmd)
       
        print("Interfaces within chains of   PDB file:")
        img_geom = Image(filename = of_geom_png, width=300, height=300);  display(img_geom)
    
        # likewise for biochemical interfaces i.e. S-S bonds and salt bridges
        of_biochem_dot = find_file_in_output_directory("biochemical_interfaces_graph.dot", odir)
        of_biochem_xml =  find_file_in_output_directory("biochemical_interfaces_graph.xml", odir)
       
    
        # plot and display image
        of_biochem_png = "%s-biochem_interfaces.png" % pdb_prefix
        cmd = "dot -Tpng %s -o %s" % (of_biochem_dot, of_biochem_png);     os.system(cmd)
        
        print("S-S bonds and salt bridges within PDB file:")
        img_biochem = Image(filename = of_biochem_png, width=300, height=300); display(img_biochem)
        
            
    display_graphs("1vfb.pdb")
    
    Interfaces within chains of   PDB file:
    
    S-S bonds and salt bridges within PDB file:
    

    For the graph of interfaces:

    • one node for each chain,
    • one edge for each interface between these two chains. Furthermore, each edge indicates the number of atoms involved and the number of connected components (patches) at this interface.

    For the graph of S-S bonds and salt bridges: no relevant information here.

    Case1, example 2: complete immunoglobulin -- pdbid 1igt.pdb

    The following example is quite interesting: as a complete immunoglobulin, we investigate the contacts between the four chains, from a geometric standpoint as well as in terms of S-S bonds and salt bridges

    In [65]:
    run_calculation("1igt.pdb")
    display_graphs("1igt.pdb")
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-bif-chainsW-atomic.exe
    
    
    All output files are: ['sbl-bif-chainsW-atomic__radius_water_1dot4__f_1igt__p_1__biochemical_interfaces_graph.dot\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1igt__p_1__biochemical_interfaces_graph.xml\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1igt__p_1__log.txt\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1igt__p_1_vor_interfaces_graph.dot\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_1igt__p_1_vor_interfaces_graph.xml\n']
    
    Log file is: results-1igt/sbl-bif-chainsW-atomic__radius_water_1dot4__f_1igt__p_1__log.txt
    Interfaces within chains of   PDB file:
    
    S-S bonds and salt bridges within PDB file:
    

    The graph of interfaces illustrates the contacts between the heavy and light chains of the IG.

    The graph of S-S bonds illustrates disulfide bonds found across the chains. S-S bonds within a chain can also be reported with the option --internal, as illustrated below.

    (left) The structure involves four chains. Note in particular the 3 disulfides bonds connecting the two heavy chains (right) The graph produced, with edges counting disulfide bonds and salt bridges. Note that there is a total of 17 / 17 S-S bonds and 18 salt bridges.

    In [70]:
    display(Image(filename ="fig/1igt-biochem-annotations--montage.png", width=500, height=500));
    

    Case 1, example 3: COP9 signalosome -- pdbid 4d10.pdb

    This more complex example involves 16 subunits.

    In [66]:
    run_calculation("4D10.pdb")
    display_graphs("4D10.pdb")
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-bif-chainsW-atomic.exe
    
    
    All output files are: ['sbl-bif-chainsW-atomic__radius_water_1dot4__f_4D10__p_1__biochemical_interfaces_graph.dot\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_4D10__p_1__biochemical_interfaces_graph.xml\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_4D10__p_1__log.txt\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_4D10__p_1_vor_interfaces_graph.dot\n', 'sbl-bif-chainsW-atomic__radius_water_1dot4__f_4D10__p_1_vor_interfaces_graph.xml\n']
    
    Log file is: results-4D10/sbl-bif-chainsW-atomic__radius_water_1dot4__f_4D10__p_1__log.txt
    Interfaces within chains of   PDB file:
    
    S-S bonds and salt bridges within PDB file:
    

    As can be seen, there are many more interfaces and biochemical features of interest. A full description of the findinds is reported in the xml files dumped into the results directory. Such files should be parsed with PALSE, see https://sbl.inria.fr/doc/PALSE-user-manual.html

    Case 2, example 1: contacts within domains for a class II fusion protein -- pdbid 4ojc.pdb

    Recall that class II fusogens act as trimers. Assume that a decomposition of a monomer into domains has been provided using labels. All contacts between such domains are easily infered.

    In [67]:
    #input filename  and output directory
    pdbid = "4ojc--EFF1-trimer.pdb"
    pdb_prefix = pdbid.rsplit(".")[0]
    ifname = "../../../demos/data/4ojc--EFF1-trimer.pdb"
    domain_labels = "../../../demos/data/4ojc--EFF1-trimer-partners.txt"
        
    odir = "results-%s" % pdb_prefix
    if not os.path.exists(odir): os.system( ("mkdir %s" % odir) )
    
    # check executable exists and is visible
    exe = shutil.which("sbl-bif-domainsW-atomic.exe")
    print( ("Using executable %s\n" % exe) )
    
    # run command
    cmd = "%s -f %s --domain-labels  %s --directory %s --verbose --output-prefix --log"\
        % (exe, ifname,domain_labels, odir)
    print("Running ",cmd)
    os.system(cmd)
    
    Using executable /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-bif-domainsW-atomic.exe
    
    Running  /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-bif-domainsW-atomic.exe -f ../../../demos/data/4ojc--EFF1-trimer.pdb --domain-labels  ../../../demos/data/4ojc--EFF1-trimer-partners.txt --directory results-4ojc--EFF1-trimer --verbose --output-prefix --log
    
    Out[67]:
    0
    In [68]:
    display_graphs("4ojc--EFF1-trimer.pdb")
    
    Interfaces within chains of   PDB file:
    
    S-S bonds and salt bridges within PDB file:
    

    As for the previous example, an automatic processing of the xml files delivered is in order.