Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Molecular_distances_flexible

Authors: F. Cazals and R. Tetley

Goals

General goals

Goals. The root mean square deviation (RMSD) and the least RMSD which are provided in the companion package Molecular_distances are two widely used similarity measures in structural bioinformatics. Yet, they stem from global comparisons, possibly obliterating locally conserved motifs.

To foster our understanding of molecular flexibility, this package, based upon developments presented in [40], provides so-called combined $\rmsd$, which mixes independent $\lrmsd$ measures, each computed with its own optimal rigid motion.

The structural units which may be domains, subdomain, SSE, etc, are defined using the machinery from the package MolecularSystemLabelsTraits

The combined $\rmsd$ can be used to compare (quaternary) structures based on motifs defined from the sequence (domains, SSE), or to compare structures based on structural motifs yielded by local structural alignment methods. To handle these situations, the following three executables are provided:

  • $\text{\sblrmsdcombconf}$ to compare conformations of a molecule–which can be done without computing a non-trivial alignment.
  • $\text{\sblrmsdcombprotkpax}$ to compare homologous proteins, using the structural aligner $\text{\kpax}$ provided in the package Iterative_alignment.
  • $\text{\sblrmsdcombprotapurva}$ to compare homologous proteins, using the structural aligner $\text{\apurva}$ provided in the package Apurva.
  • $\text{\sblrmsdcombmot}$ to compare structural motifs – see also the companion package Structural_motifs .

Modes. For the previous programs and as for the package Molecular_distances, four modes are provided:

  • Calpha atoms: distances are computed on Calpha.
  • Backbone atoms : only Calpha, C and N atoms.
  • Heavy atoms : distances are computed on all atoms but hydrogen atoms.
  • All atoms: distances are computed on all atoms, including hydrogen atoms.
Two related packages are the following ones:
  • Molecular_distances provides implementations of the standard methods to compute distances on whole molecules.
  • Point_cloud_rigid_registration_3 provides the elementary geometric operations used in the current package as well as Molecular_distances. The package also provides tools to align collections of 3D points – without any semantic related to atoms.


The case of polypeptide chains: sequences, labels and local alignments

As noticed above, regions are defined using the labels, as indicated in the package MolecularSystemLabelsTraits.

Two regions with the same label specification may not contain the same number of amino-acids, which typically happens in two cases:

  • for proteins corresponding to the same sequence, when selected regions could not be crystallized.
  • for (homologous) proteins whose sequences differ.

In any case, for each label, a local alignment is run to ensure a 1-1 correspondence between a.a. In the SBL, alignments are defined in the package Alignment_engines.

We actually consider structural alignments in two settings:

  • Two chains with identical sequence , the alignment is trivial as common residues are retained and aligned. That is, ensuring the 1-1 correspondence amounts to taking the intersection of lists of resids.

We note that for PDB file containing structures of the same polypeptide chain – same SEQRES section, the numberings of amino acids in the ATOMS section are expected to be aligned. See primary-sequences-and-the-pdb-format for details.

The executable corresponding to this case is $\text{\sblrmsdcombconf}$.

  • Two chains whose sequences differ , a structural alignment is computed using $\text{\kpax}$ or $\text{\apurva}$ – see above. The corresponding executables are $\text{\sblrmsdcombprotkpax}$ and $\text{\sblrmsdcombprotapurva}$ .
The following ones are important remarks:
  • The rationale for performing a structural alignment is that motifs are primarily defined based on structural elements – whence the use of $\text{\kpax}$. For long motifs, a sequence alignment could be envisioned too.
  • In the sequel, it is assumed that any local alignment involves 4 or more residues; if not, a warning is issued, and the $\text{\lrmsd}$ output for the corresponding comparisons is set to -1.


Combined RMSD for proteins

Pre-requisites

Vertex weighted $\lrmsd$. Consider two point sets $A=\{ a_i \}$ and $B=\{ b_i \}$ of size $N$. Naturally, each point corresponds to an atom or pseudo-atom – which we generically call a particle.

Also consider a set of positive weights $\{ w_i\}_{i=1,\dots, N}$, meant to stress the importance of certain points. The weighted $\rmsd$ reads as

\begin{equation} \rmsdw{A}{B} = \sqrt{\frac{1}{\sum_i w_i} \sum_{i=1,\dots, N} w_i \vvnorm{a_i-b_i}^2} \end{equation}

Let $g$ a rigid motion from the special Euclidean group $SE(3)$. To perform a comparison of $A$ and $B$ oblivious to rigid motions, we use the so-called least RMSD [103] :

The vertex weighted $\lrmsd$ is defined by

\begin{equation} \lrmsdvw{A}{B} = \min_{g\in SE(3)} \rmsdw{A}{g(B)}. \end{equation}

The rigid motion yielding the minimum is denoted $\lrmsdoptrm{}(A,B)$ or $\lrmsdoptrm{}$ for short. The weight of the $\lrmsdvw$ is defined as $\rmsdW{vw}{A}{B} = \sum_i w_i$.


Note that the celebrated $\lrmsd$ is the particular case of the previous with unit weights:

\begin{equation} \lrmsd{A}{B} = \lrmsdvw{A}{B} \text{ with } w_i \equiv 1, \forall i. \end{equation}

We arrive at the main definition, which combines individual RMSD:

Consider two structures $A$ and $B$ for which non-overlapping regions $ \{ \motifacc{i}, \motifbcc{i} \}_{i=1,\dots,m}$ have been identified Assume that a $\lrmsd$ has been computed for each pair $(\motifacc{i}, \motifbcc{i} )$. Let $w_i$ be the weights associated with an individual $\lrmsd$. The combined $\rmsd$ is defined by

\begin{equation} \rmsdcomb{A}{B} = \sqrt{ \sum_{i=1}^m \frac{w_i}{\sum_i w_i} \lrmsds{ \motifacc{i}}{\motifbcc{i}} }. \end{equation}


Chain mappings for quaternary structures. Consider the case where one wishes to compare chains from different quaternary structures. In that case, one needs to know the correspondence between the individual chains across these structures. To this end, we define:

A mapping between the m chains of n proteins is given in matrix form as follows:
  • one line provides an ordered list of the $m$ protein chains. Phrased differently, two lines define a matching between the $m$ chains of these two proteins.
  • one column corresponds the chains of the $n$ proteins to be compared.


Functionalities and scenarios

Functionalities. The executables $\text{\sblrmsdcombprotkpax}$ and $\text{\sblrmsdcombprotapurva}$ computes a vertex weighted $\rmsd$ for homologous proteins. A pairwise comparison therefore requires an alignment, as discussed below.

Scenarios. The scenarii discussed below depend on

  • the number of proteins processed – one, two or more
  • the number of polypeptide chains studied of interest.

Options to compare polypeptide chains. The programs $\text{\sblrmsdcombprotkpax}$ and $\text{\sblrmsdcombprotapurva}$ enjoy the same options, the main ones being:

The main options of the program $\sblrmsdcombprotkpax$ are:
–pdb-file string: PDB file
–domain-labels: Spec. file defining the subdomain labels
–chains: Which chains to load
-d: Output directory
–allow-incomplete-chains: Load the polypeptide chains even if they are incomplete
-p: Set oocupancy policy


Options to compare conformations. As noticed above, the identity alignment between residues can be used. The options are:

The main options of the program $\sblrmsdcombconf$ are:
–pdb-file string: PDB file
–domain-labels: Spec. file defining the subdomain labels
–chains: Which chains to load
-d: Output directory


The reader is referred to the jupyter notebook – section Jupyter demo for illustrations.

Comparing n polypeptide chains of one protein

This case is that of a protein with quaternary structure, in which case we compare the polypeptide chains.

Input.

Output. We report the matrix of $\text{\rmsdcomb}$ distances.

sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d results/onepnc -v -l


Comparing two polypeptide chains of two proteins

Consider two proteins, with a focus on one chain for each.

Input.

  • A PDB file for each chain.
  • One spec file per chain for the sub-domain definitions (as in MolecularSystemLabelsTraits) for each chain.

Output. The $\text{\rmsdcomb}$ is reported.

$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d results/twoponec -v -l --allow-incomplete-chains -p 3


Comparing n polypeptide chains of two proteins

Input. Consider two proteins, each of which with n chains, specified as follows:

  • A PDB file for each protein,
  • A spec file for each protein.

Output. A matrix of $\text{\rmsdcomb}$ is reported – one entry for each pair of chains.

$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d results/twoponec -v -l --allow-incomplete-chains -p 3


Comparing n polypeptide chains

Input.

  • A PDB file for each chain.
  • One spec file per chain for the sub-domain definitions (as in MolecularSystemLabelsTraits) for each chain.

Output. The matrix of all pairwise $\text{\rmsdcomb}$ distances is reported – one entry for each pair of chains.

$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d results/nponec -v -l --allow-incomplete-chains -p 3


Comparing m polypeptide chains of n proteins

Input. Consider now a set of $n$ proteins, each involving $m$ chains defining the common quaternary structure.

We assume a a mapping between chains is provided – see Def. def-chain-mapping. The corresponding options are the following ones:

The main options of the program $\sblrmsdcombprotkpax$ are:
–chain-mapping: The mapping between the chains which should be compared


The main options of the program $\sblrmsdcombprotapurva$ are:
–chain-mapping: The mapping between the chains which should be compared


Output. We report $\binom{n}{2}(m+1)$ numbers, corresponding to $m+1$ distance matrices, stored in $m+1$ files:

  • For each column of the mapping i.e. each chain, we report in a file the $\binom{n}{2}$ numbers corresponding to pairwise comparisons between the chains identified by the column – that is each entry corresponds to a comparison between two files.
  • The last file contains weighted lRMSD for all pairs of proteins. For a pair of proteins $P_i$ and $P_j$, we also mix the weighted lRMSD obtained by comparing their chains (pairwise).
$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d results/npnc -v -l --allow-incomplete-chains -p 3


../examples/results/nponec/sbl-rmsd-flexible-proteins__weighted_lrmsd_0.txt Distance matrix One file per chain mapping. Contains the distance matrix for the N proteins and a given chain.


Combined RMSD for conformations

Pre-requisites

Functionalities. The executable $\text{\sblrmsdcombconf}$ computes the $\text{\rmsdcomb}$ (vertex weighted) for conformations of a given protein. Note that a structural alignment is no longer necessary, instead the trivial identity alignment is computed. It behaves as follows:

  • Compares N conformations of the same molecule, on a pairwise basis. That is, for each pair, the $\lrmsd$ of specified structural units are computed, and combined into a $\rmsdcomb$.
  • If several chains are loaded (per conformation), a chain mapping is required – see Definition def-chain-mapping.

Note that the run scenarios are the same as the previous executable (see Pre-requisites). For example runs, the user is refered to Combined RMSD for proteins .

Combined RMSD for structural motifs

Pre-requisites

Structural motifs. As explained in the companion package Structural_motifs , structural motifs are regions showing a structural conservation higher than that of the structures defining them. For two structures, a motif is defined by two sets of a.a. in one-to-one correspondence – that is one set of a.a. on each structure.

Motif graph for overlapping motifs. When several motifs exist for two structures, an important question is to handle them coherently. Since motifs may overlap, we define:

(Motif graph) The motif graph of a list of motifs $\{ (\motifa{i}, \motifb{i}) \}_{i=1,\dots,p}$ is defined as follows: its node set is the union of the particles $A$ and $B$; its edge set is the union of two types of edges:
  • matching edges: the edges associated with the matchings defined by the motifs. NB: such edges are counted without multiplicity, that is, a matching edge present in several motifs is counted once.
  • motif edges: edges defining a path connecting all amino acids in a motif.
Consider a connected component (c.c.) of the motif graph. Restricting each c.c. to each structure yields two subgraphs. The set of all such subgraphs is denoted $\{ \motifacc{i}, \motifbcc{i} \}_{i=1,\dots,m}$.


Consider now the case where motifs have been defined for the two structures $A$ and $B$. We wish to compare $A$ and $B$ exploiting the information yielded by the connected components of the motif graph.

Consider the i-th c.c. of the motif graph. Let $e_i$ be the number of matching edges of this c.c. As usual, let $g(b_j)$ the position of atom $b_j$ from $\motifbcc{i}$ matched with atom $a_j$ from $\motifacc{i}$, upon applying a rigid motion $g$. We define:

The edge weighted $\lrmsdew$ of the i-th c.c. of the motif graph is defined by

\begin{align} \lrmsdew{\motifacc{i}}{\motifbcc{i}} &= \min_{g\in SE(3)} \sqrt{\frac{1}{e_i} \sum_{j=1}^{e_i} \vvnorm{a_j - g(b_j)}^2 } \end{align}

The rigid motion yielding the minimum is denoted $\lrmsdoptrm{i}$. The weight of the $\lrmsdew$ is defined as $\rmsdW{ew}{\motifacc{i}}{\motifbcc{i}} = e_i$.


This definition recalled, we note that the $\rmsdcomb$ from Eq. def-rmsd-comb generalizes, using edge weighted rather than vertex weighted $\rmsd$.

Using

The input requires two structures (PDB files) and a specification of motifs. The motifs are defined as an identifier followed by a list of aligned residues (example file below).

sbl-rmsd-flexible-motifs.exe --pdb-file data/pdb_files/SFV-1RER.pdb --chains A --pdb-file data/pdb-files/RVFV.pdf --motif-file data/motifs.txt --allow-incomplete-chains -p 3 -d results/motifs -v -l


The main options of the program $\sblrmsdcombmot$ are:
–pdb-file string: PDB file
–motif-file: Spec. file defining the motifs
–chains: Which chains to load


../examples/results/onepnc/sbl-rmsd-flexible-motifs_motif_graph.xml Motif graph XML archive containing all the residues in the motif graph components


Programmer's Workflow

Pre-requisites

The implementation rationale behind the three executables is straightforward. Each executable has a workflow consisiting of one loader (SBL::IO::T_Protein_representation_loader, see Protein_representation) and one module. There are two different modules used among the three workflows :

  • SBL::Modules::T_Subdomain_comparator_module< ModuleTraits > is used in the $\sblrmsdcombconf$ and $\sblrmsdcombprotkpax$ and $\sblrmsdcombprotapurva$ executables. It simply instantiates the pairwise comparisons between all the specified labels. In the former case, it uses the trivial identity alignment (from the ModuleTraits class), in the latter case, it uses Apurva or $\text{\kpax}$ from Iterative_alignment.
  • SBL::Modules::T_RMSD_comb_for_motifs_module< ModuleTraits > is used in $\sblrmsdcombmot$. See Molecular_distances for more details.

Workflow example

The workflows are extremely basic. One out of the three is displayed below as an example.

T_Local_structural_comparison_workflow:

Implementation using Kpax

We note in passing that the implementation of $\sblrmsdcombprotkpax$ is done as follows: this package i.e. Molecular_distances_flexible defines in the file Structure_for_kpax.hpp the structure which is used to instantiate $\text{\kpax}$ from package Iterative_alignment; in turn, $\text{\kpax}$ uses alignment data structures from Alignment_engines.

Jupyter demo

See the following jupyter notebook:

  • Jupyter notebook file
  • Molecular_distances_flexible

    Molecular_distances_flexible

    We illustrate calculations involving the so-called combined RMSD or RMSD Comb. As test case, we use class II fusion proteins, decomposing each each polypeptide chain into 23 regions (see preprint by Tetley et al).

    Preparing directories for the calculations to be carried out

    In [24]:
    import os
    import sys
    import pdb
    
    from SBL import SBL_pytools
    from SBL_pytools import SBL_pytools as sblpyt
    
    
    odir = "results-new"
    sdirs = ["n-pc-one-protein", "two-pc", "n-pc-two-proteins", "n-pc", "m-pc-n-proteins", "motifs"]
    
    for sdir in sdirs:
        w = "%s/%s" % (odir, sdir)
        if not os.path.exists(w):
            os.system( ("mkdir -p %s" % w) )
    

    (User manual, section 2.3) Comparing n polypeptide chains of one protein

    • We report the matrix of RMSDcomb distances
    • Nb: calculations using kpax and apura are shown for the sake of completeness. But since the sequences of the three chains are identical, the alignments are trivial, and the results for kpax and apurva identical.
    In [32]:
    def cmp_proteins_with_aligner_n_pc_one_protein(odir, aligner, aligner_tag):
    
        osdir = "n-pc-one-protein"
        cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d %s/%s -v -l" % (aligner,odir,osdir)
        print(("Running %s" % cmd))
        os.system(cmd)
    
        ofn = "%s/%s/sbl-rmsd-flexible-proteins-%s__weighted_lrmsd.txt" % (odir,osdir,aligner_tag)
        odir_osdir = "%s/%s" % (odir,osdir)
        file_suffix = "%s__weighted_lrmsd.txt" % aligner_tag
        sblpyt.show_text_file(file_suffix, odir_osdir)
     
    aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe"   # kpax
    cmp_proteins_with_aligner_n_pc_one_protein(odir, aligner_kpax, "kpax")
    
    aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva
    cmp_proteins_with_aligner_n_pc_one_protein(odir, aligner_apurva, "apurva")
    
    Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d results-new/n-pc-one-protein -v -l
    Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt
    
    ++Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt
    	A	B	C
    A
    B	0.0316797
    C	0.150864	0.155959
    --Done
    
    
    Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d results-new/n-pc-one-protein -v -l
    Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt
    
    ++Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt
    	A	B	C
    A
    B	0.0316797
    C	0.150864	0.155959
    --Done
    
    
    

    (User manual, section 2.4) Comparing two polypeptide chains of two proteins

    • The decomposition of each chain is specified using labels
    • The RMSDcomb is reported
    In [28]:
    def cmp_proteins_with_aligner_two_pc(odir, aligner, aligner_tag):
    
        osdir = "two-pc"
        cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir)
        print(("Running %s" % cmd))
        os.system(cmd)
         
        odir_osdir = "%s/%s" % (odir,osdir)
        file_suffix = "%s__labels_lrmsd.txt" % aligner_tag
        sblpyt.show_text_file(file_suffix, odir_osdir)
    
    aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe"   # kpax
    cmp_proteins_with_aligner_two_pc(odir, aligner_kpax, "kpax")
    
    aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva
    cmp_proteins_with_aligner_two_pc(odir, aligner_apurva, "apurva")
    
    Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d results-new/two-pc -v -l --allow-incomplete-chains -p 3
    Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-kpax__labels_lrmsd.txt
    
    ++Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-kpax__labels_lrmsd.txt
    Label comparison between chain A of SFV-1RER and chain A of TBEV
    A 2.969	9.000
    B 1.555	7.000
    B0 0.857	7.000
    C 0.373	4.000
    C0 0.394	4.000
    D0 1.379	11.000
    E 0.359	5.000
    E0 0.943	7.000
    F 1.136	7.000
    F0 0.514	5.000
    G 0.345	6.000
    G0 1.276	5.000
    H0 0.561	5.000
    I0 1.362	8.000
    a 0.503	6.000
    b 0.833	9.000
    c 1.228	7.000
    d 0.817	9.000
    e 0.650	5.000
    f 0.303	4.000
    g 0.150	4.000
    h -1.000	0.000
    i 0.609	6.000
    j 0.688	5.000
    k -1.000	0.000
    l -1.000	0.000
    --Done
    
    
    Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d results-new/two-pc -v -l --allow-incomplete-chains -p 3
    Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-apurva__labels_lrmsd.txt
    
    ++Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-apurva__labels_lrmsd.txt
    Label comparison between chain A of SFV-1RER and chain A of TBEV
    A 2.969	9.000
    B 1.555	7.000
    B0 1.642	7.000
    C 0.691	5.000
    C0 0.590	5.000
    D0 2.125	11.000
    E 0.862	5.000
    E0 1.434	8.000
    F 1.136	7.000
    F0 1.013	5.000
    G 1.505	6.000
    G0 1.276	5.000
    H0 0.807	5.000
    I0 1.362	8.000
    a 0.503	6.000
    b 2.288	10.000
    c 1.826	7.000
    d 1.496	9.000
    e 0.650	5.000
    f 0.501	4.000
    g 1.288	4.000
    h -1.000	0.000
    i 1.619	6.000
    j 0.841	5.000
    k -1.000	0.000
    l -1.000	0.000
    --Done
    
    
    

    (User manual, section 2.5) Comparing n polypeptide chains of two proteins

    • The decomposition of each chain is specified using labels
    • The matrix of all pairwise distances RMSDcomb (between pairs of chains) is reported
    In [30]:
     def cmp_proteins_with_aligner_n_pc_two_proteins(odir, aligner, aligner_tag):
    
        osdir = "n-pc-two-proteins"
        cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir)
        print(("Running %s" % cmd))
        os.system(cmd)
    
        odir_osdir = "%s/%s" % (odir,osdir)
        file_suffix = "%s__weighted_lrmsd.txt" % aligner_tag
        sblpyt.show_text_file(file_suffix, odir_osdir)
    
    aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe"   # kpax
    cmp_proteins_with_aligner_n_pc_two_proteins(odir, aligner_kpax, "kpax")
    
    aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva
    cmp_proteins_with_aligner_n_pc_two_proteins(odir, aligner_apurva, "apurva")
    
    Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d results-new/n-pc-two-proteins -v -l --allow-incomplete-chains -p 3
    Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt
    
    ++Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt
    	SFV-1RER_A	SFV-1RER_B	SFV-1RER_C
    TBEV_A	1.167	1.171	1.185
    TBEV_B	1.165	1.168	1.203
    TBEV_C	1.161	1.164	1.199
    --Done
    
    
    Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d results-new/n-pc-two-proteins -v -l --allow-incomplete-chains -p 3
    Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt
    
    ++Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt
    	SFV-1RER_A	SFV-1RER_B	SFV-1RER_C
    TBEV_A	1.575	1.580	1.581
    TBEV_B	1.574	1.579	1.579
    TBEV_C	1.567	1.572	1.572
    --Done
    
    
    

    (User manual, section 2.6) Comparing n polypeptide chains

    • The decomposition of each chain is specified using labels
    • The chain of interest in each file is specified
    • The matrix of all pairwise distances RMSDcomb (between pairs of chains) is reported
    In [31]:
    def cmp_proteins_with_aligner_n_pc(odir,aligner, aligner_tag):
        osdir = "n-pc"
        cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir)
        print(("Running %s" % cmd))
        os.system(cmd)
    
      
        odir_osdir = "%s/%s" % (odir,osdir)
        file_suffix = "%s__weighted_lrmsd.txt" % aligner_tag
        sblpyt.show_text_file(file_suffix, odir_osdir)
    
    aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe"   # kpax
    cmp_proteins_with_aligner_n_pc(odir,aligner_kpax, "kpax")
     
    aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva
    cmp_proteins_with_aligner_n_pc(odir,aligner_apurva, "apurva")
     
    
    Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d results-new/n-pc -v -l --allow-incomplete-chains -p 3
    Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt
    
    ++Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER	0	1.167	0.982
    TBEV	1.167	0	1.158
    EFF1	0.982	1.158	0
    --Done
    
    
    Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d results-new/n-pc -v -l --allow-incomplete-chains -p 3
    Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt
    
    ++Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER	0	1.575	1.356
    TBEV	1.575	0	1.539
    EFF1	1.356	1.539	0
    --Done
    
    
    

    (User manual, section 2.7) Comparing m polypeptide chains of n proteins

    • As above, except that one matrix of RMSDcomb per chain is reported
    In [33]:
    def cmp_proteins_with_aligner_m_pc_n_chains(odir,aligner, aligner_tag):
        osdir = "m-pc-n-proteins"
        cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir)
        print(("Running %s" % cmd))
        os.system(cmd)
        
        odir_osdir = "%s/%s" % (odir,osdir)
            
        file_suffix = "%s__weighted_lrmsd_chain_0.txt" % aligner_tag
        sblpyt.show_text_file(file_suffix, odir_osdir)
        
        file_suffix = "%s__weighted_lrmsd_chain_1.txt" % aligner_tag
        sblpyt.show_text_file(file_suffix, odir_osdir)
          
        file_suffix = "%s__weighted_lrmsd_chain_2.txt" % aligner_tag
        sblpyt.show_text_file(file_suffix, odir_osdir)
    
    
        
    aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe"   # kpax
    cmp_proteins_with_aligner_m_pc_n_chains(odir,aligner_kpax, "kpax")
     
    aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva
    cmp_proteins_with_aligner_m_pc_n_chains(odir,aligner_apurva, "apurva")
     
    
    Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d results-new/m-pc-n-proteins -v -l --allow-incomplete-chains -p 3
    Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_0.txt
    
    ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_0.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER			1.167	0.982
    TBEV	1.167			1.158
    EFF1	0.982	1.158
    --Done
    
    
    Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_1.txt
    
    ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_1.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER			1.168	0.858
    TBEV	1.168			1.128
    EFF1	0.858	1.128
    --Done
    
    
    Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_2.txt
    
    ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_2.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER			1.199	0.853
    TBEV	1.199			1.119
    EFF1	0.853	1.119
    --Done
    
    
    Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d results-new/m-pc-n-proteins -v -l --allow-incomplete-chains -p 3
    Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_0.txt
    
    ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_0.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER			1.575	1.356
    TBEV	1.575			1.539
    EFF1	1.356	1.539
    --Done
    
    
    Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_1.txt
    
    ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_1.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER			1.579	1.353
    TBEV	1.579			1.537
    EFF1	1.353	1.537
    --Done
    
    
    Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_2.txt
    
    ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_2.txt
    	SFV-1RER	TBEV	EFF1
    SFV-1RER			1.572	1.367
    TBEV	1.572			1.531
    EFF1	1.367	1.531
    --Done
    
    
    

    Comparing proteins using precomputed motifs

    In [23]:
    # cmp proteins using motifs
    #i################################################################################
    def cmp_proteins_with_motifs():
        ## 4.2 Combined RMSD for structural motifs
        osdir = "motifs"
        aligner = "sbl-rmsd-flexible-motifs.exe"
        cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --chains A --pdb-file data/pdb_files/RVFV.pdb --motif-file data/motifs.txt --allow-incomplete-chains -p 3 -d %s/%s -v -l" % (aligner,odir,osdir)
        print(("Running %s" % cmd))
        os.system(cmd)
    cmp_proteins_with_motifs()
    
    Running sbl-rmsd-flexible-motifs.exe --pdb-file data/pdb_files/SFV-1RER.pdb --chains A --pdb-file data/pdb_files/RVFV.pdb --motif-file data/motifs.txt --allow-incomplete-chains -p 3 -d results-new/motifs -v -l
    
    In [ ]:
     
    
    In [ ]:
     
    
    In [ ]: