Molecular_distances_flexible

Authors: F. Cazals and R. Tetley

Goals

General goals

Goals. The root mean square deviation (RMSD) and the least RMSD which are provided in the companion package Molecular_distances are two widely used similarity measures in structural bioinformatics. Yet, they stem from global comparisons, possibly obliterating locally conserved motifs.

To foster our understanding of molecular flexibility, this package, based upon developments presented in [40], provides so-called combined $\rmsd$ , which mixes independent $\lrmsd$ measures, each computed with its own optimal rigid motion.

The structural units which may be domains, subdomain, SSE, etc, are defined using the machinery from the package MolecularSystemLabelsTraits

The combined $\rmsd$ can be used to compare (quaternary) structures based on motifs defined from the sequence (domains, SSE), or to compare structures based on structural motifs yielded by local structural alignment methods. To handle these situations, the following three executables are provided:

$\text{\sblrmsdcombconf}$ to compare conformations of a molecule–which can be done without computing a non-trivial alignment.

$\text{\sblrmsdcombprotkpax}$ to compare homologous proteins, using the structural aligner $\text{\kpax}$ provided in the package Iterative_alignment.

$\text{\sblrmsdcombprotapurva}$ to compare homologous proteins, using the structural aligner $\text{\apurva}$ provided in the package Apurva.

$\text{\sblrmsdcombmot}$ to compare structural motifs – see also the companion package Structural_motifs .

Modes. For the previous programs and as for the package Molecular_distances, four modes are provided:

Calpha atoms: distances are computed on Calpha.
Backbone atoms : only Calpha, C and N atoms.
Heavy atoms : distances are computed on all atoms but hydrogen atoms.
All atoms: distances are computed on all atoms, including hydrogen atoms.

Two related packages are the following ones:

Molecular_distances provides implementations of the standard methods to compute distances on whole molecules.
Point_cloud_rigid_registration_3 provides the elementary geometric operations used in the current package as well as Molecular_distances. The package also provides tools to align collections of 3D points – without any semantic related to atoms.

The case of polypeptide chains: sequences, labels and local alignments

As noticed above, regions are defined using the labels, as indicated in the package MolecularSystemLabelsTraits.

Two regions with the same label specification may not contain the same number of amino-acids, which typically happens in two cases:

for proteins corresponding to the same sequence, when selected regions could not be crystallized.

for (homologous) proteins whose sequences differ.

In any case, for each label, a local alignment is run to ensure a 1-1 correspondence between a.a. In the SBL, alignments are defined in the package Alignment_engines.

We actually consider structural alignments in two settings:

Two chains with identical sequence , the alignment is trivial as common residues are retained and aligned. That is, ensuring the 1-1 correspondence amounts to taking the intersection of lists of resids.

We note that for PDB file containing structures of the same polypeptide chain – same SEQRES section, the numberings of amino acids in the ATOMS section are expected to be aligned. See primary-sequences-and-the-pdb-format for details.

The executable corresponding to this case is $\text{\sblrmsdcombconf}$ .

Two chains whose sequences differ , a structural alignment is computed using $\text{\kpax}$ or $\text{\apurva}$ – see above. The corresponding executables are $\text{\sblrmsdcombprotkpax}$ and $\text{\sblrmsdcombprotapurva}$ .

The following ones are important remarks:

The rationale for performing a structural alignment is that motifs are primarily defined based on structural elements – whence the use of $\text{\kpax}$ . For long motifs, a sequence alignment could be envisioned too.
In the sequel, it is assumed that any local alignment involves 4 or more residues; if not, a warning is issued, and the $\text{\lrmsd}$ output for the corresponding comparisons is set to -1.

Combined RMSD for proteins

Pre-requisites

Vertex weighted $\lrmsd$ . Consider two point sets $A=\{ a_i \}$ and $B=\{ b_i \}$ of size . Naturally, each point corresponds to an atom or pseudo-atom – which we generically call a particle.

Also consider a set of positive weights $\{ w_i\}_{i=1,\dots, N}$ , meant to stress the importance of certain points. The weighted $\rmsd$ reads as

$\begin{equation} \rmsdw{A}{B} = \sqrt{\frac{1}{\sum_i w_i} \sum_{i=1,\dots, N} w_i \vvnorm{a_i-b_i}^2} \end{equation}$

Let a rigid motion from the special Euclidean group . To perform a comparison of and oblivious to rigid motions, we use the so-called least RMSD [103] :

The vertex weighted $\lrmsd$ is defined by

$\begin{equation} \lrmsdvw{A}{B} = \min_{g\in SE(3)} \rmsdw{A}{g(B)}. \end{equation}$

The rigid motion yielding the minimum is denoted $\lrmsdoptrm{}(A,B)$ or $\lrmsdoptrm{}$ for short. The weight of the $\lrmsdvw$ is defined as $\rmsdW{vw}{A}{B} = \sum_i w_i$ .

Note that the celebrated $\lrmsd$ is the particular case of the previous with unit weights:

$\begin{equation} \lrmsd{A}{B} = \lrmsdvw{A}{B} \text{ with } w_i \equiv 1, \forall i. \end{equation}$

We arrive at the main definition, which combines individual RMSD:

Consider two structures

and

for which non-overlapping regions $\{ \motifacc{i}, \motifbcc{i} \}_{i=1,\dots,m}$ have been identified Assume that a $\lrmsd$ has been computed for each pair $(\motifacc{i}, \motifbcc{i} )$ . Let

be the weights associated with an individual $\lrmsd$ . The combined $\rmsd$ is defined by

$\begin{equation} \rmsdcomb{A}{B} = \sqrt{ \sum_{i=1}^m \frac{w_i}{\sum_i w_i} \lrmsds{ \motifacc{i}}{\motifbcc{i}} }. \end{equation}$

Chain mappings for quaternary structures. Consider the case where one wishes to compare chains from different quaternary structures. In that case, one needs to know the correspondence between the individual chains across these structures. To this end, we define:

A mapping between the m chains of n proteins is given in matrix form as follows:

one line provides an ordered list of the protein chains. Phrased differently, two lines define a matching between the chains of these two proteins.
one column corresponds the chains of the proteins to be compared.

Functionalities and scenarios

Functionalities. The executables $\text{\sblrmsdcombprotkpax}$ and $\text{\sblrmsdcombprotapurva}$ computes a vertex weighted $\rmsd$ for homologous proteins. A pairwise comparison therefore requires an alignment, as discussed below.

Scenarios. The scenarii discussed below depend on

the number of proteins processed – one, two or more
the number of polypeptide chains studied of interest.

Options to compare polypeptide chains. The programs $\text{\sblrmsdcombprotkpax}$ and $\text{\sblrmsdcombprotapurva}$ enjoy the same options, the main ones being:

The main options of the program $\sblrmsdcombprotkpax$ are:
–pdb-file string: PDB file
–domain-labels: Spec. file defining the subdomain labels
–chains: Which chains to load
-d: Output directory
–allow-incomplete-chains: Load the polypeptide chains even if they are incomplete
-p: Set oocupancy policy

Options to compare conformations. As noticed above, the identity alignment between residues can be used. The options are:

The main options of the program $\sblrmsdcombconf$ are:
–pdb-file string: PDB file
–domain-labels: Spec. file defining the subdomain labels
–chains: Which chains to load
-d: Output directory

The reader is referred to the jupyter notebook – section Jupyter demo for illustrations.

Comparing n polypeptide chains of one protein

This case is that of a protein with quaternary structure, in which case we compare the polypeptide chains.

Input.

One PDB file containing the protein in question
A spec file for the sub-domain definitions (as in MolecularSystemLabelsTraits).

Output. We report the matrix of $\text{\rmsdcomb}$ distances.

sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d results/onepnc -v -l

Comparing two polypeptide chains of two proteins

Consider two proteins, with a focus on one chain for each.

Input.

A PDB file for each chain.
One spec file per chain for the sub-domain definitions (as in MolecularSystemLabelsTraits) for each chain.

Output. The $\text{\rmsdcomb}$ is reported.

$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d results/twoponec -v -l --allow-incomplete-chains -p 3

Comparing n polypeptide chains of two proteins

Input. Consider two proteins, each of which with n chains, specified as follows:

A PDB file for each protein,
A spec file for each protein.

Output. A matrix of $\text{\rmsdcomb}$ is reported – one entry for each pair of chains.

$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d results/twoponec -v -l --allow-incomplete-chains -p 3

Comparing n polypeptide chains

Input.

A PDB file for each chain.
One spec file per chain for the sub-domain definitions (as in MolecularSystemLabelsTraits) for each chain.

Output. The matrix of all pairwise $\text{\rmsdcomb}$ distances is reported – one entry for each pair of chains.

$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d results/nponec -v -l --allow-incomplete-chains -p 3

Comparing m polypeptide chains of n proteins

Input. Consider now a set of proteins, each involving chains defining the common quaternary structure.

We assume a a mapping between chains is provided – see Def. def-chain-mapping. The corresponding options are the following ones:

The main options of the program $\sblrmsdcombprotkpax$ are:
–chain-mapping: The mapping between the chains which should be compared

The main options of the program $\sblrmsdcombprotapurva$ are:
–chain-mapping: The mapping between the chains which should be compared

Output. We report $\binom{n}{2}(m+1)$ numbers, corresponding to distance matrices, stored in files:

For each column of the mapping i.e. each chain, we report in a file the $\binom{n}{2}$ numbers corresponding to pairwise comparisons between the chains identified by the column – that is each entry corresponds to a comparison between two files.

The last file contains weighted lRMSD for all pairs of proteins. For a pair of proteins and , we also mix the weighted lRMSD obtained by comparing their chains (pairwise).

$SBL_DIR/Applications/Molecular_distances_flexible/src/Molecular_distances_flexible/build/sbl-rmsd-flexible-proteins.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d results/npnc -v -l --allow-incomplete-chains -p 3

../examples/results/nponec/sbl-rmsd-flexible-proteins__weighted_lrmsd_0.txt Distance matrix One file per chain mapping. Contains the distance matrix for the N proteins and a given chain.

Combined RMSD for conformations

Pre-requisites

Functionalities. The executable $\text{\sblrmsdcombconf}$ computes the $\text{\rmsdcomb}$ (vertex weighted) for conformations of a given protein. Note that a structural alignment is no longer necessary, instead the trivial identity alignment is computed. It behaves as follows:

Compares N conformations of the same molecule, on a pairwise basis. That is, for each pair, the $\lrmsd$ of specified structural units are computed, and combined into a $\rmsdcomb$ .

The structural units are specified using the mechanisms from MolecularSystemLabelsTraits

If several chains are loaded (per conformation), a chain mapping is required – see Definition def-chain-mapping.

Note that the run scenarios are the same as the previous executable (see Pre-requisites). For example runs, the user is refered to Combined RMSD for proteins .

Combined RMSD for structural motifs

Pre-requisites

Structural motifs. As explained in the companion package Structural_motifs , structural motifs are regions showing a structural conservation higher than that of the structures defining them. For two structures, a motif is defined by two sets of a.a. in one-to-one correspondence – that is one set of a.a. on each structure.

Motif graph for overlapping motifs. When several motifs exist for two structures, an important question is to handle them coherently. Since motifs may overlap, we define:

(Motif graph) The motif graph of a list of motifs $\{ (\motifa{i}, \motifb{i}) \}_{i=1,\dots,p}$ is defined as follows: its node set is the union of the particles

and

; its edge set is the union of two types of edges:

matching edges: the edges associated with the matchings defined by the motifs. NB: such edges are counted without multiplicity, that is, a matching edge present in several motifs is counted once.
motif edges: edges defining a path connecting all amino acids in a motif.

Consider a connected component (c.c.) of the motif graph. Restricting each c.c. to each structure yields two subgraphs. The set of all such subgraphs is denoted $\{ \motifacc{i}, \motifbcc{i} \}_{i=1,\dots,m}$ .

Consider now the case where motifs have been defined for the two structures and . We wish to compare and exploiting the information yielded by the connected components of the motif graph.

Consider the i-th c.c. of the motif graph. Let be the number of matching edges of this c.c. As usual, let the position of atom from $\motifbcc{i}$ matched with atom from $\motifacc{i}$ , upon applying a rigid motion . We define:

The edge weighted $\lrmsdew$ of the i-th c.c. of the motif graph is defined by

$\begin{align} \lrmsdew{\motifacc{i}}{\motifbcc{i}} &= \min_{g\in SE(3)} \sqrt{\frac{1}{e_i} \sum_{j=1}^{e_i} \vvnorm{a_j - g(b_j)}^2 } \end{align}$

The rigid motion yielding the minimum is denoted $\lrmsdoptrm{i}$ . The weight of the $\lrmsdew$ is defined as $\rmsdW{ew}{\motifacc{i}}{\motifbcc{i}} = e_i$ .

This definition recalled, we note that the $\rmsdcomb$ from Eq. def-rmsd-comb generalizes, using edge weighted rather than vertex weighted $\rmsd$ .

Using

The input requires two structures (PDB files) and a specification of motifs. The motifs are defined as an identifier followed by a list of aligned residues (example file below).

sbl-rmsd-flexible-motifs.exe --pdb-file data/pdb_files/SFV-1RER.pdb --chains A --pdb-file data/pdb-files/RVFV.pdf --motif-file data/motifs.txt --allow-incomplete-chains -p 3 -d results/motifs -v -l

The main options of the program $\sblrmsdcombmot$ are:
–pdb-file string: PDB file
–motif-file: Spec. file defining the motifs
–chains: Which chains to load

../examples/results/onepnc/sbl-rmsd-flexible-motifs_motif_graph.xml Motif graph XML archive containing all the residues in the motif graph components

Programmer's Workflow

Pre-requisites

The implementation rationale behind the three executables is straightforward. Each executable has a workflow consisiting of one loader (SBL::IO::T_Protein_representation_loader, see Protein_representation) and one module. There are two different modules used among the three workflows :

SBL::Modules::T_Subdomain_comparator_module< ModuleTraits > is used in the $\sblrmsdcombconf$ and $\sblrmsdcombprotkpax$ and $\sblrmsdcombprotapurva$ executables. It simply instantiates the pairwise comparisons between all the specified labels. In the former case, it uses the trivial identity alignment (from the ModuleTraits class), in the latter case, it uses Apurva or $\text{\kpax}$ from Iterative_alignment.

SBL::Modules::T_RMSD_comb_for_motifs_module< ModuleTraits > is used in $\sblrmsdcombmot$ . See Molecular_distances for more details.

Workflow example

The workflows are extremely basic. One out of the three is displayed below as an example.

T_Local_structural_comparison_workflow:

Implementation using Kpax

We note in passing that the implementation of $\sblrmsdcombprotkpax$ is done as follows: this package i.e. Molecular_distances_flexible defines in the file Structure_for_kpax.hpp the structure which is used to instantiate $\text{\kpax}$ from package Iterative_alignment; in turn, $\text{\kpax}$ uses alignment data structures from Alignment_engines.

Jupyter demo

See the following jupyter notebook:

Jupyter notebook file
Molecular_distances_flexible
Molecular_distances_flexible¶
We illustrate calculations involving the so-called combined RMSD or RMSD Comb. As test case, we use class II fusion proteins, decomposing each each polypeptide chain into 23 regions (see preprint by Tetley et al).

Preparing directories for the calculations to be carried out¶
In [24]:

import os import sys import pdb from SBL import SBL_pytools from SBL_pytools import SBL_pytools as sblpyt odir = "results-new" sdirs = ["n-pc-one-protein", "two-pc", "n-pc-two-proteins", "n-pc", "m-pc-n-proteins", "motifs"] for sdir in sdirs: w = "%s/%s" % (odir, sdir) if not os.path.exists(w): os.system( ("mkdir -p %s" % w) )
(User manual, section 2.3) Comparing n polypeptide chains of one protein¶

We report the matrix of RMSDcomb distances

Nb: calculations using kpax and apura are shown for the sake of completeness. But since the sequences of the three chains are identical, the alignments are trivial, and the results for kpax and apurva identical.
In [32]:

def cmp_proteins_with_aligner_n_pc_one_protein(odir, aligner, aligner_tag): osdir = "n-pc-one-protein" cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d %s/%s -v -l" % (aligner,odir,osdir) print(("Running %s" % cmd)) os.system(cmd) ofn = "%s/%s/sbl-rmsd-flexible-proteins-%s__weighted_lrmsd.txt" % (odir,osdir,aligner_tag) odir_osdir = "%s/%s" % (odir,osdir) file_suffix = "%s__weighted_lrmsd.txt" % aligner_tag sblpyt.show_text_file(file_suffix, odir_osdir) aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe" # kpax cmp_proteins_with_aligner_n_pc_one_protein(odir, aligner_kpax, "kpax") aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva cmp_proteins_with_aligner_n_pc_one_protein(odir, aligner_apurva, "apurva")
Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d results-new/n-pc-one-protein -v -l Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt ++Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt A B C A B 0.0316797 C 0.150864 0.155959 --Done Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC -d results-new/n-pc-one-protein -v -l Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt ++Showing file results-new/n-pc-one-protein/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt A B C A B 0.0316797 C 0.150864 0.155959 --Done
(User manual, section 2.4) Comparing two polypeptide chains of two proteins¶

The decomposition of each chain is specified using labels

The RMSDcomb is reported
In [28]:

def cmp_proteins_with_aligner_two_pc(odir, aligner, aligner_tag): osdir = "two-pc" cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir) print(("Running %s" % cmd)) os.system(cmd) odir_osdir = "%s/%s" % (odir,osdir) file_suffix = "%s__labels_lrmsd.txt" % aligner_tag sblpyt.show_text_file(file_suffix, odir_osdir) aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe" # kpax cmp_proteins_with_aligner_two_pc(odir, aligner_kpax, "kpax") aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva cmp_proteins_with_aligner_two_pc(odir, aligner_apurva, "apurva")
Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d results-new/two-pc -v -l --allow-incomplete-chains -p 3 Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-kpax__labels_lrmsd.txt ++Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-kpax__labels_lrmsd.txt Label comparison between chain A of SFV-1RER and chain A of TBEV A 2.969 9.000 B 1.555 7.000 B0 0.857 7.000 C 0.373 4.000 C0 0.394 4.000 D0 1.379 11.000 E 0.359 5.000 E0 0.943 7.000 F 1.136 7.000 F0 0.514 5.000 G 0.345 6.000 G0 1.276 5.000 H0 0.561 5.000 I0 1.362 8.000 a 0.503 6.000 b 0.833 9.000 c 1.228 7.000 d 0.817 9.000 e 0.650 5.000 f 0.303 4.000 g 0.150 4.000 h -1.000 0.000 i 0.609 6.000 j 0.688 5.000 k -1.000 0.000 l -1.000 0.000 --Done Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A -d results-new/two-pc -v -l --allow-incomplete-chains -p 3 Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-apurva__labels_lrmsd.txt ++Showing file results-new/two-pc/sbl-rmsd-flexible-proteins-apurva__labels_lrmsd.txt Label comparison between chain A of SFV-1RER and chain A of TBEV A 2.969 9.000 B 1.555 7.000 B0 1.642 7.000 C 0.691 5.000 C0 0.590 5.000 D0 2.125 11.000 E 0.862 5.000 E0 1.434 8.000 F 1.136 7.000 F0 1.013 5.000 G 1.505 6.000 G0 1.276 5.000 H0 0.807 5.000 I0 1.362 8.000 a 0.503 6.000 b 2.288 10.000 c 1.826 7.000 d 1.496 9.000 e 0.650 5.000 f 0.501 4.000 g 1.288 4.000 h -1.000 0.000 i 1.619 6.000 j 0.841 5.000 k -1.000 0.000 l -1.000 0.000 --Done
(User manual, section 2.5) Comparing n polypeptide chains of two proteins¶

The decomposition of each chain is specified using labels

The matrix of all pairwise distances RMSDcomb (between pairs of chains) is reported
In [30]:

def cmp_proteins_with_aligner_n_pc_two_proteins(odir, aligner, aligner_tag): osdir = "n-pc-two-proteins" cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir) print(("Running %s" % cmd)) os.system(cmd) odir_osdir = "%s/%s" % (odir,osdir) file_suffix = "%s__weighted_lrmsd.txt" % aligner_tag sblpyt.show_text_file(file_suffix, odir_osdir) aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe" # kpax cmp_proteins_with_aligner_n_pc_two_proteins(odir, aligner_kpax, "kpax") aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva cmp_proteins_with_aligner_n_pc_two_proteins(odir, aligner_apurva, "apurva")
Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d results-new/n-pc-two-proteins -v -l --allow-incomplete-chains -p 3 Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt ++Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt SFV-1RER_A SFV-1RER_B SFV-1RER_C TBEV_A 1.167 1.171 1.185 TBEV_B 1.165 1.168 1.203 TBEV_C 1.161 1.164 1.199 --Done Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC -d results-new/n-pc-two-proteins -v -l --allow-incomplete-chains -p 3 Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt ++Showing file results-new/n-pc-two-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt SFV-1RER_A SFV-1RER_B SFV-1RER_C TBEV_A 1.575 1.580 1.581 TBEV_B 1.574 1.579 1.579 TBEV_C 1.567 1.572 1.572 --Done
(User manual, section 2.6) Comparing n polypeptide chains¶

The decomposition of each chain is specified using labels

The chain of interest in each file is specified

The matrix of all pairwise distances RMSDcomb (between pairs of chains) is reported
In [31]:

def cmp_proteins_with_aligner_n_pc(odir,aligner, aligner_tag): osdir = "n-pc" cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir) print(("Running %s" % cmd)) os.system(cmd) odir_osdir = "%s/%s" % (odir,osdir) file_suffix = "%s__weighted_lrmsd.txt" % aligner_tag sblpyt.show_text_file(file_suffix, odir_osdir) aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe" # kpax cmp_proteins_with_aligner_n_pc(odir,aligner_kpax, "kpax") aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva cmp_proteins_with_aligner_n_pc(odir,aligner_apurva, "apurva")
Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d results-new/n-pc -v -l --allow-incomplete-chains -p 3 Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt ++Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd.txt SFV-1RER TBEV EFF1 SFV-1RER 0 1.167 0.982 TBEV 1.167 0 1.158 EFF1 0.982 1.158 0 --Done Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains A --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains A --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains A -d results-new/n-pc -v -l --allow-incomplete-chains -p 3 Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt ++Showing file results-new/n-pc/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd.txt SFV-1RER TBEV EFF1 SFV-1RER 0 1.575 1.356 TBEV 1.575 0 1.539 EFF1 1.356 1.539 0 --Done
(User manual, section 2.7) Comparing m polypeptide chains of n proteins¶

As above, except that one matrix of RMSDcomb per chain is reported
In [33]:

def cmp_proteins_with_aligner_m_pc_n_chains(odir,aligner, aligner_tag): osdir = "m-pc-n-proteins" cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d %s/%s -v -l --allow-incomplete-chains -p 3" % (aligner,odir,osdir) print(("Running %s" % cmd)) os.system(cmd) odir_osdir = "%s/%s" % (odir,osdir) file_suffix = "%s__weighted_lrmsd_chain_0.txt" % aligner_tag sblpyt.show_text_file(file_suffix, odir_osdir) file_suffix = "%s__weighted_lrmsd_chain_1.txt" % aligner_tag sblpyt.show_text_file(file_suffix, odir_osdir) file_suffix = "%s__weighted_lrmsd_chain_2.txt" % aligner_tag sblpyt.show_text_file(file_suffix, odir_osdir) aligner_kpax = "sbl-rmsd-flexible-proteins-kpax.exe" # kpax cmp_proteins_with_aligner_m_pc_n_chains(odir,aligner_kpax, "kpax") aligner_apurva = "sbl-rmsd-flexible-proteins-apurva.exe" # apurva cmp_proteins_with_aligner_m_pc_n_chains(odir,aligner_apurva, "apurva")
Running sbl-rmsd-flexible-proteins-kpax.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d results-new/m-pc-n-proteins -v -l --allow-incomplete-chains -p 3 Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_0.txt ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_0.txt SFV-1RER TBEV EFF1 SFV-1RER 1.167 0.982 TBEV 1.167 1.158 EFF1 0.982 1.158 --Done Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_1.txt ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_1.txt SFV-1RER TBEV EFF1 SFV-1RER 1.168 0.858 TBEV 1.168 1.128 EFF1 0.858 1.128 --Done Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_2.txt ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-kpax__weighted_lrmsd_chain_2.txt SFV-1RER TBEV EFF1 SFV-1RER 1.199 0.853 TBEV 1.199 1.119 EFF1 0.853 1.119 --Done Running sbl-rmsd-flexible-proteins-apurva.exe --pdb-file data/pdb_files/SFV-1RER.pdb --domain-labels data/spec_files/SFV.spec --chains ABC --pdb-file data/pdb_files/TBEV.pdb --domain-labels data/spec_files/TBEV.spec --chains ABC --pdb-file data/pdb_files/EFF1.pdb --domain-labels data/spec_files/EFF1.spec --chains ABC --chain-mapping data/mapping.txt -d results-new/m-pc-n-proteins -v -l --allow-incomplete-chains -p 3 Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_0.txt ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_0.txt SFV-1RER TBEV EFF1 SFV-1RER 1.575 1.356 TBEV 1.575 1.539 EFF1 1.356 1.539 --Done Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_1.txt ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_1.txt SFV-1RER TBEV EFF1 SFV-1RER 1.579 1.353 TBEV 1.579 1.537 EFF1 1.353 1.537 --Done Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_2.txt ++Showing file results-new/m-pc-n-proteins/sbl-rmsd-flexible-proteins-apurva__weighted_lrmsd_chain_2.txt SFV-1RER TBEV EFF1 SFV-1RER 1.572 1.367 TBEV 1.572 1.531 EFF1 1.367 1.531 --Done
Comparing proteins using precomputed motifs¶
In [23]:

# cmp proteins using motifs #i################################################################################ def cmp_proteins_with_motifs(): ## 4.2 Combined RMSD for structural motifs osdir = "motifs" aligner = "sbl-rmsd-flexible-motifs.exe" cmd = "%s --pdb-file data/pdb_files/SFV-1RER.pdb --chains A --pdb-file data/pdb_files/RVFV.pdb --motif-file data/motifs.txt --allow-incomplete-chains -p 3 -d %s/%s -v -l" % (aligner,odir,osdir) print(("Running %s" % cmd)) os.system(cmd) cmp_proteins_with_motifs()
Running sbl-rmsd-flexible-motifs.exe --pdb-file data/pdb_files/SFV-1RER.pdb --chains A --pdb-file data/pdb_files/RVFV.pdb --motif-file data/motifs.txt --allow-incomplete-chains -p 3 -d results-new/motifs -v -l
In [ ]:
In [ ]:
In [ ]:

Table of Contents

Molecular_distances_flexible

Goals

General goals

The case of polypeptide chains: sequences, labels and local alignments

Combined RMSD for proteins

Pre-requisites

Functionalities and scenarios

Comparing n polypeptide chains of one protein

Comparing two polypeptide chains of two proteins

Comparing n polypeptide chains of two proteins

Comparing n polypeptide chains

Comparing m polypeptide chains of n proteins

Combined RMSD for conformations

Pre-requisites

Combined RMSD for structural motifs

Pre-requisites

Using

Programmer's Workflow

Pre-requisites

Workflow example

Implementation using Kpax

Jupyter demo

Molecular_distances_flexible¶

Preparing directories for the calculations to be carried out¶

(User manual, section 2.3) Comparing n polypeptide chains of one protein¶

(User manual, section 2.4) Comparing two polypeptide chains of two proteins¶

(User manual, section 2.5) Comparing n polypeptide chains of two proteins¶

(User manual, section 2.6) Comparing n polypeptide chains¶

(User manual, section 2.7) Comparing m polypeptide chains of n proteins¶

Comparing proteins using precomputed motifs¶