Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Molecular_cradle

Authors: M. Simsir and F. Cazals

Molecular cradle: a combined analysis based on conformations, states, and rigid blocks

Main goals. A suitable framework to describe certain biomolecules is that of almost rigid blocks whose relative position changes over time. Stable relative positions are often characteristic of certain states of the mechanism studied. More specifically, the notion of state refers to meta-stable state, that is a coherent ensemble of conformations which may be observed experimentally. Understanding the connexion between such states and groups of (sub)domains which characterize them is the goal of this package. In short, in a manner akin to the analysis of Newton's cradle, we wish to study complex dynamics of a biomolecular system by identifying which regions account for specific states, using static structures only.

This endeavor is in particular tractable when an ensemble of (high-resolution) structures is available, as recently formalized in [145] . This package provides three types of analysis to deal with such systems. To describe them, we take the following for granted:

  • A list of conformations of sequence identical or homologous polypeptide chains. A conformation is also called a chain instance or instance for short. We may also assume that a number of chain instances have a label corresponding to a particular state in the biomolecular mechanism studied. Instances which do not have a label are termed unlabeled.
  • A template decomposing the polypeptide chain of interest into subdomains. Typically, subdomains are regions connected by linkers. See e.g. the decompositions used in the package Molecular_distances_flexible .

The three analysis provided in this package. These analysis are:

1. Classifying states of unlabeled monomers. This step is a clustering step aiming at identifying groups of coherent structures, with one state per cluster. The clustering defining these states is called the reference clustering.

2. Identifying subdomains compatible with the states of the reference clustering. The goal is to identify those subdomains characteristic of (selected) states.

3. Characterizing the evolution of Voronoi interfaces between subdomains across states. Voronoi interfaces characterize non covalent interactions between molecules or domains. The goal is to identify interfaces characteristic of (selected) states. We therefore compare interface sizes between subdomains across states.

These steps are implemented by three scripts, called $\text{\codecx{sbl-cradle-step1.py}}$, $\text{\codecx{sbl-cradle-step2.py}}$ and $\text{\codecx{sbl-cradle-step3.py}}$ respectively.

Case study: AcrB. As an illustration, we study AcrB, a trimeric membrane protein member of the resistance-nodulation-cell-division superfamily, involved in the active transport of a variety of molecules [145] . Numerous crystal structures of AcrB have been obtained, from which a complex mechanism akin to a peristaltic pump has been inferred. Along this mechanism, which uses the protomotive force, monomers cycle across three states denoted Access (A), Binding (B), and Extrusion (E). See The reader is referred to [145] for a comprehensive bibliography and results obtained with all crystal structures known to date.

General pre-requisites

Molecular distances

The methods provided in this package are suitable in two settings, for which different molecular distances are used:

Case 1: all chain instances share the same primary sequence: the molecular distance used is the classical least RMSD (lRMSD), or possibly the or the combined RMSD [31] ( $\text{\rmsdcomb}$), which mixes lRMSD of individual domains or subdomains.

In that case, the executable used is:

  • $\text{\sblrmsdcombconf}$ : comparing two conformations of the same polypeptide chain, so that amino acids are numbered identically for all chains. (Nb: if selected a.a. are unresolved in one structure or the other, only those a.a. present in both structures are used.)

The executable is provided by the package Molecular_distances_flexible .

Case 2: chain instances are homologous: computing a molecular distance requires performing an alignment. Two options are proposed:

  • $\text{\sblrmsdcombprotapurva}$: uses the $\text{\apurva}$ combinatorial alignment method from the package Apurva .
In the Structural Bioinformatics Library, all molecular distance calculations for proteins can be carried out at four levels: Calpha atoms, backbone atoms, heavy atoms, all atoms – see Molecular_distances_flexible . Since the cradle package makes it possible to handle homologous proteins, the Calpha option is retained.


Mean displacement and the significance of lRMSDs

Consider a (sub)domain of a chain which is known to adopt several states in a mechanism. Given a set of instances of this (sub)domain, for different states of the containing monomer, several lRMSD comparison can be undertaken, as introduced in [145] :

  • Intra state comparison: distribution of the lRMSD observed for all pairs of instances of (sub)domains within a state.
  • Inter state comparison: distribution of the lRMSD observed for all pairs of (sub)domains in the Cartesian product of the two states.

On the other hand, in a crystal structure, atomic oscillation amplitudes $\overline{u}$ are related to B-factors by the formula $B = 8\pi^2 \overline{u}^2$ .

Comparing mean displacements against lRMSD of (intra, inter) state comparisons then allows to single out those comparisons which are positive – lRMSD larger than the mean displacement.

Based on these quantities, a classification of subdomains as static, dynamic or unstable is proposed in [145] .

For example, a subdomain which not exhibit any positive intra state comparison, but has at least one positive inter state comparison, is termed dynamic. (Nb: a sufficient but not necessary condition, see [145] .) Such subdomains are used in the second step.

Voronoi interfaces

A molecular space filling model (SFM) is a molecular representation with one ball per atom. The SFM can be partitioned by a so-called Voronoi diagram, which assigns one Voronoi cell to each atom. Furthermore, one can compute the restriction of each atomic ball, as the intersection between the ball and its Voronoi region.

Consider now two domains or subdomains in a molecule. An interfacial pair for these two (sub)domains is a pair of atoms, one on each (sub)domain, such that their Voronoi restriction are in contact – their share a so-called Voronoi facet. The interface size between these two (sub)domains is then defined as the number of interfacial atoms.

In this package, we focus in particular on interfaces between pairs of subdomains, which gives an indication on their spatial proximity as a function of the state considered.

In practice, Voronoi interfaces are computed using the package Space_filling_model_interface_finder .

Using Molecular_cradle: script sbl-cradle-step1.py

Pre-requisites

Reference clustering and chain-to-state-mapping. In step 1, we compute a hierarchical clustering of the chain instances using one of the aforementioned molecular distances. The output is a dendogram, for which several linkage options can be used (see options below). It is up to the user to decide whether this clustering yields well separated clusters. If so, a chain-to-state id mapping can be defined, providing a state id for each chain instance. See the illustration below.

The state id of a chain is the character string providing the corresponding state for that chain. (Nb: the number of different strings must equal the number of clusters in the reference clustering.)


Protein decomposition template specification. This template decomposes a chain into subdomains, using the following format:

  • polypeptide chain and its hierarchical decomposition specified by a triple:

    • protein name (below: protein-name)
    • domain name (below: domain). nb: use eg whole if the whole protein is considered
    • subdomains, each specified by a list of integer intervals. See example below.

  • The template, one per chain, is defined in a spec file whose name follows the following convention:
    • protein-name_rmsd-type_domain_chain-id.spec
    See example below.

Here the spec file for the whole chain of AcrB:

#the first line starts the template and give it a name
domains-template-begin AcrB
#the following lines contain: the name of the label, then the ranges
#of residues corresponding to this label (including the bounds)
Whole 1-1044
#the star denotes the complementary, i.e all residues not
#mentionned before in the template
# Coil is not taken into account here to maximize interest of flexible RMSD
#COIL **
#terminates the template
end
#enumerates the chains and possibly associates template to them
chains-enumeration-begin
A like AcrB
end
#groups hierarchically the chains
chains-hierarchy-begin
M1 A
end
The decomposition template is optional for step 1 since the clustering is carried out by default on whole molecules by default. If a decomposition template with at least two subdomains is provided, the global lRMSD is replaced by the $\text{\rmsdcomb}$ of these subdomains. The file below provides such a specification, for the so-called Coil2 subdomain of AcrB.


The (sub)domain specification in these spec files must use the exact same name, even for two different proteins.


#the first line starts the template and give it a name
domains-template-begin AcrB
#the following lines contain: the name of the label, then the ranges
#of residues corresponding to this label (including the bounds)
Coil2 132-137
#the star denotes the complementary, i.e all residues not
#mentionned before in the template
# Coil is not taken into account here to maximize interest of flexible RMSD
#COIL **
#terminates the template
end
#enumerates the chains and possibly associates template to them
chains-enumeration-begin
A like AcrB
end
#groups hierarchically the chains
chains-hierarchy-begin
M1 A
end

Protein database specification. The database of structures processed specifies in a file (csv) format the chain instances processed. The database specification is based on (PDB id, protein name, chains id(s)) provided in a csv file. Note that the previous decomposition template is applied to each chain.

Example:

pdb;protein;chains
2dr6;AcrB;ABC
3aoa;AcrB;ABC
3w9h;AcrB;ABC
4dx5;AcrB;ABC
4zit;AcrB;ABCDEF

Optional: chain-to-state map file. It may happen that the state of selected (but not all) chains is known. If so, the mapping chain-to-state is specified in a text file (csv format). For the sake of presentation, it is also requested to specify one color per state, to be used to display the leaves of the dendogram produced by the hierarchical clustering.

The number of entries in this file is at most the number of structures in the database. A state whose chain/color is not specified is display in black in the clustering.

The following file illustrate this mapping for AcrB, using the three states A, B, and E:

chain-to-state mapping

PDB;state;color
3w9h_A;A;red
4dx5_A;A;red
4zit_A;A;red
2dr6_C;A;red
3aoa_C;A;red
4zit_D;A;red
2dr6_A;B;cyan
3aoa_A;B;cyan
3w9h_B;B;cyan
4dx5_B;B;cyan
4zit_B;B;cyan
4zit_E;B;cyan
3w9h_C;E;green
4dx5_C;E;green
4zit_C;E;green
2dr6_B;E;green
3aoa_B;E;green
4zit_F;E;green

Input of the script

The main options of the program sbl-cradle-step1.py are:
(-d, –idir) string: directory containing all input files (required)
(-x, –exe_type) string: alignment type used in case of multiple protein comparisons (default: kpax)
(-c, –csv_file_info) string: CSV file containing the database specification (pdb;Protein;chains) (required)
(-sf, –statefile) string: CSV file providing the chain-to-state mapping (pdb_chain;state id) (optional)
(-l, –linkage) string: Linkage type for the hierarchical clustering: "average" (default), "single", "complete", "ward"
(-w, –workflow) string(= Workflow type: lRMSD computation (computation), clustering (analysis), both (both): default)


The script is called as follows:

sbl-cradle-step1.py  -d data/cradle_step1 -x kpax -c data/cradle-input-structures.csv -sf data/cradle-chain-instance-to-state-map.csv

Output of the script

Main output generated are:

  • (Main output step1-1) Dendogram generated by the hierarchical clustering. As noted above, the leaves are colored by states when the chain-to-sate mapping is provided.
  • (Main output step1-2) Aggregated matrix with all pairwise distances ( $\text{\rmsd}$ or $\text{\rmsdcomb}$)
  • (Main output step1-3) Aggregated matrix with size of the alignments which accompany the distance calculation. This file is of special interest if homologous chains are compared, to make sure that comparable numbers of a.a. are used in all distances calculations.
The comparison of molecular distances should be accompanied by a check of the number of residues involved, as a small alignment size typically results in a smaller distance. This is particularly critical when comparing homologous proteins.


Using Molecular_cradle: script sbl-cradle-step2.py

Pre-requisites

Recall that step 2 aim to identify subdomains which are coherent with the reference clustering. Two pieces of information are required to do so:

  • decomposition template for each chain,
  • chain-to-state mapping for all chains.
For complex molecules, it is advised to carry out this step for conformations of the same chain only. Indeed, homologous proteins may not have the same dynamic subdomains.


Input of the script

The main options of the program sbl-cradle-step2.py are:
(-d, –pdbdir) string: directory containing input pdb files (required)
(-s, –specdir) string: directory containing input spec files (required)
(-x, –exe_type) string: alignment type used in case of multiple protein comparisons (default: kpax)
(-c, –csv_file_info) string: CSV file containing the database specification (pdb;Protein;chains) (required)
(-sf, –statefile) string: CSV file providing the chain-to-state mapping (pdb_chain;state id) (optional)
(-l, –linkage) string: Linkage type for the hierarchical clustering: "average" (default), "single", "complete", "ward"
(-w, –workflow) string: Workflow type: lRMSD computation (computation), clustering (analysis), both (both, default)


Output of the script

Main output generated are:

  • (Main output step2-1) Mean displacement for each subdomain.
  • (Main output step2-2) For each subdomain: table comparing the mean displacement against the inter-states and intra-states lRMSD.
  • (Main output step2-3) Table listing the dynamic subdomains.
  • (Main output step2-4) Table summarizing the correctness of the clusterings obtained for each subdomain under the $\text{\lrmsd}$.
  • (Main output step2-5) Table summarizing the correctness of clusterings obtained for subsets of dynamics subdomains under $\text{\rmsdcomb}$.

Using Molecular_cradle: script sbl-cradle-step3.py

Pre-requisites

Recall that step 3 aims at characterizing the stability/evolution of interfaces between subdomains when changing states. The pre-requisites are identical to those of step 2.

For complex molecules, it is advised to carry out this step for conformations of the same chain only. Indeed, homologous proteins may not have identical interfaces between subdomains.


Input of the script

The main options of the program sbl-cradle-step3.py are:
(-d, –pdbdir) string: directory containing input pdb files (required)
(-s, –specdir) string: directory containing input spec files (required)
(-c, –csv_file_info) string: CSV file containing the database specification (pdb;Protein;chains) (required)
(-sf, –statefile) string: CSV file providing the chain-to-state mapping (pdb_chain;state id) (optional)


Output of the script

Main output generated are:

  • (Main output step3-1) Matrix of interface sizes between subdomains. Since the matrix is symmetric, the upper triangular part only is filled. For a given pair, three values are reported, one per state; the value for a state is the median value of the interface size (number of interfacial atoms) observed for all interfaces between the two domains of interest, in a given state.

Using Molecular_cradle: script sbl-cradle-step3.py

Algorithms and Methods

The analysis provided by the previous scripts involve the following three steps:

Step 1 : Classifying states of unlabeled monomers. The main classes are:

Step 2 : Identifying subdomains compatible with the states of the reference clustering. The main classes are:

Step 3 : Characterizing the evolution of Voronoi interfaces between subdomains. The main class is:

Dependencies

Packages from the SBL.

Other packages.

  • Handling PDB files was done using Biopython PDB:

Biopython Structural Bioinformatics and [41].

Installation. To use the aforementioned three scripts:

  • Make sure all executables used are visible from one's PATH environment variable. The executables are $\text{\sblrmsdcombconf}$, $\text{\sblrmsdcombprotkpax}$, $\text{\sblintervorabw}$ .

Jupyter demo

See the following jupyter notebook:

  • Jupyter notebook file
  • Molecular_cradle

    Molecular_cradle

    This jupyter notebook illustrates the three steps of our workflow, which are

    • Classifying states of unlabeled monomers,
    • Identifying subdomains compatible with states ABE,
    • Characterizing the evolution of interfaces between subdomains along state changes.

    For the sake of execution time, the notebook uses solely 5 PDB files containing a total of 18 monomers.

    Performing all pairwise structural comparisons using kpax is indeed a costly operation. The notebook runs within <10' on a laptop computer.

    In [1]:
    # Loading required python libraries
    from SBL import Molecular_cradle_step1
    from Molecular_cradle_step1 import *
    
    from SBL import Molecular_cradle_step2
    from Molecular_cradle_step2 import *
    
    from SBL import Molecular_cradle_step3
    from Molecular_cradle_step3 import *
    
    from SBL import SBL_pytools
    from SBL_pytools import SBL_pytools as sblpyt
    

    (Step 1) Classifying states of unlabeled monomers

    The main input consists of:

    • data: the directory containing PDB and spec files
    • cradle-input-structures.csv: database of structures
    • kpax: name of the method used to perform structural alignemnts
    • cradle-chain-instance-to-state-map.csv: the chain-to-state mapping
    • average: average linkage used to perform the hierarchical clustering

    The output consists of:

    • dendogram, lRMSD based, showing the three known clusters associated to the states A, B, and E (png file)
    • the matrix of pairwise distances from which the dendogram is built (csv file)
    In [2]:
    # python3 ./sbl-cradle-step1.py  -d data/cradle_step1 -c data/cradle-input-structures.csv -x kpax -sf data/cradle-chain-instance-to-state-map.csv -w both -l average -v 1
    
    
    step1_options = Cradle_step1_options("./data/cradle_step1", "./data/cradle-input-structures.csv", "kpax", "./data/cradle-chain-instance-to-state-map.csv", "both", "average", 0)
    step1 = Cradle_step1(step1_options)
    step1.run() 
    
    lrmsd_whole
    
    In [3]:
    # We display the dendogram (Main output step1-1)
    sblpyt.show_image("lrmsd_whole/full_matrix_lrmsd_whole_average.png")
    
    In [4]:
    # We display the pairwise distance matrix (Main output step1-2)
    sblpyt.show_this_text_file("lrmsd_whole/full_matrix_lrmsd_whole.csv")
    
    ++Showing file lrmsd_whole/full_matrix_lrmsd_whole.csv
    	2dr6_A	2dr6_B	2dr6_C	3aoa_A	3aoa_B	3aoa_C	3w9h_A	3w9h_B	3w9h_C	4dx5_A	4dx5_B	4dx5_C	4zit_A	4zit_B	4zit_C	4zit_D	4zit_E	4zit_F
    2dr6_A	0.000	3.065	2.358	0.686	3.133	2.364	2.223	0.834	2.946	2.384	0.980	2.994	2.072	1.026	3.117	1.877	1.056	3.065
    2dr6_B	3.065	0.000	3.286	3.137	0.820	3.278	3.205	3.078	0.903	3.319	3.249	0.932	3.068	3.149	1.123	3.056	3.127	1.234
    2dr6_C	2.358	3.286	0.000	2.395	3.348	0.700	0.975	2.317	3.238	1.066	2.435	3.216	1.283	2.375	3.441	1.468	2.390	3.356
    3aoa_A	0.686	3.137	2.395	0.000	3.174	2.397	2.237	0.807	3.025	2.387	0.940	3.070	2.064	0.941	3.174	1.842	0.965	3.106
    3aoa_B	3.133	0.820	3.348	3.174	0.000	3.331	3.251	3.135	0.999	3.358	3.314	1.029	3.091	3.176	1.190	3.081	3.166	1.282
    3aoa_C	2.364	3.278	0.700	2.397	3.331	0.000	0.991	2.318	3.230	1.116	2.429	3.211	1.314	2.378	3.420	1.487	2.398	3.334
    3w9h_A	2.223	3.205	0.975	2.237	3.251	0.991	0.000	2.142	3.137	0.652	2.273	3.111	0.952	2.193	3.325	1.192	2.224	3.230
    3w9h_B	0.834	3.078	2.317	0.807	3.135	2.318	2.142	0.000	2.949	2.284	0.726	2.998	1.942	0.770	3.116	1.730	0.830	3.059
    3w9h_C	2.946	0.903	3.238	3.025	0.999	3.230	3.137	2.949	0.000	3.241	3.131	0.611	2.953	3.023	0.890	2.944	3.001	1.015
    4dx5_A	2.384	3.319	1.066	2.387	3.358	1.116	0.652	2.284	3.241	0.000	2.406	3.188	1.053	2.350	3.461	1.356	2.355	3.359
    4dx5_B	0.980	3.249	2.435	0.940	3.314	2.429	2.273	0.726	3.131	2.406	0.000	3.160	2.060	0.759	3.285	1.798	0.843	3.220
    4dx5_C	2.994	0.932	3.216	3.070	1.029	3.211	3.111	2.998	0.611	3.188	3.160	0.000	2.947	3.069	0.820	2.949	3.037	0.952
    4zit_A	2.072	3.068	1.283	2.064	3.091	1.314	0.952	1.942	2.953	1.053	2.060	2.947	0.000	1.907	3.174	0.767	2.000	3.081
    4zit_B	1.026	3.149	2.375	0.941	3.176	2.378	2.193	0.770	3.023	2.350	0.759	3.069	1.907	0.000	3.186	1.683	0.601	3.124
    4zit_C	3.117	1.123	3.441	3.174	1.190	3.420	3.325	3.116	0.890	3.461	3.285	0.820	3.174	3.186	0.000	3.186	3.153	0.590
    4zit_D	1.877	3.056	1.468	1.842	3.081	1.487	1.192	1.730	2.944	1.356	1.798	2.949	0.767	1.683	3.186	0.000	1.787	3.095
    4zit_E	1.056	3.127	2.390	0.965	3.166	2.398	2.224	0.830	3.001	2.355	0.843	3.037	2.000	0.601	3.153	1.787	0.000	3.107
    4zit_F	3.065	1.234	3.356	3.106	1.282	3.334	3.230	3.059	1.015	3.359	3.220	0.952	3.081	3.124	0.590	3.095	3.107	0.000
    --Done
    
    
    
    In [5]:
    # We display the alignement size matrix (Main output step1-3)
    sblpyt.show_this_text_file("lrmsd_whole/alignment-size_lrmsd_whole.csv")
    
    ++Showing file lrmsd_whole/alignment-size_lrmsd_whole.csv
    	2dr6_A	2dr6_B	2dr6_C	3aoa_A	3aoa_B	3aoa_C	3w9h_A	3w9h_B	3w9h_C	4dx5_A	4dx5_B	4dx5_C	4zit_A	4zit_B	4zit_C	4zit_D	4zit_E	4zit_F
    2dr6_A	1022	1022	1022	1022	1022	1022	1019	1019	1019	1022	1019	1019	1021	1021	1021	1021	1021	1021
    2dr6_B	1022	1022	1022	1022	1022	1022	1019	1019	1019	1022	1019	1019	1021	1021	1021	1021	1021	1021
    2dr6_C	1022	1022	1022	1022	1022	1022	1019	1019	1019	1022	1019	1019	1021	1021	1021	1021	1021	1021
    3aoa_A	1022	1022	1022	1022	1022	1022	1019	1019	1019	1022	1019	1019	1021	1021	1021	1021	1021	1021
    3aoa_B	1022	1022	1022	1022	1022	1022	1019	1019	1019	1022	1019	1019	1021	1021	1021	1021	1021	1021
    3aoa_C	1022	1022	1022	1022	1022	1022	1019	1019	1019	1022	1019	1019	1021	1021	1021	1021	1021	1021
    3w9h_A	1019	1019	1019	1019	1019	1019	1033	1033	1033	1033	1033	1033	1032	1032	1032	1032	1032	1032
    3w9h_B	1019	1019	1019	1019	1019	1019	1033	1033	1033	1033	1033	1033	1032	1032	1032	1032	1032	1032
    3w9h_C	1019	1019	1019	1019	1019	1019	1033	1033	1033	1033	1033	1033	1032	1032	1032	1032	1032	1032
    4dx5_A	1022	1022	1022	1022	1022	1022	1033	1033	1033	1044	1033	1033	1042	1043	1042	1042	1042	1042
    4dx5_B	1019	1019	1019	1019	1019	1019	1033	1033	1033	1033	1033	1033	1032	1032	1032	1032	1032	1032
    4dx5_C	1019	1019	1019	1019	1019	1019	1033	1033	1033	1033	1033	1033	1032	1032	1032	1032	1032	1032
    4zit_A	1021	1021	1021	1021	1021	1021	1032	1032	1032	1042	1032	1032	1042	1042	1042	1042	1042	1042
    4zit_B	1021	1021	1021	1021	1021	1021	1032	1032	1032	1043	1032	1032	1042	1043	1042	1042	1042	1042
    4zit_C	1021	1021	1021	1021	1021	1021	1032	1032	1032	1042	1032	1032	1042	1042	1042	1042	1042	1042
    4zit_D	1021	1021	1021	1021	1021	1021	1032	1032	1032	1042	1032	1032	1042	1042	1042	1042	1042	1042
    4zit_E	1021	1021	1021	1021	1021	1021	1032	1032	1032	1042	1032	1032	1042	1042	1042	1042	1042	1042
    4zit_F	1021	1021	1021	1021	1021	1021	1032	1032	1032	1042	1032	1032	1042	1042	1042	1042	1042	1042
    --Done
    
    
    

    (Step 2) Identifying subdomains compatible with states ABE

    The main input consists of:

    • PDB structures and the associated pieces of information (see step 1)
    • the spec files
    • chain instance to state map (see step 1)
    • the linkage type (see step 1)
    • alignment method (see step 1)

    The output consists of:

    • the mean displacement for each subdomain (csv file)
    • values obtained for the intra state and inter state comparisons (csv file)
    • dynamic sub-domains found (csv file)
    • table providing the correctness of clusterings obtained with lRMSD and RMSDcomb (csv file)
    • dendogram of RMSDcomb with dynamic subdomains (png file)
    In [8]:
    # python3 ./sbl-cradle-step2.py  -d data/cradle_step1/PDB -s data/cradle_step2/spec_files -c data/cradle-input-structures.csv -x kpax -sf data/cradle-chain-instance-to-state-map.csv -l average -v 1
    
    
    step2_options = Cradle_step2_options("./data/cradle_step1/PDB", "./data/cradle_step2/spec_files", "./data/cradle-input-structures.csv", "kpax", "./data/cradle-chain-instance-to-state-map.csv", "average", 0)
    step2 = Cradle_step2(step2_options)
    step2.run() 
    
    rmsdc_subdomains
    lrmsd_TM
    lrmsd_aHelix
    lrmsd_Loop2
    lrmsd_Loop8
    lrmsd_Loop9
    lrmsd_Loop11
    rmsdc_TMLoop2
    rmsdc_TMLoop8
    rmsdc_TMLoop11
    rmsdc_Loop2Loop8
    rmsdc_Loop2Loop11
    rmsdc_Loop8Loop11
    rmsdc_TMLoop2Loop8
    rmsdc_TMLoop2Loop11
    rmsdc_TMLoop8Loop11
    rmsdc_Loop2Loop8Loop11
    rmsdc_TMLoop2Loop8Loop11
    
    In [9]:
    # We display the mean displacement for each subdomain (Main output step2-1)
    sblpyt.show_this_text_file("mean_displacement.csv")
    
    ++Showing file mean_displacement.csv
    DC 1.0127
    DN 1.0230
    Loop1 1.0325
    Loop10 1.0189
    Loop11 1.1413
    Loop2 1.0246
    Loop3 0.9940
    Loop4 0.9955
    Loop5 1.0403
    Loop6 1.1378
    Loop7 1.0903
    Loop8 1.1868
    Loop9 1.0529
    PC1 1.0855
    PC2 1.0877
    PN1 0.9510
    PN2 1.0374
    TM 1.1000
    aHelix 1.1870
    --Done
    
    
    
    In [10]:
    # We display the values for intra and inter subdomain comparisons. (Main output step2-2)
    # u : mean displacement, AvsA lrmsd or A versus A, ...
    sblpyt.show_this_text_file("subdomains_lrmsd_u.csv")
    
    ++Showing file subdomains_lrmsd_u.csv
    Subdomain	u	AvsA	AvsB	AvsE	BvsB	BvsE	EvsE
    DC	1.01	0.56	0.571	0.545	0.624	0.649	0.549
    DN	1.02	0.658	0.778	0.716	0.776	0.822	0.699
    Loop1	1.03	0.637	0.884	0.477	0.695	0.892	0.342
    Loop10	1.01	0.122	0.144	0.19	0.317	0.211	0.346
    Loop11	1.14	2.004	1.54	0.797	2.112	2.535	0.44
    Loop2	1.02	0.287	1.746	0.601	1.126	0.919	0.38
    Loop3	0.99	0.179	0.346	0.5	0.348	0.232	0.206
    Loop4	0.99	0.322	0.477	0.361	0.486	0.348	0.354
    Loop5	1.04	0.231	0.217	0.245	0.201	0.249	0.173
    Loop6	1.13	0.354	0.361	0.374	0.428	0.545	0.649
    Loop7	1.09	0.455	0.761	0.561	0.671	0.71	0.603
    Loop8	1.18	2.424	1.568	1.605	3.098	3.694	1.926
    Loop9	1.05	1.229	0.366	0.341	0.408	0.407	0.357
    PC1	1.08	0.88	0.938	0.741	0.803	0.84	0.748
    PC2	1.08	0.807	0.627	0.562	0.682	1.025	0.969
    PN1	0.95	0.471	0.693	0.509	0.75	0.657	0.708
    PN2	1.03	0.55	0.975	0.683	0.985	0.721	0.672
    TM	1.1	0.789	1.299	0.84	1.768	1.608	0.91
    aHelix	1.18	2.29	1.217	1.416	1.463	1.265	1.45
    --Done
    
    
    
    In [11]:
    # We display the previous table restricted to dynamic subdomains (Main output step2-3)
    sblpyt.show_this_text_file("selected_subdomains_lrmsd_u.csv")
    
    ++Showing file selected_subdomains_lrmsd_u.csv
    Subdomain	u	AvsA	AvsB	AvsE	BvsB	BvsE	EvsE
    Loop11	1.14	2.004	1.54	0.797	2.112	2.535	0.44
    Loop2	1.02	0.287	1.746	0.601	1.126	0.919	0.38
    Loop8	1.18	2.424	1.568	1.605	3.098	3.694	1.926
    Loop9	1.05	1.229	0.366	0.341	0.408	0.407	0.357
    TM	1.1	0.789	1.299	0.84	1.768	1.608	0.91
    aHelix	1.18	2.29	1.217	1.416	1.463	1.265	1.45
    --Done
    
    
    
    In [13]:
    # We display the correctness for single subdomains (lRMSD) (Main output step2-4)
    sblpyt.show_this_text_file("table_rmsdc_average_summary.csv")
    
    ++Showing file table_rmsdc_average_summary.csv
    domain;correctness
    rmsdc_TMLoop2;C
    rmsdc_TMLoop8;C
    rmsdc_TMLoop11;C
    rmsdc_Loop2Loop8;C
    rmsdc_Loop2Loop11;C
    rmsdc_Loop8Loop11;C(B) C(E)
    rmsdc_TMLoop2Loop8;C
    rmsdc_TMLoop2Loop11;C
    rmsdc_TMLoop8Loop11;C
    rmsdc_Loop2Loop8Loop11;C(B) C(E)
    rmsdc_TMLoop2Loop8Loop11;C
    --Done
    
    
    
    In [14]:
    # We display the correctness for combined subdomains (RMSDcomb) (Main output step2-5)
    sblpyt.show_this_text_file("table_rmsdc_average_summary.csv")
    
    ++Showing file table_rmsdc_average_summary.csv
    domain;correctness
    rmsdc_TMLoop2;C
    rmsdc_TMLoop8;C
    rmsdc_TMLoop11;C
    rmsdc_Loop2Loop8;C
    rmsdc_Loop2Loop11;C
    rmsdc_Loop8Loop11;C(B) C(E)
    rmsdc_TMLoop2Loop8;C
    rmsdc_TMLoop2Loop11;C
    rmsdc_TMLoop8Loop11;C
    rmsdc_Loop2Loop8Loop11;C(B) C(E)
    rmsdc_TMLoop2Loop8Loop11;C
    --Done
    
    
    
    In [15]:
    # We display an illustrative dendogram with RMSDcomb
    sblpyt.show_image("rmsdc_TMLoop8Loop11/full_matrix_rmsdc_TMLoop8Loop11_average.png")
    

    (Step 3) Characterizing the evolution of interfaces between subdomains along state changes.

    The main input: identical to step 2

    The output consists of:

    • table listing all intra interfaces (csv file)
    In [6]:
    # python3 ./sbl-cradle-step3.py  -d data/cradle_step1/PDB -s data/cradle_step2/spec_files -c data/cradle-input-structures.csv -sf data/cradle-chain-instance-to-state-map.csv -v 1
    
    
    step3_options = Cradle_step3_options("./data/cradle_step1/PDB", "./data/cradle_step2/spec_files", "./data/cradle-input-structures.csv", "./data/cradle-chain-instance-to-state-map.csv", 0)
    step3 = Cradle_step3(step3_options)
    step3.run() 
    
    In [7]:
    # We display the table with all interfaces and their size for each step (see paper for notations) (Main output step3-1) 
    sblpyt.show_this_text_file("median_result_matrix.csv")
    
    ++Showing file median_result_matrix.csv
    ;TM;aHelix;DC;DN;PC1;PC2;PN1;PN2;Loop1;Loop2;Loop3;Loop4;Loop5;Loop6;Loop7;Loop8;Loop9;Loop10;Loop11
    TM;o;211 / 210 / 209;o;o;39 / 36 / 31;0 / 0 / 9;0 / 6 / 12;15 / 22 / 12;145 / 151 / 147;o;o;o;35 / 34 / 35;141 / 138 / 135;154 / 153 / 159;9 / 12 / 0;o;o;70 / 69 / 89
    aHelix;;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o
    DC;;;o;331 / 325 / 325;o;o;50 / 50 / 12;26 / 25 / 24;o;o;30 / 29 / 30;24 / 22 / 23;o;o;o;o;37 / 37 / 30;18 / 17 / 18;o
    DN;;;;o;o;o;o;0 / 5 / 4;o;o;25 / 26 / 25;33 / 35 / 33;o;o;o;o;o;o;o
    PC1;;;;;o;0 / 0 / 45;27 / 25 / 21;108 / 78 / 98;o;21 / 0 / 18;24 / 5 / 18;61 / 58 / 52;44 / 28 / 44;o;128 / 124 / 122;17 / 15 / 47;21 / 35 / 60;14 / 9 / 14;o
    PC2;;;;;;o;106 / 108 / 104;o;o;o;o;o;o;o;0 / 0 / 39;37 / 39 / 45;67 / 65 / 61;26 / 29 / 30;81 / 79 / 76
    PN1;;;;;;;o;34 / 7 / 19;67 / 61 / 63;14 / 19 / 20;o;6 / 7 / 0;o;o;o;0 / 0 / 5;o;32 / 34 / 36;35 / 35 / 34
    PN2;;;;;;;;o;45 / 35 / 36;90 / 74 / 82;92 / 88 / 92;29 / 33 / 34;66 / 67 / 63;o;o;0 / 0 / 5;o;o;o
    Loop1;;;;;;;;;o;7 / 10 / 11;o;o;0 / 10 / 0;o;14 / 13 / 13;35 / 33 / 35;o;o;6 / 5 / 6
    Loop2;;;;;;;;;;o;5 / 0 / 4;o;21 / 21 / 23;o;o;12 / 20 / 33;o;o;o
    Loop3;;;;;;;;;;;o;48 / 31 / 45;o;o;o;o;o;o;o
    Loop4;;;;;;;;;;;;o;o;o;o;o;o;o;o
    Loop5;;;;;;;;;;;;;o;o;25 / 15 / 27;0 / 0 / 6;o;o;o
    Loop6;;;;;;;;;;;;;;o;o;o;o;o;o
    Loop7;;;;;;;;;;;;;;;o;42 / 58 / 35;o;o;0 / 4 / 33
    Loop8;;;;;;;;;;;;;;;;o;o;o;37 / 23 / 42
    Loop9;;;;;;;;;;;;;;;;;o;49 / 46 / 49;o
    Loop10;;;;;;;;;;;;;;;;;;o;o
    Loop11;;;;;;;;;;;;;;;;;;;o
    --Done
    
    
    
    In [ ]: