Molecular_cradle

Authors: M. Simsir and F. Cazals

Molecular cradle: a combined analysis based on conformations, states, and rigid blocks

Main goals. A suitable framework to describe certain biomolecules is that of almost rigid blocks whose relative position changes over time. Stable relative positions are often characteristic of certain states of the mechanism studied. More specifically, the notion of state refers to meta-stable state, that is a coherent ensemble of conformations which may be observed experimentally. Understanding the connexion between such states and groups of (sub)domains which characterize them is the goal of this package. In short, in a manner akin to the analysis of Newton's cradle, we wish to study complex dynamics of a biomolecular system by identifying which regions account for specific states, using static structures only.

This endeavor is in particular tractable when an ensemble of (high-resolution) structures is available, as recently formalized in [162] . This package provides three types of analysis to deal with such systems. To describe them, we take the following for granted:

A list of conformations of sequence identical or homologous polypeptide chains. A conformation is also called a chain instance or instance for short. We may also assume that a number of chain instances have a label corresponding to a particular state in the biomolecular mechanism studied. Instances which do not have a label are termed unlabeled.

A template decomposing the polypeptide chain of interest into subdomains. Typically, subdomains are regions connected by linkers. See e.g. the decompositions used in the package Molecular_distances_flexible .

The three analysis provided in this package. These analysis are:

1. Classifying states of unlabeled monomers. This step is a clustering step aiming at identifying groups of coherent structures, with one state per cluster. The clustering defining these states is called the reference clustering.

2. Identifying subdomains compatible with the states of the reference clustering. The goal is to identify those subdomains characteristic of (selected) states.

3. Characterizing the evolution of Voronoi interfaces between subdomains across states. Voronoi interfaces characterize non covalent interactions between molecules or domains. The goal is to identify interfaces characteristic of (selected) states. We therefore compare interface sizes between subdomains across states.

These steps are implemented by three scripts, called $\text{\codecx{sbl-cradle-step1.py}}$ , $\text{\codecx{sbl-cradle-step2.py}}$ and $\text{\codecx{sbl-cradle-step3.py}}$ respectively.

Case study: AcrB. As an illustration, we study AcrB, a trimeric membrane protein member of the resistance-nodulation-cell-division superfamily, involved in the active transport of a variety of molecules [162] . Numerous crystal structures of AcrB have been obtained, from which a complex mechanism akin to a peristaltic pump has been inferred. Along this mechanism, which uses the protomotive force, monomers cycle across three states denoted Access (A), Binding (B), and Extrusion (E). See The reader is referred to [162] for a comprehensive bibliography and results obtained with all crystal structures known to date.

General pre-requisites

Molecular distances

The methods provided in this package are suitable in two settings, for which different molecular distances are used:

Case 1: all chain instances share the same primary sequence: the molecular distance used is the classical least RMSD (lRMSD), or possibly the or the combined RMSD [40] ( $\text{\rmsdcomb}$ ), which mixes lRMSD of individual domains or subdomains.

In that case, the executable used is:

$\text{\sblrmsdcombconf}$ : comparing two conformations of the same polypeptide chain, so that amino acids are numbered identically for all chains. (Nb: if selected a.a. are unresolved in one structure or the other, only those a.a. present in both structures are used.)

The executable is provided by the package Molecular_distances_flexible .

Case 2: chain instances are homologous: computing a molecular distance requires performing an alignment. Two options are proposed:

$\text{\sblrmsdcombprotkpax}$ : computes the alignment with the $\text{\kpax}$ aligner from the package Iterative_alignment;

$\text{\sblrmsdcombprotapurva}$ : uses the $\text{\apurva}$ combinatorial alignment method from the package Apurva .

In the Structural Bioinformatics Library, all molecular distance calculations for proteins can be carried out at four levels: Calpha atoms, backbone atoms, heavy atoms, all atoms – see Molecular_distances_flexible . Since the cradle package makes it possible to handle homologous proteins, the Calpha option is retained.

Mean displacement and the significance of lRMSDs

Consider a (sub)domain of a chain which is known to adopt several states in a mechanism. Given a set of instances of this (sub)domain, for different states of the containing monomer, several lRMSD comparison can be undertaken, as introduced in [162] :

Intra state comparison: distribution of the lRMSD observed for all pairs of instances of (sub)domains within a state.

Inter state comparison: distribution of the lRMSD observed for all pairs of (sub)domains in the Cartesian product of the two states.

On the other hand, in a crystal structure, atomic oscillation amplitudes $\overline{u}$ are related to B-factors by the formula $B = 8\pi^2 \overline{u}^2$ .

Comparing mean displacements against lRMSD of (intra, inter) state comparisons then allows to single out those comparisons which are positive – lRMSD larger than the mean displacement.

Based on these quantities, a classification of subdomains as static, dynamic or unstable is proposed in [162] .

For example, a subdomain which not exhibit any positive intra state comparison, but has at least one positive inter state comparison, is termed dynamic. (Nb: a sufficient but not necessary condition, see [162] .) Such subdomains are used in the second step.

Voronoi interfaces

A molecular space filling model (SFM) is a molecular representation with one ball per atom. The SFM can be partitioned by a so-called Voronoi diagram, which assigns one Voronoi cell to each atom. Furthermore, one can compute the restriction of each atomic ball, as the intersection between the ball and its Voronoi region.

Consider now two domains or subdomains in a molecule. An interfacial pair for these two (sub)domains is a pair of atoms, one on each (sub)domain, such that their Voronoi restriction are in contact – their share a so-called Voronoi facet. The interface size between these two (sub)domains is then defined as the number of interfacial atoms.

In this package, we focus in particular on interfaces between pairs of subdomains, which gives an indication on their spatial proximity as a function of the state considered.

In practice, Voronoi interfaces are computed using the package Space_filling_model_interface_finder .

Using Molecular_cradle: script sbl-cradle-step1.py

Pre-requisites

Reference clustering and chain-to-state-mapping. In step 1, we compute a hierarchical clustering of the chain instances using one of the aforementioned molecular distances. The output is a dendogram, for which several linkage options can be used (see options below). It is up to the user to decide whether this clustering yields well separated clusters. If so, a chain-to-state id mapping can be defined, providing a state id for each chain instance. See the illustration below.

The state id of a chain is the character string providing the corresponding state for that chain. (Nb: the number of different strings must equal the number of clusters in the reference clustering.)

Protein decomposition template specification. This template decomposes a chain into subdomains, using the following format:

polypeptide chain and its hierarchical decomposition specified by a triple:
- protein name (below: protein-name)
- domain name (below: domain). nb: use eg whole if the whole protein is considered
- subdomains, each specified by a list of integer intervals. See example below.
The template, one per chain, is defined in a spec file whose name follows the following convention:
- protein-name_rmsd-type_domain_chain-id.spec
See example below.

Here the spec file for the whole chain of AcrB:

#the first line starts the template and give it a name
domains-template-begin AcrB
 
#the following lines contain: the name of the label, then the ranges
#of residues corresponding to this label (including the bounds)
Whole 1-1044
 
 
#the star denotes the complementary, i.e all residues not
#mentionned before in the template
# Coil is not taken into account here to maximize interest of flexible RMSD
#COIL    **
 
#terminates the template
end
 
#enumerates the chains and possibly associates template to them
chains-enumeration-begin
A like AcrB
end
 
#groups hierarchically the chains 
chains-hierarchy-begin
M1 A
end
 

The decomposition template is optional for step 1 since the clustering is carried out by default on whole molecules by default. If a decomposition template with at least two subdomains is provided, the global lRMSD is replaced by the $\text{\rmsdcomb}$ of these subdomains. The file below provides such a specification, for the so-called Coil2 subdomain of AcrB.

The (sub)domain specification in these spec files must use the exact same name, even for two different proteins.

#the first line starts the template and give it a name
domains-template-begin AcrB
 
#the following lines contain: the name of the label, then the ranges
#of residues corresponding to this label (including the bounds)
Coil2 132-137
 
#the star denotes the complementary, i.e all residues not
#mentionned before in the template
# Coil is not taken into account here to maximize interest of flexible RMSD
#COIL    **
 
#terminates the template
end
 
#enumerates the chains and possibly associates template to them
chains-enumeration-begin
A like AcrB
end
 
#groups hierarchically the chains 
chains-hierarchy-begin
M1 A
end

Protein database specification. The database of structures processed specifies in a file (csv) format the chain instances processed. The database specification is based on (PDB id, protein name, chains id(s)) provided in a csv file. Note that the previous decomposition template is applied to each chain.

Example:

pdb;protein;chains
2dr6;AcrB;ABC
3aoa;AcrB;ABC
3w9h;AcrB;ABC
4dx5;AcrB;ABC
4zit;AcrB;ABCDEF

Optional: chain-to-state map file. It may happen that the state of selected (but not all) chains is known. If so, the mapping chain-to-state is specified in a text file (csv format). For the sake of presentation, it is also requested to specify one color per state, to be used to display the leaves of the dendogram produced by the hierarchical clustering.

The number of entries in this file is at most the number of structures in the database. A state whose chain/color is not specified is display in black in the clustering.

The following file illustrate this mapping for AcrB, using the three states A, B, and E:

chain-to-state mapping

PDB;state;color
3w9h_A;A;red
4dx5_A;A;red
4zit_A;A;red
2dr6_C;A;red
3aoa_C;A;red
4zit_D;A;red
2dr6_A;B;cyan
3aoa_A;B;cyan
3w9h_B;B;cyan
4dx5_B;B;cyan
4zit_B;B;cyan
4zit_E;B;cyan
3w9h_C;E;green
4dx5_C;E;green
4zit_C;E;green
2dr6_B;E;green
3aoa_B;E;green
4zit_F;E;green

Input of the script

The main options of the program sbl-cradle-step1.py are:
(-d, –idir) string: directory containing all input files (required)
(-x, –exe_type) string: alignment type used in case of multiple protein comparisons (default: kpax)
(-c, –csv_file_info) string: CSV file containing the database specification (pdb;Protein;chains) (required)
(-sf, –statefile) string: CSV file providing the chain-to-state mapping (pdb_chain;state id) (optional)
(-l, –linkage) string: Linkage type for the hierarchical clustering: "average" (default), "single", "complete", "ward"
(-w, –workflow) string(= Workflow type: lRMSD computation (computation), clustering (analysis), both (both): default)

The script is called as follows:

sbl-cradle-step1.py  -d data/cradle_step1 -x kpax -c data/cradle-input-structures.csv -sf data/cradle-chain-instance-to-state-map.csv

Output of the script

Main output generated are:

(Main output step1-1) Dendogram generated by the hierarchical clustering. As noted above, the leaves are colored by states when the chain-to-sate mapping is provided.

(Main output step1-2) Aggregated matrix with all pairwise distances ( $\text{\rmsd}$ or $\text{\rmsdcomb}$ )

(Main output step1-3) Aggregated matrix with size of the alignments which accompany the distance calculation. This file is of special interest if homologous chains are compared, to make sure that comparable numbers of a.a. are used in all distances calculations.

The comparison of molecular distances should be accompanied by a check of the number of residues involved, as a small alignment size typically results in a smaller distance. This is particularly critical when comparing homologous proteins.

Using Molecular_cradle: script sbl-cradle-step2.py

Pre-requisites

Recall that step 2 aim to identify subdomains which are coherent with the reference clustering. Two pieces of information are required to do so:

decomposition template for each chain,

chain-to-state mapping for all chains.

For complex molecules, it is advised to carry out this step for conformations of the same chain only. Indeed, homologous proteins may not have the same dynamic subdomains.

Input of the script

The main options of the program sbl-cradle-step2.py are:
(-d, –pdbdir) string: directory containing input pdb files (required)
(-s, –specdir) string: directory containing input spec files (required)
(-x, –exe_type) string: alignment type used in case of multiple protein comparisons (default: kpax)
(-c, –csv_file_info) string: CSV file containing the database specification (pdb;Protein;chains) (required)
(-sf, –statefile) string: CSV file providing the chain-to-state mapping (pdb_chain;state id) (optional)
(-l, –linkage) string: Linkage type for the hierarchical clustering: "average" (default), "single", "complete", "ward"
(-w, –workflow) string: Workflow type: lRMSD computation (computation), clustering (analysis), both (both, default)

Output of the script

Main output generated are:

(Main output step2-1) Mean displacement for each subdomain.

(Main output step2-2) For each subdomain: table comparing the mean displacement against the inter-states and intra-states lRMSD.

(Main output step2-3) Table listing the dynamic subdomains.

(Main output step2-4) Table summarizing the correctness of the clusterings obtained for each subdomain under the $\text{\lrmsd}$ .

(Main output step2-5) Table summarizing the correctness of clusterings obtained for subsets of dynamics subdomains under $\text{\rmsdcomb}$ .

Using Molecular_cradle: script sbl-cradle-step3.py

Pre-requisites

Recall that step 3 aims at characterizing the stability/evolution of interfaces between subdomains when changing states. The pre-requisites are identical to those of step 2.

For complex molecules, it is advised to carry out this step for conformations of the same chain only. Indeed, homologous proteins may not have identical interfaces between subdomains.

Input of the script

The main options of the program sbl-cradle-step3.py are:
(-d, –pdbdir) string: directory containing input pdb files (required)
(-s, –specdir) string: directory containing input spec files (required)
(-c, –csv_file_info) string: CSV file containing the database specification (pdb;Protein;chains) (required)
(-sf, –statefile) string: CSV file providing the chain-to-state mapping (pdb_chain;state id) (optional)

Output of the script

Main output generated are:

(Main output step3-1) Matrix of interface sizes between subdomains. Since the matrix is symmetric, the upper triangular part only is filled. For a given pair, three values are reported, one per state; the value for a state is the median value of the interface size (number of interfacial atoms) observed for all interfaces between the two domains of interest, in a given state.

Using Molecular_cradle: script sbl-cradle-step3.py

Algorithms and Methods

The analysis provided by the previous scripts involve the following three steps:

Step 1 : Classifying states of unlabeled monomers. The main classes are:

SBL::Molecular_cradle_step1::Cradle_step1 : step 1 implementation, based on the classes below.
SBL::Molecular_cradle_utils_distance::Distance_calculator : class performing the all pairwise distance calculations, with the choice of the proper distance to be used, as discussed in section General pre-requisites .
SBL::Molecular_cradle_utils_dendogram::Matrix_processor : a utility class to collect the distance values of pairwise calculations and aggregate them into the final matrix.
SBL::Molecular_cradle_utils_dendogram::Dendogram_processor : computes the hierarchical clustering and the associated dendogram, and possibly checks the coherence with the reference clustering.

Step 2 : Identifying subdomains compatible with the states of the reference clustering. The main classes are:

SBL::Molecular_cradle_step2::Cradle_step2: step 2 implementation, based on the classes below.
SBL::Molecular_cradle_utils_label_selection::Label_selector: computation of mean displacements, of $\text{\rmsdcomb}$ , and comparison of the two. Perform inter and intra comparisons, to finally label subdomains as stable, dynamic, or unstable.
SBL::Molecular_cradle_step2_selected::Cradle_step2_selected: computes pairwise lRMSD s for individual dynamic subdomain, computes clustering using subsets of these, and finally checks their coherence with the reference clustering. These three steps uses the classes SBL::Molecular_cradle_utils_distance::Distance_calculator, as well as SBL::Molecular_cradle_utils_dendogram::Matrix_processor (see above) and SBL::Molecular_cradle_utils_dendogram::Dendogram_processor (see above).

Step 3 : Characterizing the evolution of Voronoi interfaces between subdomains. The main class is:

SBL::Molecular_cradle_step3::Cradle_step3: computes Voronoi interfaces and assembles the results into the final matrix.

Dependencies

Packages from the SBL.

Computing molecular distances: package Molecular_distances_flexible; executables: $\text{\sblrmsdcombconf}$ , $\text{\sblrmsdcombprotkpax}$ , and $\text{\sblrmsdcombprotapurva}$ .

Computing Voronoi interfaces: package Space_filling_model_interface; executable: $\text{\sblintervorabw}$ .

Other packages.

Handling PDB files was done using Biopython PDB:

Biopython Structural Bioinformatics and [51].

Scipy packages:
hierarchical clustering ; spatial distance .

Installation. To use the aforementioned three scripts:

Make sure all executables used are visible from one's PATH environment variable. The executables are $\text{\sblrmsdcombconf}$ , $\text{\sblrmsdcombprotkpax}$ , $\text{\sblintervorabw}$ .

Jupyter demo

See the following jupyter notebook:

Jupyter notebook file
Molecular_cradle
Molecular_cradle¶
This jupyter notebook illustrates the three steps of our workflow, which are

Classifying states of unlabeled monomers,

Identifying subdomains compatible with states ABE,

Characterizing the evolution of interfaces between subdomains along state changes.

For the sake of execution time, the notebook uses solely 5 PDB files containing a total of 18 monomers.

Performing all pairwise structural comparisons using kpax is indeed a costly operation. The notebook runs within <10' on a laptop computer.
In [1]:

# Loading required python libraries from SBL import Molecular_cradle_step1 from Molecular_cradle_step1 import * from SBL import Molecular_cradle_step2 from Molecular_cradle_step2 import * from SBL import Molecular_cradle_step3 from Molecular_cradle_step3 import * from SBL import SBL_pytools from SBL_pytools import SBL_pytools as sblpyt
(Step 1) Classifying states of unlabeled monomers¶
The main input consists of:

data: the directory containing PDB and spec files

cradle-input-structures.csv: database of structures

kpax: name of the method used to perform structural alignemnts

cradle-chain-instance-to-state-map.csv: the chain-to-state mapping

average: average linkage used to perform the hierarchical clustering

The output consists of:

dendogram, lRMSD based, showing the three known clusters associated to the states A, B, and E (png file)

the matrix of pairwise distances from which the dendogram is built (csv file)
In [2]:

# python3 ./sbl-cradle-step1.py -d data/cradle_step1 -c data/cradle-input-structures.csv -x kpax -sf data/cradle-chain-instance-to-state-map.csv -w both -l average -v 1 step1_options = Cradle_step1_options("./data/cradle_step1", "./data/cradle-input-structures.csv", "kpax", "./data/cradle-chain-instance-to-state-map.csv", "both", "average", 0) step1 = Cradle_step1(step1_options) step1.run()
lrmsd_whole
In [3]:

# We display the dendogram (Main output step1-1) sblpyt.show_image("lrmsd_whole/full_matrix_lrmsd_whole_average.png")
In [4]:

# We display the pairwise distance matrix (Main output step1-2) sblpyt.show_this_text_file("lrmsd_whole/full_matrix_lrmsd_whole.csv")
++Showing file lrmsd_whole/full_matrix_lrmsd_whole.csv 2dr6_A 2dr6_B 2dr6_C 3aoa_A 3aoa_B 3aoa_C 3w9h_A 3w9h_B 3w9h_C 4dx5_A 4dx5_B 4dx5_C 4zit_A 4zit_B 4zit_C 4zit_D 4zit_E 4zit_F 2dr6_A 0.000 3.065 2.358 0.686 3.133 2.364 2.223 0.834 2.946 2.384 0.980 2.994 2.072 1.026 3.117 1.877 1.056 3.065 2dr6_B 3.065 0.000 3.286 3.137 0.820 3.278 3.205 3.078 0.903 3.319 3.249 0.932 3.068 3.149 1.123 3.056 3.127 1.234 2dr6_C 2.358 3.286 0.000 2.395 3.348 0.700 0.975 2.317 3.238 1.066 2.435 3.216 1.283 2.375 3.441 1.468 2.390 3.356 3aoa_A 0.686 3.137 2.395 0.000 3.174 2.397 2.237 0.807 3.025 2.387 0.940 3.070 2.064 0.941 3.174 1.842 0.965 3.106 3aoa_B 3.133 0.820 3.348 3.174 0.000 3.331 3.251 3.135 0.999 3.358 3.314 1.029 3.091 3.176 1.190 3.081 3.166 1.282 3aoa_C 2.364 3.278 0.700 2.397 3.331 0.000 0.991 2.318 3.230 1.116 2.429 3.211 1.314 2.378 3.420 1.487 2.398 3.334 3w9h_A 2.223 3.205 0.975 2.237 3.251 0.991 0.000 2.142 3.137 0.652 2.273 3.111 0.952 2.193 3.325 1.192 2.224 3.230 3w9h_B 0.834 3.078 2.317 0.807 3.135 2.318 2.142 0.000 2.949 2.284 0.726 2.998 1.942 0.770 3.116 1.730 0.830 3.059 3w9h_C 2.946 0.903 3.238 3.025 0.999 3.230 3.137 2.949 0.000 3.241 3.131 0.611 2.953 3.023 0.890 2.944 3.001 1.015 4dx5_A 2.384 3.319 1.066 2.387 3.358 1.116 0.652 2.284 3.241 0.000 2.406 3.188 1.053 2.350 3.461 1.356 2.355 3.359 4dx5_B 0.980 3.249 2.435 0.940 3.314 2.429 2.273 0.726 3.131 2.406 0.000 3.160 2.060 0.759 3.285 1.798 0.843 3.220 4dx5_C 2.994 0.932 3.216 3.070 1.029 3.211 3.111 2.998 0.611 3.188 3.160 0.000 2.947 3.069 0.820 2.949 3.037 0.952 4zit_A 2.072 3.068 1.283 2.064 3.091 1.314 0.952 1.942 2.953 1.053 2.060 2.947 0.000 1.907 3.174 0.767 2.000 3.081 4zit_B 1.026 3.149 2.375 0.941 3.176 2.378 2.193 0.770 3.023 2.350 0.759 3.069 1.907 0.000 3.186 1.683 0.601 3.124 4zit_C 3.117 1.123 3.441 3.174 1.190 3.420 3.325 3.116 0.890 3.461 3.285 0.820 3.174 3.186 0.000 3.186 3.153 0.590 4zit_D 1.877 3.056 1.468 1.842 3.081 1.487 1.192 1.730 2.944 1.356 1.798 2.949 0.767 1.683 3.186 0.000 1.787 3.095 4zit_E 1.056 3.127 2.390 0.965 3.166 2.398 2.224 0.830 3.001 2.355 0.843 3.037 2.000 0.601 3.153 1.787 0.000 3.107 4zit_F 3.065 1.234 3.356 3.106 1.282 3.334 3.230 3.059 1.015 3.359 3.220 0.952 3.081 3.124 0.590 3.095 3.107 0.000 --Done
In [5]:

# We display the alignement size matrix (Main output step1-3) sblpyt.show_this_text_file("lrmsd_whole/alignment-size_lrmsd_whole.csv")
++Showing file lrmsd_whole/alignment-size_lrmsd_whole.csv 2dr6_A 2dr6_B 2dr6_C 3aoa_A 3aoa_B 3aoa_C 3w9h_A 3w9h_B 3w9h_C 4dx5_A 4dx5_B 4dx5_C 4zit_A 4zit_B 4zit_C 4zit_D 4zit_E 4zit_F 2dr6_A 1022 1022 1022 1022 1022 1022 1019 1019 1019 1022 1019 1019 1021 1021 1021 1021 1021 1021 2dr6_B 1022 1022 1022 1022 1022 1022 1019 1019 1019 1022 1019 1019 1021 1021 1021 1021 1021 1021 2dr6_C 1022 1022 1022 1022 1022 1022 1019 1019 1019 1022 1019 1019 1021 1021 1021 1021 1021 1021 3aoa_A 1022 1022 1022 1022 1022 1022 1019 1019 1019 1022 1019 1019 1021 1021 1021 1021 1021 1021 3aoa_B 1022 1022 1022 1022 1022 1022 1019 1019 1019 1022 1019 1019 1021 1021 1021 1021 1021 1021 3aoa_C 1022 1022 1022 1022 1022 1022 1019 1019 1019 1022 1019 1019 1021 1021 1021 1021 1021 1021 3w9h_A 1019 1019 1019 1019 1019 1019 1033 1033 1033 1033 1033 1033 1032 1032 1032 1032 1032 1032 3w9h_B 1019 1019 1019 1019 1019 1019 1033 1033 1033 1033 1033 1033 1032 1032 1032 1032 1032 1032 3w9h_C 1019 1019 1019 1019 1019 1019 1033 1033 1033 1033 1033 1033 1032 1032 1032 1032 1032 1032 4dx5_A 1022 1022 1022 1022 1022 1022 1033 1033 1033 1044 1033 1033 1042 1043 1042 1042 1042 1042 4dx5_B 1019 1019 1019 1019 1019 1019 1033 1033 1033 1033 1033 1033 1032 1032 1032 1032 1032 1032 4dx5_C 1019 1019 1019 1019 1019 1019 1033 1033 1033 1033 1033 1033 1032 1032 1032 1032 1032 1032 4zit_A 1021 1021 1021 1021 1021 1021 1032 1032 1032 1042 1032 1032 1042 1042 1042 1042 1042 1042 4zit_B 1021 1021 1021 1021 1021 1021 1032 1032 1032 1043 1032 1032 1042 1043 1042 1042 1042 1042 4zit_C 1021 1021 1021 1021 1021 1021 1032 1032 1032 1042 1032 1032 1042 1042 1042 1042 1042 1042 4zit_D 1021 1021 1021 1021 1021 1021 1032 1032 1032 1042 1032 1032 1042 1042 1042 1042 1042 1042 4zit_E 1021 1021 1021 1021 1021 1021 1032 1032 1032 1042 1032 1032 1042 1042 1042 1042 1042 1042 4zit_F 1021 1021 1021 1021 1021 1021 1032 1032 1032 1042 1032 1032 1042 1042 1042 1042 1042 1042 --Done
(Step 2) Identifying subdomains compatible with states ABE¶
The main input consists of:

PDB structures and the associated pieces of information (see step 1)

the spec files

chain instance to state map (see step 1)

the linkage type (see step 1)

alignment method (see step 1)

The output consists of:

the mean displacement for each subdomain (csv file)

values obtained for the intra state and inter state comparisons (csv file)

dynamic sub-domains found (csv file)

table providing the correctness of clusterings obtained with lRMSD and RMSDcomb (csv file)

dendogram of RMSDcomb with dynamic subdomains (png file)
In [8]:

# python3 ./sbl-cradle-step2.py -d data/cradle_step1/PDB -s data/cradle_step2/spec_files -c data/cradle-input-structures.csv -x kpax -sf data/cradle-chain-instance-to-state-map.csv -l average -v 1 step2_options = Cradle_step2_options("./data/cradle_step1/PDB", "./data/cradle_step2/spec_files", "./data/cradle-input-structures.csv", "kpax", "./data/cradle-chain-instance-to-state-map.csv", "average", 0) step2 = Cradle_step2(step2_options) step2.run()
rmsdc_subdomains lrmsd_TM lrmsd_aHelix lrmsd_Loop2 lrmsd_Loop8 lrmsd_Loop9 lrmsd_Loop11 rmsdc_TMLoop2 rmsdc_TMLoop8 rmsdc_TMLoop11 rmsdc_Loop2Loop8 rmsdc_Loop2Loop11 rmsdc_Loop8Loop11 rmsdc_TMLoop2Loop8 rmsdc_TMLoop2Loop11 rmsdc_TMLoop8Loop11 rmsdc_Loop2Loop8Loop11 rmsdc_TMLoop2Loop8Loop11
In [9]:

# We display the mean displacement for each subdomain (Main output step2-1) sblpyt.show_this_text_file("mean_displacement.csv")
++Showing file mean_displacement.csv DC 1.0127 DN 1.0230 Loop1 1.0325 Loop10 1.0189 Loop11 1.1413 Loop2 1.0246 Loop3 0.9940 Loop4 0.9955 Loop5 1.0403 Loop6 1.1378 Loop7 1.0903 Loop8 1.1868 Loop9 1.0529 PC1 1.0855 PC2 1.0877 PN1 0.9510 PN2 1.0374 TM 1.1000 aHelix 1.1870 --Done
In [10]:

# We display the values for intra and inter subdomain comparisons. (Main output step2-2) # u : mean displacement, AvsA lrmsd or A versus A, ... sblpyt.show_this_text_file("subdomains_lrmsd_u.csv")
++Showing file subdomains_lrmsd_u.csv Subdomain u AvsA AvsB AvsE BvsB BvsE EvsE DC 1.01 0.56 0.571 0.545 0.624 0.649 0.549 DN 1.02 0.658 0.778 0.716 0.776 0.822 0.699 Loop1 1.03 0.637 0.884 0.477 0.695 0.892 0.342 Loop10 1.01 0.122 0.144 0.19 0.317 0.211 0.346 Loop11 1.14 2.004 1.54 0.797 2.112 2.535 0.44 Loop2 1.02 0.287 1.746 0.601 1.126 0.919 0.38 Loop3 0.99 0.179 0.346 0.5 0.348 0.232 0.206 Loop4 0.99 0.322 0.477 0.361 0.486 0.348 0.354 Loop5 1.04 0.231 0.217 0.245 0.201 0.249 0.173 Loop6 1.13 0.354 0.361 0.374 0.428 0.545 0.649 Loop7 1.09 0.455 0.761 0.561 0.671 0.71 0.603 Loop8 1.18 2.424 1.568 1.605 3.098 3.694 1.926 Loop9 1.05 1.229 0.366 0.341 0.408 0.407 0.357 PC1 1.08 0.88 0.938 0.741 0.803 0.84 0.748 PC2 1.08 0.807 0.627 0.562 0.682 1.025 0.969 PN1 0.95 0.471 0.693 0.509 0.75 0.657 0.708 PN2 1.03 0.55 0.975 0.683 0.985 0.721 0.672 TM 1.1 0.789 1.299 0.84 1.768 1.608 0.91 aHelix 1.18 2.29 1.217 1.416 1.463 1.265 1.45 --Done
In [11]:

# We display the previous table restricted to dynamic subdomains (Main output step2-3) sblpyt.show_this_text_file("selected_subdomains_lrmsd_u.csv")
++Showing file selected_subdomains_lrmsd_u.csv Subdomain u AvsA AvsB AvsE BvsB BvsE EvsE Loop11 1.14 2.004 1.54 0.797 2.112 2.535 0.44 Loop2 1.02 0.287 1.746 0.601 1.126 0.919 0.38 Loop8 1.18 2.424 1.568 1.605 3.098 3.694 1.926 Loop9 1.05 1.229 0.366 0.341 0.408 0.407 0.357 TM 1.1 0.789 1.299 0.84 1.768 1.608 0.91 aHelix 1.18 2.29 1.217 1.416 1.463 1.265 1.45 --Done
In [13]:

# We display the correctness for single subdomains (lRMSD) (Main output step2-4) sblpyt.show_this_text_file("table_rmsdc_average_summary.csv")
++Showing file table_rmsdc_average_summary.csv domain;correctness rmsdc_TMLoop2;C rmsdc_TMLoop8;C rmsdc_TMLoop11;C rmsdc_Loop2Loop8;C rmsdc_Loop2Loop11;C rmsdc_Loop8Loop11;C(B) C(E) rmsdc_TMLoop2Loop8;C rmsdc_TMLoop2Loop11;C rmsdc_TMLoop8Loop11;C rmsdc_Loop2Loop8Loop11;C(B) C(E) rmsdc_TMLoop2Loop8Loop11;C --Done
In [14]:

# We display the correctness for combined subdomains (RMSDcomb) (Main output step2-5) sblpyt.show_this_text_file("table_rmsdc_average_summary.csv")
++Showing file table_rmsdc_average_summary.csv domain;correctness rmsdc_TMLoop2;C rmsdc_TMLoop8;C rmsdc_TMLoop11;C rmsdc_Loop2Loop8;C rmsdc_Loop2Loop11;C rmsdc_Loop8Loop11;C(B) C(E) rmsdc_TMLoop2Loop8;C rmsdc_TMLoop2Loop11;C rmsdc_TMLoop8Loop11;C rmsdc_Loop2Loop8Loop11;C(B) C(E) rmsdc_TMLoop2Loop8Loop11;C --Done
In [15]:

# We display an illustrative dendogram with RMSDcomb sblpyt.show_image("rmsdc_TMLoop8Loop11/full_matrix_rmsdc_TMLoop8Loop11_average.png")
(Step 3) Characterizing the evolution of interfaces between subdomains along state changes.¶
The main input: identical to step 2

The output consists of:

table listing all intra interfaces (csv file)
In [6]:

# python3 ./sbl-cradle-step3.py -d data/cradle_step1/PDB -s data/cradle_step2/spec_files -c data/cradle-input-structures.csv -sf data/cradle-chain-instance-to-state-map.csv -v 1 step3_options = Cradle_step3_options("./data/cradle_step1/PDB", "./data/cradle_step2/spec_files", "./data/cradle-input-structures.csv", "./data/cradle-chain-instance-to-state-map.csv", 0) step3 = Cradle_step3(step3_options) step3.run()
In [7]:

# We display the table with all interfaces and their size for each step (see paper for notations) (Main output step3-1) sblpyt.show_this_text_file("median_result_matrix.csv")
++Showing file median_result_matrix.csv ;TM;aHelix;DC;DN;PC1;PC2;PN1;PN2;Loop1;Loop2;Loop3;Loop4;Loop5;Loop6;Loop7;Loop8;Loop9;Loop10;Loop11 TM;o;211 / 210 / 209;o;o;39 / 36 / 31;0 / 0 / 9;0 / 6 / 12;15 / 22 / 12;145 / 151 / 147;o;o;o;35 / 34 / 35;141 / 138 / 135;154 / 153 / 159;9 / 12 / 0;o;o;70 / 69 / 89 aHelix;;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o;o DC;;;o;331 / 325 / 325;o;o;50 / 50 / 12;26 / 25 / 24;o;o;30 / 29 / 30;24 / 22 / 23;o;o;o;o;37 / 37 / 30;18 / 17 / 18;o DN;;;;o;o;o;o;0 / 5 / 4;o;o;25 / 26 / 25;33 / 35 / 33;o;o;o;o;o;o;o PC1;;;;;o;0 / 0 / 45;27 / 25 / 21;108 / 78 / 98;o;21 / 0 / 18;24 / 5 / 18;61 / 58 / 52;44 / 28 / 44;o;128 / 124 / 122;17 / 15 / 47;21 / 35 / 60;14 / 9 / 14;o PC2;;;;;;o;106 / 108 / 104;o;o;o;o;o;o;o;0 / 0 / 39;37 / 39 / 45;67 / 65 / 61;26 / 29 / 30;81 / 79 / 76 PN1;;;;;;;o;34 / 7 / 19;67 / 61 / 63;14 / 19 / 20;o;6 / 7 / 0;o;o;o;0 / 0 / 5;o;32 / 34 / 36;35 / 35 / 34 PN2;;;;;;;;o;45 / 35 / 36;90 / 74 / 82;92 / 88 / 92;29 / 33 / 34;66 / 67 / 63;o;o;0 / 0 / 5;o;o;o Loop1;;;;;;;;;o;7 / 10 / 11;o;o;0 / 10 / 0;o;14 / 13 / 13;35 / 33 / 35;o;o;6 / 5 / 6 Loop2;;;;;;;;;;o;5 / 0 / 4;o;21 / 21 / 23;o;o;12 / 20 / 33;o;o;o Loop3;;;;;;;;;;;o;48 / 31 / 45;o;o;o;o;o;o;o Loop4;;;;;;;;;;;;o;o;o;o;o;o;o;o Loop5;;;;;;;;;;;;;o;o;25 / 15 / 27;0 / 0 / 6;o;o;o Loop6;;;;;;;;;;;;;;o;o;o;o;o;o Loop7;;;;;;;;;;;;;;;o;42 / 58 / 35;o;o;0 / 4 / 33 Loop8;;;;;;;;;;;;;;;;o;o;o;37 / 23 / 42 Loop9;;;;;;;;;;;;;;;;;o;49 / 46 / 49;o Loop10;;;;;;;;;;;;;;;;;;o;o Loop11;;;;;;;;;;;;;;;;;;;o --Done
In [ ]:

Table of Contents

Molecular_cradle

Molecular cradle: a combined analysis based on conformations, states, and rigid blocks

General pre-requisites

Molecular distances

Mean displacement and the significance of lRMSDs

Voronoi interfaces

Using Molecular_cradle: script sbl-cradle-step1.py

Pre-requisites

Input of the script

Output of the script

Using Molecular_cradle: script sbl-cradle-step2.py

Pre-requisites

Input of the script

Output of the script

Using Molecular_cradle: script sbl-cradle-step3.py

Pre-requisites

Input of the script

Output of the script

Using Molecular_cradle: script sbl-cradle-step3.py

Algorithms and Methods

Dependencies

Jupyter demo

Molecular_cradle¶

(Step 1) Classifying states of unlabeled monomers¶

(Step 2) Identifying subdomains compatible with states ABE¶

(Step 3) Characterizing the evolution of interfaces between subdomains along state changes.¶