Correlated_motions_by_domain
Authors: F. Cazals and T. Dreyfus
Studying correlated motions between protein domains
Consider two conformations of a given molecule, denoted 1 and 2. Also assume that this molecule features two domains A and B. Further denote:
- A1: domain A in conformation 1; A2: domain A in conformation 2;
- B1: domain B in conformation 1; B2: domain B in conformation 2;
When switching from conformation 1 to conformation 2, we wish to assess the coherence of the conformational changes undergone by domains A and B.
Assume that the conformational change of domain A (and likewise for B) is modeled by a rigid motion, the is the composition of a translation and of a rotation. That is, denote:
- : the rotation associated with the best rigid motion superimposing A1 onto A2.
- : the rotation associated with the best rigid motion superimposing B1 onto B2.
- : the rotation associated with the best rigid motion superimposing A1 onto A2.
- : the rotation associated with the best rigid motion superimposing B1 onto B2.
In this application, we study the distances between these rotations and translations.
In the general setting, we consider a molecule with n domains, in which case we perform n*(n-1)/2 pairwise comparisons – one comparison for each pair of domains.
Using Correlated_motions_by_domain
In the following, we specify the input/output of the program , as well as the python script script .
Pre-requisites
Domains under the identity alignment.
The default specification of domains is as follows:
- A domain is specified using a list of ranges of a.a. – as a domain is not necessarily connected along the sequence.
- It is assumed that the i-th a.a. of the first structure corresponds to the i-th a.a. of the second structure, and this a.a. is considered if and only if it is present in both structures. In other words, we consider the two structures under the identity alignment, and consider the common a.a. as specified by the domains.
- Because of the previous design choice, a single domain specification file is used for both structures.
We note that this design choice aims at controlling carefully the alignments used before comparing rigid motions. For cases where this constraint can be relaxed, one can proceed as follows.
Domains under a computed alignment.
An alignment can naturally be computed on the fly, in which case this alignment replaces the identity alignment. If so:
- We align the two structures. The following alignment methods are provides: a sequence based alignment using seqan (package Alignment_engines) , or the structure based alignment using Apurva (package Apurva).
- A single domain specification file is used: it directly defines the a.a. of the domains on the first structure, and also on the second structure via the computed alignment.
Metrics.
- Several metrics to compare two rotations exist, see [98] . In the sequel, we use . This distance, which is a number in . It is defined from the matrix Frobenius norm as follows:
- For a molecule with n domain, one expects n*(n-1)/2 pairwise comparisons between domains. However, we omit comparisons between domains involving less than four a.a. – or when the calculation of the optimal rigid motion faces a degeneracy.
Input and output: specifications
Input. An individual calculation requires 5 pieces of information, namely:
- pieces 1 and 2: PDB file + and associated chain for the first conformation,
- pieces 3 and 4: PDB file + and associated chain for the second conformation,
- piece 5: file providing the specification of domains for chains of the first PDB file – see remark above.
Main options. The main options are:
–pdb-file
string: PDB file passed twice ie once for each structure
–load-chains
string: Chain(s) to load passed twice ie once for each structure
–domain-labels
: The domains spec file. Domains are defined as residue sequence number ranges.
As an illustration, the following examples compares domains of the AcrB protein:
sbl-comodo-domainsW-atomic.exe -f data/4dx5-AcrB.pdb --load-chains A -f data/4dx7-AcrB.pdb --load-chains B --domain data/AcrB_rmsdw_monomer_subdomain_withCoils_ABCchains.spect --alignment seqan -v --output-prefix --log -p 4
Similarly, the following examples compares regions of an antibody (CDR, coil regions):
sbl-comodo-domainsW-atomic.exe -f data/IMGT-1A2Y.pdb --load-chains A -f data/IMGT-1A2Y.pdb --load-chains B --domain data/1a2y-imgt-cdrs.txt --alignment seqan -v --output-prefix --log -p 4
Main output. The main output file is an XML file containing the information for all comparisons performed. This file is best parsed with PALSE.
In this XML file, an individual comparison for two domains provides the following pieces of information:
- Domain1: the first domain
- Domain2: the second domain
- lRMSD1: the lRMSD for the pair (A1, A2)
- lRMSD2: the lRMSD for the pair (B1, B2)
- The distance defined by Eq. eq-phi-five
- The norm of the vector
- Distance: the distance between the aforementioned roations
An example tuple generated is the following one:
<item class_id="3" tracking_level="0" version="0">
<Domain1>A_CDR3</Domain1>
<Domain2>A_CDR1</Domain2>
<lRMSD1>3.81154538800104392e+00</lRMSD1>
<lRMSD2>2.26966325341404618e+00</lRMSD2>
<Distance-rotation>1.83275250807724421e+00</Distance-rotation>
<Distance-translation>1.89912193084327114e+01</Distance-translation>
</item>
Python script
We further provide the script , which performs a set of individual calculations. The input file is a text file such that each line specifies
As an example:
data/IMGT-1A2Y.pdb A data/IMGT-1A2Y.pdb B data/1a2y-imgt-cdrs.txt
data/4dx5-AcrB.pdb A data/4dx7-AcrB.pdb B data/AcrB_rmsdw_monomer_subdomain_withCoils_ABCchains.spect
Assuming such a file is named tuples.txt, a calculation is launched as follows:
sbl-comodo.py -f tuples.txt
Algorithms and Methods
The analysis carried out involves the following elementary steps:
- Computing distances between the rotations: implementation of the formulae form [98].
- Retrieving the statistics for all pairwise comparisons: using PALSE.
Programmer's Workflow
Correlated motions workflow:
External dependencies
Linear algebra is carried out using Eigen.