Correlated_motions_by_domain

Authors: F. Cazals and T. Dreyfus

Studying correlated motions between protein domains

Consider two conformations of a given molecule, denoted 1 and 2. Also assume that this molecule features two domains A and B. Further denote:

A1: domain A in conformation 1; A2: domain A in conformation 2;
B1: domain B in conformation 1; B2: domain B in conformation 2;

When switching from conformation 1 to conformation 2, we wish to assess the coherence of the conformational changes undergone by domains A and B.

Assume that the conformational change of domain A (and likewise for B) is modeled by a rigid motion, the is the composition of a translation and of a rotation. That is, denote:

$R_{A1-A2}$ : the rotation associated with the best rigid motion superimposing A1 onto A2.

$R_{B1-B2}$ : the rotation associated with the best rigid motion superimposing B1 onto B2.

$T_{A1-A2}$ : the rotation associated with the best rigid motion superimposing A1 onto A2.

$T_{B1-B2}$ : the rotation associated with the best rigid motion superimposing B1 onto B2.

In this application, we study the distances between these rotations and translations.

In the general setting, we consider a molecule with n domains, in which case we perform n*(n-1)/2 pairwise comparisons – one comparison for each pair of domains.

Using Correlated_motions_by_domain

In the following, we specify the input/output of the program $\text{sbl-comodo-domainsW-atomic.exe}$ , as well as the python script script $\text{sbl-comodo.py}$ .

Pre-requisites

Domains under the identity alignment.

The default specification of domains is as follows:

A domain is specified using a list of ranges of a.a. – as a domain is not necessarily connected along the sequence.

It is assumed that the i-th a.a. of the first structure corresponds to the i-th a.a. of the second structure, and this a.a. is considered if and only if it is present in both structures. In other words, we consider the two structures under the identity alignment, and consider the common a.a. as specified by the domains.

Because of the previous design choice, a single domain specification file is used for both structures.

We note that this design choice aims at controlling carefully the alignments used before comparing rigid motions. For cases where this constraint can be relaxed, one can proceed as follows.

As an illustration of the previous design choice, consider the following case: one wishes to compare chain A from pdb1 against chain D from pdb2. We assume that the domain specification file defines domains for chain A from pdb1. In fact, there is no need to specify anything for chain D. Indeed, as noticed above, the identity mapping between chains A and D allows transferring the definition of domains from chain A to chain D.

Domains under a computed alignment.

An alignment can naturally be computed on the fly, in which case this alignment replaces the identity alignment. If so:

We align the two structures. The following alignment methods are provides: a sequence based alignment using seqan (package Alignment_engines) , or the structure based alignment using Apurva (package Apurva).

A single domain specification file is used: it directly defines the a.a. of the domains on the first structure, and also on the second structure via the computed alignment.

or Kpax (package Iterative_alignment).

As noticed above, aligning domains on the fly should be done with care. As an example, consider the heavy chain of an antibody (IgG) for which 3 domains have been specified, namely the 3 complementary determining regions (CDR). Upon aligning two IgG heavy chains, the alignment (sequence or structure) typically shifts the sequences, so that the CDR as specified on the first chain do not correspond to the a.a. matched on the second structure. Therefore, a better practice consists of consistently numbering the a.a. of the two chains in the first place, so as to use a common spec of domains.

Metrics.

Several metrics to compare two rotations exist, see [98] . In the sequel, we use $\Phi_5$ . This distance, which is a number in $[0, 2\sqrt{2}]$ . It is defined from the matrix Frobenius norm as follows:

$\Phi_5(R_{A1-A2}, R_{A1-A2}) = \mid\mid I_3 - R_{A1-A2}*R_{A1-A2}^t\mid\mid_F.$

For a molecule with n domain, one expects n*(n-1)/2 pairwise comparisons between domains. However, we omit comparisons between domains involving less than four a.a. – or when the calculation of the optimal rigid motion faces a degeneracy.

As shown in [98], various other metrics $\Phi_2,\dots,\Phi_4$ are functions of $\Phi_5$ . Thus, we only report $\Phi_5$ .

Input and output: specifications

Input. An individual calculation requires 5 pieces of information, namely:

pieces 1 and 2: PDB file + and associated chain for the first conformation,
pieces 3 and 4: PDB file + and associated chain for the second conformation,
piece 5: file providing the specification of domains for chains of the first PDB file – see remark above.

Main options. The main options are:

–pdb-file string: PDB file passed twice ie once for each structure
–load-chains string: Chain(s) to load passed twice ie once for each structure
–domain-labels: The domains spec file. Domains are defined as residue sequence number ranges.

As an illustration, the following examples compares domains of the AcrB protein:

sbl-comodo-domainsW-atomic.exe -f data/4dx5-AcrB.pdb --load-chains A -f data/4dx7-AcrB.pdb --load-chains B --domain data/AcrB_rmsdw_monomer_subdomain_withCoils_ABCchains.spect --alignment seqan -v --output-prefix --log -p 4

Similarly, the following examples compares regions of an antibody (CDR, coil regions):

sbl-comodo-domainsW-atomic.exe -f data/IMGT-1A2Y.pdb --load-chains A -f data/IMGT-1A2Y.pdb --load-chains B --domain data/1a2y-imgt-cdrs.txt --alignment seqan -v --output-prefix --log -p 4

Main output. The main output file is an XML file containing the information for all comparisons performed. This file is best parsed with PALSE.

In this XML file, an individual comparison for two domains provides the following pieces of information:

Domain1: the first domain
Domain2: the second domain
lRMSD1: the lRMSD for the pair (A1, A2)
lRMSD2: the lRMSD for the pair (B1, B2)
The distance defined by Eq. eq-phi-five
The norm of the vector $T_{A1-A2} - T_{B1-B2}$
Distance: the distance between the aforementioned roations

An example tuple generated is the following one:

                        <item class_id="3" tracking_level="0" version="0">
                                <Domain1>A_CDR3</Domain1>
                                <Domain2>A_CDR1</Domain2>
                                <lRMSD1>3.81154538800104392e+00</lRMSD1>
                                <lRMSD2>2.26966325341404618e+00</lRMSD2>
                                <Distance-rotation>1.83275250807724421e+00</Distance-rotation>
                                <Distance-translation>1.89912193084327114e+01</Distance-translation>
                        </item>

Python script

We further provide the script $\text{sbl-comodo.py}$ , which performs a set of individual calculations. The input file is a text file such that each line specifies

As an example:

data/IMGT-1A2Y.pdb A data/IMGT-1A2Y.pdb B data/1a2y-imgt-cdrs.txt

data/4dx5-AcrB.pdb A data/4dx7-AcrB.pdb B data/AcrB_rmsdw_monomer_subdomain_withCoils_ABCchains.spect

Assuming such a file is named tuples.txt, a calculation is launched as follows:

sbl-comodo.py -f tuples.txt

Algorithms and Methods

The analysis carried out involves the following elementary steps:

Loading PDB files: using the package MolecularGeometryLoader.

Alignments: performed with the pacakge Alignment_engines.

Specifying the domains: using the package MolecularSystemLabelsTraits.

Computing optimal rigid motions to superimpose two sets: using the package Point_cloud_rigid_registration_3

Computing distances between the rotations: implementation of the formulae form [98].

Retrieving the statistics for all pairwise comparisons: using PALSE.

Programmer's Workflow

Correlated motions workflow:

External dependencies

Linear algebra is carried out using Eigen.

Table of Contents