Template C++ / Python API for developping structural bioinformatics applications.
User Manual
Correlated_motions_by_domain
Authors:F. Cazals and T. Dreyfus
Studying correlated motions between protein domains
Consider two conformations of a given molecule, denoted 1 and 2. Also assume that this molecule features two domains A and B. Further denote:
A1: domain A in conformation 1; A2: domain A in conformation 2;
B1: domain B in conformation 1; B2: domain B in conformation 2;
When switching from conformation 1 to conformation 2, we wish to assess the coherence of the conformational changes undergone by domains A and B.
Assume that the conformational change of domain A (and likewise for B) is modeled by a rigid motion, the is the composition of a translation and of a rotation. That is, denote:
: the rotation associated with the best rigid motion superimposing A1 onto A2.
: the rotation associated with the best rigid motion superimposing B1 onto B2.
: the rotation associated with the best rigid motion superimposing A1 onto A2.
: the rotation associated with the best rigid motion superimposing B1 onto B2.
In this application, we study the distances between these rotations and translations.
In the general setting, we consider a molecule with n domains, in which case we perform n*(n-1)/2 pairwise comparisons – one comparison for each pair of domains.
Using Correlated_motions_by_domain
In the following, we specify the input/output of the program , as well as the python script script .
Pre-requisites
Domains under the identity alignment.
The default specification of domains is as follows:
A domain is specified using a list of ranges of a.a. – as a domain is not necessarily connected along the sequence.
It is assumed that the i-th a.a. of the first structure corresponds to the i-th a.a. of the second structure, and this a.a. is considered if and only if it is present in both structures. In other words, we consider the two structures under the identity alignment, and consider the common a.a. as specified by the domains.
Because of the previous design choice, a single domain specification file is used for both structures.
We note that this design choice aims at controlling carefully the alignments used before comparing rigid motions. For cases where this constraint can be relaxed, one can proceed as follows.
As an illustration of the previous design choice, consider the following case: one wishes to compare chain A from pdb1 against chain D from pdb2. We assume that the domain specification file defines domains for chain A from pdb1. In fact, there is no need to specify anything for chain D. Indeed, as noticed above, the identity mapping between chains A and D allows transferring the definition of domains from chain A to chain D.
Domains under a computed alignment.
An alignment can naturally be computed on the fly, in which case this alignment replaces the identity alignment. If so:
We align the two structures. The following alignment methods are provides: a sequence based alignment using seqan (package Alignment_engines) , or the structure based alignment using Apurva (package Apurva).
A single domain specification file is used: it directly defines the a.a. of the domains on the first structure, and also on the second structure via the computed alignment.
As noticed above, aligning domains on the fly should be done with care. As an example, consider the heavy chain of an antibody (IgG) for which 3 domains have been specified, namely the 3 complementary determining regions (CDR). Upon aligning two IgG heavy chains, the alignment (sequence or structure) typically shifts the sequences, so that the CDR as specified on the first chain do not correspond to the a.a. matched on the second structure. Therefore, a better practice consists of consistently numbering the a.a. of the two chains in the first place, so as to use a common spec of domains.
Metrics.
Several metrics to compare two rotations exist, see [108] . In the sequel, we use . This distance, which is a number in . It is defined from the matrix Frobenius norm as follows:
For a molecule with n domain, one expects n*(n-1)/2 pairwise comparisons between domains. However, we omit comparisons between domains involving less than four a.a. – or when the calculation of the optimal rigid motion faces a degeneracy.
As shown in [108], various other metrics are functions of . Thus, we only report .
Input and output: specifications
Input. An individual calculation requires 5 pieces of information, namely:
pieces 1 and 2: PDB file + and associated chain for the first conformation,
pieces 3 and 4: PDB file + and associated chain for the second conformation,
piece 5: file providing the specification of domains for chains of the first PDB file – see remark above.
Main options. The main options are:
–pdb-filestring: PDB file passed twice ie once for each structure –load-chainsstring: Chain(s) to load passed twice ie once for each structure –domain-labels: The domains spec file. Domains are defined as residue sequence number ranges.
As an illustration, the following examples compares domains of the AcrB protein:
sbl-comodo-domainsW-atomic.exe -f data/4dx5-AcrB.pdb --load-chains A -f data/4dx7-AcrB.pdb --load-chains B --domain data/AcrB_rmsdw_monomer_subdomain_withCoils_ABCchains.spect --alignment seqan -v --output-prefix --log -p 4
Similarly, the following examples compares regions of an antibody (CDR, coil regions):
sbl-comodo-domainsW-atomic.exe -f data/IMGT-1A2Y.pdb --load-chains A -f data/IMGT-1A2Y.pdb --load-chains B --domain data/1a2y-imgt-cdrs.txt --alignment seqan -v --output-prefix --log -p 4
Main output. The main output file is an XML file containing the information for all comparisons performed. This file is best parsed with PALSE.
In this XML file, an individual comparison for two domains provides the following pieces of information: