
PDB_utilities
Authors: F. Cazals
Goals
This package provides two sets of scripts:
- A first set to perform simple operations on PDB files : fetching, analyzing chains, aligning sequences, etc.
- A second set implementing the analysis of AlphaFold predictions, as explained in [167] and [42] .
Fetching PDB files
Script
sbl-pdb-fetch.py. A script to connect to the PDB and fetch one / a list of PDB files. The spec consists of the PDB id; the format namely pdb or mmCif is specified with the option –format. The option –chains lists the chains present in each file.
Implementation
A direct use of PDBList from Bio.PDB.
Parsing PDB files
Script
sbl-pdb-parse.py. A script to investigate the chains present in (a) PDB file(s), listing in particular the number of a.a. in the primary sequence (retrieved from the PDB header, or reconstructed from the a.a. sequence if there is no PDB header), and the number of a.a. in the structure (with the min and max resids).
Example: querying the three chains present in 1vfb.pdb:
sbl-pdb-parse.py -f 1vfb.pdb PDB_utils processing file 1vfb.pdb with num chains 3 pdbid:1vfb.pdb model idx:0 chain_id:A aa_in_primary_seq:107 aa_in_struct:107 resid_min:1 resid_max:107 resid_span:107 107 ; DIVLTQSPASLSASVGETVTITCRASGNIHNYLAWYQQKQGKSPQLLVYYTTTLADGVPSRFSGSGSGTQYSLKINSLQPEDFGSYYCQHFWSTPRTFGGGTKLEIK
Implementation
A direct use of the modules and PDB from Bio, and MMCIFParser + PDBIO from Bio.PDB.
Rendering PDB files
Script
sbl-pdb-render.py. A script calling pymol to render a set of structures. The main options are:
- –idir to render all files in a directory.
- -f/–fnames to pass a list of files
- -F/–fname to pass a file containing a list of files
- -A to pass a text file containing filenames of AlphaFold reconstructions i.e. files prefixed with AF-*.
- –AFcol to exploit the temperature factors as pLDDT values and color code the residues accordingly.
- –odir to provide the file containing the png files generated.
- –R/–range to passe a range of pLDDT values and plot those amino acids only
Example: the following dumps a picture for each structure in the input directory, with AlphaFold's pLDDT coloring scheme:
sbl-pdb-render.py –idir ~/render-structures -o ~/render-structures –AFcol
Figures automatically rendered, using the AlphaFold coloring scheme. AlphaFold2 models: Homo Sapiens: AF-P15121-F1-model_v4; Drosophila melanogaster: AF-Q9VQS4-F1-model_v4; Homo Sapiens: AF-Q02817-F15-model_v4
Implementation
A direct use of pymol via the python module pymol_cmd.
Aligning models
Script
sbl-pdb-align.py. A simple script providing pairwise sequence and structure alignments:
- Two sequences: uses PairwiseAligner from Bio.Align to align two sequences.
- Two structures that is subsets of residues of two chains: rotates/translates one set of atoms on top of another to minimize RMSD. Also rewrite the second structure in a PDB file to ease visualization. NB: the atoms used are N, CA, and C atoms (no side-chain atom).
See all options with –help.
Implementation
The sequence alignment relies on the PairwiseAligner module.
The structural alignment relies on the Bio.PDB.Superimposer module.