For a predicted structure – in the realm of AlphaFold, pLDDT values are stored in the B factor column.
Overview
The script sbl-alphafold-dbrun.py provides the analysis described in the papers [42] and [167] – see also [41] .
The main analysis methods offered are:
mode pLDDT: elementary statistics on pLDDT values of all amino acids of all predicted structures in a directory. (NB: pLDDT values stored in the column of B factors.)
mode filtrations: analysis based on the pLDDT and arity filtrations, for one structure or all structures in a directory.
mode null: analysis based on the pLDDT filtration for a random/null model as specified in the paper.
mode null-pvalue : permutation p-value calculation as specified in the paper, using the null model to specify H0.
Filtrations using the arity and the pLDDT: example on ordered, disordered, and mixed proteins. For a prediction (i.e. for every column), the calculation provides: (i) Arity based filtration: evolution of the number of connected components in the filtration (ii) Arity based filtration: persistence diagram for connected components (iii) pLDDT based filtration: evolution of the number of connected components in the filtration (iv) pLDDT based filtration: persistence diagram for connected components See [42] and [167] for details.
Analysis: filtrations
Calculation on a single file. Invoke the script sbl-alphafold-dbrun.py on the file to be processed. The call generates a variety of analysis, dumped into the subdirectory perfile-gen-arity of the directory containing the model processed.
Filtration using the arity as parameter: prefixed as –arity,
Filtration using the pLDDT as parameter: prefixed as –pLDDT,
Finally, the csv file containing all statistics for the pLDDT and arity based filtrations (format detailed below).
Calculation for the null model. The null model associated with Union-Find can be invoked as follows:
sbl-alphafold-dbrun.py -a null –nm-N 1000
Calculation on a directory. Invoke the script sbl-alphafold-dbrun.py on the directory to be processed. The call generates a summary and scatter plots for the main statistics, and stores them into the subdirectory DIRNAME-gen-arity.
The call also generates the database DIRNAME-afstats.csv, see below.
It is also possible to generate all the analysis seen above for a single file using the option –dpm.
Example:
sbl-alphafold-dbrun.py -d $afdb/MJannaschii -a filtration Starting // calculation for /user/fcazals/home/mols-archives/AFDB/MJannaschii PDB files found No gz file found ... Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-arity.csv Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-arity-r10-scatter.png Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-arity-hmul-r10.png Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-arity-analysis-r10.tex Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-afstats.csv Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-afstats-fcp-x-H-hmap.png Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-afstats-cpf-x-mp-hmap.png Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-arity/MJannaschii-afstats-mp-x-ent-hmap.png
Database. The previous calculation on a whole directory also generates DIRNAME-afstats.csv,, which contains 19 columns as follows:
Columns 0:name,1: num_aa : model name and number of amino acids.
Columns 2:q25,3:ar25,4:cdf25,5:q75,6:ar75,7:cdf75 : a sequence of q triples, with q the number of quantiles used in the arity signature. Each triple consists of (i) the quantiale value, (ii) the arity value, and (iii) the value of the CDF at the arity value.
Colums 8:FPCP,9:meanP,10:persH : a triple consisting of the fraction of positive critical points, the mean persistence, and the normalized persistence entropy.
Columns : 11:t0dot025,12:PLM0dot025,13:t0dot05,14:PLM0dot05,15:t0dot15,16:PLM0dot15 : a sequence of pairs consisting of the persistence threshold used and the corresponding number of persistent local maxima for the number of connected components of the pLDDT curve. In the example, three thresholds are used, namely .
Columns: 17:NCC_max_arity,18:NCC_max_pLDDT : the maxima of number of connected components observed in the filtrations using the arity and the pLDDT.
Here are the first few lines of MJannaschii-afstats.csv:
Database queries using filters. A filter is a test file such that each line contains three pieces of information: the variable name, and the min and max values (vmin and vmax respectively). Lines staring with a pound are ignored.
Each line is used to define a filter, which is then invoked to decide whether a given entry (protein) of the genome processed is selected or not.
The pLDDT analysis aims at providing pLDDT statistics on all the AlphaFold reconstructions in a genome, plotting the distribution of all pLDDT values, of mean values (one mean value per protein), and median values (one median value per protein).
Calculation on a directory. Invoke the script sbl-alphafold-dbrun.py on the directory to be processed. The call generates a summary and scatter plots for the main statistics, and stores them into the subdirectory DIRNAME-gen-pLDDT.
sbl-alphafold-dbrun.py -d $afdb/MJannaschii -a plddt === Processing /user/fcazals/home/mols-archives/AFDB/MJannaschii PDB files found No gz file found Starting // calculation ... Calculation done All pLDDT values for MJannaschii 1773 497291 Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-pLDDT/MJannaschii-pLDDT-all-histogram.png Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-pLDDT/MJannaschii-pLDDT-median-histogram.png Dumped /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-pLDDT/MJannaschii-pLDDT-mean-histogram.png Written /user/fcazals/home/mols-archives/AFDB/MJannaschii/MJannaschii-gen-pLDDT/MJannaschii-pLDDT-analysis.tex
Implementation notes
The module SBL::Union_find provides two classes. The class SBL::Union_find::Union_find_DS implements the usual Union-find algorithm, and also build the persistence diagram for connected components. The class SBL::Union_find::Union_find_wrapper wraps the latter, taking as input a list of pairs (index, associated value), with value used to build the filtration.
The module SBL::AF_null_model provides the class SBL::AF_null_model::AF_null_model implementing the null model using a Union_find_wrapper, and the associated permutation pvalue.
The module SBL::AF_filtrations provides the following classes: SBL::AF_filtrations::AF_filtrations_finder uses Union-find to study the filtrations; SBL::AF_filtrations::AF_filtrations_directory runs the previous on a directory; SBL::AF_filtrations::AF_filtrations_parallel handles the parallel mode for files in a directory.
Visualization, Plugins, GUIs
The SBL provides VMD, PyMOL, and Web plugins for sbl-alphafold-dbrun.py. Launch the VMD plugin from the SBL catalog under the Extensions menu, start the PyMOL plugin by running sbl_alphafold_dbrun in the PyMOL command line, and open the Web plugin from the SBL web plugins launcher page (served locally via sbl-web-plugins-launcher). All three plugins expose the same GUI and capabilities.
sbl-alphafold-dbrun plugin user interface.
The user interface for the sbl-alphafold-dbrun plugin. The typical workflow is: Step 1: Provide input file. Step 2: Specify the parameters for the analysis. Step 3: Choose an output directory (optional, a default is provided). Step 4: Launch the computation. Step 5: Review the results in GUI.