Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
KpaxPostAnalyzer Class Reference

Public Member Functions

None __init__ (self, Path output_dir, Path post_dir, int gap=5, Optional[str] viewer=None, bool chains_visible_default=True)
None prepare_vmd_visualization (self, List[Path] pdb_paths)
None prepare_pymol_visualization (self, List[Path] pdb_paths)
None prepare_ngl_visualization (self, List[Path] pdb_paths)
None prepare_viewer_files (self)
int run (self)

Protected Member Functions

Optional[Path] _find_first_by_patterns (self, Iterable[str] patterns)
Tuple[bool, str] _render_dot_to_pdf (self, Path dot_file, Path pdf_out)
List[Tuple[str, int, str, int]] _parse_alignment_pairs (self, Path alignment_path)
Tuple[bool, str] _align_second_structure_with_kabsch (self, Path pdb1, Path pdb2)
None _write_pdf_titles (self, Dict[str, str] pdf_titles)
List[Path] _extract_pdb_files_from_log (self)
List[Path] _copy_structures_to_post (self, List[Path] pdb_paths)
List[str] _extract_protein_details_and_statistics (self, List[str] lines)
str _format_statistics_line (self, str line)
Tuple[List[str], List[str], List[str]] _parse_log (self)
Tuple[List[str], List[str]] _copy_alignment_file (self)
None _write_summary (self, List[str] events, List[str] issues)

Static Protected Member Functions

Dict[Tuple[str, int], Tuple[float, float, float]] _load_ca_coordinates (Path pdb_path)
None _apply_rigid_transform_to_pdb (Path src_pdb, Path dst_pdb, R, t)

Detailed Description

Locate KPAX outputs, generate a PDF graph, and emit GUI-compatible text.

Responsibilities
----------------
- Discover KPAX artifacts in ``output_dir`` (log and alignment files).
- Generate the alignment graph PDF via ``KpaxAlignmentGraph``.
- Extract protein details and statistics from the log into ``outputText1.txt``.
- Copy/format the alignment table into ``outputText2.txt`` with indices.
- Optionally prepare viewer files for VMD, PyMOL, or NGL.

Attributes
----------
output_dir : Path
    Root directory containing KPAX outputs.
post_dir : Path
    Destination directory for generated artifacts.
gap : int
    Gap threshold used by ``KpaxAlignmentGraph`` when building the graph.
viewer : str | None
    Optional viewer type: ``vmd``, ``pymol``, or ``ngl``.
chains_visible_default : bool
    For NGL assets, whether chains should be visible by default.
text1_path : Path
    Path to ``outputText1.txt``.
text2_path : Path
    Path to ``outputText2.txt``.
fig1_path : Path
    Path to ``outputPDF1.pdf``.
titles_path : Path
    Path to ``pdftitles.txt``.

Constructor & Destructor Documentation

◆ __init__()

None __init__ ( self,
Path output_dir,
Path post_dir,
int gap = 5,
Optional[str] viewer = None,
bool chains_visible_default = True )
Initialize the analyzer with locations and options.

Parameters
----------
output_dir : Path
    Directory to scan for KPAX outputs.
post_dir : Path
    Directory where post-analysis artifacts will be written.
gap : int, default=5
    Gap threshold forwarded to ``KpaxAlignmentGraph``.
viewer : str | None, default=None
    Optional viewer type to prepare assets for: ``vmd``, ``pymol``, or ``ngl``.
chains_visible_default : bool, default=True
    For the NGL viewer assets, whether chains are visible by default.

Member Function Documentation

◆ _align_second_structure_with_kabsch()

Tuple[bool, str] _align_second_structure_with_kabsch ( self,
Path pdb1,
Path pdb2 )
protected
Align ``pdb2`` onto ``pdb1`` using CA pairs (Kabsch).

Pairs are derived from ``alignment.txt`` via ``_parse_alignment_pairs``.

Parameters
----------
pdb1 : Path
    Reference structure.
pdb2 : Path
    Target structure (transformed in place on success).

Returns
-------
tuple[bool, str]
    ``(ok, message)``. If prerequisites are missing, returns ``(False, reason)``.

◆ _apply_rigid_transform_to_pdb()

None _apply_rigid_transform_to_pdb ( Path src_pdb,
Path dst_pdb,
R,
t )
staticprotected
Apply a rigid transform to all ATOM/HETATM coordinates in a PDB.

Parameters
----------
src_pdb : Path
    Source PDB file to read.
dst_pdb : Path
    Destination PDB file to write.
R : sequence[sequence[float]]
    3x3 rotation matrix.
t : sequence[float]
    3D translation vector.

◆ _copy_alignment_file()

Tuple[List[str], List[str]] _copy_alignment_file ( self)
protected
Copy ``alignment.txt`` content to ``outputText2.txt`` with indices.

Adds a header row and right-aligns indices for clean display.

Returns
-------
tuple[list[str], list[str]]
    ``(events, warnings)``

◆ _copy_structures_to_post()

List[Path] _copy_structures_to_post ( self,
List[Path] pdb_paths )
protected
Copy up to two structures into ``post_dir``.

Files are named ``tmp_[pdbid].pdb``.

Parameters
----------
pdb_paths : list[Path]
    Candidate source structures.

Returns
-------
list[Path]
    Destination paths that exist in ``post_dir`` after the copy.

◆ _extract_pdb_files_from_log()

List[Path] _extract_pdb_files_from_log ( self)
protected
Heuristically extract up to two input structures from the KPAX log.

Heuristics
----------
- Inspect the first ~20 non-empty lines of the first log found
- Collect tokens ending with ``.pdb``/``.cif``/``.mmcif``
- Resolve to existing files (absolute or relative to ``output_dir``)

Returns
-------
list[Path]
    Up to two resolved structure paths.

◆ _extract_protein_details_and_statistics()

List[str] _extract_protein_details_and_statistics ( self,
List[str] lines )
protected
Extract protein details and statistics sections from the log.

Logic
-----
- Locate the "Details for each protein" section and keep specific fields
  (structure lines, number of chains/residues).
- Locate the "Statistics" section within the alignment module and format
  numeric values to two decimals for readability.

◆ _find_first_by_patterns()

Optional[Path] _find_first_by_patterns ( self,
Iterable[str] patterns )
protected
Find the first file matching any of the provided glob patterns.

Search is recursive under ``self.output_dir`` using ``Path.rglob``.

Parameters
----------
patterns : Iterable[str]
    One or more glob patterns to search for.

Returns
-------
Path | None
    The first matching file path, or ``None`` if none are found.

◆ _format_statistics_line()

str _format_statistics_line ( self,
str line )
protected
Format a statistics line to show numbers with 2 decimal places.

◆ _load_ca_coordinates()

Dict[Tuple[str, int], Tuple[float, float, float]] _load_ca_coordinates ( Path pdb_path)
staticprotected
Extract CA atom coordinates from a PDB file.

Parameters
----------
pdb_path : Path
    PDB file path to parse.

Returns
-------
dict[tuple[str,int], tuple[float,float,float]]
    Mapping from ``(chain_id, residue_index)`` to ``(x, y, z)``.

◆ _parse_alignment_pairs()

List[Tuple[str, int, str, int]] _parse_alignment_pairs ( self,
Path alignment_path )
protected
Parse alignment pairs into typed tuples.

Expected row format (robust to spacing): ``"A 3 A 1"``.

Parameters
----------
alignment_path : Path
    Path to ``alignment.txt``.

Returns
-------
list[tuple[str,int,str,int]]
    Tuples of ``(chain1, resid1, chain2, resid2)``. Lines that cannot
    be parsed are skipped.

◆ _parse_log()

Tuple[List[str], List[str], List[str]] _parse_log ( self)
protected
Parse the log and extract protein details/statistics.

Returns
-------
tuple[list[str], list[str], list[str]]
    ``(events, warnings, extracted_lines)``

◆ _render_dot_to_pdf()

Tuple[bool, str] _render_dot_to_pdf ( self,
Path dot_file,
Path pdf_out )
protected
Render a DOT file to PDF using Graphviz ``dot``.

Notes
-----
This helper is provided for completeness but the current workflow
generates the PDF via ``KpaxAlignmentGraph`` from ``alignment.txt``.

Parameters
----------
dot_file : Path
    Input DOT graph file path.
pdf_out : Path
    Output PDF file path to write.

Returns
-------
tuple[bool, str]
    ``(ok, message)``

◆ _write_pdf_titles()

None _write_pdf_titles ( self,
Dict[str, str] pdf_titles )
protected
Write PDF titles mapping to ``pdftitles.txt``.

File format is ``<filename>: <Title>`` per line and is consumed by
GUI PDF viewers to set tab or pane titles.

◆ _write_summary()

None _write_summary ( self,
List[str] events,
List[str] issues )
protected
Write a summary file for debugging purposes.

◆ prepare_ngl_visualization()

None prepare_ngl_visualization ( self,
List[Path] pdb_paths )
Prepare assets for the NGL viewer (aligned with the Panel GUI).

- Ensure `structure_*.pdb` exist in `post/` (copied from inputs)
- Write `structures.json` listing those files
- Parse `protein_*.txt` selections in `post/` and write `selections.json`
- Ensure a `graphics.json` stub exists (cylinders/triangles/texts)
Also writes a convenience combined `1.json` for backward compatibility.

Parameters
----------
pdb_paths : list[Path]
    Up to two structures to include; the second is pre-aligned if possible.

Files written
-------------
- ``structure_*.pdb``
- ``structures.json``
- ``selections.json``
- ``graphics.json`` (if absent)
- ``1.json`` (combined convenience file)

◆ prepare_pymol_visualization()

None prepare_pymol_visualization ( self,
List[Path] pdb_paths )
Write a self-contained PyMOL script (``1.py``) that mirrors GUI behavior.

The script:
- Loads the provided structures as protein_1, protein_2
- Sets global cartoon/display preferences
- Parses post/protein_*.txt selection files
- Aligns proteins using 'ALL' selections (CA) if available
- Creates CC objects per protein with distinct colors
- Creates ALL object per protein (hidden, lines)
- Orients and refreshes the view

Parameters
----------
pdb_paths : list[Path]
    Up to two structures to load.

◆ prepare_viewer_files()

None prepare_viewer_files ( self)
Prepare visualization files based on the requested viewer type.

If no input structures can be discovered from the log, falls back to
enumerating ``*.pdb`` files under ``output_dir``. Copied structures in
``post_dir`` are then used for all viewers for portability.

◆ prepare_vmd_visualization()

None prepare_vmd_visualization ( self,
List[Path] pdb_paths )
Write a self-contained VMD script (``1.vmd``) to visualize results.

The script:
- Loads provided PDB structures
- Parses selection files from the post directory (protein_*.txt)
- Builds connected component (CC) molecules per protein with distinct colors
- Optionally builds an ALL-residues molecule (hidden) per protein
- Aligns proteins using only aligned residues (CA), then resets the view

Parameters
----------
pdb_paths : list[Path]
    Up to two structures to load.

◆ run()

int run ( self)
Execute the KPAX post-analysis workflow.

Steps
-----
1. Build the alignment graph PDF via ``KpaxAlignmentGraph``.
2. Optionally prepare viewer assets (VMD/PyMOL/NGL).
3. Extract protein details/statistics to ``outputText1.txt``.
4. Copy/format alignment pairs to ``outputText2.txt``.

Returns
-------
int
    0 if any of the expected outputs were generated; otherwise 1.