Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
KpaxAlignmentGraph Class Reference

Public Member Functions

 __init__ (self, input_filepath, gap=5, output_filepath="alignment_graph.pdf")
 read_pairs_from_file (self)
 find_connected_components_with_gap (self, residues, gap_threshold)
 create_residue_to_cc_mapping (self, ccs_dict)
 analyze_pairs (self)
 print_analysis_summary (self)
 save_alignment_graph_to_dot (self)
 save_connected_components_to_files (self)
 run_analysis (self)

Detailed Description

A class to analyze protein structure alignment pairs and generate connectivity graphs.

This class reads alignment data, finds connected components in protein chains,
analyzes inter-chain mappings, and generates Graphviz visualizations.

Constructor & Destructor Documentation

◆ __init__()

__init__ ( self,
input_filepath,
gap = 5,
output_filepath = "alignment_graph.pdf" )
Initialize the alignment analyzer.

Args:
    input_filepath (str): Path to the alignment file containing residue pairs
    gap (int): Maximum gap between residue IDs to consider them continuous (default=5)
    output_filepath (str): Path for the output .pdf file (default="alignment_graph.pdf")

Member Function Documentation

◆ analyze_pairs()

analyze_pairs ( self)
Analyze the pairs to find connected components and inter-chain mappings.

For the pairs, all the first elements are for one chain, and all the second elements are for another chain.
We want to find connected components in each chain. Then construct a graph that connects components from chain1 to chain2.
This can be a multiple to multiple mapping (like CC1, CC2 on chain1 align with CC1 on chain2).

Returns:
    dict: Analysis results containing connected components and inter-chain mappings

◆ create_residue_to_cc_mapping()

create_residue_to_cc_mapping ( self,
ccs_dict )
Create mapping from residue ID to connected component ID

Args:
    ccs_dict (dict): Connected components from Union-Find (defaultdict format)
    
Returns:
    tuple: (residue_to_cc_mapping, cc_list)

◆ find_connected_components_with_gap()

find_connected_components_with_gap ( self,
residues,
gap_threshold )
Find connected components where residues are connected if their IDs differ by <= gap_threshold

Args:
    residues (set): Set of residue IDs
    gap_threshold (int): Maximum gap to consider residues connected
    
Returns:
    dict: Connected components from Union-Find data structure

◆ print_analysis_summary()

print_analysis_summary ( self)
Print a formatted summary of the analysis results.

◆ read_pairs_from_file()

read_pairs_from_file ( self)
Read pairs from the input file where each line contains: chain1 resid1 chain2 resid2
Returns a list of tuples (resid1, resid2)

◆ run_analysis()

run_analysis ( self)
Run the complete analysis pipeline: read pairs, analyze, and generate graph.

Returns:
    dict: Analysis results

◆ save_alignment_graph_to_dot()

save_alignment_graph_to_dot ( self)
Save inter-chain alignment mappings to a .dot file for Graphviz visualization

◆ save_connected_components_to_files()

save_connected_components_to_files ( self)
Save connected component residue indices to separate files for VMD/PyMOL selection.
Creates files named: protein_X_cc_Y_chain_Z.txt and protein_X_chain_Z.txt

Files created:
- protein_1_cc_0_chain_A.txt: Residues for Chain 1, Connected Component 0
- protein_1_cc_1_chain_A.txt: Residues for Chain 1, Connected Component 1
- protein_2_cc_0_chain_A.txt: Residues for Chain 2, Connected Component 0
- protein_1_chain_A.txt: All residues for Chain 1
- protein_2_chain_A.txt: All residues for Chain 2