Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
SBL.MinimumConnectivityInference Namespace Reference

Functions

def get_vertices_from_complexes (complexes)
 
def get_dict_of_complexes_per_vertex (complexes)
 
def __check__entries__ (complexes, G=None, weight=None, max_degree=None)
 
def is_valid_solution (complexes, solution, V=None)
 
def is_known_solution (solutions, new_solution)
 
def compare_solution_sets (A, B)
 
def __clean_string__ (st)
 
def read_complexes_from_file (filename)
 
def __list_of_edges_to_string__ (edges)
 
def pretty_print_one_sol (complexes, cost, edges, sol_number=None, trees=True)
 
def pretty_print_solutions (complexes, solutions, in_filename='Unnamed instance', out_filename=None, G=None, algorithm=None)
 
def MCI_MILP (complexes, G=None, weight=None, max_degree=None, algorithm='flow', optimality_gap=0, known_solutions=None, solver=None, verbose=False)
 
def MCI_MILP_all_with_decomposition (complexes, solver=None, verbose=True)
 
def exp_MILP (input_filename, output_filename=None, verbose=True)
 

Function Documentation

◆ __check__entries__()

def SBL.MinimumConnectivityInference.__check__entries__ (   complexes,
  G = None,
  weight = None,
  max_degree = None 
)
This private method is to be used by the MCI_MILP and MCI_Greedy methods.
It ensures that the graph G and the weight dictionary contains all necessary
values.

◆ __clean_string__()

def SBL.MinimumConnectivityInference.__clean_string__ (   st)
Put the string in a parsable format

◆ __list_of_edges_to_string__()

def SBL.MinimumConnectivityInference.__list_of_edges_to_string__ (   edges)

◆ compare_solution_sets()

def SBL.MinimumConnectivityInference.compare_solution_sets (   A,
  B 
)
Given two lists of pairs (cost, list of edges) or triples (cost, list of
edges, time), returns True if the two lists contains the same solutions.

◆ exp_MILP()

def SBL.MinimumConnectivityInference.exp_MILP (   input_filename,
  output_filename = None,
  verbose = True 
)

◆ get_dict_of_complexes_per_vertex()

def SBL.MinimumConnectivityInference.get_dict_of_complexes_per_vertex (   complexes)
Returns a dictionary keyed by vertices associating to each vertex the set of
complexes it belongs to.

◆ get_vertices_from_complexes()

def SBL.MinimumConnectivityInference.get_vertices_from_complexes (   complexes)
Returns the set of vertices involved in the complexes.

◆ is_known_solution()

def SBL.MinimumConnectivityInference.is_known_solution (   solutions,
  new_solution 
)
Return True if the new_solution is already in the list of solutions.

◆ is_valid_solution()

def SBL.MinimumConnectivityInference.is_valid_solution (   complexes,
  solution,
  V = None 
)
Return True if the solution is valid.

A solution is a pair (cost, list of edges) or a triple (cost, list of edges,
time). A solution is valid if for each complexe, the subgraph induced by
that complexe over the list of edges is connected.

◆ MCI_MILP()

def SBL.MinimumConnectivityInference.MCI_MILP (   complexes,
  G = None,
  weight = None,
  max_degree = None,
  algorithm = 'flow',
  optimality_gap = 0,
  known_solutions = None,
  solver = None,
  verbose = False 
)
Mixed Integer Linear Programming formulation for the MCI problem.

By default, we use the flow formulation to ensure connectivity of the
complexes (i.e., the subgraph induced by a complexe must be connected). When
the algorithm parameter starts with 'mad', we use the maximum average degree
(mad) formulation instead of the flow formulation to ensure the connectivity
of the complexes. We use a special complexe covering all vertices to ensure
global connectivity of the solutions.

This method can be used either to compute an optimal solution, or to
enumerate all optimal solutions, or to enumerate all solutions with cost at
most OPT+optimality_gap.

It is possible to bound the number of incident edges per vertex using the
max_degree parameter. This is independent of the chosen formulation (mad or
flow).

INPUTS:

- complexes -- dictionary keyed by the label of the complexes. Each item is
  a subset of proteins forming a complexe.

- G -- (default: None) undirected graph containing all potential edges. By
  default, G is the complete undirected graph without loops or multiple
  edges.

- weight -- (default: None) dictionary associating to edge e a weigth. We
  must have weight[e]>=0. By default, a unit weight is assumed.

- max_degree -- (default: None) integral value upper bounding the degree of
  the vertices in the solution (number of edges incident to a single
  vertex).

- algorithm -- (default: 'flow') name of the algorithm to be executed among:

    - 'flow': returns an optimal solution using the flow formulation.

    - 'mad': returns an optimal solution using the mad formulation.

    - 'flow_all' and 'mad_all': return all optimal solutions using either
      the flow or the mad formulation.

    - 'flow_gap' and 'mad_gap': return all solutions whose cost differ by at
      most optimality_gap of the optimal one, using either the flow or the
      mad formulation.

- optimality_gap -- (default: 0) this parameter is used only when the
  specified algorithm ends with '_gap' to return all solutions whose cost
  differ by at most optimality_gap of the optimal one.

- known_solutions -- (default: []) computed solutions must be different from
  already known solutions.

- solver -- (default: None) Allows for specifying the MILP solver (GLPK,
  Cplex, etc.)

- verbose -- (default: False) display some informations when sets to True.

OUTPUT:

A list of triples (cost, list of edges, computation time).

◆ MCI_MILP_all_with_decomposition()

def SBL.MinimumConnectivityInference.MCI_MILP_all_with_decomposition (   complexes,
  solver = None,
  verbose = True 
)
This function is a front-end to MCI_MILP that speeds up the computation of
all optimal solutions if some vertices are in all complexes.


EXAMPLES:

Assuming we have a file 'blop.txt' containing::

    BEGIN-OLIGOMERS
    3
    a b
    a b d e
    a b d e g h
    END-OLIGOMERS

    sage: complexes = read_complexes_from_file('blop.txt')
    sage: complexes
    {0: set(['a', 'b']),
     1: set(['a', 'b', 'e', 'd']),
     2: set(['a', 'b', 'e', 'd', 'g', 'h'])}

    sage: R1 = MCI_MILP_all_with_decomposition(complexes, verbose=False)
    sage: len(R1)
    192
    sage: R2 = MCI_MILP(complexes, algorithm='flow_all', verbose=False)
    sage: len(R1)==len(R2)
    True
    sage: compare_solution_sets(R1,R2)
    True

Using a hard real instance::

    sage: complexes = read_complexes_from_file('../data/taverner_hernandez_carolinson_ACS_2008_msms_mcci.txt')
    sage: res = MCI_MILP_all_with_decomposition(complexes, verbose=True)
    Identical vertices: ['Dis3', 'Mtr3', 'Rrp41', 'Rrp42', 'Rrp45']
    125 and 3 blocks for building solutions.
    Current count: 546875 solutions and 0 rejected.
    3 and 1296 blocks for building solutions.
    Current count: 546875 solutions and 263538 rejected.
    546875 solutions have been constructed, and 263538 have been rejected.
    0 duplicated solutions.
    Final number of solutions: 546875

◆ pretty_print_one_sol()

def SBL.MinimumConnectivityInference.pretty_print_one_sol (   complexes,
  cost,
  edges,
  sol_number = None,
  trees = True 
)

◆ pretty_print_solutions()

def SBL.MinimumConnectivityInference.pretty_print_solutions (   complexes,
  solutions,
  in_filename = 'Unnamed instance',
  out_filename = None,
  G = None,
  algorithm = None 
)
Print the results either in the standard output (default) or in the
specified file (out_filename).

◆ read_complexes_from_file()

def SBL.MinimumConnectivityInference.read_complexes_from_file (   filename)
Read complexes from input file assuming that the sets are given between a
'BEGIN-OLIGOMERS' tag and a 'END-OLIGOMERS' tag.

BEGIN-OLIGOMERS
15               # some number indicating the number of complexe sets
Csn1 Csn2 Csn3 Csn4 Csn6 Csn7 Csn8
Csn1 Csn3 Csn4 Csn5 Csn6 Csn7 Csn8
...
Csn4 Csn6 Csn7
Csn3 Csn8
END-OLIGOMERS