Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
|
Functions | |
def | get_vertices_from_complexes (complexes) |
def | get_dict_of_complexes_per_vertex (complexes) |
def | __check__entries__ (complexes, G=None, weight=None, max_degree=None) |
def | is_valid_solution (complexes, solution, V=None) |
def | is_known_solution (solutions, new_solution) |
def | compare_solution_sets (A, B) |
def | __clean_string__ (st) |
def | read_complexes_from_file (filename) |
def | __list_of_edges_to_string__ (edges) |
def | pretty_print_one_sol (complexes, cost, edges, sol_number=None, trees=True) |
def | pretty_print_solutions (complexes, solutions, in_filename='Unnamed instance', out_filename=None, G=None, algorithm=None) |
def | MCI_MILP (complexes, G=None, weight=None, max_degree=None, algorithm='flow', optimality_gap=0, known_solutions=None, solver=None, verbose=False) |
def | MCI_MILP_all_with_decomposition (complexes, solver=None, verbose=True) |
def | exp_MILP (input_filename, output_filename=None, verbose=True) |
def SBL.MinimumConnectivityInference.__check__entries__ | ( | complexes, | |
G = None , |
|||
weight = None , |
|||
max_degree = None |
|||
) |
This private method is to be used by the MCI_MILP and MCI_Greedy methods. It ensures that the graph G and the weight dictionary contains all necessary values.
def SBL.MinimumConnectivityInference.__clean_string__ | ( | st | ) |
Put the string in a parsable format
def SBL.MinimumConnectivityInference.__list_of_edges_to_string__ | ( | edges | ) |
def SBL.MinimumConnectivityInference.compare_solution_sets | ( | A, | |
B | |||
) |
Given two lists of pairs (cost, list of edges) or triples (cost, list of edges, time), returns True if the two lists contains the same solutions.
def SBL.MinimumConnectivityInference.exp_MILP | ( | input_filename, | |
output_filename = None , |
|||
verbose = True |
|||
) |
def SBL.MinimumConnectivityInference.get_dict_of_complexes_per_vertex | ( | complexes | ) |
Returns a dictionary keyed by vertices associating to each vertex the set of complexes it belongs to.
def SBL.MinimumConnectivityInference.get_vertices_from_complexes | ( | complexes | ) |
Returns the set of vertices involved in the complexes.
def SBL.MinimumConnectivityInference.is_known_solution | ( | solutions, | |
new_solution | |||
) |
Return True if the new_solution is already in the list of solutions.
def SBL.MinimumConnectivityInference.is_valid_solution | ( | complexes, | |
solution, | |||
V = None |
|||
) |
Return True if the solution is valid. A solution is a pair (cost, list of edges) or a triple (cost, list of edges, time). A solution is valid if for each complexe, the subgraph induced by that complexe over the list of edges is connected.
def SBL.MinimumConnectivityInference.MCI_MILP | ( | complexes, | |
G = None , |
|||
weight = None , |
|||
max_degree = None , |
|||
algorithm = 'flow' , |
|||
optimality_gap = 0 , |
|||
known_solutions = None , |
|||
solver = None , |
|||
verbose = False |
|||
) |
Mixed Integer Linear Programming formulation for the MCI problem. By default, we use the flow formulation to ensure connectivity of the complexes (i.e., the subgraph induced by a complexe must be connected). When the algorithm parameter starts with 'mad', we use the maximum average degree (mad) formulation instead of the flow formulation to ensure the connectivity of the complexes. We use a special complexe covering all vertices to ensure global connectivity of the solutions. This method can be used either to compute an optimal solution, or to enumerate all optimal solutions, or to enumerate all solutions with cost at most OPT+optimality_gap. It is possible to bound the number of incident edges per vertex using the max_degree parameter. This is independent of the chosen formulation (mad or flow). INPUTS: - complexes -- dictionary keyed by the label of the complexes. Each item is a subset of proteins forming a complexe. - G -- (default: None) undirected graph containing all potential edges. By default, G is the complete undirected graph without loops or multiple edges. - weight -- (default: None) dictionary associating to edge e a weigth. We must have weight[e]>=0. By default, a unit weight is assumed. - max_degree -- (default: None) integral value upper bounding the degree of the vertices in the solution (number of edges incident to a single vertex). - algorithm -- (default: 'flow') name of the algorithm to be executed among: - 'flow': returns an optimal solution using the flow formulation. - 'mad': returns an optimal solution using the mad formulation. - 'flow_all' and 'mad_all': return all optimal solutions using either the flow or the mad formulation. - 'flow_gap' and 'mad_gap': return all solutions whose cost differ by at most optimality_gap of the optimal one, using either the flow or the mad formulation. - optimality_gap -- (default: 0) this parameter is used only when the specified algorithm ends with '_gap' to return all solutions whose cost differ by at most optimality_gap of the optimal one. - known_solutions -- (default: []) computed solutions must be different from already known solutions. - solver -- (default: None) Allows for specifying the MILP solver (GLPK, Cplex, etc.) - verbose -- (default: False) display some informations when sets to True. OUTPUT: A list of triples (cost, list of edges, computation time).
def SBL.MinimumConnectivityInference.MCI_MILP_all_with_decomposition | ( | complexes, | |
solver = None , |
|||
verbose = True |
|||
) |
This function is a front-end to MCI_MILP that speeds up the computation of all optimal solutions if some vertices are in all complexes. EXAMPLES: Assuming we have a file 'blop.txt' containing:: BEGIN-OLIGOMERS 3 a b a b d e a b d e g h END-OLIGOMERS sage: complexes = read_complexes_from_file('blop.txt') sage: complexes {0: set(['a', 'b']), 1: set(['a', 'b', 'e', 'd']), 2: set(['a', 'b', 'e', 'd', 'g', 'h'])} sage: R1 = MCI_MILP_all_with_decomposition(complexes, verbose=False) sage: len(R1) 192 sage: R2 = MCI_MILP(complexes, algorithm='flow_all', verbose=False) sage: len(R1)==len(R2) True sage: compare_solution_sets(R1,R2) True Using a hard real instance:: sage: complexes = read_complexes_from_file('../data/taverner_hernandez_carolinson_ACS_2008_msms_mcci.txt') sage: res = MCI_MILP_all_with_decomposition(complexes, verbose=True) Identical vertices: ['Dis3', 'Mtr3', 'Rrp41', 'Rrp42', 'Rrp45'] 125 and 3 blocks for building solutions. Current count: 546875 solutions and 0 rejected. 3 and 1296 blocks for building solutions. Current count: 546875 solutions and 263538 rejected. 546875 solutions have been constructed, and 263538 have been rejected. 0 duplicated solutions. Final number of solutions: 546875
def SBL.MinimumConnectivityInference.pretty_print_one_sol | ( | complexes, | |
cost, | |||
edges, | |||
sol_number = None , |
|||
trees = True |
|||
) |
def SBL.MinimumConnectivityInference.pretty_print_solutions | ( | complexes, | |
solutions, | |||
in_filename = 'Unnamed instance' , |
|||
out_filename = None , |
|||
G = None , |
|||
algorithm = None |
|||
) |
Print the results either in the standard output (default) or in the specified file (out_filename).
def SBL.MinimumConnectivityInference.read_complexes_from_file | ( | filename | ) |
Read complexes from input file assuming that the sets are given between a 'BEGIN-OLIGOMERS' tag and a 'END-OLIGOMERS' tag. BEGIN-OLIGOMERS 15 # some number indicating the number of complexe sets Csn1 Csn2 Csn3 Csn4 Csn6 Csn7 Csn8 Csn1 Csn3 Csn4 Csn5 Csn6 Csn7 Csn8 ... Csn4 Csn6 Csn7 Csn3 Csn8 END-OLIGOMERS