# Batch_manager: Managing multiple runs

## Specifying the Runs of a Batch

This first example shows how to specify one run of an application computing the volume of a molecular structure, and run the batch. In this example, all  options are passed directly to the batch manager.

In [1]:
from SBL.Batch_manager import *
print("Marker : Calculation Started")
batch = BM_Batch()
batch.add_run_specification(BM_Run_specification_tuple("sbl-vorlume-pdb.exe"). \
                            add_option("output-prefix").add_option("log").add_file_option("f", "data/1vfb.pdb"))
batch.run_specifications.run("results", 1)    
print("Marker : Calculation Ended")


Marker : Calculation Started
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-pdb.exe --output-prefix --log -f/home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/data/1vfb.pdb
Marker : Calculation Ended


## Specifying a Batch over a Dataset using one Specification File

In general, it is more convenient to store all options in a specification file.
This example shows how to specify a batch using a run specification file and a dataset, then how to split the batch, and then run the splited batches. The specification file corresponds to the computation of the volume of a molecular structure. The content of the specification file
<font color='green'>batch-vorlume-pdb.spec</font> is shown below :

   
```
#The executable.
EXECUTABLE sbl-vorlume-pdb.exe
#Any .pdb file from the dataset is elibigle for the option -f, as specified by the python regex.
#NB: the dataset name is specified when constructing the batch in the python script.
IFO f "\.pdb$"
#There is a unique IFO, with one execution per value i.e. file.
IFO-ASSOC-RULE unary
#These are the Non File Options for the executable.
NFO log
NFO output-prefix
NFO verbose
NFO no-water
NFO radius 0 1.4
NFO p 4
```


## This specification file is used as follows from python

In [2]:
from SBL.Batch_manager import *
print("Marker : Calculation Started")
batch = BM_Batch()
batch.load_dataset("data", ".*", True)
if batch.load_run_specification("batch-vorlume-pdb.spec"):
    batches = batch.split_per_NFO()
    for b in batches:
        b.run()
print("Marker : Calculation Ended")


Marker : Calculation Started
Loading the run specification from batch-vorlume-pdb.spec
Building one batch per NFO tuple...
Building the batch for sbl-vorlume-pdb.exe...
Building the batch for sbl-vorlume-pdb.exe...
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-pdb.exe -f/home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/data/1vfb.pdb --log --output-prefix --verbose --no-water --radius=0 -p4
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-pdb.exe -f/home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/data/1igt.pdb --log --output-prefix --verbose --no-water --radius=0 -p4
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-pdb.exe -f/home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/data/1urz.pdb --log --output-prefix --verbose --no-water --radius=0 -p4
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-pdb.exe -f/home/fcazals/p

## Coupling Data Generation and Processing with two Specification Files
In this example, Batch_manager is first used to generate data using one program (dumping a collection of balls into a file), and then to process these data using a second program (computing the volume of the union of balls generated).

This pipeline requires two specification files, namely one to generate the input balls, and one to compute the volumes.
The specification file 
<font color='green'>batch-generate-balls-3.spec</font>
to generate balls goes as follows: 

```
#The executable.
EXECUTABLE generate-random-balls-3.py
#There is no Input File Option.
IFO-ASSOC-RULE none
#The S and s parameters are specified by pairs of values: five pairs here.
NFO (number, max-radius) (100, 1) (200, 2) (500, 3) (1000, 4) (10000, 5)
#The centers of the 3D balls are generated in [0,10] x [0,10] x [0,10].
NFO box-size 10

```

The computation of the volumes is specified as follows 
in <font color='green'>batch-vorlume-txt.spec</font>:

```
#The executable.
EXECUTABLE sbl-vorlume-txt.exe
#Any .txt file from the dataset is elibigle for the option -f, as specified by the python regex.
#NB: the dataset name is specified when constructing the batch in the python script.
IFO f "\.txt$"
#There is a unique IFO, with one execution per value i.e. file.
IFO-ASSOC-RULE unary
#These are the Non File Options for the executable.
NFO log
NFO output-prefix
NFO verbose
```

## As before, these two spec files are used by the batch manager:

In [3]:
from SBL.Batch_manager import *
print("Marker : Calculation Started")
batch_data = BM_Batch()
batch_data.load_run_specification("batch-generate-balls-3.spec")
output_dirs = []

for b in  batch_data.split_per_NFO():    
    b.repeat(10)    
    output_dirs.append(b.get_output_directory())

batches = []
for directory in output_dirs:
    batches.append(BM_Batch())    
    batches[-1].load_dataset(directory)
    batches[-1].load_run_specification("batch-vorlume-txt.spec")
    batches[-1].run()
print("Marker : Calculation Ended")


Marker : Calculation Started
Loading the run specification from batch-generate-balls-3.spec
Building one batch per NFO tuple...
Building the batch for generate-random-balls-3.py...
Building the batch for generate-random-balls-3.py...
Building the batch for generate-random-balls-3.py...
Building the batch for generate-random-balls-3.py...
Building the batch for generate-random-balls-3.py...
Running : /home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-balls-3.py --box-size=10 --number=100 --max-radius=1
Running : /home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-balls-3.py --box-size=10 --number=200 --max-radius=2
Running : /home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-balls-3.py --box-size=10 --number=500 --max-radius=3
Running : /home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-balls-3.py --box-size=10 --number=1000 --max-radius=4
Running : 

Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-txt.exe -f/home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-balls-3-number500-max-radius3/balls-3_instance_8.txt --log --output-prefix --verbose
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-txt.exe -f/home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-balls-3-number500-max-radius3/balls-3_instance_3.txt --log --output-prefix --verbose
Loading the run specification from batch-vorlume-txt.spec
Building the batch for sbl-vorlume-txt.exe...
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-txt.exe -f/home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-balls-3-number1000-max-radius4/balls-3_instance_9.txt --log --output-prefix --verbose
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-vorlume-txt.exe -f/home/fcazals/projects/proj-soft/sbl/A

##   Using more complex association rules

In this example, Batch_manager is first used to generate data using one program (dumping a collection of 2D points into a file), and then to process these data using a second program (clustering those points following their proximity in space). Finally, a data comparison is performed (the clusters of points are compared two by two).

If one wants to run several times the clustering with different parameter values, then we obtain several clusters and centers files : we do need to specify how these files are associated. Moreover, if several original points are generated, we need to associate each origin points files to a set of tuples of clusters and centers.

The comparison program requires two input graphs as input, each graph being represented by three input files :

  - the points file specifying the points to be comparedâ€“these are vertices of a graph.

  - the weights file which assigns a weight to each point (the size of the corresponding cluster).

  - the edges file which links points (i.e. clusters). To compare two clustering, we therefore need to compare all possible combinations of those tuples.

This pipeline requires three specification files:

  - specification of the execution of the random 2D points generator
 <font color='green'> batch-generate-points-2.spec</font>
  
  ```
#The executable.
EXECUTABLE generate-random-points-2.py
#There is no Input File Option.
IFO-ASSOC-RULE none
#The S and s parameters are specified by pairs of values: five pairs here.
#NFO (N,d) (10,2) (10,3)
NFO N 1000
NFO d 2 4 6  
  ```
  
  - specification of the execution of the cluster machine
 <font color='green'> batch-cluster.spec</font>
  
  ```
#The executable.
EXECUTABLE sbl-cluster-MTB-euclid.exe
#There is a unique IFO, with one execution per value i.e. file.
IFO-ASSOC-RULE unary
IFO points-file "\.txt$"
NFO num-neighbors 4
NFO persistence-threshold 0.1
# output files with proper prefix
NFO o  
  ```
  
  - specification of the execution of the comparison of the clusters
  <font color='green'>batch-compare-clusters.spec</font>
  
  ```
#The executable.
EXECUTABLE sbl-emd-graph-euclid.exe
#We which to make all combinations of pairs of the next IFO
IFO-ASSOC-RULE combinatorial 2
IFO (vertex-points-file, vertex-weights-file, edges-file) "N\d+-d\d+.*persistence_0dot\d+" ("points\.txt", "weights\.txt", "edges\.txt")
NFO v
NFO u
NFO with-connectivity-constraints 1 0
  
  ```

In [4]:
from SBL.Batch_manager import *
print("Marker : Calculation Started")
batch = BM_Batch()
batch.load_run_specification("batch-generate-points-2.spec")
batches_d = batch.split_per_selected_option("N")
odirs_data = []
for b in batches_d:
    b.run()
    odirs_data.append(b.get_output_directory())
odirs_clust = []
for directory in odirs_data:
    batch = BM_Batch()
    batch.load_dataset(directory, ".*.txt", False)
    batch.load_run_specification("batch-cluster.spec")
    batch.run()
    odirs_clust.append(batch.get_output_directory())
for directory in odirs_clust:
    batch = BM_Batch()
    batch.load_dataset(directory, ".*.txt", False)
    batch.load_run_specification("batch-compare-clusters.spec")
    batch.run()
print("Marker : Calculation Ended")


Marker : Calculation Started
Loading the run specification from batch-generate-points-2.spec
Building one batch per input option...
Building the batch for generate-random-points-2.py...
Running : /home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-points-2.py -N1000 -d2
Running : /home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-points-2.py -N1000 -d4
Running : /home/fcazals/projects/proj-soft/sbl/Applications/Batch_manager/demos/generate-random-points-2.py -N1000 -d6
Loading the run specification from batch-cluster.spec
Building the batch for sbl-cluster-MTB-euclid.exe...
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-cluster-MTB-euclid.exe --num-neighbors=4 --persistence-threshold=0.1 -o
Loading the run specification from batch-compare-clusters.spec
Building the batch for sbl-emd-graph-euclid.exe...
Running : /user/fcazals/home/projects/proj-soft/sbl-install/bin/sbl-emd-graph-euclid.exe -v -u -