Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
User Manual

Authors: F. Cazals and L. Goldenberg

Random_generation

Goals

This package provides elementary functions to generate mixtures of points in a fixed dimensional Euclidean space.

It provides individual generators generating numpy ndarrays, which can be combined (concatenated) to define mixtures.

Individual generators


Randgen_mixture_of_gaussian_generic_d. Consider a general d-dimensional Gaussian distribution $X\sim \normalD{\mu, \Sigma}$, with covariance matrix (symmetric positive definite) $\Sigma$. Matrix $\Sigma$ can be decomposed as $\Sigma = U\Lambda \transpose{U}$.

Let $Z\sim \normalD{0, I_d}$ be an isotropic normal distribution centered at the origin.

One has $X = B*Z+\mu$, with $B = U * \Lambda^{1/2}$.

The previous equation is used to draw samples from such a distribution $X$, given $\mu$ and $\Sigma$.

This class samples from d-dimensional gaussian distributions in this manner.


Randgen_mixture_of_gaussian_axis_aligned_d. This class samples from d-dimensional Gaussian distributions, defined from their centers and their vectors of variances. (NB: no covariances in this version.)

Based on the np.random.multivariate_normal numpy class.


Class Randgen_mixture_of_uniform_in_box_d. A d-dimensional box is defined by its center and its width, along each direction. For example, the following defines a 3D box reducing to a square centered on the origin in 3D, since the width along the third direction is null:

center=[0,0,0],  width=[1,1,0]

The class SBL::Randgen_mixture_of_uniform_in_box_d generates points uniformly at random in such boxes, based on the numpy random uniform numpy class.

Note that the interface is via two np.ndarrays (one for centers, one for widths), in the same Euclidean space (that is the dimension of these arrays must be the same).

To generate mixtures when the dimensions vary, see Randgen_affine_mixture_staircase and Randgen_mixture.

The main parameters of the constructor are the centers and the associated widths.


Class Randgen_cross_d. In the d-dimensional Euclidean space, let $I_i(c, diam)$ be the line segment of length $2*diam$ centered at point $c$, and aligned on the i-th coordinate axis. This class generates points at random on the union of segments $  \cup_{i=1,\dots,d} I_i(c, diam)$.

It relies on Random_generator_uniform_in_box_d, each segment being represented as a box of null thickness except in one direction.

The main parameters of the constructor are the center and the span–equal along each axis.


Class Randgen_affine_mixture_staircase. Consider an affine mixture with $k$ components, each defined by a box of a given dimension–see above. Denote $D=d_1+\dots+d_k$ the sum of dimensions of all mixtures.

This class samples each component in its proper dimension $d_i$, but embeds all points in dimension $D$.

General mixtures


Class Randgen_mixture_component. A class storing the main parameters of a mixture component (type, center, width, noise level, etc).


Class Randgen_mixture. A class to generate data points using (an unweighted) mixture of components. Reads a file to load the parameters of these individual components.

The following keywords are sought to generate points accordingly–see example below:

boxd, crossd, gaussiand-aa gaussiand

If the dimension of the individual components is not the same, let $D = \max_i d_i$ be the largest value amongst all components. In that case, for a component whose dimension is $d_i$, the samples are embedded into $\Rd{D}$ by padding zeros for the missing $D-d_i$ directions.

Example mixture specification in 3D:

# segment
model=boxd center=0,3,0 width=0,0,1  n=100
# square
model=boxd center=0,0 width=1,1  n=1000
# 3D box
model=boxd center=3,0,0 width=1,1,2  n=100
# cross
model=crossd center=-3,-3,-3 diam=10 n=100
# 2D Gaussian
model=gaussiand-aa center=-5,-5 variances=1,1 n=500
# 3D Gaussian
model=gaussiand-aa center=5,5,5 variances=2,2,20 n=500

To generate the point cloud, run

sbl-random-generators.py -s mixture-spec-file.txt

An example mixture with 6 components Nature of components: boxes of dimension 1,2,3; one cross of three segment; one 2D Gaussian, one 3D Gaussian.

External dependencies

This package is developed using numpy.

Suggestions to visualize the point clouds generated are: