Structural Bioinformatics Library
Template C++ / Python API for developping structural bioinformatics applications.
|
Authors: R. Tetley and F. Cazals
When processing large amounts of protein sequences, having indicators on certain sites of interest, such as transmembrane parts or binding sites can be interesting. As an example, one could want to find proteins wich contain a transmembrane region that is at least 30 amino-acids long. Such is the purpose of sequence annotations, allowing users to search for certain characteristics in annotated sequence. We provide a set of two Python modules: one which defines annotated sequences and provides some standard annotators and one to filter a set of protein sequences using properties on these annotations.
As an example, the EFF1 protein (http://www.uniprot.org/uniprot/G5ECA1) contains:
The Python module SBL/Sequence_annotators.py provides the SBL::Sequence_annotators::Annotated_sequence class. Such an object should be initialized with a name as well as a fasta sequence. An annotator should then be used to add sequence annotations.
phoebius is a combined transmembrane topology and signal peptide prediction method [104] , based upon profile Hidden Markov models.
The Python module SBL/Sequence_annotators.py , provides the class SBL::Sequence_annotators::Phobius_annotator which uses the executable to annotate a sequence.
An annotator has a single function, annotate, which takes an annotated sequence object in argument, and produces annotations by exploiting its fasta sequence
For example, the phobius annotator, when given an annotated sequence object, will write its fasta sequence to a file, run the exectuable, and parse the results. These results will be used to annotate the sequence.
Through the SBL/Sequence_filters.py module, we provide a set of filters which allow to filter a set of annotated sequences by using criterions on their annotations.
A sequence filter object is a functor containing the filter member function, which takes an annotated sequence as argument and returns true if the given sequence follows the defined restrictions.
For example, the SBL::Sequence_filters::Transmembrane_filter class will simply look for a "Transmembrane" feature in the annotated sequence.
In this example, we parse a fasta file and annotate each listed fasta sequence using the annotator.
We then use the SBL::Sequence_filters::Class_II_filter class to assess wether the given sequence is a viable class II fusion protein candidate.