![](//upload.wikimedia.org/wikipedia/commons/thumb/7/7e/Essay.svg/50px-Essay.svg.png) | Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review! |
Probalign is a sequence alignment tool that calculates a maximum expected accuracy alignment using partition function posterior probabilities.[1] Base pair probabilities are estimated using an estimate similar to Boltzmann distribution. The partition function is calculated using a dynamic programming approach.
Algorithm[edit]
The following describes the algorithm used by probalign to determine the base pair probabilities.[2]
Alignment score[edit]
To score an alignment of two sequences two things are needed:
- a similarity function
(e.g. PAM, BLOSUM,...)
- affine gap penalty:
![{\displaystyle g(k)=\alpha +\beta k}](https://wikimedia.org/api/rest_v1/media/math/render/svg/28908bc6f7a06a7f851ced05be76f3a6ebc7b562)
The score
of an alignment a is defined as:
Now the boltzmann weighted score of an alignment a is:
Where
is a scaling factor.
The probability of an alignment assuming boltzmann distribution is given by
Where
is the partition function, i.e. the sum of the boltzmann weights of all alignments.
Dynamic Programming[edit]
Let
denote the partition function of the prefixes
and
. Three different cases are considered:
the partition function of all alignments of the two prefixes that end in a match.
the partition function of all alignments of the two prefixes that end in an insertion
.
the partition function of all alignments of the two prefixes that end in a deletion
.
Then we have:
Initialization[edit]
The matrixes are initialized as follows:
![{\displaystyle Z_{0,j}^{M}=Z_{i,0}^{M}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/8f3c321550e83f42ba62ff136d6382326cd0f451)
![{\displaystyle Z_{0,0}^{M}=1}](https://wikimedia.org/api/rest_v1/media/math/render/svg/2bfd47b443a0fc928f65cf05d8459b421044a08a)
![{\displaystyle Z_{0,j}^{D}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/e4d5e861888485e6b74be711077001913b4dc664)
![{\displaystyle Z_{i,0}^{I}=0}](https://wikimedia.org/api/rest_v1/media/math/render/svg/82b4c39591c2004933fdaf23aa271128a5b87e5b)
Recursion[edit]
The partition function for the alignments of two sequences
and
is given by
, which can be recursively computed:
![{\displaystyle Z_{i,j}^{M}=Z_{i-1,j-1}+\sigma (x_{i},y_{j})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/911a99fb6dea33d8b8f3a034fd53575acad415f9)
![{\displaystyle Z_{i,j}^{D}=Z_{i-1,j}^{D}\cdot e^{\frac {\beta }{T}}+Z_{i-1,j}^{M}\cdot e^{\frac {g(1)}{T}}+Z_{i-1,j}^{I}+e^{\frac {g(1)}{T}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f68e7740cdd150cd8406c369d97133a5658eb593)
analogously
Base pair probability[edit]
Finally the probability that positions
and
form a base pair is given by:
References[edit]
- ^ U. Roshan and D. R. Livesay, Probalign: multiple sequence alignment using partition function posterior probabilities, Bioinformatics, 22(22):2715-21, 2006 (PDF)
- ^ Lecture "Bioinformatics II" at University of Freiburg [1]
See also[edit]
External Links[edit]
Probalign Webservice