Multiple alignment

12/26/2023

For measuring the degree of conservation, we use a type of Z-score that is based on profile analysis. We introduce a statistical alignment quality score which first quantifies the degree of conservation at each alignment position and then counts the number of significantly conserved positions over the alignment. The MOS searches for identically aligned regions in many alignments and presumes that the alignment with the highest number of such residues also has the highest quality. The recently introduced multiple overlap score (MOS) is a promising approach, which does not need a reference alignment. The APDB (Analyze alignments with PDB) quality measure evaluates the quality of an alignment by using available tertiary structures of the sequences in the alignment. Several modifications have been made to the SP score. The SP score calculates the proportion of identically aligned residue pairs in the test and the reference alignments, whereas the CS score measures the fraction of identically aligned positions. These scores can, however, only be used if a reference alignment of the same sequences is available. Two popular measures for scoring entire multiple alignments are the sum of pairs (SP) score and the column score (CS). Therefore, there is a need for an objective approach to evaluate the alignments produced by alignment programs.

It has been recognized that the automatic construction of a multiple sequence alignment for a set of remotely related sequences can be a very demanding task. The results of annotation of gene/protein sequences, prediction of protein structures or building of phylogenetic trees, for instance, are critically dependent on the quality of the given alignment. Multiple sequence alignment has become an essential and widely used tool for understanding the structure and function of these molecules. The results indicate that the proposed statistical score is useful in assessing the quality of multiple sequence alignments.Ī wealth of molecular data concerning the linear structure of proteins and nucleic acids is available in the form of DNA, RNA and protein sequences. The novel alignment quality score provides similar results than the sum of pairs method. According to these results, the Mafft strategy L-INS-i outperforms the other methods, although the difference between the Probcons, TCoffee and Muscle is mostly insignificant. Secondly, we evaluate the quality of the alignments produced by several widely used multiple sequence alignment programs using a novel alignment quality score and a commonly used sum of pairs method. The results for the Src homology 2 (SH2) domain, Ras-like proteins, peptidase M13, subtilase and β-lactamase families demonstrate that the score can distinguish sequence patterns with different degrees of conservation. We first evaluate a novel objective function used in the alignment quality score for measuring the positional conservation. The quality assessment is based on counting the number of significantly conserved positions in the alignment using importance sampling method in conjunction with statistical profile analysis framework. To address the need for an objective evaluation framework, we introduce a statistical score that assesses the quality of a given multiple sequence alignment. Although the automatic construction of a multiple sequence alignment for a set of remotely related sequences cause a very challenging and error-prone task, many downstream analyses still rely heavily on the accuracy of the alignments.

Multiple sequence alignment is the foundation of many important applications in bioinformatics that aim at detecting functionally important regions, predicting protein structures, building phylogenetic trees etc.

0 Comments

Multiple alignment

Leave a Reply.

Author

Archives

Categories