fulltext.study @t Gmail

Revisiting the relationship between compositional sequence complexity and periodicity

Paper ID Volume ID Publish Year Pages File Format Full-Text
15528 1421 2008 12 PDF Available
Title
Revisiting the relationship between compositional sequence complexity and periodicity
Abstract

BackgroundGiven a big sequence fragment or a set of functionally related sequences we consider two problems of a sequence analysis associated with the given sequence(s). The first problem is to measure sequence complexity (repetitiveness, compactness) to estimate how informative the set as a whole is. Usually an obtained measure should be compared with an appropriate random background calculated using permutation of the given sequences. We propose a novel and effective approach for background information measurement instead of the usual sequence reshuffling. The second problem is to detect a periodic bias to determine if it is one of the set features. Sequence periodicity, when sometimes one has in mind hidden periodicity, is a very basic genomic property. The sequence period of 3, which is considered to characterize coding sequences, and period 10–11, which may be due to the alternation of hydrophobic and hydrophilic amino acids, DNA curvature, and bendability were discovered and described. Searching for periodical biases brought significant results in the study of sequence-dependent nucleosome positioning: nucleosomal sites carry hidden period of about 10.4 bases.ResultsCalculated differences between genomic sequences and background showed high biological relevancy of the method that we proposed in this study. Our algorithm was applied to a few natural and artificial datasets. We constructed a simple “periodic” dataset by replacement of every tenth dinucleotide in each sequence of a trial set by the same dinucleotide “CC”. We showed that the method reveals the introduced periodicity and that this periodical pattern carries higher information than in uninterrupted subsequences. An application of the method to the nucleosomal dataset revealed a weak pseudo-periodicity of 10.4 nucleotides confirming previous knowledge. An application of the method to Escherichia coli datasets revealed the well-known periodicity of 3 bp as a genic attribute, a secondary genic period slightly larger than 11 bp, and an intergenic period a bit smaller than 11 bp.ConclusionsWe reported a novel compositional complexity-based method for sequence analysis. We found that the difference between the sequence complexity of a natural sequence and of background is especially high for a set consisting exclusively of coding sequences. Hidden periodicities were found with no need of any preliminary assumptions regarding a composition of periodic elements. We illustrated the power of the method by studying the sets with known weak periodic properties: a nucleosomal database and sets of different regions of E. coli. We showed that the method conveniently indicated all kinds of periodicity and related features in these sets of DNA sequences.

Keywords
Information; Hidden periodicity; Nucleosome positioning; Entropy; E. coli
First Page Preview
Revisiting the relationship between compositional sequence complexity and periodicity
Get Full-Text Now
Don't Miss Today's Special Offer
Price was $35.95
You save - $31
Price after discount Only $4.95
100% Money Back Guarantee
Full-text PDF Download
Online Support
Any Questions? feel free to contact us
Publisher
Database: Elsevier - ScienceDirect
Journal: Computational Biology and Chemistry - Volume 32, Issue 1, February 2008, Pages 17–28
Authors
,
Subjects
Physical Sciences and Engineering Chemical Engineering Bioengineering
Get Full-Text Now
Don't Miss Today's Special Offer
Price was $35.95
You save - $31
Price after discount Only $4.95
100% Money Back Guarantee
Full-text PDF Download
Online Support
Any Questions? feel free to contact us