Comparative analysis of core promoter region: Information content from mono and dinucleotide substitution matrices
We have studied the core promoter region in five sets of promoter sequences by calculating the average mutual information content H (relative entropy). We have used specially constructed substitution matrices to calculate mono and dinucleotide replacements in a given block of aligned sequences. These substitution matrices use log-odds form of scores, which are in bits of information. Here, we constructed and applied nucleotide substitution matrices for the core promoter region to calculate the information content to study the Transcription Start Site (TSS), TATA-box and downstream regions. As expected, the information content decreases with increasing block size. This clearly implies that the TSS region is likely to be 5–10 bases in size (length). We also notice that both in the case of mouse and humans, both TATA-boxes and TSS regions are likely to play important roles in proper transcriptional initiation.
Journal: Computational Biology and Chemistry - Volume 30, Issue 1, February 2006, Pages 58–62