Identification of specific sequence motifs in the upstream region of 242 human miRNA genes
We have identified novel over-represented and conserved motifs in the upstream regions of human and mouse miRNA stem-loop sequences by means of a new bioinformatic processing regimen. We observed sequence conservation −500 bp upstream in 189 human and mouse miRNAs declining with increasing distance from their putative miRNA stem-loop origin. We also found relatively GC-rich regions having more than 50% of guanine + cytosine (G + C) content at about −30 and −170 bp relative to human miRNA stem-loop sequence origin. To further identify specific sequence motifs that might be involved in the transcriptional regulation of miRNA precursors, we first searched 500 bp upstream sequences of 194 non-redundant human miRNA stem-loop sequences for frequently occurring motifs 5–15 bp long. We then found the comparable occurrences of the 20 most frequent motifs in the 2000 bp upstream regions of 242 human and 290 mouse miRNAs. The significantly reduced frequency of occurrence of all 20 motifs in the regions 2000 bp upstream of 23,570 human RefSeq genes demonstrated that these motifs were specific to the upstream miRNA sequences. The most frequently observed motif M1 (GTGCTTMTAGTGCAG), with a MEME E-value of 3.8e−57 was distributed within 500 bp upstream of stem-loop sequences and was also miRNA-specific. We suggest that these over-represented motif sites are good candidates for experimentally testing miRNA expression as well as possible interaction with regulatory factors.
Journal: Computational Biology and Chemistry - Volume 31, Issue 3, June 2007, Pages 207–214