Scientists at Harvard have identified a previously unknown embryonic signal, dubbed Toddler, that instructs cells to move and reorganize themselves, through a process known as gastrulation, into three layers.
Over the last decade, high-throughput sequencing has revealed a plethora of RNA transcripts that do not seem to encode proteins. These long non-coding RNAs (lncRNAs) have emerged as key players in gene regulation. In the past, annotation of lncRNAs has largely relied on the computational identification of transcripts that lack coding features. The recent development of ribosome profiling has provided a high-throughput method of assessing ribosome engagement over the whole transcriptome at near-nucleotide resolution.
Harvard scientists now integrated ribosome profiling and RNA-Seq data from a zebrafish developmental time course using a random forest machine learning approach to distinguish different modes of translation. They first showed that it is possible to discriminate ribosome engagement over transcript leaders and transcript trailers from protein coding open reading frames (ORFs). Using this result they classified lncRNAs into the category that they resembled the most (protein-coding, leader-like, trailer-like). This surprisingly revealed that many lncRNAs were engaged by ribosomes in a similar manner to upstream ORFs (uORFs) in transcript leaders. Unlike canonical proteins, these lncRNAs do not seem to have a single dominant ORF that is translated instead they often contain multiple short ORFs with more dispersed translation.
Several observations suggest that these translated lncRNA-ORFs might generate proteins that are likely to be non-functional. First, their diminutive size (sometimes just a single amino acid) is often incompatible with a functional peptide. Second, they show a distinct lack of conservation and in particular conservation at the amino acid level. Instead, ribosomal engagement of this subclass of lncRNAs may have a regulatory role. Translation can affect the stability of transcripts and/or their localization through a number of pathways like nonsense-mediated RNA decay (NMD) that the cell could utilize to modulate the lncRNA expression levels. Alternatively, spurious translation could potentially be functional over evolutionary time as a source of new proteins. These data also revealed that computational classification should be used with care as some pipelines can result in up to 45% of false positives (proteins). The researchers show that using a combination of homology and evolutionary measures results in a good classification and should be adopted as the standard in the field.