The human cytomegalovirus (HCMV) genome was sequenced 20 years ago. However, like those of other complex viruses, our understanding of its protein coding potential is far from complete. A group around Jonathan Weissman, from the University of California, San Francisco, used ribosome profiling and transcript analysis to experimentally define the HCMV translation products and follow their temporal expression. They identified hundreds of previously unidentified open reading frames and confirmed a fraction by means of mass spectrometry. They also found that regulated use of alternative transcript start sites plays a broad role in enabling tight temporal control of HCMV protein expression and allowing multiple distinct polypeptides to be generated from a single genomic locus. These results reveal an unanticipated complexity to the HCMV coding capacity and illustrate the role of regulated changes in transcript start sites in generating this complexity.
The human cytomegalovirus, which infects nearly every human and can cause disease in immunocompromised adults as well as birth defects in newborns. Its genome is about 240 kb large and was thought to have an estimated number of ORFs ranging from 165 to 252. "The genome of a virus is just a starting point," said Jonathan Weissman in a statement. "Understanding what proteins are encoded by that genome allows us to start thinking about what the virus does and how we can interfere with it."
To study ORFs in HCMV over time, Weissman and his colleagues infected human foreskin fibroblasts cells with the virus and took samples of the infected cells after five hours, 24 hours, and 72 hours.
In addition, the researchers treated the cells to gauge ribosomal positioning: cycloheximide, a translation elongation inhibitor, was applied to the cells to examine the overall distribution of ribosomes; harrington and lactimidomycin were applied to encourage ribosomes to accumulate at transcription start sites rather than over the length of the message; and others were not treated.
This approach allowed the researchers to determine how genes are arranged in HCMV. For example, in the UL25 ORF, the researchers found one transcriptional start site upstream of the ORF. In the harrington- and lactimidomycin-treated cells, the ribosomes marked a single initiation site at the first start codon downstream of the start site, and in cycloheximide-treated and untreated cells, the density of the ribosomes accumulated near the first in-frame stop codon.
Using such ribosomal footprints as a guide, the researchers identified hundreds of new ORFs: ORFs within known ORFs, out-of-frame ORFs, upstream ORFS, and ORFs starting at near-cognate start codons, with CUG rather than AUG, for example. The researchers also annotated splice junctions and used data from harrington-treated cells, where ribosomes accumulate at start sites, to develop a support vector machine-based machine learning strategy to uncover even more ORFs. In doing so, they identified an additional 53 possible ORFs.
All in all, Weissman and his colleagues reported that they found 751 translated ORFS in HCMV. Of those, 147 were previously thought to be coding. Many of the newly uncovered ORFs are very short — often less than 100 nucleotides in length with many even smaller than 20 nucleotides in length — and are usually found upstream of larger ORFs.
Using high-resolution tandem mass spectrometry, the researchers were able to confirm a number of proteins that originated from the newly identified ORFs. The researchers also noted that viral genes, including the newly found ORFs, are tightly controlled over time, and that the use of 5' ends is "critical to the tight temporal regulation of viral genes expression and production of alternate protein products during infection," the study authors wrote.
"Our work yields a framework for studying HCMV by establishing the viral proteome and its temporal regulation, providing a context for mutational studies, and revealing the full range of HCMV functional and antigenic potential," they added.