Second-generation sequencing (SGS) has become the preferred method for RNA transcriptome profiling of organisms and single cells. However, SGS analysis of transcriptome diversity (including protein-coding transcripts and regulatory...
Second-generation sequencing (SGS) has revolutionized whole genome sequencing and transcriptome analysis (1-5). In particular, sequencing of cDNA synthesized from intracellular total RNA (RNA-seq) enables RNA expression profiling with high dynamic range and genome coverage. RNA-seq has led to discoveries of novel alternative RNA splicing in various eukaryotic cells types and expanded our knowledge of regulatory non-coding RNA transcripts (6-8). The primary component of both eukaryotic and prokaryotic total RNA is ribosomal RNA (rRNA) with all other coding, noncoding, and small RNAs representing less than 15% of the total RNA population (9). The abundance of rRNA-derived sequences in cDNA libraries diminishes the utility of RNA-seq for functional genomics studies because only a small fraction of reads are from sequences of interest. In this context, RNA-seq library preparation techniques that efficiently remove highly abundant rRNA-derived sequence populations and enrich for non-ribosomal RNAs prior to SGS are highly desirable.
A common method for excluding rRNA is to capture RNA species that contain polyadenylated tails. This approach is highly effective in removing rRNA but also depletes all non-polyadenylated host transcripts, including non-coding RNAs that regulate eukaryotic cellular function, as well as both viral and prokaryotic microbial sequences present in many complex sample types (10). Another common method for excluding rRNA is to selectively remove the ribosomal RNA prior to generating a cDNA library for SGS. These rRNA depletion protocols utilize antisense rRNA probes specifically designed to capture human/mouse/rat or gram positive/gram negative bacterial rRNA transcripts from high-quality total RNA samples. This technique is a multi-step procedure that requires large amounts of starting material (250 ng to 10 µg of total RNA) and has been shown to be less effective on degraded RNA samples. Commercially available rRNA depletion kits such as RiboMinus, and Ribo-Zero are effective in removing highly abundant rRNA species from eukaryotic and prokaryotic total RNA, but are costly and the rRNA capture probes are species-specific (11-14).
An alternative to depleting rRNA sequences prior to cDNA library synthesis is to apply cDNA normalization (also called Cot filtration) approaches that remove highly abundant sequences from cDNA libraries (15, 16). In normalization, double-stranded DNA (dsDNA) populations are first denatured and then allowed to re-anneal at an elevated temperature. Highly abundant sequences hybridize at higher rates (proportional to the square of their concentration) and, if the re-annealing reaction is stopped at a suitable time point (e.g., 4–24 h), these will comprise the majority of double-stranded species (17). If double-stranded and single-stranded cDNA can then be separated, representation of the highest abundance species in the resulting ss fraction can be significantly reduced. The two common approaches for separating ss-cDNA and ds-cDNA populations include enzymatic digestion of ds-cDNA using a duplex specific nuclease (DSN) (18, 19) and physical separation of ds-cDNA from ss-cDNA through methods such as hydroxyapatite chromatography (HAC) (20-24).
Here we describe a micro-column based HAC approach for normalization using convenient re-packable cartridges that is rapid, reproducible, and amenable to future automated sample preparation platforms (25-27). We present a comparison of our microcolumn HAC-based method with a commercial rRNA-depletion kit, Ribo-Zero, and a DSN normalization kit for normalizing SGS libraries prepared from Escherichia coli K-12 or human peripheral blood mononuclear cell (PBMC) total RNA, respectively. Sequencing of RNA-seq cDNA libraries followed by alignment to either the E. coli K-12 or human (hg19) genome was used to measure rRNA abundance, non-rRNA transcript enrichment, and in the case of E. coli K-12, coverage across the entire bacterial transcriptome. Microcolumn HAC-based normalization proved to be an effective, cost saving alternative to commercial Ribo-Zero and DSN normalization kits, and the first step toward a fully automated system incorporating HAC normalization into RNA-seq cDNA library preparation workflows.