Blocking Abundant RNA Transcripts by High-Affinity Oligonucleotides during Transcriptome Library Preparation
Biological Procedures Online volume 25, Article number: 7 (2023)
RNA sequencing has become the gold standard for transcriptome analysis but has an inherent limitation of challenging quantification of low-abundant transcripts. In contrast to microarray technology, RNA sequencing reads are proportionally divided in function of transcript abundance. Therefore, low-abundant RNAs compete against highly abundant - and sometimes non-informative - RNA species.
We developed an easy-to-use strategy based on high-affinity RNA-binding oligonucleotides to block reverse transcription and PCR amplification of specific RNA transcripts, thereby substantially reducing their abundance in the final sequencing library. To demonstrate the broad application potential of our method, we applied it to different transcripts and library preparation strategies, including YRNAs in small RNA sequencing of human blood plasma, mitochondrial rRNAs in both 3′ end sequencing and long-read sequencing, and MALAT1 in single-cell 3′ end sequencing. We demonstrate that the blocking strategy is highly efficient, reproducible, specific, and generally results in better transcriptome coverage and complexity.
Our method does not require modifications of the library preparation procedure apart from simply adding blocking oligonucleotides to the RT reaction and can thus be easily integrated into virtually any RNA sequencing library preparation protocol.
RNA sequencing has become the gold standard for transcriptome characterization. Numerous RNA sequencing library preparation procedures have been developed to quantify various RNA biotypes, including amongst others polyA+ RNA sequencing, total RNA sequencing, 3′ end RNA sequencing and small RNA sequencing. Regardless of the library preparation method, RNA sequencing reads are distributed across RNA transcripts proportionally to their abundance. Consequently, highly abundant RNA species, often deemed non-informative, can dominate the RNA sequencing library and hamper the detection of lower abundant transcripts. A well-known example is ribosomal RNA (rRNA), which typically accounts for more than 80% of all RNA transcripts  in cellular or tissue RNA. Another well-documented example is RNY4 fragments. YRNAs are non-coding, evolutionary conserved RNA species with a length of 80–110 nucleotides. Four human YRNAs are known: hY1, hY3, hY4, and hY5 . YRNAs are readily fragmented in cells undergoing apoptosis in a caspase-dependent, Dicer-independent manner [3, 4]. The resulting fragments reside in cultured cells , solid tumors , and multiple biofluids [6,7,8]. More specifically, a 30–33 nucleotide 5′-end hY4 fragment is abundantly present in human blood plasma, serum, and saliva potentially serving a physiological function . In small RNA sequencing libraries of serum or plasma RNA, this fragment can account for more than 30% of all reads [10, 11] and even up to 70% in platelet-rich blood plasma. Superfluous amounts of hY4 fragments negatively impact the library complexity, requiring deeper sequencing to retrieve information about the other small RNA species in the library.
Removing sequence fragments derived from these excessively abundant transcripts from a sequencing library is instrumental in obtaining sufficient coverage of the informative fraction of the transcriptome without having to sequence libraries to extreme depth with diminishing returns. Several workarounds have been proposed to tackle this problem. The concentration of rRNA is reduced using many different strategies: subtractive pull-down [1, 12,13,14] (as in the old Ribo-Zero Gold Kit, the Ambion MICROBExpress Bacterial mRNA Enrichment kit and the Life Technologies RiboMinus Transcriptome Isolation Kit), gel excision , probe-directed RNase H digestion [16,17,18] (as in the new Ribo-Zero Gold Kit), Cas9-directed cDNA digestion (also named DASH) [19, 20], not-so-random primers [21, 22], duplex-specific nuclease (DSN) depletion [23, 24], Probe-Directed Degradation (PDD) [25, 26], rRNA poly(A) clipping  and EMBR-seq . These methods can, in principle, be applied for any other unwanted sequence. Instead of removing abundant transcripts, specific transcripts of interest can also be enriched using biotinylated probes, magnetic bead-linked probes, or capture arrays [29,30,31,32,33,34]. Alternatively, methods like 3′ end sequencing apply poly(A)-priming to convert polyadenylated RNAs to cDNA for further library preparation. Several studies have compared the performance of some of these depletion methods and pointed toward discrepancies in efficiency and specificity [35,36,37,38].
Methods developed for hybridization based small RNA depletion are often labor-intensive and result in loss of material by washing steps . CRISPR-based technologies generally include PCR and multiple washes , making the protocol significantly longer and more prone to material loss. Likewise, pull-down methods also require several washing steps and tend to perform inconsistently . Additionally, their efficiency drops significantly when applied to fragmented RNA in e.g., biofluids or formalin fixed tissues . All current technologies require the implementation of multiple steps and substantially increase the hands-on time and compromise the repeatability of the library preparation.
Researchers frequently use oligonucleotides containing modified nucleic acids due to their increased melting temperature, high binding specificity, or stability . One example is locked nucleic acid (LNA), which contains an oxymethylene bridge between the 2′ oxygen and 4′ carbon molecules. This “locked” structure provides higher affinity and mismatch discrimination. Because of this, LNAs have been used in multiple applications [42,43,44]. Interestingly, LNA oligonucleotides have been used to block the PCR amplification of unspliced transcripts (by targeting the intronic sequence)  or wild-type transcripts when the mutated version is of interest [46,47,48]. A patent describing the use of LNA oligonucleotides to block reverse transcription and amplification of hemoglobin mRNA from whole blood during RT-qPCR  further exemplifies their potential and applicability for depletion purposes. However, this idea has never been extensively evaluated and applied to massively parallel sequencing techniques, such as Oxford Nanopore Sequencing or single-cell RNA sequencing. Importantly, implementation of an LNA-based reverse transcription blocking step would require only one extra pipetting step during library preparation.
Here, we describe an easy-to-implement method using LNA-modified oligonucleotides that bind unwanted RNA transcripts and block their reverse transcription and PCR amplification during RNA sequencing library preparation. We applied our method to different abundant RNA species and RNA sequencing library preparation strategies, including small RNA sequencing, 3′ end sequencing, long-read sequencing, and single-cell 3′ end sequencing. We demonstrate that the applied method, which requires only one additional step in the library prep procedure, is highly efficient and does not affect quantification of untargeted genes.
Material and Methods
YRNA Blocking in Human Blood Plasma Samples
Samples and Sample Collection
For the healthy donor experiments, we drew venous blood from an elbow vein of two healthy donors in three EDTA tubes (BD Vacutainer Hemogard Closure Plastic K2-Edta Tube, 10 ml, #367525) using the BD Vacutainer Push blood collection set (21G needle). We collected the blood samples according to the Ethical Committee of Ghent University Hospital approval EC/2017/1207, following the ICH Good Clinical Practice rules, and obtained written informed consents from all donors. We inverted the tubes 5 times and centrifuged within 15 minutes after blood draw (400 g, 20 minutes, room temperature, without brake). Per donor, we pipetted the upper plasma fraction (leaving approximately 0.5 cm plasma above the buffy coat) and pooled in a 15 ml tube. After gently inverting, five aliquots of 220 μl platelet-rich plasma (PRP) were snap-frozen in 1.5 ml LoBind tubes (Eppendorf Protein LoBind microcentrifuge tubes Z666548 - DNA/RNA) in liquid nitrogen and stored at − 80 °C. We centrifuged the remaining plasma (800 g, 10 minutes, room temperature, without brake) and transferred to a new 15 ml tube, leaving approximately 0.5 cm plasma above the separation. Next, we centrifuged this plasma a 3rd time (2500 g, 15 minutes, room temperature, without brake), and transferred it to a 15 ml tube, leaving approximately 0.5 cm above the separation. The resulting platelet-free plasma (PFP) was gently inverted, snap-frozen in five aliquots of 220 μl and stored at − 80 °C. The entire plasma preparation protocol took less than 2 h. We isolated RNA from 200 μl PRP or PFP. For the spike-in RNA titration experiment, the protocol was identical, except for the fact 4 EDTA tubes of 10 ml were used and that the second centrifugation step was different (1500 g, 15 minutes, room temperature, without brake).
For the cancer patient experiment, plasma samples are acquired from ProteoGenex (Inglewood, United States of America) under EC/2017/1515 from Ghent University Hospital. Blood was collected in EDTA vacutainer tubes. After inversion (10 times), we centrifuged the vacutainer tubes at 4 °C for 10 minutes at 1500 g without brakes. The plasma is then transferred into a 15 mL centrifuge tube and centrifuged for a second time for 10 minutes at 1500 g. Finally, the plasma was transferred into cryovials and stored at − 80 °C until shipment. The cancer types included are colorectal cancer (CRC), lung adenocarcinoma (LUAD), and prostate cancer (PRAD).
RNA Isolation and Spike-in Controls
Total RNA was isolated from platelet-free (PFP) and platelet-rich plasma (PRP) using the miRNeasy Serum/Plasma Kit (Qiagen, Hilden, Germany, 217,184). We used 200 μl of plasma as input. For the cancer patient experiment, 2 μl of 1x RC PFP spikes were added to the plasma during isolation. The elution volume was 14 μl, and we added 2 μl of 1x LP PFP spikes (Thermo Fisher). Detailed descriptions of the spike-in controls can be found in the exRNAQC study . From this total volume, we used 5 μl for the library preparation. For the healthy donor experiment, the eluate of multiple parallel extractions was pooled according to the original biofluid type (PRP or PFP) and split into six aliquots of 5 μl to minimize extraction bias. We did not include a gDNA removal step after RNA isolation. The input is volume-based since the RNA concentrations of PFP and PRP are below the limit of quantification.
YRNA LNA Design
The YNRA4 fragment (32 nucleotides) was tiled with 16 bp long complementary nucleotides resulting in 17 possible designs. We mapped the full set of antisense oligonucleotides to the human transcriptome (Ensembl v84) and miRBase. Oligonucleotides with no off-targets when 3 mismatches are allowed were retained. Of the retained LNAs, we chose the oligonucleotide with the highest melting temperature (Tm). The resulting fully modified LNA (ACCCACTACCATCGGA, targeting TCCGATGGTAGTGGGT) has a Tm of 89.9 °C. In addition to the fully LNA-modified oligo, for the same sequence we ordered 2′-O-methyl and 2′-methoxy-ethoxy modified nucleotides and half modified (alternating modified – non-modified nucleotides) oligos at Integrated DNA Technologies. Sequences are available in Supplemental Table 1.
TruSeq Small RNA Library Prep
We used the TruSeq small RNA library prep sequencing kit (Illumina, San Diego, CA, USA) for library preparation according to manufacturing instructions, except for the changes listed below. After adaptor ligation and before the reverse transcription step, 2 μl LNA with a concentration of 0.25 μM (LNA1x) or 2.5 μM (LNA10x) was added to 14 μl of the adaptor-ligated RNA. In the experiments with the cancer patient samples and alternative modifications, only the 0.25 μM (LNA1x) concentration was analyzed as we showed that the 10-fold higher concentration had no added value. As a negative control for LNA blocking (LNA0x), 2 μl of water was added to 14 μl of RNA. Next, we used 6 μl of each sample to start the reverse transcription and continue the library prep. Since the input amounts are low, the number of PCR cycles was set at 16 (the manufacturer recommends 11) during the final PCR step.
Pippin Prep and Sequencing
We performed a size selection for 125–163 bp on all libraries using 3% agarose dye-free marker H cassettes on a Pippin Prep (Sage Science, Beverly, MA, USA). Next, the libraries were purified by precipitation using ethanol and resuspended with 10 mM Tris-HCl buffer (pH 8.0) with Tween 20. After dilution, the libraries were quantified using the KAPA Library Quantification Kit (Roche Diagnostics, Diegem, Belgium, KK4854). Healthy donor samples were sequenced using a NextSeq 500 using the NextSeq 500 High Output Kit v2.5 (75 cycles) (Illumina, San Diego, CA, USA). We loaded the library at a concentration of 2.0 pM with 10% PhiX and obtained a total of 268 M reads. We loaded the cancer patient samples on one lane of a NovaSeq 6000 (Illumina, San Diego, CA, USA) instrument at a concentration of 300 pM with 10% PhiX using the NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles) (Illumina, San Diego, CA, USA) (paired-end, 2 × 50 cycles, only the first read was used for subsequent analysis), resulting in 267 M reads. For the chemical modification comparison experiment, we used one lane of a NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles) (Illumina, San Diego, CA, USA, 20028401) (Illumina, San Diego, CA, USA) (1 × 100 bp), loading 300 pM with 10% PhiX, resulting in a total of 548 M reads.
We used a dedicated in-house small RNA-seq pipeline for the quantification of small RNAs. This pipeline starts with adaptor trimming using Cutadapt (v1.8.1) , which discards reads shorter than 15 nt, and those in which no adaptor was found. The reads with a low quality are discarded by using the FASTX-Toolkit (v0.0.14)  set at a minimum quality score of 20 in at least 80% of nucleotides. Next, we counted and filtered out reads belonging to our spike-in controls (both RC as LP). The spike reads are subtracted from the FASTA files, and reads are counted. For this comparison, the spike-in controls were not used for correction since the library preparation methods (adding LNA or not) differ. The spike-ins are, however, needed to correct for input concentration variation when all other parameters are equal, as the pooling is performed based on volume. Subsequently, we mapped the reads with Bowtie (v1.1.2) , allowing one mismatch. At the end of the pipeline, the mapped reads are annotated by matching the genomic coordinates of each read with genomic locations of miRNAs (obtained from miRBase, v20) and other small RNAs (obtained from UCSC GRCh37/hg19 and Ensembl v84). We submitted the original FASTQ-files and the count tables in EGA (EGAS00001006023). The samples are downsampled to the sequencing depth of the sample with the least number of reads per experiment, or respectively 13 M reads (concentration experiment), 6.5 M reads (modification experiment), and 7 M reads (cancer experiment).
We used R (v3.6.0)  for further data processing, using the following packages: tidyverse (v1.2.1) , biomaRt (v2.40.4) [56, 57], broom (v0.5.2) . For differential expression analysis limma-voom (v3.40.6)  was used on a filtered matrix with at least 10 reads per million (RPM) per miRNA over all samples.
Mitochondrial Ribosomal RNA Blocking in Cell Lysates
Cell Culture and RNA Extraction
We used HEK293T cells that were grown in RPMI 1640 medium with GlutaMAX supplement (Thermo Fisher, Waltham, MA, USA) supplemented with 10% fetal calf serum (Merck, Germany) and were lysed with SingleShot lysis buffer (Bio-Rad, United States of America).
MtRNA LNA Design
From previous experiments, we identified three transcripts without poly(A) tail that are abundant (0.1–2% of all counts) in 3′ end sequencing data of HEK293T cells: MT-RNR1, MT-RNR2, and RNA45S. We visually inspected the RNA sequencing data using IGV_2.7.2  and confirmed the presence of an adenosine-rich region flanking the abundant fragments observed in the sequencing library. For MT-RNR2, two different fragments were associated with an internal poly(A) stretch, contributing to the high number of gene counts. We investigated a design region of about 50 bases overlapping the abundant fragments and used Bowtie (v1.2.3)  to map several 16-base-long putative LNA sequences. We retained the oligos with the lowest number of off-target hits. We then checked their binding capacities and biochemical characteristics. Sequences are available in Supplemental Table 1.
We combined four different LNA mixes (MT-RNR2_1, MT-RNR2_2, MT-RNR1, and RNA24S) to have a final solution containing each LNA at 25 μM (100x). We mixed 2 μl of LNA to 3 μl of RNA sample. From this solution, we used 2.5 μl as input for the library preparation.
For the library preparation, we used the QuantSeq 3′ mRNA-Seq Library Prep Kit FWD for Illumina (Lexogen, Austria). We performed the ‘low input’ version of the protocol.
We sequenced the libraries using the NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles) (Illumina, San Diego, CA, USA) on a NovaSeq 6000 (Illumina, San Diego, CA, USA) instrument at a concentration of 300 pM with 10% PhiX.
We used BBMap (v38.26) to trim off the poly(A) tails and adapter sequences and to perform quality trimming. Next, all FASTQ files were subsampled to 2,000,000 reads with Seqtk (v1.3) and mapped to the hg38 genome using STAR (v2.6.0). We used SAMtools (v1.9) to count the reads mapping to the LNA-targeted genomic regions. We used htseq-count (v0.11.0)  to generate the overall counts. Before initial trimming, before quality trimming, and after quality trimming, we used FastQC (v0.11.9) to investigate the quality of the reads.
Mitochondrial Ribosomal RNA Blocking in Direct-cDNA Long-Read Sequencing
Cell Culture and Harvesting
We cultured HEK293T cells in RPMI medium supplemented with 10% fetal calf serum to 80% confluence in a T75. The cells were washed with 2 ml versene and incubated with 2 ml of trypsin for 3 minutes at 37 °C. We neutralized the mixture with 8 ml fresh medium. We centrifuged for 5 minutes at 2000 rcf at 4 °C and removed the supernatants. We resuspended the cells in 1 ml of QIAzol and flash-froze the mixture in liquid nitrogen.
RNA Extraction and Quality Control
We extracted RNA using the RNeasy Micro Kit (Qiagen, Hilden, Germany, 217,184) according to the manufacturer’s protocol. We checked the quality of the RNA (RQN = 10) using a Fragment Analyzer (Agilent, United States of America).
We combined four different LNA mixes (MT-RNR2_1, MT-RNR2_2, MT-RNR1 and RNA24S) to have a final solution containing each LNA at 25 μM. We then made a 10-fold dilution series to obtain three different LNA solutions: LNA1x (0.25 μM), LNA10x (2.5 μM) and LNA100x (25 μM). For each library preparation, 2 μg of total RNA was mixed with 2 μl of the corresponding LNA dilution. 1 μl of RNase-free water was added to 2 μg of total RNA as a non-treated sample (LNA0x). The samples are placed on ice for 5 minutes.
We prepared direct-cDNA libraries using the SQK-DCS109 Kit (Oxford Nanopore Technologies, United Kingdom). The exact protocol was followed except for the following changes: the RNA-bead binding steps were performed for 5 minutes on a Hula Mixer and 5 minutes on the bench at room temperature; the RNA elution steps were performed for 5 minutes at 37 °C and 5 minutes on a Hula Mixer at room temperature; and 300 μl of 80% ethanol was used for the beads wash steps.
Oxford Nanopore Sequencing
We sequenced each library using two Flongle Flow Cells (Oxford Nanopore Technologies, United Kingdom) with a MinION Sequencer (Oxford Nanopore Technologies, United Kingdom). Sequencing was either stopped after 24 hours or when no more pores were available.
We basecalled the raw fast5 files using Guppy (v3.5.2)  on a GPU. We grouped reads per sample and used Pychopper (v2.3.1)  to identify full-length transcripts containing both primer sequences. We mapped the reads with Minimap2 (v2.11)  and extracted reads mapping to the target fragment location using SAMtools (v1.11) . We then used NanoComp (v1.12.0)  to check the read length and quality of each sample.
MALAT1 Blocking in Single-Cell 3′ End Sequencing for Peripheral Blood Mononuclear Cells (PBMCs)
We collected whole blood in EDTA tubes. The blood was transferred to Leucosep filtered tubes (Greiner Bio-One) containing 15 ml of Ficoll Paque Plus (Cytiva, Washington, D.C., USA, 17144002) and diluted (1:2) with the same volume of 1X DPBS (Thermo Fisher, Waltham, MA, USA, 14190144). We centrifuged the samples at room temperature for 18 minutes at 800 rcf and extracted the PBMCs from the resulting buffy coat. The extracted PBMCs were centrifuged and washed twice with 1X DPBS (Thermo Fisher, Waltham, MA, USA, 14190144). We took a sample for counting, and assessed the cell viability and concentration with a Neubauer chamber, counting at least two different squares. PBMCs were then resuspended in freezing mix (complete medium (RPMI + 1% pen/strep + 10% FCS) + 10% DMSO) in cryovials with no more than 10 million cells. The vials were stored first at − 80 °C inside a freezing container for 24 h and then at − 150 °C. We thawed the vials just before live-death sorting.
MALAT1 LNA Design
After visually inspecting 3′ end sequencing data from PBMCs using IGV_2.7.2 , the optimal design space was identified (Supplemental Fig. 8). We identified two internal poly(A) sequences contributing to the high number of counts. Next, we designed and characterized the best LNA sequences following similar steps as before (see ‘mtRNA LNA design’, but with a length of 18 nucleotides). The sequences are available in Supplemental Table 1.
We diluted the LNAs at 125 μM of which 2 μl was used. This concentration is higher than the YRNA experiment, as we expect the total RNA concentration to be higher for this experiment. For the pre-RT blocking, 2 μl of the oligonucleotide mix was added to the master mix (including the RT reagent, template switching oligo, reducing reagent B, and RT enzyme C). The master mix is then combined with the cell suspension to a total volume of 80 μl. For the pre-cDNA amplification blocking, we added 2 μl of the oligonucleotide mix to the cDNA amplification mix (including Amp Mix and cDNA primers).
Sorted single-cell suspensions were resuspended in PBS + 0.04% BSA at an estimated final concentration of 1000 cells/μl and loaded on a Chromium GemCode Single Cell Instrument (10x Genomics, Pleasonton, CA, USA, 1000204), Chip G (10x Genomics, Pleasonton, CA, USA, #2000177) to generate single-cell gel beads-in-emulsion (GEM). We prepared the scRNA-seq libraries using the GemCode Single Cell 3′ Gel Bead and Library kit, version NextGEM 3.1 (10x Genomics, Pleasonton, CA, USA, PN-1000121) according to the manufacturer’s instructions.
The Chromium libraries were equimolarly pooled and loaded on a NovaSeq 6000 (Illumina, San Diego, CA, USA) instrument in standard mode with a final loading concentration of 340 pM and 2% PhiX. We obtained a total of 952 M reads with q30 of 91.32% with an SP100 cycles (Illumina, San Diego, CA, USA, 20028401) kit. The number of (pre-filtered) cells per experiment were highly comparable, 13,841 cells for the noLNA sample, 13,279 cells pre-RT, and 13,893 cells post-RT. The FASTQ files were subsampled based on the number of cells to obtain a comparable number of reads/cell over all samples.
Demultiplexing of the bcl files was performed with cellranger mkfastq (v6.0.1), after which gene counts per cell were obtained with cellranger count (v6.0.1).
The count matrixes were loaded into R (v4.1.0)  and further processed, including the integration and annotation, with Seurat (v4.0.3) . We did not filter the cells. We analyzed and visualized the data using tidyverse (v1.2.1) .
LNA Blocking Simulation in Whole Blood 3′-End Sequencing
We downloaded one of the whole blood 3′-end RNA sequencing (QuantSeq) samples generated by Uellendahl-Werth et al.  (SRR11028518). This sample had a sequencing depth of 18,043,131 reads.
We used BBMap (v38.26) to trim off the poly(A) tails and adapter sequences and to perform quality trimming. Next, we mapped the trimmed reads to the hg38 genome using STAR (v2.6.0). We used htseq-count (v0.11.0)  to quantify the uniquely mapped reads. We used FastQC (v0.11.9) to investigate the quality of the reads before quality trimming and after quality trimming.
All simulations were run using R (4.1.0). First, we generated the sampling distribution by first removing the ENSG00000244734 (HBB) reads and calculating the fraction of reads appointed to each gene relative to the total amount of reads. We used this distribution to guide the subsampling. We then subsampled the count tables for a varying total number of counts (0.5 M, 1 M, 2 M, 4 M, 8 M), initial HBB abundance (0–90%, by 10% increments), and percentage of depletion (0–100%, by 2% increments). Last, we calculated the number of genes with 10 counts or larger. Finally, we analyzed and visualized (v1.2.1) .
1X DPBS (Thermo Fisher, Waltham, MA, USA, 14190144).
RPMI 1640 medium with GlutaMAX supplement (Thermo Fisher, Waltham, MA, USA, 61870010
10% Fetal calf serum (Merck, Germany, F0804-500ML)
Chromium GemCode Single Cell Instrument (10x Genomics, Pleasonton, CA, USA, 1000204)
Direct cDNA sequencing kit (Oxford Nanopore Technologies, UK, SQK-DCS109)
Eppendorf Protein LoBind microcentrifuge tubes (Eppendorf, Hamburg, Germany, Z666548)
Ficoll Paque Plus (Cytiva, Washington, D.C., USA, 17144002
Flongle Flow cell (Oxford Nanopore Technologies, UK, FLO-FLG001)
Fragment Analyzer RNA Kit (Agilent, USA, DNF-471-0500)
GemCode Chip G (10x Genomics, Pleasonton, CA, USA, 2000177)
GemCode Single Cell 3′ Gel Bead and Library kit, version NextGEM 3.1 (10x Genomics, Pleasonton, CA, USA, PN-1000121)
KAPA Library Quantification Kit (Roche Diagnostics, Diegem, Belgium, KK4854)
MinION sequencer (Oxford Nanopore Technologies, UK, MIN-101B)
miRNeasy Serum/Plasma Kit (Qiagen, Hilden, Germany, 217,184).
NextSeq 500 High Output Kit v2.5 (75 cycles) (Illumina, San Diego, CA, USA, 20024906)
NextSeq 500 Sequencing System (Illumina, San Diego, CA, USA, SY-415-1001)
NovaSeq 6000 Sequencing System (Illumina, San Diego, CA, USA, 20012850)
NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles) (Illumina, San Diego, CA, USA, 20028401)
Pippin Prep (Sage Science, Beverly, MA, USA, PIP0001).
QuantSeq 3′ mRNA-Seq Library Prep Kit FWD for Illumina (Lexogen, Austria, 139.96)
RNeasy Micro Kit (Qiagen, Hilden, Germany, 217,184)
SingleShot lysis buffer (Bio-Rad, United States of America, 1,725,080)
TruSeq small RNA library prep sequencing kit (Illumina, San Diego, CA, USA, RS-200)
Vacutainer Hemogard Closure Plastic K2-Edta Tube, 10 ml, (BD, Franklin Lakes, NJ, USA, 367525)
Vacutainer Push blood collection set (BD, Franklin Lakes, NJ, USA, 368657)
HEK293T (ATCC, Manassas, VA, USA)
To prevent the incorporation of unwanted (fragments of) transcripts in RNA sequencing libraries, we reasoned that LNA-modified oligonucleotides would block reverse transcription and PCR amplification when bound downstream of the priming site because of their extremely high affinity to RNA and cDNA. The approach we took to design blocking LNA oligonucleotides depends on the characteristics of the unwanted RNA sequence and the library prep procedure. We selected four library prep procedures and defined highly abundant and mostly unwanted target RNA sequences for LNA oligonucleotide design (Fig. 1A). These targets include YRNA in small RNA sequencing libraries from human blood plasma, mitochondrial rRNA in 3′ end sequencing libraries and long read sequencing libraries of HEK293T cells, and MALAT1 in single-cell 3′ end sequencing libraries of PMBCs. To block RT and PCR of short fragments like YRNA in small RNA-seq libraries, we designed an 18 nucleotide LNA to be complementary to the 3′ end of the 30 nucleotide YRNA fragment (Fig. 1B). For longer fragments, like mitochondrial rRNA and MALAT1, the LNA oligonucleotide was designed to bind directly downstream of the poly(A) RT-priming site (Fig. 1B-F). As the LNA oligonucleotides are added directly to the RT reaction (see details in Material and Methods for each of the protocols), no additional steps are required in the RNA library prep protocol. Since the LNA remains present during the PCR steps, it will also inhibit the remaining fragments during PCR amplification (Fig. 1G).
YRNA Blocking in Plasma Samples for TruSeq Small RNA Sequencing
Efficient Blocking of RNY4 in PRP and PFP
We first focused on blocking RT and amplification of RNY4 fragments in human blood plasma small RNA-seq libraries. We tested the blocking efficiency on platelet-rich plasma (PRP) and platelet-free plasma (PFP) from healthy donors, with PRP having the highest fraction of RNY4 fragments (PFP: 29.74%, PRP: 73.16%). We then spiked two different concentrations of a blocking LNA oligonucleotide (0.25 μM and 2.5 μM, referred to as LNA1x and LNA10x, respectively) in the RT reaction of the TruSeq small RNA library prep, and compared the results to the standard workflow. We only observed 0.09% RNY4 in PFP and up to 0.16% RNY4 in PRP (Fig. 2A) when adding LNA1x to the RT reaction, or respectively a 477- and 468-fold reduction compared to the standard protocol. Increasing the LNA concentration 10-fold (LNA10x) provided no benefit compared to LNA1x, with a 228-fold and 262-fold reduction of RNY4. The strong reduction in RNY4 fragments was accompanied by a strong increase in the fraction of microRNA reads, from 49.55 to 79.67% for PFP and from 17.24 to 74.61% for PRP. Since the LNA1x condition resulted in a sufficient reduction in YRNA and an increase in microRNA read fraction, we decided to use the 1x concentration (0.25 μM) for the subsequent experiments unless specified otherwise.
RNY4 Blocking Increases microRNA Coverage and Preserves Fold Changes
We then evaluated the reproducibility of our RNY4 blocking protocol by comparing miRNA abundance between technical library preparation replicates. Reproducibility upon RNY4 blocking was similar to that of the standard workflow, as evidenced by similar Pearson (0.999–1) and Spearman (0.70–0.78 for PFP and 0.81–0.82 for PRP) correlation coefficients (Fig. 2B). To investigate if the abundance of miRNAs is affected by RNY4 blocking (due to off-target effects), we compared miRNA abundance (reads per million) between the RNY4 blocking procedure and the standard workflow. Only one miRNA showed a high standardized residual (residual divided by standard deviation) (> 2) in all samples and replicates. This miRNA (miR-106b-3p) showed a consistently lower abundance in the LNA1x libraries compared to the control. Of note, we did not observe any sequence similarity between the RNY4 fragment and miR-106-3p, suggesting that non-specific binding of the LNA is unlikely. In general, miRNA expression correlations between the standard protocol and LNA1x spike protocol (Fig. 2C) were comparable to these of technical replicates (Pearson correlation = 1.00, Spearman correlation = 0.67–0.72 for PFP and 0.81 for PRP). In PFP, the impact of RNY4 blocking on the number of detected miRNAs was limited, with only nine additional miRNAs detected upon subsampling for library size correction (Fig. 2D). However, the coverage for the detected miRNAs increased (Supplemental Fig. 1B). As expected, the uniquely detected miRNAs were low abundant (Supplemental Fig. 1A&B). In PRP, we detected 183 additional miRNAs in the LNA1x spike protocol. All except three miRNAs detected with the standard protocol were also detected with the LNA1x spike protocol (Fig. 2E, Supplemental Fig. 1C). Not only does RNY4 blocking increase the number of detected miRNAs, it also results in increased miRNA coverage (a 3-fold median RPM increase) (Supplemental Fig. 1D). Taken together, RNY4 blocking in blood plasma small RNA sequencing libraries improves miRNA library complexity and coverage.
We then assessed the impact of RNY4 blocking on differential miRNA abundance between samples. To address this, we examined the miRNA fold changes between PFP samples from patients with diverse tumor types (colorectal cancer or CRC (n = 4), prostate adenocarcinoma or PRAD (n = 4) and lung adenocarcinoma or LUAD (n = 4)) that were processed with the standard and LNA1x spike protocol (Supplemental Fig. 2). Differences in miRNA abundance between cancer types were highly concordant between both methods (Fig. 3A). To further assess the impact of RNY4 depletion on the robustness of differential expression analysis at various sequencing depths, we repeatedly subsampled reads to various sequencing depths and determined the concordance of differential miRNAs between the different subsamples. At high sequencing depth (7 M reads), both methods result in an equal concordance between the differentially expressed miRNAs amongst subsamples. When subsampling reads to a lower sequencing depth, however, the concordance between detected differential miRNAs (i.e., the number of shared differential miRNAs between subsamples) was higher in the case of RNY4 blocking (Fig. 3B). At 0.7 M reads, the average concordance of four random subsamples was 64.9% for RNY4 blocking and 60.0% without RNY4 blocking. When using 7 M reads, the concordance was 100% for both methods. This observation suggests that, at shallow sequencing depth, RNY4 blocking increases the robustness of differential miRNA analysis.
LNA Is the most Efficient Modification to Block Library Incorporation
As fully modified LNA oligonucleotides are relatively expensive, we evaluated the RNY4 blocking potency of cheaper base modifications known to improve oligonucleotide binding affinity such as 2′-O-methyl (2’OME) and 2′-methoxy-ethoxy (2’MOE) in PRP. We observed that an LNA-modified RNY4 oligonucleotide is equally efficient (median reduction of 6.45-fold) compared to a 2’MOE modified oligonucleotide (median reduction of 6.11-fold) (p = 0.14). The 2’OME modification, however, is less efficient and resulted in a reduction of just 1.22-fold (p = 0.005) compared to the LNA modification. In addition, we investigated the potency of partially modified (i.e., every other nucleotide) oligonucleotides for both LNA, 2’OME, and 2’MOE. We observed that the partially modified LNA RNY4 oligonucleotide was as potent as a fully modified LNA RNY4 oligonucleotide (RNY4 fold change reduction of 6.90, p = 0.383) and still outperforms fully modified 2’OME RNY4 oligonucleotides (p = 0.0005) (Fig. 3C). The partially 2’OME and 2’MOE modified oligonucleotides performed the worst (Fig. 3C).
rRNA Blocking in 3′ End Sequencing Data
During reverse transcription, oligo (dT) primers can bind internal poly(A) sequences of mitochondrial and nuclear ribosomal RNA species, which eventually get incorporated in the RNA sequencing library (up to 2% of all reads, as found in previous sequencing data (Fig. 1C-E)). Although the abundance of MT-rRNA in this data is not necessarily problematic, it does provide a good test case for eliminating multiple transcript fragments in a poly(A)-primed library prep procedure. Therefore, we designed fully modified LNA oligonucleotides to inhibit the reverse transcription of three mitochondrial rRNA fragments (MT-RNR1 and two fragments from MT-RNR2) and one fragment from nuclear rRNA RNA45S (Fig. 1). We added all four oligonucleotides to the RT reaction of a 3′ end library preparation on eight cell lysates and compared the data to that of a standard 3′ end library preparation workflow. Adding LNA oligonucleotides resulted in an average reduction of the counts per million of 16.2x, 19.2x, 8.6x, and 3.2x for RNA45S, MT-RNR1, MT-RNR2 fragment 1, and MT-RNR2 fragment 2, respectively (Fig. 4A). To evaluate the reproducibility of the method, we compared the abundance of all detected genes between biological replicates for both the standard and the LNA spike protocols. The Pearson (0.982–0.997) and Spearman (0.852–0.879) correlation coefficients were high for every comparison (Supplemental Fig. 3), and there was no significant difference in reproducibility between the standard and blocking protocol. We evaluated the number of detected genes for different sequencing depths to investigate whether blocking RNA45S and the MT-RNR1/2 fragments is beneficial for gene detection (Fig. 4B). For shallow sequencing depth (1–2 million reads), the number of detected genes was higher in the blocking protocol compared to the standard protocol. Finally, we investigated the potential off-target effects of the blocking oligonucleotides by comparing gene expression values between the control and blocking protocol. Out of 12,077 detected genes, we identified two genes that showed divergent gene expression values for all biological replicates: MT-ATP8 and H4C3 (Fig. 4C). We did not observe significant sequence complementarity between the LNA oligonucleotides and these presumed off-targets. In conclusion, LNA oligonucleotides can efficiently and specifically block the incorporation of a variety of transcript fragments in 3′ end RNA-seq libraries.
rRNA Blocking for Long-Read polyA+ Transcript Sequencing
Additionally, we explored whether the previously described rRNA (RNA45S, MT-RNR1, MT-RNR2) blocking strategy can also be applied to Oxford Nanopore Technologies (ONT) sequencing of poly(A)-primed cDNA libraries. More specifically, we performed direct-cDNA sequencing to investigate the blocking effect on just the reverse transcription step. We added three different concentrations (0.25 μM, 2.5 μM, and 25 μM, referred to as LNA1x, LNA10x, and LNA100x) of the rRNA LNA oligonucleotides (as used in the 3′ end library preparation) to the reverse transcription reaction of four different samples. For all targeted fragments, we observed a substantial decrease in counts per million with increasing concentration of LNA oligonucleotides, except for 45S pre-ribosomal RNA in the LNA100x condition (Fig. 5). Unexpectedly, we also observed a mild but consistent decrease in overall read length distribution with increasing concentration of LNA oligonucleotides (Supplemental Fig. 4). The quality scores of the reads did not vary (Supplemental Fig. 5). These results show the potential of LNA oligonucleotides to prevent reverse transcription (and thus sequencing) of specific RNA molecules in ONT long-read sequencing experiments.
MALAT1 Blocking in Single-Cell 3′ End Sequencing of PBMCs
We finally evaluated if our method would also be applicable to single-cell RNA sequencing. More specifically, we designed two half-modified LNA oligonucleotides to block MALAT1 in single-cell 3′ end sequencing libraries of PBMCs. In PBMCs, MALAT1 can consume > 40% percent of reads through priming of internal poly(A) stretches (Supplemental Fig. 6). The LNA oligonucleotides were added either before reverse transcription (pre-RT), which occurs in the gel bead-in-emulsion (GEMs), or before cDNA amplification, when the GEMs are pooled (pre-PCR). Both protocols show a decrease in MALAT1 reads (6-fold for the pre-RT and 4-fold for the pre-PCR blocked libraries) (Fig. 6A). For some cell types, e.g., erythrocytes and regulatory T-cells, the initial MALAT1 proportions were higher, resulting in a more drastic reduction (Supplemental Fig. 7). We observed a higher mitochondrial-derived RNA fraction for the pre-RT sample, which may indicate cell death (Fig. 6A). The LNAs are, in this case, combined with living cells for 18 min. We, therefore, focused our analysis on the pre-PCR protocol. UMAP representation of cells based on single-cell RNA-seq data from both the pre-PCR blocking and standard protocol revealed tight clustering of cell types independent of protocol (Fig. 6B), implying that the MALAT1 LNA oligonucleotide in the pre-PCR protocol has minimal impact on gene expression. This was further demonstrated by a perfect correlation (Spearman and Pearson correlation = 1.00) of gene expression values between the pre-PCR blocking and standard protocol (Supplemental Fig. 8). We observed a significant increase, albeit with a small effect size, in the mean number of detected genes per cell in the pre-PCR protocol; 1173 genes with at least two counts in the control sample and 1192 in the pre-PCR blocking sample (pt-test = 5.751e-05) (Fig. 6A). The higher gene detection sensitivity might be related to the initial MALAT read fraction, the number of genes detected in the cells and the fraction of other highly abundant genes. The highest impact was seen in B memory cells where the mean number of detected genes increased from 1131 to 1235 (9.2% increase, padj, t-test = 0.026).
LNA Blocking Simulation in Whole Blood 3′-End Sequencing
Although our wet-lab experiments indicate that high-affinity binding oligonucleotide blocking can efficiently deplete transcripts of interest, it remains to be determined what the relationship is between initial abundance and level of depletion in order to offer substantial benefit (such as increased library complexity). Some of our applications had a more significant impact on the number of additionally detected genes and on coverage increase than others. We assume that this impact is highly dependent on the initial fraction of targeted reads, the depletion efficiency, and the sequencing saturation. In order to investigate in detail, we simulated different abundances and depletion efficiencies of beta-globin (HBB) using publicly available whole blood 3′-end sequencing data, in which HBB accounted for 20.8% of all reads ). Figure 7 shows how the number of detected genes (with at least 10 counts) increases linearly with increasing depletion efficiency at shallow sequencing depth but increases exponentially at higher sequencing depths. The linear relation for low sequencing depths probably results from unsaturated sequencing. The relation becomes more linear as the initial unwanted fraction lowers in higher sequencing depths. We conclude that even inefficient depletion of high-abundant transcripts provides a substantial gain in the number of detected genes.
We demonstrate that high-affinity binding oligonucleotides can be applied to block reverse transcription and PCR amplification of various RNA transcripts in different RNA-seq library preparation protocols. We present a flexible and robust method that can drastically increase the detection and coverage of (low abundant) genes in the library. While LNA oligonucleotides have been used before to block PCR amplification , we provided evidence that such oligonucleotides can block both reverse transcription and PCR amplification, indicated by evident fragment depletion in the PCR-independent Oxford Nanopore Technologies protocol and the post-RT single-cell sample, respectively. Moreover, we demonstrate that the impact of blocking specific RNA fragments on gene detection and coverage strongly depends on the abundance of the blocked fragments, the transcriptome complexity and the sequencing depth. For example, the impact of RNY4 blocking on gene (i.e., miRNA) detection and coverage was much more significant for PRP than PFP, most likely because of higher RNY4 abundance in PRP. We expect a more pronounced impact in PFP samples at lower sequencing depths. We provide support for this hypothesis using simulated depletion experiments.
Our method has several advantages compared to existing protocols. First, the method only requires a single additional step that can be implemented in any RNA-seq library preparation workflow. Second, no nucleic acid sample or library material is lost because of enrichment or washing steps, which we believe has a positive impact on detection sensitivity, especially for low-input samples.
While we generally observe potent blocking of targeted transcripts, we also observe a few minor unwanted effects. First, in single-cell RNA sequencing, adding LNA oligonucleotides to the GEMs during 3′ end sequencing resulted in a higher fraction of mtRNA reads. As living cells are incubated with LNA oligonucleotides for 18 min, the oligonucleotides may enter the cells and induce cell death. A large fraction of mtRNA co-occurred with few detected genes. Since adding the LNA oligonucleotides post-PCR also results in potent target blocking, we propose to use this approach instead. Second, the optimal concentration of LNA oligonucleotide may be application and target-dependent. A dedicated optimization step is warranted for optimal performance. This necessity is reflected in the single-cell RNA sequencing experiment, where the benefit (in terms of the number of detected genes) depends on the cell type. Factors to consider are the original fraction of the targeted RNA transcript and the input RNA concentration of the library preparation protocol. We advise to combine samples in one single library prep to exclude batch effects, as is generally advised for RNA-seq experiments. Third, we observed a limited number of off-target effects upon adding specific LNA oligonucleotides (for instance, MT-AT8 and H4C3 in the 3′ end sequencing experiment). We did not observe significant sequence complementarity between the LNA oligonucleotides and the presumed off-targets. Nevertheless, off-target effects are not entirely unexpected given the relatively short length of the LNAs, their high RNA-binding capacity, and the small design space. The latter lowers the number of possible oligonucleotides and thereby the chances of designing one without off-target effects. Increasing oligonucleotide length or reducing the number of LNA nucleotides to lower binding affinity may improve specificity. A fourth limitation of the method is that it may only be applicable to small RNA sequencing or RNA-sequencing library prep methods employing an oligo(T) or a gene-specific RT primer. When the priming is random, it is impossible to design a single LNA oligonucleotide to block reverse transcription of the whole fragment. One option would be to design multiple LNA oligonucleotides spanning the entire transcript, but this could become prohibitively expensive, depending on the length of the fragment. Fifth, LNA synthesis is costly. Nevertheless, the amount of oligo that is required for efficient blocking is limited. Even at low synthesis scale, several hundreds of reactions can be performed, resulting in a limited per-sample cost. As fully and partially modified LNA oligonucleotides are equally efficient for YRNA depletion, partially modified LNA oligonucleotides could be used to further reduce oligo synthesis cost (although additional validation would be required as we only demonstrated this for a single RNA target sequence). Notably, blocking unwanted transcripts may also help reduce the sequencing cost. Finally, the observed shortening in read length with increasing LNA concentration in the Oxford Nanopore Technologies experiment is problematic, as it suggests off-target binding of the LNAs. Although the LNA oligonucleotides are expected to preferentially bind sequences with a lower number of mismatches, the steady decline in coverage towards the 5′-end of RNA transcripts points towards close to non-specific binding of the LNA oligonucleotides when supplied at high concentration. The possibility of the LNA oligonucleotides inhibiting the sequencing by binding to the final library can be dismissed by investigating the adaptor-to-adaptor reads (which signify complete sequencing of the read).
We believe the method presented here is versatile and can be used for other applications not investigated here, including hemoglobin mRNA blocking in whole blood samples (up to 70% of all mRNA in whole blood ) or trypsin mRNA in pancreatic RNA samples. As we have shown, samples dominated by a few fragments have a higher potential of benefitting from LNA oligonucleotide-transcript blocking. We suggest the users perform an initial computational analysis to define the expected benefit prior to implementing and optimizing our proposed method. While we only investigated mixtures of up to four different LNA oligonucleotides, it would be possible to combine more and block multiple fragments in one sample. Such mixtures can be designed specifically for unique and challenging sample types, containing several highly expressed, uninformative fragments .
In conclusion, we present a novel and broadly applicable method to specifically block unwanted RNA transcripts during RNA sequencing library preparations by simply adding a target-specific high-affinity oligonucleotide to the RT or PCR reaction.
Availability of Data and Materials
The datasets generated and/or analysed during the current study are available in EGA, EGAS00001006023.
O’Neil D, Glowatz H, Schlumpberger M. Ribosomal RNA Depletion for Efficient Use of RNA-Seq Capacity. Curr Protoc Mol Biol. 2013:4.19.1–8 Available from: http://doi.wiley.com/10.1002/0471142727.mb0419s103. Cited 2020 Apr 22. Hoboken, NJ, USA: John Wiley & Sons, Inc.
Hendrick JP, Wolin SL, Rinke J, Lerner MR, Steitz JA. Ro small cytoplasmic ribonucleoproteins are a subclass of La ribonucleoproteins: further characterization of the Ro and La small ribonucleoproteins from uninfected mammalian cells. Mol Cell Biol. American Society for Microbiology. 1981;1:1138–49.
Rutjes SA, Van Der Heijden A, Utz PJ, Van Venrooij WJ, Pruijn GJM. Rapid nucleolytic degradation of the small cytoplasmic Y RNAs during apoptosis. J Biol Chem. American Society for Biochemistry and Molecular Biology. 1999;274:24799–807.
Nicolas FE, Hall AE, Csorba T, Turnbull C, Dalmay T. Biogenesis of Y RNA-derived small RNAs is independent of the microRNA pathway. FEBS Lett. 2012;586:1226–30 Available from: http://doi.wiley.com/10.1016/j.febslet.2012.03.026. Cited 2020 Mar 30. John Wiley & Sons, Ltd.
Meiri E, Levy A, Benjamin H, Ben-David M, Cohen L, Dov A, et al. Discovery of microRNAs and other small RNAs in solid tumors. Nucleic Acids Res. 2010;38:6234–46 Available from: http://www.agilent.com. Cited 2020 Apr 24.
Ishikawa T, Haino A, Seki M, Terada H, Nashimoto M. The Y4-RNA fragment, a potential diagnostic marker, exists in saliva. Noncoding RNA Res. 2017;2:122–8 KeAi Communications Co.
Dhahbi JM, Spindler SR, Atamna H, Boffelli D, Mote P, DIK M. 5′-YRNA fragments derived by processing of transcripts from specific YRNA genes and pseudogenes are abundant in human serum and plasma. Physiol Genomics. 2013;45:990–8 Available from: http://www.physiology.org/doi/10.1152/physiolgenomics.00129.2013. Cited 2019 May 28. American Physiological Society Bethesda, MD.
Ninomiya S, Kawano M, Abe T, Ishikawa T, Takahashi M, Tamura M, et al. Potential Small Guide Rnas For Trnase Zl From Human Plasma, Peripheral Blood Mononuclear Cells, And Cultured Cell Lines. Costa-Rodrigues J, editor. PLoS One. 2015;10:e0118631 Available from: http://dx.plos.org/10.1371/journal.pone.0118631. Cited 2020 Apr 23. Public Library of Science.
Ninomiya S, Ishikawa T, Takahashi M, Seki M, Nashimoto M. Potential physiological roles of the 31/32-nucleotide Y4-RNA fragment in human plasma. Noncoding RNA Res. 2019;4:135–40 Available from: https://linkinghub.elsevier.com/retrieve/pii/S2468054019300447. Cited 2020 Mar 30. KeAi Communications Co.
Dhahbi JM, Spindler SR, Atamna H, Boffelli D, Martin DIK. Deep Sequencing Of Serum Small Rnas Identifies Patterns Of 5′ Trna Half And Yrna Fragment Expression Associated With Breast Cancer. Biomark Cancer. 2014;6:BIC.S20764 Available from: http://journals.sagepub.com/doi/10.4137/BIC.S20764. Cited 2019 Jun 14. SAGE PublicationsSage UK: London, England.
Yan Y, Wang X, Venø MT, Bakholdt V, Sørensen JA, Krogdahl A, et al. Circulating miRNAs as biomarkers for oral squamous cell carcinoma recurrence in operated patients. Oncotarget. 2017;8:8206–14 Impact Journals LLC.
Pang X, Zhou D, Song Y, Pei D, Wang J, Guo Z, et al. Bacterial mRNA Purification by Magnetic Capture-Hybridization Method. Microbiol Immunol. 2004;48:91–6 Available from: https://onlinelibrary.wiley.com/doi/full/10.1111/j.1348-0421.2004.tb03493.x. Cited 2022 Apr 21. John Wiley & Sons, Ltd.
Su C, Sordillo LM. A simple method to enrich mRNA from total prokaryotic RNA. Mol Biotechnol. 1998;10:83–5 Available from: https://link.springer.com/article/10.1007/BF02745865. Cited 2022 Apr 19. Springer.
Stewart FJ, Ottesen EA, Delong EF. Development and quantitative analyses of a universal rRNA-subtraction protocol for microbial metatranscriptomics. ISME J. 2010;4:896–907 Available from: https://www.nature.com/articles/ismej201018. Cited 2022 Apr 19. Nature Publishing Group.
McGrath KC, Thomas-Hall SR, Cheng CT, Leo L, Alexa A, Schmidt S, et al. Isolation and analysis of mRNA from environmental microbial communities. J Microbiol Methods. 2008;75:172–6 Elsevier.
Morlan JD, Qu K, Sinicropi DV. Selective Depletion Of Rrna Enables Whole Transcriptome Profiling Of Archival Fixed Tissue. PLoS One. 2012;7:e42882 Available from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0042882. Cited 2022 Apr 19. Public Library of Science.
Benes V, Blake J, Doyle K. Ribo-Zero Gold Kit: improved RNA-seq results after removal of cytoplasmic and mitochondrial ribosomal RNA. Nat Methods. 2011;8:iii–v Available from: https://www.nature.com/articles/nmeth.f.352. Cited 2022 Apr 21. Nature Publishing Group.
Huang Y, Sheth RU, Kaufman A, Wang HH. Scalable and cost-effective ribonuclease-based rRNA depletion for transcriptomics. Nucleic Acids Res. 2020;48:E20 Available from: https://pubmed.ncbi.nlm.nih.gov/31879761/. Cited 2022 Apr 21.
Prezza G, Heckel T, Dietrich S, Homberger C, Westermann AJ, Vogel J. Improved bacterial RNA-seq by Cas9-based depletion of ribosomal RNA reads. RNA. 2020;26:1069–78 Available from: http://rnajournal.cshlp.org/content/26/8/1069.full. Cited 2022 Apr 15. Cold Spring Harbor Laboratory Press.
Gu W, Crawford ED, O’Donovan BD, Wilson MR, Chow ED, Retallack H, et al. Depletion of Abundant Sequences by Hybridization (DASH): Using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biol. 2016;17:1–13 Available from: https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0904-5. Cited 2022 Apr 15. BioMed Central Ltd.
Arnaud O, Kato S, Poulain S, Plessy C. Targeted reduction of highly abundant transcripts using pseudo-random primers. Biotechniques. 2016;60:169–74 Available from: https://www.future-science.com/doi/full/10.2144/000114400. Cited 2022 Apr 19. Eaton Publishing Company.
Armour CD, Castle JC, Chen R, Babak T, Loerch P, Jackson S, et al. Digital transcriptome profiling using selective hexamer priming for cDNA synthesis. Nat Methods. 2009;6:647–9 Available from: https://www.nature.com/articles/nmeth.1360. Cited 2022 Apr 21. Nature Publishing Group.
Bogdanova EA, Shagina IA, Mudrik E, Ivanov I, Amon P, Vagner LL, et al. DSN depletion is a simple method to remove selected transcripts from cDNA populations. Mol Biotechnol. 2009;41:247–53 Available from: https://link.springer.com/article/10.1007/s12033-008-9131-y. Cited 2022 Apr 19. Springer.
Yi H, Cho YJ, Won S, Lee JE, Jin YH, Kim S, et al. Duplex-specific nuclease efficiently removes rRNA for prokaryotic RNA-seq. Nucleic Acids Res. 2011;39 Available from: https://pubmed.ncbi.nlm.nih.gov/21880599/. Cited 2022 Apr 21.
Archer SK, Shirokikh NE, Preiss T. Selective and flexible depletion of problematic sequences from RNA-seq libraries at the cDNA stage. BMC Genomics. 2014;15:1–9 Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-401. Cited 2022 Apr 21. BioMed Central Ltd.
Archer SK, Shirokikh NE, Preiss T. Probe-Directed Degradation (PDD) for Flexible Removal of Unwanted cDNA Sequences from RNA-Seq Libraries. Curr Protoc Hum Genet. 2015;85:11.15.1–11.15.36 Available from: https://onlinelibrary.wiley.com/doi/full/10.1002/0471142905.hg1115s85. Cited 2022 Apr 19. John Wiley & Sons, Ltd.
Naarmann-de Vries IS, Eschenbach J, Dieterich C. Improved nanopore direct RNA sequencing of cardiac myocyte samples by selective mt-RNA depletion. J Mol Cell Cardiol. 2022;163:175–86 Available from: http://www.jmcc-online.com/article/S0022282821002091/fulltext. Cited 2022 Apr 19. Academic Press.
Wangsanuwat C, Heom KA, Liu E, O’Malley MA, Dey SS. Efficient and cost-effective bacterial mRNA sequencing from low input samples through ribosomal RNA depletion. BMC Genomics. 2020;21:1–12 Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-07134-4. Cited 2022 Apr 19. BioMed Central Ltd.
Levin JZ, Berger MF, Adiconis X, Rogov P, Melnikov A, Fennell T, et al. Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 2009;10:1–8 Available from: https://genomebiology.biomedcentral.com/articles/10.1186/gb-2009-10-10-r115. Cited 2022 Apr 19. BioMed Central.
Mercer TR, Clark MB, Crawford J, Brunck ME, Gerhardt DJ, Taft RJ, et al. Targeted sequencing for gene discovery and quantification using RNA CaptureSeq. Nat Protoc. 2014;9:989–1009 Available from: https://www.nature.com/articles/nprot.2014.058. Cited 2022 Apr 19. Nature Publishing Group.
Clark MB, Mercer TR, Bussotti G, Leonardi T, Haynes KR, Crawford J, et al. Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing. Nat Methods. 2015;12:339–42 Available from: https://www.nature.com/articles/nmeth.3321. Cited 2022 Apr 19. Nature Publishing Group.
Morlion A, Everaert C, Nuytens J, Hulstaert E, Vandesompele J, Mestdagh P. Custom long non-coding RNA capture enhances detection sensitivity in different human sample types. RNA Biol. 2021;18:215–22 Available from: https://pubmed.ncbi.nlm.nih.gov/34470578/. Cited 2022 Mar 3.
Mercer TR, Gerhardt DJ, Dinger ME, Crawford J, Trapnell C, Jeddeloh JA, et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol. 2011;30:99–104 Available from: https://www.nature.com/articles/nbt.2024. Cited 2022 Apr 21. Nature Publishing Group.
Briese T, Kapoor A, Mishra N, Jain K, Kumar A, Jabado OJ, et al. Virome capture sequencing enables sensitive viral diagnosis and comprehensive virome analysis. MBio. 2015;6 Available from: https://journals.asm.org/doi/full/10.1128/mBio.01491-15. Cited 2022 Apr 21. American Society for Microbiology.
Petrova OE, Garcia-Alcalde F, Zampaloni C, Sauer K. Comparative evaluation of rRNA depletion procedures for the improved analysis of bacterial biofilm and mixed pathogen culture transcriptomes. Sci Rep. 2017;7:1–15 Available from: https://www.nature.com/articles/srep41114. Cited 2022 Apr 21. Nature Publishing Group.
Bhagwat AA, Ying ZI, Smith A, Bhagwat AA, Ying ZI, Smith A. Evaluation of Ribosomal RNA Removal Protocols for Salmonella RNA-Seq Projects. Adv Microbiol. 2014;4:25–32 Available from: http://www.scirp.org/Html/6-2270232_42072.htm. Cited 2022 Apr 21. Scientific Research Publishing.
Zhao W, He X, Hoadley KA, Parker JS, Hayes DN, Perou CM. Comparison of RNA-Seq by poly (a) capture, ribosomal RNA depletion, and DNA microarray for expression profiling. BMC Genomics. 2014;15:1–11 Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-419. Cited 2022 Apr 19. BioMed Central Ltd.
Herbert ZT, Kershner JP, Butty VL, Thimmapuram J, Choudhari S, Alekseyev YO, et al. Cross-site comparison of ribosomal depletion kits for Illumina RNAseq library construction. BMC Genomics. 2018;19:1–10 Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-018-4585-1. Cited 2022 Apr 21. BioMed Central Ltd.
Van Goethem A, Yigit N, Everaert C, Moreno-Smith M, Mus LM, Barbieri E, et al. Depletion of tRNA-halves enables effective small RNA sequencing of low-input murine serum samples. Sci Rep. 2016;6:37876.
Hardigan AA, Roberts BS, Moore DE, Ramaker RC, Jones AL, Myers RM. CRISPR/Cas9-targeted removal of unwanted sequences from small-RNA sequencing libraries. Nucleic Acids Res. 2019;47(14):e84.
Duffy K, Arangundy-Franklin S, Holliger P. Modified nucleic acids: Replication, evolution, and next-generation therapeutics. BMC Biol. 2020;18:1–14 Available from: https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-020-00803-6. Cited 2022 Apr 21. BioMed Central Ltd.
Breitenbuecher F, Hoffarth S, Worm K, Cortes-Incio D, Gauler TC, Köhler J, et al. Development of a Highly Sensitive And Specific Method For Detection Of Circulating Tumor Cells Harboring Somatic Mutations In Non-Small-Cell Lung Cancer Patients. PLoS One. 2014;9:e85350 Available from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085350. Cited 2022 Mar 10. Public Library of Science.
Singh SK, Nielsen P, Koshkin AA, Wengel J. LNA (locked nucleic acids): Synthesis and high-affinity nucleic acid recognition. Chem Commun. 1998;4:455–6. Royal Society of Chemistry.
Zhang Y, Roccaro AM, Rombaoa C, Flores L, Obad S, Fernandes SM, et al. LNA-mediated anti-miR-155 silencing in low-grade B-cell lymphomas. Blood. 2012;120:1678–86 American Society of Hematology.
Hummelshoj L, Ryder LP, Madsen HO, Poulsen LK. Locked nucleic acid inhibits amplification of contaminating DNA in real-time PCR. Biotechniques. 2005;38:605–10 Available from: https://www.future-science.com/doi/abs/10.2144/05384RR01. Cited 2022 Mar 10. Eaton Publishing Company.
Dominguez PL, Kolodney MS. Wild-type blocking polymerase chain reaction for detection of single nucleotide minority mutations from clinical specimens. Oncogene. 2005;24:6830–4 Available from: https://www.nature.com/articles/1208832. Cited 2022 Apr 19. Nature Publishing Group.
Oldenburg RP, Liu MS, Kolodney MS. Selective amplification of rare mutations using locked nucleic acid oligonucleotides that competitively inhibit primer binding to wild-type DNA. J Invest Dermatol. 2008;128:398–402 Elsevier.
Vliegen L, Dooms C, De Kelver W, Verbeken E, Vansteenkiste J, Vandenberghe P. Validation of a locked nucleic acid based wild-type blocking PCR for the detection of EGFR exon 18/19 mutations. Diagn Pathol. 2015;10 Available from: /pmc/articles/PMC4448309/. Cited 2022 Apr 19. BioMed Central.
Russell C, Kerkof K, Timour M. US20060234277A1 - Method for selectively blocking hemoglobin RNA amplification - Google Patents. 2006. Available from: https://patents.google.com/patent/US20060234277?oq=2006%2F0234277. Cited 2022 Mar 1
Consortium exRNAQC, Anckaert J, Cobos FA, Decock A, Deleu J, De WO, et al. Performance of RNA purification kits and blood collection tubes in the Extracellular RNA Quality Control (exRNAQC) study. bioRxiv. 2021;2021.05.11.442610 Available from: https://www.biorxiv.org/content/10.1101/2021.05.11.442610v1. Cited 2021 Aug 27. Cold Spring Harbor Laboratory.
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10 EMBnet Stichting.
FASTX-Toolkit. Available from: http://hannonlab.cshl.edu/fastx_toolkit/index.html. Cited 2022 Jan 19.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:1–10 Available from: https://genomebiology.biomedcentral.com/articles/10.1186/gb-2009-10-3-r25. Cited 2021 Aug 24. BioMed Central.
R Core Team. R: A language and environment for statistical computing. Vienna: R Found Stat Comput; 2021. Available from: https://www.r-project.org/. Cited 2022 Jan 19
Wickham H, Averick M, Bryan J, Chang W, D’L MA, et al. Welcome to the Tidyverse. J Open Source Softw. 2019;4:1686 Available from: https://joss.theoj.org/papers/10.21105/joss.01686. Cited 2022 Jan 19. The Open Journal.
Durinck S, Moreau Y, Kasprzyk A, Davis S, De Moor B, Brazma A, et al. BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics. 2005;21:3439–40 Available from: https://academic.oup.com/bioinformatics/article/21/16/3439/215235. Cited 2022 Jan 19. Oxford Academic.
Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc. 2009;4:1184–91 Available from: https://www.nature.com/articles/nprot.2009.97. Cited 2022 Jan 19. Nature Publishing Group.
Robinson D. broom: An R Package for Converting Statistical Analysis Objects Into Tidy Data Frames. 2014; Available from: https://arxiv.org/abs/1412.3565v2. Cited 2022 Jan 19
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47–7 Available from: https://academic.oup.com/nar/article/43/7/e47/2414268. Cited 2022 Jan 19. Oxford Academic.
Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–6 Available from: https://www.nature.com/articles/nbt.1754. Cited 2021 Aug 24. Nature Publishing Group.
Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166 Available from: /pmc/articles/PMC4287950/. Cited 2022 Jan 19. Oxford University Press.
Wick RR, Judd LM, Holt KE. Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. BioMed Central Ltd. 2019;20:1-10.
nanoporetech/pychopper: A tool to identify, orient, trim and rescue full length cDNA reads. Available from: https://github.com/nanoporetech/pychopper. Cited 2022 Jan 19.
Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–100 Available from: https://academic.oup.com/bioinformatics/article/34/18/3094/4994778. Cited 2022 Jan 19. Oxford Academic.
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10:1–4 Available from: https://academic.oup.com/gigascience/article/10/2/giab008/6137722. Cited 2022 Jan 19. Oxford Academic.
De Coster W, D’Hert S, Schultz DT, Cruts M, Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34:2666–9 Available from: https://academic.oup.com/bioinformatics/article/34/15/2666/4934939. Cited 2021 Aug 24. Oxford Academic.
Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587.e29 Cell Press.
Uellendahl-Werth F, Wolfien M, Franke A, Wolkenhauer O, Ellinghaus D. A benchmark of hemoglobin blocking during library preparation for mRNA-Sequencing of human blood samples. Sci Rep. 2020;10:1–10 Available from: https://www.nature.com/articles/s41598-020-62637-0. Cited 2022 Jul 27. Nature Publishing Group.
Field LA, Jordan RM, Hadix JA, Dunn MA, Shriver CD, Ellsworth RE, et al. Functional identity of genes detectable in expression profiling assays following globin mRNA reduction of peripheral blood samples. Clin Biochem. 2007;40:499–502 Elsevier.
Hulstaert E, Morlion A, Avila Cobos F, Verniers K, Nuytens J, Vanden Eynde E, et al. Charting extracellular transcriptomes in the human biofluid RNA atlas. Cell Rep. 2020;33:108552 Cell Press.
We are thankful to the VIB Single Cell Core, VIB Flow Core Ghent and VIB Nucleomics for support and access to the instrument park (vib.be/core-facilities)
This work was funded by ‘Fonds Wetenschappelijk Onderzoek’ Flanders; Ghent University; Kom op tegen Kanker (Stand up to Cancer), and Stichting Tegen Kanker.
Ethics Approval and Consent to Participate
Our analyses have been approved under EC/2017/1207 by the Ghent University Hospital ethical committee.
Consent for Publication
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Count distributions of noLNA, LNA1x, and LNA10x in PFP and PRP. Supplemental Figure 2. Read distribution for each sample and treatment. Supplemental Figure 3. Correlation plots comparing all biological replicates. Supplemental Figure 4. Length distribution of full-length transcripts after Oxford Nanopore direct-RNA sequencing. Supplemental Figure 5. Quality score distribution of full-length transcripts after Oxford Nanopore direct-RNA sequencing. Supplemental Figure 6. Illustration of problematic MALAT1 fragments and design space. Supplemental Figure 7. MALAT1 transcription in each cell type. Supplemental Figure 8. Correlation plot between prePCR and noLNA samples. Supplemental Table 1. Sequences of designed synthetic oligonucleotides. For each modified oligonucleotide the identification, gene target and sequence are provided. The sequences contain modification information in the generally accepted standard of notification.
About this article
Cite this article
Everaert, C., Verwilt, J., Verniers, K. et al. Blocking Abundant RNA Transcripts by High-Affinity Oligonucleotides during Transcriptome Library Preparation. Biol Proced Online 25, 7 (2023). https://doi.org/10.1186/s12575-023-00193-3
- RNA sequencing
- Oxford nanopore technologies
- Single-cell RNA sequencing