The characterization of RNA and RNA interactions is closely related to transcription, for example, gene expression levels are investigated within a biological context. Over the last decades, a variety of RNA methods have been developed for the study of RNA-DNA, RNA-RNA, and RNA-protein interactions including RNA complexes with ligand molecules.
RNA molecules are functionally diverse and involved in many cellular processes such as catalysis, ligand binding, and protein recognition. RNA molecules are structural flexible and can adopt different structures. The combination of different biochemical methods with computational modelling allows scientists to gain insight into molecular processes in which RNA is involved. RNA is a long, polymer of ribonucleoside monophosphate molecular units or building blocks joined together by phosphodiester linkages.
RNAs are single-stranded molecules and the unlinked monomer building blocks are known as nucleotides. Each nucleotide is made up of three key components: a pentose which is a five-carbon sugar, at least one phosphate group, and a nitrogenous base. RNA molecules are generally folded into compact and defined tertiary structures. Specific tertiary structure types are observed for transfer RNA (tRNA), ribosomal RNA (rRNA), small nuclear RNA (snRNA), certain introns, and ribozymes. mRNAs can also adopt complex tertiary structures, especially in the untranslated terminal regions (UTRs).
Bioinformatic algorithms allowing predicting of biomolecular folding for proteins, peptides, and RNAs, even though sometimes successful, have all their limitations. The reasons are as follows.
• RNA molecules in solution may adopt secondary structures that are only partially
determined by thermodynamics since RNA molecules can undergo conformational
changes during interaction with other RNAs, RNA binding proteins or RNA binding
peptides. These interactions are very complex and difficult to model.
• Our knowledge of thermodynamic rules and parameters that govern folding
patterns of RNAs are far from being complete.
• Most folding algorithms use approximations for scanning the landscape of possible
secondary structures. Therefore the predicted structures are often just an
estimate or approximation and chemical methods are needed to verify the
• Predicting pseudoknots and long-range and tertiary-structure interactions
accurately is also quite difficult.
• Using experimental methods to accumulate additional experimental data
should allow improvements in algorithms used in computational methods.
The mapping of RNA-protein or RNA-RNA interactions by protein pull-down or affinity pull-down methods allow studying RNA structures, as well as RNA-protein, and RNA-RNA interactions. Since RNA-binding proteins (RBPs) are key players in the post-transcriptional regulation of gene expression precise knowledge of their binding sites is critical for determining their molecular function and for understanding their roles in cell development and disease.
Nuclear magnetic resonance spectroscopy is a powerful tool for studying RNA structures in detail. NMR allows studying RNA molecules in a more natural state when dissolved in solution. The drawback is that a large preparation of highly purified and uniform RNA is needed, and it is limited to solving small structures.
RNA immunoprecipitation (RIP) uses antibodies to pull down RNA bound to a targeted protein. RIP allows detection of individual proteins associated with specific nucleic acid regions on RNA molecules. Live cells are treated with formaldehyde to generate protein-RNA cross-links between molecules that are nearby. Cross-linked RNA connected to the targeted protein is isolated using immunoprecipitation of the protein. Recovery and quantitative analysis of the immunoprecipitated RNAs are achieved by reversal of the formaldehyde cross-linking which now permits reverse transcription using PCR. RIP is similar to chromatin immunoprecipitation (ChiP). Unfortunately, the technique cannot differentiate between direct and indirectly bound RNA. A drawback of this method is that the generation of false positives from interaction occurring after cell lysis is also possible.
Cross-linking and immunoprecipitation (CLIP) allows mapping of transcriptome-wide binding sites of RNA-binding proteins. RNA-protein complexes are covalently cross-linked and purified from intact tissue cells. CLIP improved the specificity of RIP. It allows the removal of weekly bound RNA when stringent washing steps are used. RNAs that remain can be reversed transcribed, and PCR amplification allows sequencing with next-generation sequencing. For CLIP to work, reverse transcription needs to proceed from a universal 3′ ligated adapter to a universal 5′ ligated adapter. Both adapters are required for PCR amplification.
All CLIP protocols use RNA-protein cross-linking and immunoprecipitation targeting a protein of interest. Ultraviolet light creates the cross-links between RNA and proteins in vivo. The RNA is then isolated and reverse transcribed into cDNA. The cDNA can be used on several platforms to identify and quantify interacting RNAs. Several refinements and specializations of this central CLIP principle exist: CLIP-seq, PAR-CLIP, and iCLIP are three of the most common.
The individual-nucleotide resolution CLIP (iCLIP) protocol was developed to allow recovery of truncated cDNAs lost in CLIP. iCLIP enables PCR amplification of truncated cDNAs and identifies protein–RNA crosslink sites with single nucleotide resolution. iCLIP identifies protein-RNA cross-links on a genome-wide scale. An intramolecular cDNA circularization step enables analysis of cDNAs truncated at the protein-RNA crosslink sites with high resolution and specificity. A 3’ exonuclease degrades protein-bound RNA. The enzyme digests the isolated RNA but stops at the cross-linked protein. An adapter is then ligated to the remaining RNA.
For iCLIP, cells are irradiated with UV-C light on ice. Covalent bonds are formed between proteins and RNA. The cross-linking reaction is followed by partial RNase digestion and immunoprecipitation with protein-specific antibodies. Libraries are preparation and visualized by dephosphorylation of RNA. Next, a 3′ end adapter is ligated to the RNA, and the 5′ end is radioactively labeled. Complexes are separated by SDS–PAGE and isolated from a nitrocellulose membrane. Proteins are digested by proteinase K, and reverse transcription (RT) is performed truncating at the remaining polypeptide. An RT primer introduces two cleavable adapter regions and barcode sequences. Free RT primers are removed by size selection, and circularization of the cDNA is carried out. Linearization generates suitable templates for PCR amplification.Finally, high-throughput sequencing generates sequencing reads in which a barcode sequence is immediately followed by the last nucleotide of the cDNA.
Enhanced CLIP improves library preparation and circular steps of iCLIP. eCLIP simplifies the generation of paired IgG and size-matched input controls. Also, eCLIP improves specificity in the discovery of authentic binding sites. After dephosphorylation of RNA fragments, an “inline barcoded” RNA adapter is ligated to the 3′ end. Following protein gel electrophoresis and nitrocellulose membrane transfer, a region of 75 kDa (~220 nt of RNA) above the protein size is excised and proteinase K treated to isolate RNA. The RNA is further prepared into paired-end high-throughput sequencing libraries and sequenced.
Photo-activable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP) is a method that allows identification of binding sites of RNA-binding proteins (RBPs). PAR-CLIP can be used for the identification of RNA-protein binding sites in transcriptomes. For the method to work, photoreactive ribonucleoside analogs are incorporated into nascent RNA transcripts in living cells. The use of ultraviolet (UV) light of 365 nm cross-links photoreactive nucleoside-labeled cellular RNAs to RNA-binding proteins.
First, photo-reactive thioribonucleosides are incorporated into nascent transcripts. Cross-linking is achieved by irritating the cells with ultraviolet long-wavelengths greater than 310 nm (365 nm is usually used). Immunoprecipitation is used for the purification of cross-linked RNA–RBP complexes which are further purified by SDS-PAGE. The recovered RNA is converted into a cDNA library and is sequenced using next-generation sequencing. Multiple sequencing platforms can be used.
Reverse transcription of cross-linked RNA with incorporated photoactivable thioribonucleosides, followed by PCR amplification, results in a characteristic mutation (T-to-C when using 4SU and G-to-A when using 6SG) that is used to identify the RNA recognition elements.
Dimethylsulfate (DMS, C2H6O4S, 126.13 g/mol) treatment modifies RNA by adding a methyl group to any unpaired or loosely structured adenosine (A) and cytidine (C) in oligonucleotides. DMS mapping is one of the oldest chemical RNA mapping methods. After methylation, the bases can no longer form base pairs and will cause cDNA transcripts to terminate early allowing identification of the presence of unpaired bases. The following use of next-generation-sequencing (DMS-seq/Structure-seq) increases the power of the technique and allows quantitative mapping of base modifications. Targeted Structure-seq further improves the specificity of the technique by using primers targeting the length of a specific RNA of interest without having to sequence the whole transcriptome.
Parallel analysis of RNA structure (PARS) combines classical RNA foot-printing methods with next-generation sequencing. Here, poly(A) selected RNA is folded in vitro and incubated with either RNase V1 or S1 nuclease to probe for double- and single-stranded regions. RNase V1 and S1 nuclease cleavage results in a 5′P leaving group. Next, the RNA is fragmented. Enzymatic cleavage products contain 5′P, but fragmentation and degradation products have 5′OH groups. Only true structure-probing sites can be ligated to adaptors and reverse transcribed. Sequencing the resulting cDNA library using high-throughput sequencing and mapping the resulting reads to the genome allow identification of double- or single-stranded regions in the transcriptome. Different PARS scores indicate that a base is either double-stranded (positive score) or single-stranded (negative score).
Psoralen analysis of RNA interactions and structure (PARIS) is a psoraleb based cross-linking method that works by fixing base pairs of double-stranded RNA (dsRNA) of cells in vivo using the specific cross-linker 4′-aminomethyl-trioxsalen (AMT). PARIS analysis directly determines transcriptome-wide base pairing interactions. PARIS combines in vivo cross-linking, 2D gel purification, proximity ligation, and high-throughput sequencing to allow high-throughput and near-base pair resolution determination of the RNA structures and interactions in living cells. After cross-linking sample are treated with proteinase and the RNA is partially degradated. The resulting sequencing reads represent all native dsRNAs in the organism. These can be mapped to infer their structure.
Psoralen conjugated oligonucleotides
Psoralen conjugated oligonucleotides can also be used as antisense oligonucleotides (ASOs). Triple helix-forming oligonucleotides linked to psoralen (pso-TFO) allow introduction of DNA interstrand cross-links at specific sites in the genome of living mammalian cells. Co-introduction of duplex DNA with target region homology results in precise knock-in of the donor at frequencies of 2 to 3 orders.