Overcoming CRISPR’s shortcomings: optimising Prime Editing with Small Data AI

The advent of CRISPR/Cas9 (also referred to simply as CRISPR) largely revolutionised the field of gene editing. Although it is currently a widely adopted gene editing tool in biomedical research, the promise of delivering breakthrough gene therapy in the clinic has largely fallen short due to several bottlenecks. In this article we discuss the development of Prime Editing, a gene-editing technique that bypasses several important limitations currently holding back CRISPR’s therapeutic applicability and show how the DeepMirror Bio platform can be used to further optimise the technique.

What is CRISPR?

In 2012, the CRISPR revolution swept biology when this naturally occurring bacterial system was repurposed to target human genomic sequences [1]. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are sequences naturally occurring in prokaryotic genomes (bacteria and archaea), remnants of previous viral infections. When combined with Cas9 (CRISPR-associated protein 9), CRISPR sequences direct sequence-specific double strand DNA breaks (DSB) of invading viral genomes, in effect acting as a form of acquired immune defence in prokaryotes. Researchers can direct DNA cleavage to specific target sequences by designing modified guideRNAs (gRNAs) to use in combination with the CRISPR-Cas9 complex [2]. This technology has successfully been adapted to a wide range of applications, including the production of transgenic cell and animal models, gene silencing and activation, genome-wide screens to understand the underpinnings of biological processes, and plant genetic modifications [3].

CRISPR bottlenecks

Despite its large uptake in the research setting, CRISPR’s potential use as a therapeutic tool for gene editing has lost traction, largely owing to its unresolved bottlenecks. Traditional gene editing techniques, including CRISPR, mostly rely on DSBs recruiting endogenous DNA repair mechanisms to carry out a desired repair or sequence change. DNA repair mechanisms deployed after a DSB can be of two types: non-homologous end joining (NHEJ) or homologous direct repair (HDR) (Fig.1)[4,5]. NHEJ is notoriously error-prone, leading to undesirable genome modifications such as indels (insertions-deletions) and off-target effects [5], which hinder the therapeutic adoption of CRISPR, a scenario where the accuracy and safety of gene-editing tools is fundamental [4]. HDR is a precise DNA repair pathway that entails its own hurdles: it requires donor DNA for repair and is restricted to dividing cells. The latter issue has proven to be the second major bottleneck of adopting CRISPR for therapeutic applications, especially in neurological diseases, which largely involve post-mitotic neurons [4,5].

Figure 1: CRISPR and Prime Editing (PE) compared. In CRISPR gene editing, Cas9 molecules carry guideRNAs (gRNAs - purple) that target specific DNA sequences (dark blue) found next to a protospacer adjacent motif (PAM - orange). Cas9 (grey) binding to target sequences causes double stranded breaks (DSB), which in turn trigger DNA repair pathways. In non-dividing, post-mitotic cells, the main DNA repair pathway trigger after a DSB is non-homologous end joining (NHEJ). NHEJ is error-prone and causes high frequencies of insertions and deletions (indels). In dividing cells, homologous direct repair (HDR) can be deployed, a more stringent pathway for repair requiring donor DNA. Prime Editing uses an isoform of Cas that ‘nicks’ DNA, causing single stranded breaks. An addition of a Prime Editing Guide RNA (pegRNA - purple) guides to the cleavage site and serves as a template for gene editing for an engineered reverse transcriptase (RT - pink).

Enter Prime Editing

Prime Editing (PE) was developed in 2019 by addition of an RNA template (referred to as prime editing guide RNA – pegRNA) to direct DNA cleavage and editing; and a reverse transcriptase (RT) fused to a Cas9 nickase (Fig.1)[6]. These two modifications create a large shift in functionality. Prime Editing moves the gene editing paradigm firstly, from DSB to “nicking” (introducing single-strand breaks (SSB)), and secondly, from endogenous DNA repair mechanisms to directed reverse transcription; bypassing two major hurdles of therapeutic application [4]. Firstly, by moving from DSB to SSB, PE does not trigger endogenous DNA repair mechanisms and therefore bypasses the cell state and fidelity restrictions. Secondly, by replacing endogenous DNA-repair mechanisms with directed reverse transcription, gene editing fidelity dramatically increases. Directed reverse transcription is a highly accurate mechanism requiring a template carried within the pegRNA but no donor DNA.

PE has been shown to have an indel frequency of under 0.5% in injected mouse embryos and under 0.1% in mice [7]. When comparing PE and HDR-mediated CRISPR, the ratio of correct editing to indel formation was shown to be 30-fold higher for PE in stem cells and cell lines [6,8]. Off-target effects of PE were undetectable using whole-genome sequencing in mice [7] or organoids [8]. Importantly, gene editing using PE is possible and highly precise in post-mitotic neurons (<1% indels), where the reliance on NHEJ using CRISPR exhibits indel rates of >25% [6]. Indel frequency for HDR is still several orders of magnitude greater than in PE: in cells, indel frequency is 270-fold higher in CRISPR-mediated HDR than in PE3 [6]. Editing efficiency in PE is variable and, in most cases, reported lower than that of CRISPR [4]. PE editing efficiency was reported between 30–50% in human organoids [8] while it can reach >60% in CRISPR [9]. Together, these results underscore the safety and efficacy of PE.

Through future increases in PE efficiency, this new technology holds the promise to bypass major CRISPR hurdles (Table 1). Current work has focused on optimisation of PE components: Cas9, and pegRNA. Several iterations of the enzymatic component have recently been reported (PE1, PE2, PE3)[6], which will help improve PE activity across most applications. However, pegRNA sequences are target specific and will require case-by-case optimisation [10,11]. Our platform, Small Data AI, can aid researchers in identifying optimal pegRNA sequences.



High efficiency



Non-endogenous DNA repair mechanisms for editing



Single strand breaks (SSB)



Low error frequency (indels)



Low off-target frequency



Table 1: Comparison of CRISPR and PE characteristics

DeepMirror Bio to predict pegRNA efficiency

The DeepMirror Bio platform can read pegRNA sequences, predict their structure, and learn molecular graph motifs of high efficiency using Small Data AI, a technology that learns from small datasets (i.e., less than 10k datapoints). Given the need for pegRNA sequence optimisation, we tested whether DeepMirror Bio can be used to find high efficiency pegRNA motifs. We used a public dataset containing 2,600 pegRNA sequences with associated insertion frequencies [10] to build molecular graph representations and estimate editing efficiency (Fig. 2). We here present a sneak peak of the results.

Figure 2: The DeepMirror Bio platform. Our technology, Small Data AI, can create molecular graph representations for small datasets (<10k datapoints) with known and unknown endpoints, in this case for pegRNA efficiency. It then uses semi-supervised learning to predict an unknown endpoint (pegRNA efficiency). Finally, it can also extract structural information to identify motifs in the datasets that correlate with high pegRNA efficiency.

To test the performance of DeepMirror Bio compared to conventional approaches, we classified the pegRNA efficiencies into “successful” (>0.5 normalised percentage inclusion) and “unsuccessful” (<0.5 normalised percentage inclusion) and compared test dataset (800 pegRNAs) performance of DeepMirror Bio against state-of-the-art XGBoost using 180 (10% of full training data) and 1800 (100% of training data) pegRNA sequences respectively. DeepMirror Bio outperformed the XGBoost model at both 10% and 100% with a top AUC (Area Under Curve) score of 0.77, indicating good to excellent performance in predicting the outcome of insertions from a small, validated dataset (Fig. 3a).

Figure 3: Using Small Data AI to predict pegRNA insertion frequency and structural motifs a. Small Data AI performance versus state-of-the-art XGBoost, using Area Under Curve (AUC) as a benchmark for this classification task. b. An illustrative example of a pegRNA motif that associates with >10% insertion frequency.

Having successfully estimated insertion frequency, we queried DeepMirror Bio as to what RNA motif(s) are associated with successful insertions. Small Data AI uses native RNA graph models where the input data can be visualised as an RNA graph and the relevant motif highlighted leading to a biologically interpretable prediction. As an example, we were able to identify a motif our algorithm determined to be relevant for excellent performance (Fig. 3b). Running the motif detection algorithm over the whole dataset will pinpoint further relevant motifs for pegRNA design, greatly improving pegRNA efficacy.

This is a clear example of how our interpretable Small Data AI technology can be used to identify RNA motifs that associate with successful insertion, boosting the discovery of new pegRNA structures and further refining PE efficiency, one of the remaining barriers for a wider adoption of this new gene editing technology.

DeepMirror Bio can also be used for other oligonucleotide sequences (DNA), small molecules, and proteins (e.g., antibodies). Reach out now at enquiries@deepmirror.ai for more information on how to use our technology for your research!


1. Lander, E. S. The Heroes of CRISPR. Cell 164, 18–28 (2016).

2. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science (1979) 337, 816–821 (2012).

3. Barrangou, R. & Doudna, J. A. Applications of CRISPR technologies in research and beyond. Nat Biotechnol34, 933–941 (2016).

4. Scholefield, J. & Harrison, P. T. Prime editing - an update on the field. Gene Ther 28, 396–401 (2021).

5. Bothmer, A. et al. Characterization of the interplay between DNA repair and CRISPR/Cas9-induced DNA lesions at an endogenous locus. Nat Commun (2017) doi:10.1038/ncomms13905.

6. Anzalone, A. v et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature576, 149 (2019).

7. Liu, Y. et al. Efficient generation of mouse models with the prime editing system. doi:10.1038/s41421-020-0165-z.

8. Schene, I. F. et al. Prime editing for functional repair in patient-derived disease models. doi:10.1038/s41467-020-19136-7.

9. Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 2015 529:7587 529, 490–495 (2016).

10. J Koeppel et al. Predicting efficiency of writing short sequences into the genome using prime editing. (2021) doi:10.1101/2021.11.10.468024.

11. Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat Biotechnol 9,.


Related Posts

See All

Home /

News /