Design of sgRNA

This subpage constitutes the third part of the theory for Biotech Academy’s material on CRISPR-Cas9.

To perform gene modification with Cas9 in a gene in a given organism, one must first design gRNA that can recognize the sequence. Single chimeric guide RNA (sgRNA) is a type of artificial gRNA that you can design yourself depending on which DNA sequences are to be modified.
The most commonly used type of Cas9 derives from the immune system of S. pyogene . Here, Cas9 uses a type of gRNA, which consists of a complex of two RNA fragments. The artificially produced sgRNA is a fusion of these two RNA fragments. At the 5′ end of sgRNA is the 20-nucleotide segment used to identify DNA sequences. These are simply selected as complementary to the DNA sequence you want to modify. The other end of sgRNA has a fixed structure that allows Cas9 to bind to it.

Figure 9. The structure of sgRNA, where the identification sequence of 20 nucleotides at the 5′ end can be varied to your liking, so that different DNA sequences can be searched for and cleaved by Cas9. ‘N’ means that there can be any nucleotide at the position. The solid structure towards the 3′ end is used by Cas9 to bind gRNA.

A strategy for designing sgRNA could be:

  1. Identification of a gene to be modified in a given organism
    The DNA sequence of the gene must be known in order to design sgRNA for the sequence. If the DNA sequence of the gene is not known, the gene can first be sequenced.

 

  1. Searching for PAM sequences (5′-NGG-3′)
    It is important to remember that the PAM sequence must still be recognized by Cas9, and thus this must be taken into account when designing sgRNA. PAM sequences should exist to a large extent, as a requirement for an arbitrary nucleotide followed by two guanine nucleotides is not large. It’s actually very nonspecific when talking about DNA sequences, which can be many thousands of base pairs long. Theoretically, one can expect to find 5′-NGG-3′ for every 8 base pairs, assuming that all nucleotides occur randomly with equal probability.

 

  1. Design of the 20 nucleotides identification sequence
    Now the PAM sequence and its location should be taken into account. Here it is important to remember the recognition mechanism of Cas9, as the design of correct sgRNA is crucial for the success of the gene modification. First, the DNA sequence is found, which Cas9 must bind to from the PAM sequence, and then the sequence is translated into the corresponding sgRNA strand.
    The important information for identification with the 20 nucleotides is:
     

    The PUT DNA sequence is a direct extension of the 5′ direction of the PAM sequence.

     

    The PUT DNA sequence lies on the opposite strand of the PAM sequence.

     

    The location of the 20 recognized nucleotides can be seen in Figure 4.

    Since the sequence has now been identified, the sequence simply needs to be translated using knowledge of Watson-Crick base pairing. Here it is remembered that Adenine (A) and Thymine (T) are a pair, and that Cytosine (C) and Guanine (G) are a pair. For RNA, thymine is replaced with Uracil (U), which also pairs with adenine.

    For example, the DNA sequence 3′-ACTGGCTAGTACTGCAATGC-5′ could have been found, which would be translated into the 20 nucleotides of sgRNA like this:

     

    DNA Sequence: 3′-ACTGGCTAGTACTGCAATGC-5′
    PUT_CHARACTERS_HERECHARACTERS
    sgRNA sequence: 5′-UGACCGAUCAUGACGUUACG-3′

     

    This sequence can thus be inserted as the 20 nucleotides in one’s sgRNA, indicated by ‘N’ in Figure 9. Thus, they have finished designing their sgRNA, which can now recognize the DNA sequence.

 

The designed sgRNA can now be manufactured, which can easily be done by submitting the sgRNA sequence to a company that specializes in making synthetic RNA. There are several versions of gRNA where attempts have been made to change the lengths of the different segments to optimize the efficiency and specificity of Cas9 identification. The natural form of gRNA consisting of RNA complexes has been used in the past for genetic modification, which is not as effective. Fortunately, it is also more practical to make a whole piece of synthetic RNA that is ready for use, rather than having to tinker with the production of several pieces of RNA that must first form complexes with each other before use.