This recent note in an "obscure but high-profile journal" (what?!) is entirely a...

aargh_aargh · on June 6, 2017

“If it squirms, it's biology. If it stinks, it's chemistry. If it doesn't work, it's physics. And if you can't understand it, it's mathamatics.”

― Magnus Pyke

Anyway, thinking about your example it occured to me that someone using Cas9 should be able to look at the sequenced genome and identify all sequences sufficiently similar to the target sequence that could be unintentionally affected by Cas9. There's one idea for an experiment. As for mitigation, they could try to find another specimen (fat chance) or species which does not have such similar sequences. Of course, mutations will still happen for a myriad of reasons other than Cas9, as they always do.

jfarlow · on June 6, 2017

Sure - those are good ideas, and they do (did) such searches prior to determining which sequence to look for in the genome. Just a few caveats off the top of my head though:

- When you are sequencing a genome how do you know whether or not the windowed sequence you have is found only once in the genome? A solvable problem for most sequences, but still not a trivial caveat.

- Organisms utilize genomic duplication in order to create backups/redundancies/variants of their most critical systems. These backups eventually 'drift' from their parent, but how much drift is required to know whether or not it will be accidentally 'found' by Cas9.

- Cas9, like squirmy biology, is time dependent. If you give it 100 years it will likely cut most any sequence at some point during the period. So how do you measure/account for the literal number of Cas9 proteins, much less the amount of time they remain in a given nucleus.

- DNA is relatively hardy as far as biological molecules go, but there are all sorts of different kinds of chemical errors (much less intentional modifications) associated with DNA that could make a given sequence more/less likely to be misread by Cas9.

- Are all your gRNAs actually the exact sequence you think they are? RNA is much less stable than the hardy DNA mentioned above. What if some of your RNA is being modified before it even gets to go homing around for its complementary sequence.

etc. etc. etc.

Angostura · on June 6, 2017

Something that's puzzled me, for a while, but since you are knowlegedable.

AIUI Cas9 matches 20-base-pair sequences, in something the size of the human genome, wouldn't you expect more than one such exact match, anyway?

jfarlow · on June 6, 2017

If you calculate information density, it turns out that 20-base pairs is VERY unique. The human genome is ~3 billion base pairs. Each base pair is one of four different bases. So 15 base pairs at 4^15 carries 1 billion bits of information. So long as the information is evenly distributed (which it is not), 15 random base pairs has a likelihood of occurring ~3/genome. So 20 base pairs, at 4^20 is an obscenely localized search space. But again, biology is mushy, so it's more of a time-dependent lossy grep than a true exclusive search. There are of course a lot of caveats to this kind of back-of-the-envelop calculation, but on the whole it's mostly accurate. 20 base pairs contains a LOT of information, and is generally sufficient to be unique in a particular genome.

Angostura · on June 7, 2017

Thank you for doing the maths, so I didn't have to. Much appreciated.