Hacker News new | past | comments | ask | show | jobs | submit login

This recent note in an "obscure but high-profile journal" (what?!) is entirely a technical observation. I fear that the CRISPR phenomenon has a lot of people, even scientists, who know just enough about the topic to want to affect stock prices. And this too seems to be the Wired article's interpretation even after its own semi-clickbaity headline: "But most scientists, while skeptical of the results, were more disappointed in the way the paper was blown out of proportion."

The promise of CRISPR was not that it was to be the singular tool that would change how genetics and biology behaved. Rather the Cas9 protein could be used as a cheap and fast technique to colocalize with specific DNA sequences. And that is fantastic, useful, and worth all of the praise the tool has gotten. But that's it.

That Cas9 happened to also actually allow a modification at specific sites using its inherent DNA cleavage capability was a great little bonus - and that it could be cleverly controlled to use that capability to actually, today, cure some diseases, was amazing.

There is very little in biology that is binary in nature - it takes a lot of energy to maintain such an entropic dam between two states. So a protein that cuts at ATTGCTTGTA with 80%/hr/molecule efficiency, will also cut ATTGGTTGTA with some non-0% efficiency as well. Every scientist who works with Cas9 (should) already know this. I don't think Cas9 is fantastic for it's ability to cleave DNA - we've had restriction enzymes for a long time - but rather for it's ability to colocalize with arbitrary DNA sequences. And with that colocalization we can now bring to bear all the rest of the fantastic tools we already have in synthetic molecular biology. And that is the interesting part of our future - in which Cas9 is but a single (very useful) tool in our kit.




“If it squirms, it's biology. If it stinks, it's chemistry. If it doesn't work, it's physics. And if you can't understand it, it's mathamatics.”

― Magnus Pyke

Anyway, thinking about your example it occured to me that someone using Cas9 should be able to look at the sequenced genome and identify all sequences sufficiently similar to the target sequence that could be unintentionally affected by Cas9. There's one idea for an experiment. As for mitigation, they could try to find another specimen (fat chance) or species which does not have such similar sequences. Of course, mutations will still happen for a myriad of reasons other than Cas9, as they always do.


Sure - those are good ideas, and they do (did) such searches prior to determining which sequence to look for in the genome. Just a few caveats off the top of my head though:

- When you are sequencing a genome how do you know whether or not the windowed sequence you have is found only once in the genome? A solvable problem for most sequences, but still not a trivial caveat.

- Organisms utilize genomic duplication in order to create backups/redundancies/variants of their most critical systems. These backups eventually 'drift' from their parent, but how much drift is required to know whether or not it will be accidentally 'found' by Cas9.

- Cas9, like squirmy biology, is time dependent. If you give it 100 years it will likely cut most any sequence at some point during the period. So how do you measure/account for the literal number of Cas9 proteins, much less the amount of time they remain in a given nucleus.

- DNA is relatively hardy as far as biological molecules go, but there are all sorts of different kinds of chemical errors (much less intentional modifications) associated with DNA that could make a given sequence more/less likely to be misread by Cas9.

- Are all your gRNAs actually the exact sequence you think they are? RNA is much less stable than the hardy DNA mentioned above. What if some of your RNA is being modified before it even gets to go homing around for its complementary sequence.

etc. etc. etc.


Something that's puzzled me, for a while, but since you are knowlegedable.

AIUI Cas9 matches 20-base-pair sequences, in something the size of the human genome, wouldn't you expect more than one such exact match, anyway?


If you calculate information density, it turns out that 20-base pairs is VERY unique. The human genome is ~3 billion base pairs. Each base pair is one of four different bases. So 15 base pairs at 4^15 carries 1 billion bits of information. So long as the information is evenly distributed (which it is not), 15 random base pairs has a likelihood of occurring ~3/genome. So 20 base pairs, at 4^20 is an obscenely localized search space. But again, biology is mushy, so it's more of a time-dependent lossy grep than a true exclusive search. There are of course a lot of caveats to this kind of back-of-the-envelop calculation, but on the whole it's mostly accurate. 20 base pairs contains a LOT of information, and is generally sufficient to be unique in a particular genome.


Thank you for doing the maths, so I didn't have to. Much appreciated.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: