This whole section relates to that...maybe you can suggest an addition to the author?
There are lots of possible explanations for the massive amount of non-coding DNA - one of the most appealing (to a coder) has to do with ‘folding propensity’. DNA needs to be stored in a highly coiled form, but not all DNA codes lend themselves well to this.
This may remind you of RLL or MFM coding. On a hard disk, a bit is encoded by a polarity transition or the lack thereof. A naive encoding would encode a 0 as ‘no transition’ and 1 as ‘a transition’.
Encoding 000000 is easy - just keep the magnetic phase unchanged for a few micrometers. However, when decoding, uncertainty creeps in - how many micrometers did we read? Does this correspond to 6 zeroes or 5? To prevent this problem, data is treated such that these long stretches of no transitions do not occur.
If we see ‘no transition,no transition,transition,transition’ on disk, we can be sure that this corresponds to ‘0011’ - it is exceedingly unlikely that our reading process is so imprecise that this might correspond to ‘00011’ or ‘00111’. So we need to insert spacers so as to prevent too little transitions. This is called ‘Run Length Limiting’ on magnetic media.
The thing to note is that sometimes, transitions need to be inserted to make sure that the data can be stored reliably. Introns may do much the same thing by making sure that the resulting code can be coiled properly.
However, this area of molecular biology is a minefield! Huge diatribes rage about variants with exciting names like ‘introns early’ or ‘introns late’, and massive words like ‘folding propensity’ and ‘stem-loop potential’. I think it best to let this discussion rage on a bit.
2013 Update: ten years on, the debate still hasn’t settled! It is very clear that ‘Junk DNA’ is a misnomer, but as to its immediate function, there is no consensus. Check out Fighting about ENCODE and junk for a discussion of where we stand.
2021 Update: eighteen years on, the debate is nowhere close to being settled. It is now somewhat consensual that ‘Junk DNA’ has important and diverse functions, but new discoveries are being made on a daily basis. https://www.advancedsciencenews.com/that-junk-dna-is-full-of...
I think you're grossly underestimating how large of a hole this really is.
The author spends entire sections on things that are completely unimportant (e.g DNA error correction)... while leaving most of epigenetics and regulatory genomics completely out of the picture.
There are lots of possible explanations for the massive amount of non-coding DNA - one of the most appealing (to a coder) has to do with ‘folding propensity’. DNA needs to be stored in a highly coiled form, but not all DNA codes lend themselves well to this.
This may remind you of RLL or MFM coding. On a hard disk, a bit is encoded by a polarity transition or the lack thereof. A naive encoding would encode a 0 as ‘no transition’ and 1 as ‘a transition’.
Encoding 000000 is easy - just keep the magnetic phase unchanged for a few micrometers. However, when decoding, uncertainty creeps in - how many micrometers did we read? Does this correspond to 6 zeroes or 5? To prevent this problem, data is treated such that these long stretches of no transitions do not occur.
If we see ‘no transition,no transition,transition,transition’ on disk, we can be sure that this corresponds to ‘0011’ - it is exceedingly unlikely that our reading process is so imprecise that this might correspond to ‘00011’ or ‘00111’. So we need to insert spacers so as to prevent too little transitions. This is called ‘Run Length Limiting’ on magnetic media.
The thing to note is that sometimes, transitions need to be inserted to make sure that the data can be stored reliably. Introns may do much the same thing by making sure that the resulting code can be coiled properly.
However, this area of molecular biology is a minefield! Huge diatribes rage about variants with exciting names like ‘introns early’ or ‘introns late’, and massive words like ‘folding propensity’ and ‘stem-loop potential’. I think it best to let this discussion rage on a bit.
2013 Update: ten years on, the debate still hasn’t settled! It is very clear that ‘Junk DNA’ is a misnomer, but as to its immediate function, there is no consensus. Check out Fighting about ENCODE and junk for a discussion of where we stand.
2021 Update: eighteen years on, the debate is nowhere close to being settled. It is now somewhat consensual that ‘Junk DNA’ has important and diverse functions, but new discoveries are being made on a daily basis. https://www.advancedsciencenews.com/that-junk-dna-is-full-of...