The mouse genome is only 160 megabytes, and contains the instructions for buildi...

dunk010 · on July 6, 2017

Absolutely dead wrong. It's 2.5Gb, where Gb = Gigabases. Learn you a genomics, son.

https://www.nature.com/nature/journal/v420/n6915/full/nature...

Asooka · on July 6, 2017

There are four bases, so one base encodes two bits of information. Eight bits are one byte, so four bases are one byte. 2500 megabases = 625 megabytes. So yeah, Parent was off by a factor of 5-6 :) . But still, that fits on one CD.

dunk010 · on July 6, 2017

Except that currently genomics requires even more information to be encoded - such as quality scores, allele frequencies, phase information, ... - so, depending on the format, this estimate is off by either one or two orders of magnitude still.

chridal · on July 6, 2017

No need for being rude.

mmerlin · on July 6, 2017

I read that last line as funny / tongue-in-cheek, as if every normal person learns their ATGC's as easily as their ABC's :-D

chongli · on July 6, 2017

only 160 megabytes

Take a bunch of source code. Compile it, obfuscate it, compress it, and encrypt it with AES such that the result is a 160MB blob. Now see how long it takes to figure out what it does, given a computer that costs a lot of money just to load your program and a long time to give you a result. The upper bound on the complexity of DNA as it relates to the complete expression of an organism phenotype is insanely high.

dunk010 · on July 6, 2017

Most developers think of the genome like a big load of source code, and if only we could work out where the if and for statements were we could read it. This is an extremely naive and overconfident point of view; the analogy between source code and genomes is very poor. The genome is coding for proteins (by way of RNA). Those proteins are subject to all of physics (think: electrostatics, hydrophobics, ....), whereas your code is an abstract entity designed to run on a rather simple analogue of a Turing machine. The complexity of life is much harder I am afraid. Though that never seems to stop developers assuming that they can create a crude analogy which explains it. Also, the size is totally wrong; see previous comment.

vilhelm_s · on July 6, 2017

It's true that genes and proteins is nothing like code, but in the context of understanding the brain, I think that should be cause for optimism, because it means that nature has its hands tied behind its back. The genes can't just contain a description of how the brain should be wired together, because the description also has to be "self-executing"; the entire object must robustly self-assemble just from proteins physically interacting. So although 700 megabytes of mouse genes could potentially contain a lot of stuff, it might be possible to do the same thing much more simply if we can program a digital computer instead.

Like, the connectome for C. elegans has been mapped out; it's can be written down as a 2 megabyte ascii text file. Just the connectivity is not enough to actually reproduce the behavior of the worm, you would also need data about the weight of each connection, but it's still a lot less data than the worm genome (about 25 megabytes---I hope I got the number right this time!). The worm genes also need to contain a lot of additional stuff to build functioning cells internals, etc, stuff which hopefully is irrelevant to the actual cognition.

josefx · on July 6, 2017

> whereas your code is an abstract entity designed to run on a rather simple analogue of a Turing machine.

I cannot adequately put the insane laugh required as response to that into text form. So I will only write this and be just as right: going by physics the brain of a mouse can be adequately approximated by a perfect sphere.

dunk010 · on July 6, 2017

I guess we'll all have to just imagine you're right, then. Mwahahaha!

josefx · on July 6, 2017

The definition of a turing machine is mathematically perfect. No threading, no IO, no error correction, no errors, no asynchronous events, no processes fighting over shared resources, no resources that might or might not disappear at the blink of an eye, in short no nothing. In that it is equivalent to a spherical brain, any complexity relevant to the problem at hand removed.

dunk010 · on July 7, 2017

You're making things far too complex, and confusing the issue, and yourself, as a result. Let's turn to the first sentence from Wikipedia:

"A Turing machine ... manipulates symbols on a strip of tape according to a table of rules"

Whichever programming language you are fond of ultimately reduces to this mode of computation. However, with DNA, RNA, and Proteins, that is not the case. The way that we compute is simplistic compared with the way that biology computes. Thus: the crude analogy in fact hinders understanding, and should be discarded.

nikki93 · on July 6, 2017

the "learning" can come from things like the compounds in various food and so on! "learning" is, in a generalized sense, any non-genetically-bootstrapped environment->body information transference...

taneq · on July 6, 2017

The busy beaver of 1.6x10^8 can produce an awful lot of stuff.