Hacker News new | past | comments | ask | show | jobs | submit login
The Omnigenic Model as Metaphor for Life (slatestarcodex.com)
94 points by infruset on Sept 14, 2018 | hide | past | favorite | 35 comments



What changed between the few-gene era and the polygene era is that we got sequencers that could read the genomes of an entire experimental cohort at reasonable cost. Likewise, when we have the capability to do single-cell profiling of the epigenetic signature of an array of relevant cell types, for every member of your cohort, there’s decent odds biologists will think it ludicrous that we hoped to get meaningful results out of lone, unannotated, out-of-context genomes.

EDIT: I would specify that I'm not saying that experimental biology "doesn't work" or doesn't get meaningful results before the bigger new technology arrives. I think the Slate Star Codex article overstates the helplessness of single-gene theories, which did explain a bunch of diseases and simple attributes, and had significant medical impacts in understanding things like cancer and chemotherapy effectiveness. It just failed to accomplish a set of other things that people wanted, like explaining intelligence or height. Each new advance (cheap genome sequencing, epigenetic readouts, hugely longitudinal metabolomics, molecule-level microscopy, and the hundred-plus advances in this direction we haven't even conceived of yet) expands the territory of what we can address through molecular biology.

Scientists are often really optimistic about whether topics of interest lie inside or without this territory at a given moment, which I think has to do more with the incentives of grant writing than anything else.


Yeah, the hidden caveat is that in a lot of cases, science "suddenly" progresses when scientists get access to better equipment or get trained in new methods, and can suddenly admit to our funders that the things we were doing before sucked.


I dunno. There's a lot to be said for squeezing as much as you can out of whatever technology you currently possess. The current iteration of sequencing will surely be looked back on as woefully inadequate, but we're learning a ton with it now.


The problem is I don't think that possibility can be evoked for situations where one discovers more and more things are part of a single, vast, interrelated system.

At least until we get "AI Complete" approach, we can't expect to suddenly a means to "debug" a system crudely analogous to a "million lines of spaghetti code" (and yes, only crudely analogous but the ways that analogy break down only makes the whole thing harder to understand).


I learned about this in bio back in the 2000s. As I studied how genomes actually worked it quickly became apparent that genetic systems are absolutely nothing like human engineered systems. Every part simultaneously interacts with every other part in real time and determines everything in parallel.

Take the genome for example. I really think the notion of gene sequences hobbles our understanding. The genome is not a sequence. It can be sequenced, but that's not what it actually is. A genome is a molecule. Every part of the surface of that molecule is constantly interacting with other molecules. In programming terminology it's like a program where every instruction executes in parallel, always, and in real time. That means that trying to read the genome sequentially like a program or a book is missing the point entirely. We're holding it wrong.

I truly think we are a long way from being able to really "grok" these systems. It took us thousands of years to develop the math, logic, and science to understand complex systems with discrete components and discrete logic. In many ways the digital computer is the apex product of our understanding of discrete systems and discrete logic.

Now we get to figure out concurrent systems and concurrent n^n combinatorial "logic." It might take a lot less than thousands of years because there are more of us and we have a lot more knowledge to work with, but it's not going to be overnight.


It's not going to be the sort of thing we "grok", at least in the traditional sense. Human minds can fully understand relatively simple discrete logical steps in a system but massively parallel interactions like the genome are fundamentally beyond our ability to "grok".

In order to make sense of and manipulate things like genetics we will need to develop machines that can do those things for us. While that's unsatisfying because we generally like the feeling of fully understanding things, such machines will still yield progress and results, which is all we can really hope for here.


I don't think a human in 500 BC had the intellectual tools to understand a modern CPU or computer program. An electronic circuit would have been absolutely inscrutable to them. A time traveler might be able to talk them through it, but a time traveler would have the benefit of future intellectual tools not available at the time. It's one thing to teach what is already understood (to you) and another to comprehend for the very first time in history.

I've thought for many years that there are new intellectual tools waiting to be discovered here that will be as big as arithmetic, calculus, or logic. There was a time when humanity had no idea what mathematics -- the whole field -- was, and today there are probably whole analogues to mathematics waiting to be discovered.

Unfortunately we are still in the phase of trying to attack this problem with old ways of thinking. We probably won't even try until we finally come to terms with the fact that the tools we have at our disposal right now do not work to truly understand the genome. This will take a while as humans become emotionally attached to their tools and cling to them. Try debating a programmer on OSes, languages, or editors to see a simple example. :)

Bonus is that once we understand the genome we'll probably understand a lot of other unknown unknowns we didn't even realize we didn't understand. Maybe this is why physics seems stuck. Maybe the cognitive tools we have right now are simply not up to the task of understanding the whole thing.

Edit:

I actually think Stephen Wolfram's doorstop A New Kind of Science was groping in this direction. The book was problematic because of Wolfram's almost comical narcissism (Wolfram sort of tries to take credit for a lot of things he didn't invent), and the techniques it discusses don't seem to have delivered much fruit in and of themselves. Nevertheless at the "meta" level the notion of trying to invent fundamentally new intellectual tools is absolutely what we should be doing. We will of course fail a lot, but that's what happens when you try to do something new.


> I don't think a human in 500 BC had the intellectual tools to understand a modern CPU or computer program. An electronic circuit would have been absolutely inscrutable to them.

I'm pretty sure they could learn to write programs. The had algorithms.


They could if it were explained to them. I doubt they could figure out a more complex sort of algorithm if they were given the artifact with no explanation. Doubly so if the algorithm involved things like calculus and modern number theory.

We do not have aliens or time travelers to walk us through genomes and fill in the missing pieces of our understanding.


If your are referring to the physical CPU itself, then you are correct as they would lack the technology to even see the circuit paths (much less measure current).

But, if you are suggesting a smart human from 500 B.C. couldn't grasp 'The Art of Programming,' I'd respectfully disagree. Logic and reasoning haven't changed in recorded history. The ancients were no more or less intelligent than the moderns. Whenever I'm tempted to think otherwise, I sit down with my Euclids Elements and see how far into it I can get before I reach the "WTF... How did he figure THAT out!?" An even better cure -- although more recent -- is to see how far you can get through Newton's Principea.


I see your point but I still dont agree.

Everything seems obvious and easy in hindsight because we are viewing it with those intellectual tools deeply embedded into our understanding. They are all over our culture and we pick up bits of them as children through osmosis even before we study them formally.

I think getting an ancient Greek or Roman intellectual to understand a large integer factoring algorithm, a proof of work block chain, or an OS kernel would be pretty painful. It would take a lot of tutoring to first teach a lot of things that were not understood in that time at all.

You can sometimes see this today when you see older people in rapidly developing nations trying to learn advanced concepts. They can do it but it takes a while.

My point is that all this assumes a tutor who knows and can explain. For levels of understanding not yet reached by any human, there is no tutor to teach us how to think about the problem.


Two of your three examples consist of algorithms that take a large number of steps. Without an ability to perform a large number of steps in a short amount of time, there wouldn't be the need to think about such things. That is, your tutoring of the ancient would consist of explaining an algorithm that might take 100,000 calculations. He would grasp it, but shrug and say "so what? It would take years to perform that algorithms. Let me show you a different way that results in a damn good approximation that requires nothing more than a compass, straightedge, stylus and a piece of string."

In short, you are confusing reasoning ability with algorithms designed for a particular form of technology.

Take neural nets. Would someone from the 1970s understand the benefits of a convoluted net vs. a simpler form? Perhaps. But, without the technology to perform millions of training calculations, the lack of comprehension would come from your pupil wondering what the point would be of trying to understand an algorithm that, with his technology, could never be demonstrated, used, or tested.


There's no way that anybody could possibly understand a modern CPU all at once. What allows us to create them is that we've inserted various abstraction barriers into their design to break them up into comprehensible pieces that can be understood one by one. Evolution doesn't have the same need to insert abstraction barriers for comprehensibility, so its designs may be truly incomprehensible - at least to being with a working memory as small as a human's is.


I totally agree! It seems to me that the current scientific paradigm is still way to reductionistically or mechanically oriented.

While cybernetics or the theories of emergence in systems-theory is a step in the right direction, i think the reality of how "something", "works" is way weirder than we expect.

It's much less about how different components superficially "interact" with each other and much more about mind-numbingly complex emergent properties that somehow "happens" between objects, emerge through complex space-time interactions probably with even weirder quantum factors or "fields" that are not apparent by n-dimensional modelling.

When you really begin to dig deep into these problems it becomes an increasingly weird mixture of philosophy, physics and mathematics, revealing endless paradoxes as we increase the resolution.

Basically it becomes like a chicken and egg problem: is the complexity and life-force of a cell or its components in any way able to be "deworlded" or is the cell or other complex organisations simply not able to be understood without the context in which they exists, in other words, the entire universe.

We won't be able to "understand" by looking at the "thing" because no thing is actually at the center, it's not "doing" anything and doesn't even exists outside of our own simple taxonomical concepts. The genome can't be sequenced because it's simply a way to label, divide or reduce a "thing".

Feeding a neural net with enough extremely high resolution data from a space large enough and i could imagine extremely strange patterns emerge. How we would be able to capture such data from reality is the big problem because we simply don't know how many factors are at play. Protein folding is already an incredibly demanding calculation and it's very simple compared to even the tiniest mechanisms of a cell.


> The most recent estimate for how many genes are involved in complex traits like height or intelligence is approximately “all of them” – by the latest count, about twenty thousand. From this side of the veil, it all seems so obvious. It’s hard to remember back a mere twenty or thirty years ago, when people earnestly awaited “the gene for depression”.

I wonder how much that's just a technicality, in the same way you could say the inverse square law for gravitation is wrong because really every massive particle has some influence on every other particle, etc.

So maybe it's the case that every gene is involved in every trait, but maybe there's a handful that account for 99% of what we care about in that trait? (Then again, I can imagine for something like intelligence that most of the genome is really involved—height though?)

EDIT: I had some remarks about the term 'gene' here that were incorrect and turned into a useless diversion.


It's more like, early on we found some traits like "blood type" that really did correspond closely to a small number of genes. So we theorized they all worked like that. Now, we know that however height works, it doesn't work like blue eyes do.


There is apparently a single gene variant that some short Peruvians have that (with two copies) can make a 4 cm difference. [1] But that's the extreme case. The next 700 genes explain 7% of their height.

Generally speaking, most genes individually make such a small difference that it takes a huge dataset of genomes to find them.

[1] http://www.sciencemag.org/news/2018/05/study-short-peruvians...


They are probably only applying the 700 genes as single factor interactions. I doubt all genes affect height. It only takes about 33 gene variants to have more genetic possibilities than there are people on earth. If they interact non-linearly, they are almost impossible to find by genetic surveys.

A fundamental problem with genetic analysis is that gene variants are categorical variables that often have non-linear effects, but predictive power goes down drastically as soon as you start looking for multi-factor effects. You have to narrow down the multi-factor search space, possibly by narrowing it down to genes that have a linear effect, but that could still be too large and you could easily miss genes that on average have no effect.

I wrote a critique of a paper using this method. I proposed studying the distribution of variant data's residual distance from a single-factor linear fit. Consider a variant that has a positive effect on height when combined with another variant, but a negative effect otherwise. The single-factor linear fit will assign it 0 effect, but residuals involving the variant will be unusually large, positive when the second variant is present and negative when it is not. My critique found several variants fitting this description (after a bonferroni correction), but i didnt have the compute power or time to rerun the 2nd order interactions with my findings.

But hey, I got a B on that paper, so maybe it's a terrible idea.


This is indeed a massive problem with GWA (genome-wide association), since the studies only see additive variation. Lot of interesting techniques around to look at non-additive variation. Had acquaintances working on drug-drug interactions and the genetics of skin colour; had to find some cool workarounds.

On the other hand, given that height is closely linked to metabolic conditions, and both the transcription/translation rate and functioning of every cellular component relies on metabolism, it doesn't take long for changes to propagate to simple traits like height.


> I'm using 'gene' as it's used in the quote, but from my understanding of the term it seems like it becomes meaningless if the author's statement were true

It's not; the authors statement seems to use what I udnerstand to be the standard definition used in the field, where a gene is sepcifically a sequence which codes for the synthesis of a particular protein or nucleic acid sequence, which is essentially the lowest level “simple” trait. This is perfectly consistent with large and overlapping sets of genes being involved in determining each of the complex traits that humans mostly care about, and does not make the term “gene” meaningless.


I see. That make sense. I had thought we just used it to refer to genome sections related to 'macro traits' (not sure of a better term) we're interested in.


That is a common use, which shouldn't be surprising because it's how we built a theory around genetics even before we had any idea about DNA. It just turns out in practice that the idea of single distinct unit of heredity, while it works as a close enough approximation for the traits that were observed to establish genetic theory, doesn't work for lots of traits of interest.


Research into the question, "which programming languages are more productive?" certainly suffers from looking at single causes in a massively polycausal system.

If you assume the effects of individual causes combine linearly, you can still look at one cause at a time. But programming languages interact with the problem domain, library availability, team preference and experience in non-linear ways.


Seems premature to claim "it works" based on predictions. That was what got us in trouble last time.


> This side of the veil, instead of looking for the “gene for intelligence”, we try to find “polygenic scores”. Given a person’s entire genome, what function best predicts their intelligence? The most recent such effort [citation in the original article] uses over a thousand genes and is able to predict 10% of variability in educational attainment. This isn’t much, but it’s a heck of a lot better than anyone was able to do under the old “dozen genes” model, and it’s getting better every year in the way healthy paradigms are supposed to.

I think that's a much more measured claim than you're giving credit for; "it works" is claiming to explain 10% of variance. Do you have specific complaints about the cited article? Or are you just arguing by analogy that since people were wrong in the past, they are probably wrong now?


What do you mean by "it works"? Polygenic scores?


I'm doubtful that polygenic scores work, in the sense that we all acknowledge that complex traits like intelligence are interactions between genetics and environment.

You can always decompose a function of two variables into three parts (with some hand-waving for notation):

f(a, b) = f(a) + f(b) + interaction term

where E(interaction term) = 0 in some statistical sense.

If the interaction term is zero, then great, your function is trivially decomposable. For our problem, take a = genetics and b = environment, that means you can precisely talk about someone's 'genetic score for intelligence' and someone's 'environmental score for intelligence', and never have to consider them together.

But I strongly suspect that the interaction between genes and environment is very very high; the latest effort to map genes to intelligence only accounts for 10% of variance not because the environment determines the other 90%, but probably the interaction term, which can be complicated and highly nonlinear.

Another thing: variance of your inputs can only be considered in a statistical sense so the relative importance of genes vs. environment won't be stable. If somehow the world became completely uniform (every single child in the world received the exact same education and upbringing), you'd expect genetic variation to account for everything just by definition.


You can't do that with things that interact, e.g.:

f(a,b) = b/a


f(a,b) = b/a = 0 + 0 + b/a = f(a) + f(b) + interaction term

In this case the system is contained entirely within the interaction term, and since you don't know the distributions of a and b there's not much motivation to go further. If you had the distributions of a and b you might be able to do something a little less trivial by skewing the interaction term to have an expected value of zero, potentially like:

a/b = a + b + (a/b - a - b) = f(a) + f(b) + (interaction term with E[term] = 0)


If by "interaction term" it is meant f(a,b), then I congratulate the author of that tautology.


Hidden in the parent's comment is the additional condition that f(a,b) have an expected value of zero. It's still pretty easy to prove, though.


Height is measurable. Intelligence really isn’t.

I’m naively confident that we’ll find gene patterns for height soon.


Bear with me against the impression of pandering, but I see an analogy between the author's prescriptions and "product-market fit". The power of that term is that it discards the notion of an optimal product or perfect audience - the two can only be right for each other. In the same way, problems with polycausal phenomena might admit specific solutions (Prozac for depression) without being fully understandable.

The correlary is that successful products can litter the world with unintended consequences - as can isolated discoveries.


I would think a big part of the problem is we shouldn’t look at genes but rather how genes are regulated.


I remember thinking how utterly stupid it was to think it was a single gene that determined these.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: