Hacker News new | past | comments | ask | show | jobs | submit login
Butterfly wing patterns emerge from ancient ‘junk’ DNA (cornell.edu)
114 points by hhs on Oct 21, 2022 | hide | past | favorite | 76 comments



This sounds really neat, from the layman’s perspective. Are they saying that the neighboring non coding genes are acting as a kind of entropy source for the coding genes? If so, are there other known examples?


(From another layman) this sounds a bit like Turing Patterns, which are found all over nature.


you and Oarch both get the idea, its not a pattern for butterfly wings, its a pattern. the pattern is conserved across species, and reused to direct morphogenesis.


Don't we have similar terminology for some of human organs. Appendix or some other organs are assumed useless. Personally I feel it's insane to call DNA or organs useless instead of acknowledging that we don't yet understand there true purpose.


Iirc appendix is now known to harbour gut microflora/bacteria when you have diarrhea, to keep them safe etc while your body clears whatever it deems unwanted


Not everything serves a purpose. Onions have more DNA than humans.


Has it been demonstrated that that extra DNA has no function or are you just assuming "onions are less complex than us"?


Ok now I'm afraid to look in that bottom drawer in the kitchen, who knows what those devious bloody alliums are up to in there.


plant like double their genome, and become stronger.


This term used almost twenty years ago.


I'm not a biologist but the whole term "junk DNA" always felt a bit off to me. Millions of years of evolution and having a significant portion of DNA in all living organisms being "junk" doesn't make sense at all.


It's all 'junk' from a scientific materialist perspective.

It's a chemical reaction just like any other - it's random noise.

There's no (scientific materialist) reason to attribute 'junk' or 'not junk' to a sequence of random bits of information. There can be no 'purpose' to random noise.

Now, it is a bit odd that the most ardent secular scientific materialists do, truly in their own living experiences, tend strongly to attribute such anthropological attributes to such things, by calling some of these processes 'life', using terms such as 'evolution' instead of what is just 'random changes some of which persist in time', isn't it? It's almost like they 'believe' in some metaphysical orientation or principle which is a bit different than their stated 'belief' of materialism and the odd bunch of equations we posit to govern 'everything'?

Perhaps we can avoid going all the way down the intellectual rabbit hole by shifting these kinds of discussions into the evolving (pun intended) field of 'emergence' ...


Why? If it can copy, junk just sits there for eons. Think of free-again sectors on your old hdd that were not overwritten due to how fs driver allocates them.


Perhaps. Still, my gut feeling tells me that there is much more to it that we haven't even scratched the surface as humanity.


Pretty ingenious, reminds me a bit about how Elite double used its own program code as a lookup table for building a procedurally generated universe in order to conserve memory.

I figured it was ingenious back then but I had no clue it‘s biomimetic :)

https://en.m.wikipedia.org/wiki/Elite_(video_game)


I feel like at this point I need a browser plugin which changes the phrase "junk DNA" to literally any other phrase

One of the worst names I've come across for a concept


True enlightenment is when you realize that junk DNA is actually a great term, because evolution is great at building things out of junk and repurposing them.

The "junk" areas of repetitive DNA were actually identified as cauldrons of evolution in the early days of research, so we have known since the beginning that the junk was actually crucial.

Barbara McClintock's Nobel for this was awarded way back in 1983

https://en.wikipedia.org/wiki/Barbara_McClintock


This is a pet peeve of mine - the common misconceptions that "junk DNA" is just DNA with an unknown function.

By far most junk DNA really is junk. We know this because we know its nonfunctional origin (transposons, pseudogenes and similar), because we can do studies of conservation between species, and because there is a huge range of junk DNA content in otherwise similar species.

The misconception come from media bias: You'll never hear a story about "junk DNA really is junk, researchers find"


Transposons and pseudogenes may be nonfunctional for an individual organism, but aren't they useful for evolution to create new genes? Wikipedia suggests some usefulness:

"While some TEs confer benefits on their hosts, most are regarded as selfish DNA parasites" (https://en.wikipedia.org/wiki/Transposable_element#Evolution)

"Pseudogene sequences may be transcribed into RNA at low levels, due to promoter elements inherited from the ancestral gene or arising by new mutations. Although most of these transcripts will have no more functional significance than chance transcripts from other parts of the genome, some have given rise to beneficial regulatory RNAs and new proteins." (https://en.wikipedia.org/wiki/Pseudogene)

I would define truly junk DNA as any DNA that if removed would be beneficial or not harmful to the majority of organisms in a species, and be beneficial or not harmful to the ability of a species to adapt and evolve its genome.


> Transposons and pseudogenes may be nonfunctional for an individual organism, but aren't they useful for evolution to create new genes?

Ok, but by that definition, nothing is really junk: Even our own junkyards are still home to all sorts of animals and microorganisms which feed on the junk.

Not even starting with the recycling/upcycling movement.

Doesn't change the fact that it's junk in the sense of its original purpose and immediate utility to us.

In the same way, if junk DNA really has no effect on the body it's currently part of (like freed but uncleared memory in RAM), then I think the name is appropriate.

What I'm really missing in popular reporting about genetics though is more focus on the "regulatory elements". The understanding of DNA as given by pop-science articles is mostly that DNA is either instructions for building proteins or "junk", nothing else. This completely leaves out the question how the cell decides when/if a particular protein should be built - which is of course integral for understanding how a number of protein building instructions can result in a complex organism.

It's like describing a programming language as consisting only of instructions and comments but completely leaving out branches and loops.

(even though gene regulation does not happen through branches and loops)


Does it mean that mutations in the junk DNA are evenly distributed among the population because supposedly they don’t affect anything?


Of course, nobody who does biological sciences actually believes it is junk.


How about "legacy DNA" or "retired DNA" or just "leftovers"?


"deprecated DNA" seems a good fit ;-)


How about "cloud computing"?


Not knowing much about DNA, it seems weird to me that it's assumed the primary purpose of DNA is to encode for proteins. Sure, that's one of its functions, but it would be just as misguided as assuming that bits only exist to encode text, and when they encode anything else, it's seen as some extraordinary exception to the norm.


people like to forget DNA has a 3D structure. A lot of the DNA that doesnt encode proteins might be involved, for example, in the association of 3D topological domains or conformational switches that impact chromatin accessibility. Interesting also is when sufficient factors bind to a local region of DNA to change the local chemistry and initiate phase separated domains where regulatory factors might preferrentially bind and thus drive the transcription of the few coding regions, and that's pretty cool too. Just to add some context to your bits-analogy.


Why don't we give genes names like functions?

Seems like the whole thing might be easier to understand if we called a gene "butterflyStripeWidth" rathern than "WntA".

It's like we're writing a new core library but only using obfuscated function names.


We can only guess it functions. Give concrete name like that is not very good idea


But I can't get anything at all from the name now. BRCA1 isn't descriptive at all, but suppressBreastTumor is more helpful even it's not 100% accurate. I'd rather it be our best guess description than some random letters.


What if we gain more knowledge, and this isn't suppressBreastTumor. What do you do now?

As I see, the answer still cryptic name, and lookup table.


Sure, you'd expect that to happen. There are lots of interesting things you could do there, all of which are better than random letters and numbers that never change. Committees and proposals, objects and references. Things change their names all the time and society is perfectly capable of handling. Pluto isn't even a planet anymore and we all survived that.

I just think we can do better than the first system we came up with, and improving the system would open the whole thing up to more people and possibility.


Isn’t it convenient that you could talk about Pluto without saying “PlutoTheOrbitingObject, which was known as PlutoThePlanet from 1930-2006, isn’t even a planet…”


I mean, we do almost the exact same thing with most celestial objects (ex., HUDF-JD2 aka UDF 033238.7 -274839.8 aka BBG 3179) - but the difference is they don't _do_ anything. I'm never going to write a "program" with planets, but I am interested in writing "programs" using genes. I know it's not a 1-1 example, but you know what I'm getting at.


Are those replacement names or additional names that point to the same underlying object? The former is much more problematic IMO.


BRCA is an acronym for Breast Cancer, so we are already doing that to a large degree.


A fair point, but I think that really just lends credence to what I'm suggesting, that we expand upon that, since it isn't standardized or all that useful. If you were encountering it for the first time, what would you think WntA did?


I would look it up. Genes can encode for multiple things, so it doesn’t make sense to name them like variables, genetics is far far more complex than computer programs. Case in point, the Wnt gene codes for multiple developmental characteristics across insect groups - you can’t simply say they code for this wing protein, it does far more.


we kind of do that to an extent but that confuses things.

for example tumor suppressor genes, actually produce a product that, among many other things, leads to suppression of tumor activity or viability.

the gene function [depending on the gene] would really be something hypothetically like -immuno signaling factor modulator gene number 9-

this makes things harder to mentally catalog for a lot of people, so we use name like SONIC; KRUPPEL; HEDGEHOG; SONIC HEDGHOG.

these are developmental pattern genes, and the names are subjective based on phenotype appearance, or what a researcher was doing when they observed it, etc.


You want Wnt to be named after butterfly stripes? Last sentence below gives away wnt and lack of it leading to colon polyps/cancer. It does a million other things though. Might as well call it signaling molecule 145626.

The activation of the pathway occurs at the cellular membrane, where Wnt ligands bind to the seven transmembrane-domain protein receptors Frizzled (Fzd) and/or to the low-density lipoprotein receptor-related protein (LRP) 5/6. This interaction leads to the inhibition of the axin degradasome destruction complex, which is a multiprotein complex that controls the cytoplasmic amount of β-catenin via phosphorylation, and, thereby, triggers β-catenin degradation by the proteasome in the absence of Wnt [10]. The destruction complex comprises the tumor suppressor adenomatous polyposis coli (APC), the axin scaffold protein, and two Ser/Thr kinases: glycogen synthase kinase 3 (GSK3) β and casein kinase 1 (CK1). In the absence of Wnt ligands, CK1 phosphorylates β-catenin at Ser45 residue and GSK3β at Ser33/Ser37/Thr41 residues. Then, the β-transducin repeats-containing protein (β-TrCP), an E3-ubiquitin ligase, ubiquitinates phosphorylated β-catenin, which becomes a target for proteasomal degradation [10]. When Wnt binds to Fzd and/or LRP5/6 receptors, the Wnt/β-catenin pathway is activated and the axin degradasome is inhibited [9]. As a consequence, Dishevelled (Dvl) is activated and recruits the degradasome complex to the plasma membrane, and, thereby, promotes the interaction between LRP5/6 and axin [11,12]. Consequently, LRP5/6 is phosphorylated at specific amino acidic residues (Ser1490, Thr1530, Thr1572, Ser1590, Ser1607) [13], acting as a direct competitive inhibitor of GSK3β [14]. Moreover, inactivation of GSK3β through Akt-dependent Ser9 phosphorylation prevents the phosphorylation of β-catenin, which allows its stabilization and accumulation in the cytoplasm. Stabilized β-catenin translocates to the nucleus where it binds to transcription factors, notably T-cell factor (TCF) and lymphoid enhancing factor (LEF), TCF/LEF. This interaction displaces the co-TCF/LEF repressor Groucho, whose function under basal conditions is to compact chromatin [15]. Groucho and TCF/LEF form a multiprotein complex, which is also termed Wnt enhanceosome, that recruits transcriptional co-activators and histone modifiers such as the ATP-dependent helicase Brahma-related gene 1 (BRG1, also known as SMARCA4), cyclic adenosine mono phosphate response element (CREB)-binding protein (CBP), p300, B-cell lymphoma 9 (BCL9), and pygo [15,16]. The Wnt enhanceosome regulates chromatin remodeling and activates the transcription of β-catenin-dependent genes involved in cell growth and survival, including C-MYC, CCND1, BIRC5, and CDKN1a [9]. C-myc is a proto-oncogene that activates cyclin D1 and simultaneously inhibits p21 and p27, which leads to uncontrolled cell proliferation


The name of the gene, not the protein, but yeah you're right that butterflyStripeWidth is far too abstract. But something like, I don't know, decarboxylizeOrnithine would make a genome more "readable".


Thinking of DNA as code, with the particularity that outdated versions of the code are kept in "binaries", it's a creative use to recycle them as a random pattern generator!


We need to stop using the term “junk” DNA - it’s an outmoded term based on the wrongheaded assumption that DNA directly encoding proteins was the be all end all.


The even use the better term throughout the article (non-coding regulatory DNA) but someone thought they should stick 'junk' in the headline.

The whole saga of 'junk' DNA is pretty interesting, and serves as a cautionary tale for those who want to use science to prop up their metaphysical ideologies. These non-coding regions of the genome were long thought to be the detritus of evolution, nothing but extra baggage carried around by the genome. Richard Dawkins and others famously settled on this as a means of discrediting the ideology of 'intelligent design' because, as they saw it, an intelligent designer wouldn't leave all this junk sitting in the genome. The later realization that this junk was actually playing all kinds of roles in large-scale regulation, cell differentiation, three-dimensional structure of the genome, and was often being actively transcribed (to regulatory RNA), etc. tossed that whole notion out the window and then it became (for a few years) something the intellectual creationists tried to use to discredit Dawkins and co. There was something called the 'Encode controversy' over a decade ago which featured centrally and which is well described here:

https://www.science20.com/adaptivecomplexity/our_genomes_enc...

Most people seem to have forgotten all that and accepted that large non-protein-gene-coding regions of eukaryotic genomes have various essential functional roles.


One man's junk is another's treasure. The truth is, a large chunk of it is somewhat "junky". I have a huge problem of using the word "junk" itself, as the word itself comes with so much baggage. When you really think about junk in the real world, it is not truly always junk - things can be recycled in some ways, and the matter itself is usually not destroyed... earth + energy gets used in some way, and then gets tossed aside..

Moving on to biology, we have ~2500 smell receptor genes in various states of decay. We also have a lot of deactivated smell receptor genes, with a lot of polymorphism across humans.. as it's in free drift. Perhaps a long time ago when we shared a common ancestor with rats, many of these were far more useful, with strong selection pressure to preserve working copies, but not so much anymore.. Perhaps the trash man didn't come yet, but it's definitely stored away in our basements.

A more concrete example would be the roughly 40% of our genome that consists of repetitive elements, transposons, and retrotransposons. One of them, the ALU sequence, is 300 basepairs long, and yet is about %10 of our genome. There are about ~1 million copies of this sequence across our genome. It's spread has slowed down recently. Out of about a million copies, most of them are "inert", with no further ability to copy themselves. Sure, given just how much of our genome is this one thing being repeated over and over again, it has some function in some places - causes various splicing events here and there, helps shuffle genes (which is a bit of a meta function across evolutionary timescales), but overall, if you had children with these parts mutated, 99.9% of the time nothing of significance would happen. 10% is huge, given the protein coding region is about 1%, and the regulatory regions influencing transcription & RNA expression with some sort of identifiable action is about ~10% (if we're being very, very generous). DNA is promiscuously transcribed, but this doesn't mean much of that has any particular function. A lot of is transcribed at very low copy numbers, and degraded as fast as possible. A lot of things are stochastic, so there is a "long tail" of what gets transcribed, and at very low numbers at that.

Another good chunk of that 40% of repetitive sequence is is old retroviruses we got infected with (not dissimilar to HIV) that totally raided our DNA and became endogenous - in fact they're called HERVs - human endogenous retroviruses. These mostly stopped spreading in our DNA as well. Also, large streches of our DNA is the same couple of bases repeated ad nauseam for no particular reason.. let's say AT... there are these massive streches that go ATATATATATATA and you find people are highly polymorphic and tolerate a lot of mutations in these sequences. Some people have it deleted with no ill effect. There are rare examples of this occuring inside a gene, where having the repeat go on for too long is bad for you ("Huntingtin" (no it's not a mis-spelling) is the paradigmatic example). This is the exception rather than the rule, considering we have more repeats than the entire % of coding regions in the genome.

Junk is a bad word, but it's not like there is a "clean rewrite", or "clean refactor" either... so things accumulate, get shuffled around, get forgotten.. randomly get deleted in snippets... most mutations are either neutral or detrimental, with happy accidents happening here and there.

Overall, it is nonsensical to force things into this binary of "junk", "not junk", but beware of thinking of the genome as one would a design schematic for a microprocessor, or an aircraft for that matter. There is a lot of stuff in it that's like meh... doesn't really matter if it's there or not.


Thank you for this excellent explanation.


Thank you!


On the assumption that there are big areas that can be skipped or deleted without any effect on the phenotype though, I've often wondered why there doesn't seem to be evolutionary pressure to delete them. Is it that the energy or other costs of duplication of those regions are not significant, or are there other factors that oppose it?


DNA, and many other biological polymers are structure as well as storage, some of these regions are about structure and modification of structure, to facilitate modality of function. they act as recognition sites, consensus sequences, torsion limiters, process interference selectors, user defined variables.

picture if you will, a big mass of magtape, no spool just a disaster, and you have to scan accross this mess until you find an accessible loop of it with a particular flag or type readable- likehow magtape works, except you have the mechanical challange of reading the tape in the form of a quivering, dynamic mass of spaggetti being read at mutiple locations and having to remain undamaged.

some of these regions make the biggest contribution by simply allowing mechanical slack and positioning of contact sites.


You can't have an evolutionary pressure to delete stuff in general, because then you'll oops out stuff that needs to be there. Short of that, building the mechanisms to figure out what to delete would be far more short-term expensive than copying everything around for one more generation.


It's exactly as you say it - the energy costs are negligible. Most mutation is neutral drift.


Right. Its like someone reverse-engineered an executable file and discovered the .data section while calling the .text junk.


Working Effectively with Legacy DNA


"DNA design patterns for maintainable organisms."

On your digital bookshelves in 2074.


O’Reilly 2042


Pearson Education, 2004. O’Reilly 2042 is an auto parts store in Wilson NC.


Maybe they are opening and closing parentheses of a lambda program.


Maybe God is still trying to exit vim.


The Ghost in the Machine is beautiful.


To add to the stash of dangerously simplifying computer analogies, this almost sounds like separate "code" and "data" sections: The coding DNA sort of interpreting the non-coding part as "data".

The advantage is the same: The coding parts can evolve only slowly, because a wrong mutation can mess up the entire functioning of the wings and would seriously reduce the individual's chance of survival. Meanwhile, a mutation in the non-coding part can only change the color pattern and nothing else. Therefore, the non-coding part can evolve a lot faster.


If some part of DNA is classified as code DNA, why not classify this as data DNA instead of junk?


Journalists don’t understand data but they understand junk.


I’m a junk engineer to these journalists.


The problem isn't classification, as much as undoing the effect of a generation of academic and pop-sci publishing, which popularized the term "junk DNA" - and groups that perpetuate this term today, e.g. creationist/anti-evolution groups that use it to argue biologists are full of hubris (and thus wrong about evolution), and inadvertently keep the term itself alive.


Creationists only weaponize the term because it really is used by journalists and pop-scientists to argue against design.


Some journalists and pop-science writers are creationists themselves; the whole debate is still self-sustaining feedback loop, and one of its side effects is that the term "junk DNA" remains in use.


Do you mean “Junk DNA” is a junk term?

At least until it gets repurposed.

Apologies for getting too meta.


> Do you mean “Junk DNA” is a junk term?

Junk gene is a junk meme? Yeah, that's about right.

> At least until it gets repurposed.

Like all memes do. And genes, for that matter.


Junk DNA is like dead code your IDE greys out for you in a large code base. They’re still there even though nothing uses them anymore.


False. Read the article and learn what they found out in this case.


it would be closer to say this is legacy code lock-in.

the system is legacy dependent for early boot configurations then changes mode to runtime with recently versioned modules and libraries


It would've been nice if the article could have even shown a single image of how knocking out a certain sequence caused some wing pattern to disappear. But I guess that's hidden behind a paywall.


The "materials and methods" attachment is free and includes some images: https://www.science.org/doi/suppl/10.1126/science.abi9407/su...

(paper is not yet available on scihub)


The first image, before the text of the article, has this caption: "Wings of the painted lady butterfly – Vanessa cardui, modified by deletion of non-coding DNA sequence."

(I'm definitely not paying for this content, and it is available in a fresh incognito browser session, so I suppose it should be available to all readers.)


Sooo... Not "junk". :) <ducks>


Man, butterflies relying on all this old shit smh.... [ indiscriminate murmuring.... ] Yo hey [hallway scuffling] Yo PM, can't we just deprecate butterflies already? Ah he- I know I kno- yes I understand the boss likes butterflies, pretty and what no- no stop. Butterflies are a cost center, can we just deprecate and phase them out? Deprecation Plan? Cost Center Optimization? Ah shit ah- yeah, we keep the butterflies....




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: