It is an introduction into what is like to do genomics in a scientific environment. The content at the link the OP posted appears to be an oversimplified, high level and naive overview
The opening paragraph of this resource states its absolutely not about being a comprehensive introduction to genomics. I strongly disagree with the sentiment its naive or oversimplified. It's trying to give someone with no knowledge a working mental model to begin to dig into building a comprehensive view. A framework of analogy for many people is an extremely helpful device for learning, frequently left out by comprehensive scientific or engineering texts.
> Explore the secret of life through the basics of biochemistry, genetics, molecular biology, recombinant DNA, genomics and rational medicine.
It's really well done and genomics is the focus. I took many dozens of edX and Coursra courses over the years, this is one of the top 5% of the courses there I would say.
I don't understand the phrase "from a programmer's perspective", or "for Engineers" in the title on top.
As a programmer whos studied CS but also took numerous life science courses throughout my life. You want to learn biology you study biology, what does a "programmer's view", or an engineer's, have to do with it? You use the correct tool for the job, and having a background in both, I don't see this working out well, more like the opposite actually.
The point of looking at biology for an engineer or programmer should be to broaden ones horizons, not to use ones internal models build for a completely different field in another one that really is not like that at all. IMO it's best to forget all computer metaphors here.
I have absolutely loved working in genomics. I am a huge believer that genomics will be a huge part of healthcare in the future, and i have two examples to motivate that point that I think may be interesting to the reader.
1) The Moderna vaccine was made with the help of illumina genome sequencing. They were able to sequence the virus and send that sequence of nucleotides over to moderna for them to develop the vaccine - turning a classically biology problem, into a software problem, reducing the need for them to bring the virus in house.
2) Illumina has a cancer screening test called Galleri, that can identify a bunch of cancers from a blood test. It identifies mutated dna released by cancer cells. This is huge, if we can identify cancer before someone even starts to show symptoms, the chances of having a useful treatment dramatically go up.
Disclaimer: I work for illumina, views my own.
I wrote some more about why genomics is cool from a technical point of view here (truly big data, hardware accelerated bioinformatics) : https://dddiaz.com/post/genomics-is-cool/
The thing I'm most excited about long term is biocomputing.
Having Turing complete programmatic control over biological systems has an absolutely endless list of transformative applications.
Imagine being able to program bacteria that can "infect" the patient and attack tumor cells, or act as fodder to keep autoimmune disease in check.
Or let's say we could program stem cells into "liver repair mode" to go and differentiate into new liver cells.
Then the implications for things like drug synthesis with the ability to programmatically control enzyme levels to compile more or less arbitrary biosynthetic pathways into fast growing photosynthetic algea, turning CO2, water and sunlight into medicine.
It's still a long way off being at that level of applicability, but man oh man it's gonna change everything.
Sounds great until natural selection kicks in, and because DNA replication is largely a lossy process, suddenly the thing you programmed the organism to do mutates to do something else a whole lot more problematic.
Imagine a software heisenbug, but instead it's a life form that you can't kill -9.
The idea of tailor-made medicines in a vat is awesome, but as far as creating a bacteria to "specially target" certain cells seems like a disaster waiting to happen.
Those are certainly real problems, and I'm not a cell biologist, but I'm not convinced these problems are insurmountable.
For instance, it might be possible to use ECC to get around transcription errors. It could also perhaps be ensured that any rogue "clinical biocomputer" could be easily treated with antibiotics or specifically engineered bacteriophage virus.
Like I said, the technology is very far off from having real world applications like this. At the moment it feels like we're in the analogue of the 40s and 50s for conventional computing. The field is still just inventing the very basic building blocks. It's going to be very limited in use, wildly dangerous(look up mercury delay lines) and unreliable for decades to come.
Considering the current treatment in the worst cases(where more targeted treatments don't exist) is to blast the pasient with radiation and poison(chemotherapy) and hope it doesn't kill them, I'll take those odds.
Except a rogue bacteria won't just kill you, it could escape beyond you and kill millions.
Chemo/radiation only kills the patients it was given to ( not entirely true - if the treatment caused mutations in the germline and the patient subsequently had children, the effects of the treatment might be passed on - but still very limited ).
Bacterial infections are generally very treatable though. Even when the bacteria aren't engineered. And especially when they are, because why would you leave any antibiotic resistance in an engineered bacterium?
Bacteria are the scariest when they've had the time to develop resistance to multiple different antibiotics.
Additionally, a bacterium that's engineered to be almost completely harmless evolving into a deadly strain in vivo is fairy unlikely in itself, especially if transcriptional errors can be reduced several orders of magnitude like GGP suggested.
Adding to that the option of hospitalisation or even home isolation to reduce risk of transmission, the risk of this resulting in some huge lethal epidemic must be pretty miniscule.
It's hubris to think we are at a stage where human scientists are so disciplined and knowledgable that we can start patching existing life-forms in such a safe enough way so as to target certain types of cells reliably over time and not others.
Software is essentially a cleanroom in the sense that the environment tends to be deterministic and man-made, and that is still riddled with unexpected accidents. Fortunately we can turn it off, fix the bug, and redeploy and the people involved in that tend to survive.
> Additionally, a bacterium that's engineered to be almost completely harmless evolving into a deadly strain in vivo is fairy unlikely in itself, especially if transcriptional errors can be reduced several orders of magnitude like GGP suggested.
The proposition was to engineer a bacteria that targets and infects a particular type of human cell to kill it. Creating medicines in a vat (like insulin) is different from releasing infectious agents in the wild. I was under the impression that this was obvious, but apparently not.
>It's hubris to think we are at a stage where human scientists are so disciplined and knowledgable that we can start patching existing life-forms in such a safe enough way so as to target certain types of cells reliably over time and not others.
I never said we're at this stage now or even close to it, in fact I explicitly said the opposite:
>>>>Like I said, the technology is very far off from having real world applications like this. At the moment it feels like we're in the analogue of the 40s and 50s for conventional computing. The field is still just inventing the very basic building blocks. It's going to be very limited in use, wildly dangerous(look up mercury delay lines) and unreliable for decades to come
As for comparing creating medicines in a vat to using bacteria as an active treatment, you're the only one making that comparison. The paragraph you responded to wasn't about in vitro drug synthesis at all, so I'm not sure what your point is here. Yes, it's obviously different. I never said otherwise. It's perfectly possible to target bacteria to specific tissues; wild bacteria already do this.
My point was that a bacteria engineered to target a malignant tumour, to be very treatable with antibiotics or bacteriophage, and to have a strongly reduced rate of mutation, is extremely unlikely to evolve into a pandemic, and is likely to be much safer than chemotherapy and radiation.
> My point was that a bacteria engineered to target a malignant tumour, to be very treatable with antibiotics or bacteriophage, and to have a strongly reduced rate of mutation, is extremely unlikely to evolve into a pandemic, and is likely to be much safer than chemotherapy and radiation.
Did you know bacteria have horizontal gene transfer? ie antibiotic resistance isn't just evolved and passed to children ( vertical ), it can be passed to peers horizontally.
It also happens outside bacteria - but bacteria have active mechanisms to enable this - that's how antibiotic resistance spreads - not just from parent to child, but peer to peer like a meme :-).
Safety is a complex topic, and you'd need to consider on a case by case basis - PhD students engineer bacteria every day ( something that had a self imposed ban in the 1970's I think ) - however that's within the context of standard platforms and each and every one should have a risk assessment.
Don't get me wrong, I think it could be done, but there is a Genie and a bottle here and it's best to think twice.
I'd like to see both a kill switch ( beyond antibiotics ) and some sort of growth external dependency - ie they need something you provide to survive as well.
There are two main flavors of jobs. For one you’ll want to be a phd in something like physics or math. For the other Amy standard software engineering background will do.
TBH I'm surprised how hard Illumina is already pushing Galleri as a product. Current ctDNA/cfDNA are imperfect for advanced cancers which should have a lot of shedding to begin with. Additionally CHIP is and outstanding issue. DNA methylation sequencing has promise but I feel more data would be needed to truly make diagnostic findings. So to see Illumina market it as a ready to go product is quite worrying. It may burn a lot of people
Really glad to see this, but it reminds me of the earlier HN post that said engineers don't go into genomics because it doesn't pay and requires a lot of investment in learning biology.
I worked in genomics, left this year because you’re underpaid and often disregarded “IT-help” that assists wildly over-educated and underpaid people driving the actual research in 95% of cases.
Thats why you stay though, the people are interesting and the work is meaningful and you directly see the fruits of your labors whilst contributing to a codebase that is by default open source.
People aren’t anymore interesting than anywhere else.
Work is no more meaningful than anywhere else. It’s a big “selling point” for the industry, but it’s just a way to get people to get paid less (yay you’re making the world better than all those garbage people serving coffee or healing sick people or keeping your lights on or optimizing the routes of the goods you have delivered). If you want sustainable systems, trying to be a martyr and work for less only screws this up long term.
Code-base is not open-source. It’s biotech R&D, there is zero culture of sharing outside your organization within industry. You can present high-level things at conferences and such, but you’ll have to rip the raw data out of their dead hands…not happening.
I’ve been in too many conversations about building software to serve larger groups in this industry. It can happen, but it can’t currently and nobody wants it. Confident someone will find a solution, but everyone wants their own home-grown solutions in their own walled-gardens that no one has access to.
Data and the things it can/can’t tell you are held tightly in these companies. I was at a pharma a couple years ago where researchers were explicitly told they COULD NOT test certain compounds in a certain way because they did not want a trace of this data to exist while they were trying to push compounds through the FDA.
On I assumed we were talking about being a software eng in academia. It's a spiritually rewarding experience, with none of the blackholes you've described at pharma
If you want some personal motivation to get into genomics, you can get your whole genome sequenced for a few hundred bucks and play around with the raw files yourself. I used Dante Labs[1] and they are great. You can even ask them to delete your data and samples!
and you will learn almost nothing from sequencing and studying your own genome
at best you waste your time, at worst you will find all kinds of things that are not there
it is the Silicon Valley hacker mentality that thinks the life is some sort of computer where you can fiddle with parameters
learn some biology first, then you can marvel at it and realize just how absurdly simplistic is to think you can read anything out of some random letter
I wanted to play around with BAM files and it is much more engaging to play around with my own BAM file versus downloading one from a website.
It is also about data ownership. The value of a fully sequenced genome is limited, sure, but I still want that value without having to give my genomic data to 23andMe.
Working with genomics technology is too far away from the money to become rich from. There are too many middlemen in-between technology and application.
But it's a fun subject, and as the technology develops, middle layers will disappear and then the money from expertise will become better.
The number of people that are both capable software developers and has a good understanding of cellular biology are quite few and will probably remain so for the foreseeable future.
In biotech, the end goal is a physical product or a service performed by a doctor or another highly paid professional. Those don't scale as well as software. The ratio of users to developers is also low. You are likely developing software for many niche tasks, which does not scale either.
And if you are considering roles in the academia, your productivity is not going to be high enough to justify a competitive salary. Productivity, in monetary terms, is defined by the amount of money you can bring in. Either directly on indirectly. In the academia, that usually means grants. You may be able to argue successfully to a funding agency that one software engineer is worth two postdocs, but not four.
The people studying yeast metabolism in grad school were always the ones with the best beer (especially the ones that created mutant strains). I think the two might be related.
One of my gateways into Making was during grad school when my housemate brewed beer. He wanted to make a counterflow wort chiller, went to Home Depot, bought some parts, assembled the whole thing, and it worked perfectly. I was completely blown away you could make "scientific" apparatus from Home Depot. And immensely jealous.
Heh. That reminds me of the time in grad school where I needed a liquid disposal system for an arraying robot. We had this great ($$$) robot, but the liquid waste needed to be manually emptied every few hours. It was non-toxic and just went down a sink.
A quick trip to Lowe’s to get PVC fittings and pipes, and I suddenly made my own scientific equipment and saved me some time!
IIRC we passed the wort through PVC, and the cold water through copper tubing wrapped around the PVC. PVC is generally food safe and we cleaned it, IIRC, with IPA (isopropyl alcohol, not india pale ale).
I got a real imposter syndrome from the whole project and my housemate went on to be a famous microscopist (but later rage-quit to leave for industry). I build microscopes using 3d printers and other easily sourced bits. But Home Depot is a terrible place to source materials. They are the lowest quality.
There are a lot of starry eyed individuals who are ready to “sacrifice” stable welll paid career to “make a difference” by working on fields like biology.
It's not quite the same because the helpdesk person is probably paying a sizeable chunk of their salary on tuition. Presumably working in science would scratch that itch for you, so number to compare against is whatever you'd make elsewhere, minus what you spend on tuituon, minus however much it's worth to you to be able to focus on the science and not have to balance your time with some unrelated job.
One of my favorite books in this space is “BioInformatics Data Skills.” It’s just nice concise coverage of a lot of basic tech skills like git, bash, tmux etc. and then coverage of basic bioinformatics skills.
For me coming from a SWE background the computational skills are very easy to pick up especially if you work with bioinformaticians you can ask questions. It’s the genomics knowledge that is very difficult for an engineer to acquire.
Starts with “ This Guide is written specifically by and for computer scientists and engineers”
And yeah it shows - contrived example after another, and honestly not a great description of anything.
If you want to truly understand genomics you have to understand how biology works. And honestly it’s great info for anyone even if you’re not getting into genomics or whatever.. why would you not want a working model of how life is put together? In that case I’d just recommend dusting off a biochem or cell bio text book and reading just the first 5-8 chapters. Typically they lay it out very simply from basic principles and the authors have far more experience and understanding and writing help than this weird tutorial course thing.
Do you have an example of a contrived example and explanation of why it is contrived, for the non-biologist to see why it is contrived?
I once tried reading a few chapters of a bioinformatics book explaining DNA, RNA, protein creation, etc. The basic idea seems very simple but to my mind they explained it non-systematically with too many words. There seems to be an internal information structure in these RNA- and DNA- related processes that was not being concisely presented and it seemed that if the writers presented the material in terms of computer-science concepts, so much time could be saved.
You can't present it as computer science concepts because it's not computer science.
For example, the central dogma of DNA transcribed to RNA translated to protein seems simple, but it's not.
In almost every instance, there are vague 'rules' and many many exceptions to these rules. For example, often coding regions in genes start with an ATG, but sometimes they don't. Sometimes splice sites (where the non-coding parts of transcripts called introns are chopped out) can be predicted, but a portion of the transcripts are not spliced at predicted sites for no obvious reason. Sometimes the predictions are just wrong. Sometimes the generated proteins are modified at specific locations which impacts their function, but again, sometimes not. Even whether the gene itself is 'switched on' (i.e. able to be transcribed) is impacted by many many things, such as unidentified transcription factors, or whether the chromosomal location itself is accessible or not. There are many many other things that impact the process.
There is no simple underlying concept as the system is not designed, it evolved and is quite different among different organisms, and even in different tissues or timepoints in the same organism. As long as it works and provides enough benefit to avoid negative selection, that's enough.
It starts by defining a cell as a bakery. First of all, what exactly is more systematic in comparing a cell to a bakery? Other than the fact that both things produce crap the analogy has no real substance. And there are so many wrong facts in that one paragraph (many of our genes are present as more than two copies in our genome, for one).
You are absolutely correct, there’s an information theoretical underpinning of genomics and systems biology that’s rarely if ever tackled in text books but (a) neither does this course tackle it, and (b) you can’t just skip on biochem basics and Jump to that. That’s like trying to become a physicist without learning math.
There's nothing about sequencing by synthesis, how blocking nucleotides are added one after another, pictures of the fluorescent nucleotides on the flow cells are image analysed, etc.
I think perhaps this (learngenomics.dev) resource is a little too shallow on some levels, but has interesting depth in odd places. I think there is a need to get users up to speed with things like the SAM format, which is very fundamental to 99% "dna sequencing" projects, but it's an odd format in some ways because it's quite low level, so trying to get people to understand how the basics of biology interact with it is worthwhile. I did my own attempt in this sometimes-updated blog post https://cmdcolin.github.io/posts/2022-02-06-sv-sam
I didn't get this from skimming the first page - but what will this let me do? If I take this course will I be able to mess with a cell or will I just learn some stuff about biology.
I saw a recent Lex Friedman podcast where the guest talks about "bioelectric patterns" and somehow getting a worm to grow a second head by messing with those patterns. I would absolutely start on this course now if it was a realistic pathway to doing something like that.
This is the worst outcome of regulation of the life sciences.
There is no REPL for the cell. No tinkering allowed.
When Marvin Minsky was growing up in New York, neighborhood pharmacists owned fluoroscopes. He said those fluoroscopes were like “great black boxes” to him and that “those kinds of black boxes don't exist for kids anymore.”
Many modern bio experiments are almost exactly a repl. You build a system and then repeatedly interrogate it inputing some data using a Read (IE, you pass in some DNA), which is then Eval'd by the cell (warning: there will be side effects), "printed" in the form of some signal like a fluoresence, and then you loop back to the beginning. This is often called "closed loop laboratory."
Unfortunately, each step ends up being extremely challenging, and there's tons of noise, and the cost of each Read, Eval and Print is far higher than in a programming language. Further, the "system" is running 38,000 other "threads" all of which have direct read and write access to your data, some of those threads consider your data to be the enemy and cut it up, while others are just randomly spamming your console with uneccessary debug log messages.
We have actually reached the point where some scientists have synthetically created a novel chromosome, and used a preexisting cell to bootstrap the new genome so that the cell eventually contains only protein from the new genome. To me, that represents a step beyond tinkering: it means we can create synthetic lifeforms with exactly and only the details we want, which makes studying them and engineering them far easier.
Interestingly, even though this tech exists, nobody has found any interesting use for it and it's not even really used to probe biology.
A better example would be gene therapy, which has been developing slowly over decades. A single person died in the a trial in the 90s and stopped development (that's the regulation part you're referring to) for decades. In other trials that don't include gene therapy, patients routinely die and they're just a statistic.
What is the minimum it would take to run such a REPL at home? What hardware, life form, and knowledge would you need, at a minimum, to start making changes and seeing results?
It's difficult to get into this field if you don't have a graduate degree. I was a double major, Computer Science and Biochemistry, with a minor in Biotechnology. I sent my resume to many biotech and pharma companies, but could not even get an interview. A lot of the jobs said you need 0 years experience if you have a PhD, but 10 years experience if you have a Bachelors. Now that I have 10 years experience as a developer, I've forgotten almost everything I learned in my science education, and I've lost interest.
Don't want to be too disparaging, but this to me doesn't seem to be an 'Introduction to Genomics', but more an introduction to read mapping and variant detection in human (or more broadly diploid) genomes.
Genomics stretches vastly beyond this - assembly and annotation to start with.
I'd argue the most interesting problem space for software engineers is outside of what is covered in the document.
The space of startups cashing in on genomics but making shiny web apps that software engineers need to understand something about human diploid genetic variation is far higher though. Thats where the money and engineers are, not in fundamental algo development for slimemould assembly.
CS person with biology PhD here. The mix of biology and computation is huge, and with the right skill set, interdisciplinary unicorns make tons of money. If you want to see how computation and biology mix, first dive into a standard university Intro Biology course, and then with that foundation, look into computational biology & bioinformatics (they're distinct). You'll find that genomics is only one piece of a much bigger and absolutely fascinating story.
To get that basic biology foundation, another post mentioned an EdX Intro Biology course, that would be a terrific start, or just get a recent university-level intro biology textbook. It's not terribly difficult material and you'll be in far better shape than reading a biology-for-laypersons pamphlet.
Genomics is where I started learning how to program. Having worked as bench scientist in a genetics lab I understood nothing about my lab mates research when they were showing me python scripts of their analysis. Which initially got me curious.
Now having been in the in the industry developing apis for large companies for the past 8 years, I’d be keen to get back into it. Any ideas where to start or find jobs in the space? I would love to go back into the space.
I find the field extremely interesting, but I wish the pay in genomics was better. Compared to fang/unicorn type companies, their pay is way below market and it's really hard to justify the massive pay cut.
The pay is exactly where market is. There’re ton of wet-lab people wanting to get into “data”. And the industry is less lucrative than showing ads like Google does.
Just about any “bioinformatics” job requires a PhD in biology-related field. And there’re plenty graduates. That’s the thing, there’s plenty of supply.
Nicee, but I feel like really the only thing you need to know as an eng is DNA -> RNA -> Protein. Sometimes RNA -> DNA via reverse transcriptase.
Everything else is just normal Python scripting.
Oh no. A major flaw that kills protects; to run a valid statistical test you need to understand the underlying reality of the data. Otherwise you just run tests until you find “something”.
How do you handle one genomic variant affecting dozens of different rna transcripts and isoforms? How do you handle tissue-specific expression? LD haplotype blocks? Frequency across populations and reference choice? Sample handling affecting read depth? Mixed direction of effects in phenotype-genotype? The critical (and beauty IMO) feature of bioinfo is requiring an understanding of how your dataset can rarely be considered clean and as simple as _observation name_ and _observation value_. To succeed it is usually critical to know a lot about the observation meta data which is not collected in the dataset. Hopefully in the future it will be better curated and less esoteric.
- the dna that doesn’t code for proteins but makes up the vast majority of human dna
- the intron regions of genes that are translated into RNA but then sliced out of the RNA and not transcribed into protein and are 5x larger than the coding parts
Those two things alone are absolutely critical to understand to interpret a genome sequence. Of course there is much more.
You do know that there are things like epigenetics, DNA repair (using specialized proteins), RNAi, post-translational modifications, metabolites (just to name a few)?
Sooner or later you'll have to learn all the other stuff in the linked page: file formats used only in genomics, structural variants, NGS, evolution, regulation, polygenics, etc.
https://www.biostarhandbook.com/
I have learned so much from it.
It is an introduction into what is like to do genomics in a scientific environment. The content at the link the OP posted appears to be an oversimplified, high level and naive overview