Hacker News new | past | comments | ask | show | jobs | submit login
Exome sequencing and analysis of 450k UK Biobank participants (nature.com)
76 points by ahurmazda 3 months ago | hide | past | favorite | 61 comments

This is absolutely world changing. I've known that this was coming and was only a matter of time, it did get here sooner than I expected. I have hypermobile Ehelers Danlos Syndrome (hEDS) from a TNXB mutation. I was incorrectly told by doctors for decades that I was perfectly healthy and my issues were psychosomatic. It wasn't until I did a careful large scale behavioral analysis that I was able to identify similar people and from there the root cause. A $500 (30x) DNA test confirmed it. hEDS is massively under diagnosed, the vast majority never get a diagnosis.

I strongly suspect that most mental problems are physical in nature and most physical problems are DNA related. And even when they're not DNA related, treatment ideas could be gleaned from 'natures large scale experiment'. The number of issues that can be identified from a $100 (1x) test and then subsequently treated is mind boggling. For me, that cost is less than a single doctors visit. This side steps the medical establishment which is slow and in many ways archaic. This will lead to a massive leap forward in medicine.

In other news; I'm also of the opinion that IQ is largely determined by DNA, nature as opposed to nurture, and once that is properly figured out I'm sure designer babies are next. I don't think that is a door that can be kept closed. There is already a black market for it. I'll be watching from the sidelines, I think this is going to get interesting.

I figure DNA works a bit like makefiles. A particular target is only built if it is required somewhere. (And requirements can chain and branch, of course).

So most people realize that height is correlated to diet [given: a functioning gene set; your height is then determined by diet] , strength correlated to exercise [given a fairly broad spectrum that your genes support, exercise will update your strength) , visual accommodation ability to how much you need to accomodate.

But people somehow assume that something as complex and intertwined as an IQ score magically appears, stays constant throughout life, cannot be practiced, and isn't linked to some sort of exercise.

You’re talking about phenotypes. With height people with entirely the same diets will end up different heights based on their genes. I used to think that intelligence was earned due to exercise and that others just needed to try harder. I’ve since seen enough people try really hard and fail that I no longer believe that. As with strength, some mutations give some people a big advantage. I don’t think it’s an assumption, there is a lot of evidence. Even the low studies have IQ at 50% DNA and that’s without the greater understandings we have today, I would put it at 80%. Plus likelihood of IQ supporting genes would be much much higher at the extremes.

On the one hand, it is certainly very suggestive when you see that children's phenotypes very closely match those of their parents (though you do have to keep an eye out for confounders such as culture or environment even then).

On the other hand, if you try to actually predict things like height or IQ directly from some random genetic sequence someone puts in front of you ... well ... that turns out to be rather tricky.

Have you read any papers that give it a try?

I think most anyone should have a CYP2D6 test if they are considering taking, or taking, medication, it affects so many drugs out there, yet here in the UK, the GP's I have encountered are not familiar with gene<>drug interactions and I had to request it after working it out myself. It's a complex area, but if you happen to be a poor or ultra metaboliser it's a good thing to know. https://www.ncbi.nlm.nih.gov/books/NBK99699/

100% it’s cheap enough that everyone should do it. The TNXB mutation has a ton of weird drug interactions. Anesthesia resistant being just one.

Have you got a reference for that, I did search but could do with some help!

Does anyone remember the days when some geneticists were saying that 98% of DNA is "junk" DNA? When I heard it I knew it couldn't be true but it's probably going to take another 50 years to figure out how much really does get used.

I can't help but suspect that a lot of the genome is a part of the boot sequence that helps you go from one cell up to all the differentiated organs and tissues and systems.

FWIW the study focuses on the coding region of the human genome, i.e., the other 2%.

It is also important to point out that the fraction of non-coding DNA in a genome depends on the organism and is not correlated to complexity. There are multicellular organisms with less than 5% of it as well as unicellular organisms with amounts of DNA orders of magnitude higher than humans.

the marbled lungfish has the largest recorded genome of any eukaryote. One haploid copy of this fish's genome is composed of a whopping 132.8 billion base pairs, while one copy of a human haploid genome has only 3.5 billion


DNA developers these days just use electron and don’t care about efficiency. That lungfish is ripe for a refactoring.

Bet I could implement a new lungfish with only a few million base pairs in a weekend.

640 kbp should be enough for anybody.

Is electron some sort of synthetic genome compiler?

To be specific, it is Javascript based desktop application platform. Each finished app ships its own version of Chrome which affects the size and arguably adds quite a lot of overhead. It also makes developing cross-platform desktop applications much more accessible, thus making it easier for more developers to make slow, unoptimized applications. Overall it has become a meme for bulky, slow desktop apps with Visual Studio Code being the notable exception.

It’s a (not well executed) joke about electron’s memory consumption and how developers refuse to use anything better.

> not well executed

Aw! :(

For the record I like Electron a lot and am usually the one to defend it on HN, but it indeed was a comment on binary size, needless duplication and copy-paste culture ;)

I think the joke may be about binary size and not memory, which would make sense when talking about DNA.

Bio-electrics are starting to look at biology the way you describe.

Michael Levin and team at Tufts are instigating regenerative healing by “boot strapping” the process through manipulation of electric fields surrounding cells.

Understanding the extent of network effects will be a big idea in the future. Proving our statistics are probable causes versus mathematical object identification and social debate over the effects.

We’re moving beyond mere taxonomy and catalog of reality into seriously weird science.

Tumor treating fields are a thing that apparently extend survival for folks with glioblastoma. Amazing.

Citation? Open minded, but this sounds like hogwash.

> I can't help but suspect that a lot of the genome is a part of the boot sequence that helps you go from one cell up to all the differentiated organs and tissues and systems.

Boot sequence. Amazing. I've always been interested in how the DNA transcription looks like a Turing machine with the RNAP being the head and the one DNA strand being the tape. Is there any research in that kind of computational analogy or is it just a coincidence?

I mean, it could be except that the tape in this case has 3D structure and can change shape (and thereby expression) depending on histone modification. So its close to a good analogy, but in some ways DNA is more interesting and complex than a reel to reel tape.

Turing did actually make a significant contribution to biology, but unfortunately he died right around the time the structure of DNA was discovered. Can you imagine what might have been?


There are many similarities between information processing in biology and information processing in computing, but the analogies only stretch so far. It's worth reading the basic textbooks in this area to get an idea of what mainstream science currently thinks; speculating too far outside the mainstream is a guarantee you will never be successful.

That is not bad intuition. Another analogy is that coding DNA is like all the function calls - the parts of the code that change data. Non coding DNA is like the all the flow control, conditionals, constants, etc. They don’t directly operate on the data, but have a huge impact on how the program behaves given some input.

Had the exact thought when I read the headline.

What arrogance. "I don't understand what these genes do. Must be junk"

Well, it was the 70s. The "start" and "stop" codons were known. They could work out that transcription proteins would seek out those codons, produce a strip of mRNA of the DNA bases between those codons, zip it on over to the ribosome, and crank out a protein.

Then there's the rest of the genome. Vast stretches of DNA that don't have the signals needed to transcribe proteins. Why?

They didn't know. They had no idea. It would be decades before they even had a complete copy of the genome. It was years of grinding effort, of trying to work out the big picture by staring through a straw. DNA methylation, gene expression, histone stuff, the entire field of epigeneics-- non-coding DNA playing an active role in cellular operation without directly producing proteins-- was still in the future.

Hah, they would kill for a straw.

Like the old particle physics joke: trying to figure out how a mechanical watch works by pouring a hundred of them into a bucket, smashing them up with a hammer, sorting the fragments by size, and speculating how they fit together when intact.

Now they call it "non-coding DNA": https://en.wikipedia.org/wiki/Non-coding_DNA

That's not what your reference says. Briefly, "non-coding" means it doesn't code for proteins, while "junk" means it serves no function at all.

When the term "junk DNA" was first used it was used for everything that didn't map to proteins, because the other functions weren't yet known.

Actually no, that's not the intellectual level at which any science is performed, let alone population genetics and molecular genetics. Do you perhaps see the irony in making such a grotesquely arrogant suggestion yourself? People working on population genetics and molecular genetics, who came to the conclusion that much DNA has no phenotypic relevance, were doing so based on several decades of literature and 10-20+ years of their own education and scientific training. Would you care to give your own qualifications in this field?

> Would you care to give your own qualifications in this field?

I am a programmer, and I trust my ability to call out junk code. Except if I am reading code from someone like Carmack or Linus. In those cases, I am gonna assume whatever I don't understand is my fault. What hubris would it be for me to call Linus' code junk, even if I really can't make sense of it despite my best effort?

Same here. It's fine to say "we did our best to understand this and as far as we can tell, these genes are not utilized" to go like "yeah it's junk" is quite different. You're a mere mortal, and DNA has been the foundation of all life for millennia. You don't get to judge so easily.

Have you considered that there might be methods that you are not aware of for inferring whether a region of DNA has phenotypic consequences? There’s a huge literature on this. I can’t believe you’re so arrogant as to imagine that you can just intuit the contents of that literature in a few seconds thought before writing a comment on HN.

If a section of DNA has no phenotypic consequences then that means that when we look at a sample of genomes from a population, then the stochastic process underlying the evolution of that region of the genome features random genetic drift, but natural selection is only involved via statistical associations with nearby functional regions due to limited recombination. In contrast, non-junk regions of DNA have natural selection involved directly in the stochastic process underlying their evolution. That difference gives rise to a research program where we seek to infer whether or not a region is “junk” by developing statistical models of DNA sequence evolution and fitting them to data sets comprising samples of DNA sequences from multiple individuals in a population.

That’s just one example of how the question of junk vs. non- junk is studied. There’s also comparative genomics which compares genomes of related species, taking the phylogeny into account in the analysis.

You’re not expected to know any of this; it’s evidently not your field. What is expected however, as a reader of an intelligent website such as this, is for you to understand that there might actually be an entire research field lying behind a question, and not to think that everything is so simple that you can understand it without any study at all on your part.

> It's fine to say "we did our best to understand this and as far as we can tell, these genes are not utilized" to go like "yeah it's junk" is quite different.

That’s not how junk DNA was defined. Junk DNA regions have no coding regions. No genes. There’s no easily recognized feature or pattern that would allow you to derive or even propose a function, despite decades of advances in the area. In this particular example the analogy with computer code won’t take you far.

peak Hacker News.

not exactly; many of those regions that we knew were junk weren't genes, but long regions of repitition of archaic junk that was inserted by viruses and replicated unnecessarily tens of thousands to millions of years ago. Generally, most scientists don't think that (for example) alu sequences really have strong function that you could measure in an experiment.

For example junk DNA was described before we really understood that RNA genes were common and so large regions of the genome that are RNA genes were just treated as totally non-functional.

Sean Eddy proposed an interesting experiment called the Random Genome to address these questions but I don't think anybody is seriously considering running the experiment.

That hypothesis is a little dated, but it should be noted that the exome (the thing being discussed in this article) is 1% of the overall human genome.

Twitter thread from one of the corresponding authors: https://twitter.com/gabecasis/status/1450289834543697925

Great explainer of the significance of this study by Goncalo Abecasis. Basically human gene knock-outs, courtesy of mother nature.

This study yields tons of gene function leads. Just awesome.

Edit: quote

The data is so rich that we had no more than a single paragraph to summarize association signals relevant to cancer and brain function. Ouch. A tweet won’t do them justice, but @uk_biobank will be sharing all the data with researchers for follow-up.

Not too related, but I got my genome sequenced for fun the other day (< $500 for 30x whole genome), and after a few scrolls through the results (200+ interpretations of my genome in the context of studies like the one linked here), I'm already losing interest.

Ok, so I have a high genetic predisposition for thinness, walking fast [0], and bipolar disorder? So now what? I never noticed except for the thinness. They even scared me with 'you're genetically in the 99th percentile for critical covid disease progression,' but reading the results of the paper in question, it turned out the heritability of the effect under study was only 6%. Thanks for letting me know.

These studies are interesting from a general genomics perspective, but not so much yet from a personal one (with the exception of a few traits that are determined by one or a few well-understood genetic variants. Hopefully this space will expand over the next few decades).

[0] https://www.nature.com/articles/s42003-020-01357-7

This is precisely right, and for most people it's unlikely that their genome will be particularly useful any time soon.

For an individual, the questions are usually much more focused: for example if I have a baby with my partner, what are the chances that our child will have a recessive genetic disorder? This is answered right now with a carrier screen, where the only genes tested as the genes with variants that are known to cause disease. Much cheaper, much more highly curated, and put together and run entirely by experts in the field, rather than general technologists that are just running samples through a big machine.

Interpretation of the genome will take millions upon millions more individual genomes sequenced, but those individuals will only rarely, if ever, see personal benefit.

you just had the "hey, most of personalized medicine is actually psuedoscience" moment. Welcome to the club.

WHen I had my whole genome sequenced and analyzed by professionals, they were astonished because I didn't have a single risk factor in any known gene. They thought I'd live forever.

In most cases, your sequence info will be useless to you, but in some situations it can have a huge impact, such as predicting how you might respond to epidural anesthesia if you are a woman in labor.

Also, things like personalized immunity therapies for cancers and such are not far off. Unless they completely fail - which is always a possibility in therapeutic development.

I think DNA data should be considered public data. 99% of our DNA is shared, and the DNA degrees of separation between you and me is ~3. Practically we are all already identifiable from the small sample that exists in genealogy companies and it s been repeatedly done by law enforcement. I think it's bonkers that , after 10 years, companies like 23andme are not allowed to give us health reports because ... what reasons really? Especially in Europe it makes no sense to have such draconian laws about something that's important to everyone's health. Full sequencing costs like $200 today and i m sure millions would be willing to give their data to a public database that would take the field meaningfully forward. Instead what we get is we allow these companies to sell data which should be public domain. The US has its problems with their broken social security - here's a suggestion: fix it. Europe could use its public healthcare advantage to get ahead in genomic research while simultaneously improving the quality of its health systems. Instead we are stuck with outdated privacy laws (yet somehow making tax information public is OK). China could be using their own advantage and databases to get ahead in genomics while we 're still playing hide and seek.

From abstract: “… We discover several risk-increasing associations with traits related to liver disease, eye disease and cancer, among others, as well as novel risk-lowering associations for hypertension (SLC9A3R2), diabetes (MAP3K15, FAM234A) and asthma (SLC27A3)…“

Wow, what a time to be alive! For those who are not into bio, this type of information is invaluable for untangling the molecular mechanisms of disease and can provide important clues about where we should intervene with a small molecule or antibody drug to disrupt disease processes.

To give a coding analogy, you can think of this as trying to debug a program written in a language we can’t read. We have no idea what lines are doing what, but there are many different versions / forks of this program floating around. By looking at all the diffs and comparing the outputs, we can start to figure out what some of the code is doing, and which lines we might want to comment out to correct certain types of bugs.

I think I missed the novel technique they applied.

Surely others try to match particular genes to diseases. Surely that’s one of the pillars of the entire field.

The novelty lies in the size of the data set. ~450,000 exomes is a lot.

This is neat and will lead to a lot of interesting discoveries. Golden age for genomics is ahead.

The golden age for genomics has been “ahead” since 1990.

It’s still ahead. We just don’t know which decade will yield the eureka.

Voyages of discovery are like that.

For people with health problems, identifying the genetic stuff contributing to their health issues can be game changing.

awful lot of resources were spent to find very little of deep value. Genomics is just marginal gains for each huge new amount of capital invested.

Why the pessimism? Lots of really interesting leads are coming out of this study.

well, I worked in the field for several decades and got really tired of the constant overhyping of the results. These results really aren't useful (they're not predictive, or useful for genetic screening) and the cost is really extraordinary (hundreds of millions of dollars). There are other areas of biology that provide much more value per dollar spent.

A reminder that these sorts of genome-wide studies are not particularly reliable due to the problem of multiple comparisons: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3270946/

A reminder that these sorts of genome-wide studies haven't used 0.05 but genome-wide significance (10e-8 usually, and 10e-11 in OP) for a decade now for precisely that reason, and have since replicated just fine, thank you very much.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact