All that glitters is not gold…

The start of the genomic era is widely believed to be April 14, 2003, when human genome sequencing was declared complete. Since this incredible achievement, over the last two decades, the general population has constantly been exposed to news, and perspective articles, talking about the wonders of genomics. Not a week has gone by without an article hailing the discovery of “the cancer gene” or “the obesity gene”, detailing efforts in which a particular genetic variation is correlated with a uni que condition in a GWAS (Genome Wide Association Study). The articles often end with a vague statement about how this discovery could transform the diagnosis and management of the said condition, heralding the beginning of the golden era of personalized/precision healthcare. However, the promised golden era has not yet materialized. Frustratingly, even the identification of a precise genetic variant is not enough to guarantee a treatment or disease management plan. For example, drugs targeting the V600E mutation on the BRAF gene are effective against melanoma but have no activity in colorectal tumors despite strong correlation with exactly the same mutation. There is increasing consensus that attempting to treat, or even diagnose, diseases merely based on genetic information is likely to be an overly simplistic approach. Don’t get me wrong, I am by no way asserting that genomics isn’t useful. Genomic tools, and associated insights, has undeniably helped improve our fundamental understanding of biology as well as in developing several life-saving clinical interventions and diagnostic assays. Genetic tools have, more recently, also been instrumental in understanding covid-19 as well as developing the vaccine for the same. However, IMHO the actionable insights garnered from the genomic information, in most cases, does not match the hype that is often associated with the discovery of the said information, especially as we get closer to the hospital bed or the clinic. This is fundamentally because the genes are merely the “blueprints”, it represents the way things “should be” not the way “they are”. Making definitive statements about the health of an individual/population from just their genetic information is in many ways analogous to predicting the quality of a building merely by looking at its plans without paying equal (or more) attention to quality of construction materials or builders. Within an individual these “blueprints’’ first need to be transcribed into RNA and then translated into various protein molecules (the real “actors” of the cells) in order to have any effect on the state of an individual.

Within all living systems proteins interact with each other as well as with RNA, DNA and other small molecules to directly impact the observable reality. So, for all intents and purposes, while proteome is the workhorse of biology the genome captures most of the popular headlines. In clinical settings, proteins make up the majority of biomarkers used in disease diagnosis, risk assessment, and evaluation of treatment effectiveness. In fact, the majority of blood-based molecular tests run in clinical laboratory are on proteins. For example, proteins such as troponin I are used to diagnose myocardial infarction (heart attack), a relatively common test performed whenever an individual arrives at an emergency room with symptoms of a heart attack. PSA and CA-125 are a couple of other well accepted protein blood serum biomarkers for prostate and ovarian cancer diagnosis respectively as well as for treatment management. Diagnostic methods also rely heavily on protein detection in tissues as well as other non-blood fluids. For example, protein identification within tissue biopsies using immunohistochemistry (IHC) are heavily used to support diagnosis of cancer and muscular disorders. Cell-based immunoassays measured by flow cytometry are also used regularly for diagnosing and staging hematological neoplasm, HIV infection, and chemotherapy monitoring. Outside the clinical setting, proteins find most relevance within the pharmaceutical industry as developing drugs primarily involves targeting proteins. This application needs even more sophisticated protein analysis tools than those needed in clinical settings because the requirement of finding low toxicity, high efficacy, drugs is much more demanding than biomarker discovery. Protein analysis is used within the drug discovery pipeline for: target identification, constructing mechanistic understanding of signaling pathways, testing drug efficacy and toxicity, patient stratification, repositioning drugs and identifying diagnostic biomarkers.

While these existing applications clearly indicate the importance of protein analysis within the clinical and research domains, I believe this is merely the beginning of a whole new approach to biological research and healthcare. This is because the current protein measuring techniques are arduous, usually involving the end users having to make the choice between sensitivity and multiplexing. However, over the last decade various techniques have matured independently, some even commercialized, pushing the limits of the sensitivity and multiplexing. On the sensitivity front, Quanterix’s Simoa platform can demonstrably achieve attomolar sensitivity which is at least 3 orders of magnitude better sensitivity compared to its closest rival. The enhanced sensitivity of simoa also enables discovery, and validation, of new biomarkers as well as the ability to monitor biomarker concentration well below the typical detection limit. Improvement in sensitivity is also being pursued by Protein Simple, Isoplexis and MSD though their capabilities are not at the same level as Quanterix. On the multiplexing front, Somalogic and Olink offer products that enable quantification at ~7000 and ~1500 proteins from few microliters of sample, demonstrating an entirely new approach to proteomics that does not rely on single biomarkers but identification of patterns within the protein expression that are associated to specific phenotypes. Few of the recent peer-reviewed proof-of-principle studies using these highly multiplexed approaches show how the assay, combined with pattern recognition algorithms, allows accurate predictions of complex biological phenotypes like VO2 max, Body fat percentage, cardio function as well as others. It is worth noting that none of these phenotypes could be associated with a single unique biomarker since they are system level properties. These results are also particularly interesting because the general approach bears striking resemblance to studies from early days of genotyping enabled GWAS. I strongly believe that such studies are merely the tip of the iceberg, the approach with genomics, like those likely in the upcoming olink studies, will further increase the available insights.

In the next few of posts (next week or two), I will go into (a) the breadth and depth of the proteome as well as the complexity introduced by the various proteoforms, (b) the technical details about the prevailing approaches to proteome analysis © a couple of unique use case of next-gen proteomics beyond direct healthcare or biology and (d) the vision of hypothesis-free healthcare.

All that is gold does not glitter,
Not all those who wander are lost;
The old that is strong does not wither,
Deep roots are not reached by the frost.

From the ashes a fire shall be woken,
A light from the shadows shall spring;
Renewed shall be blade that was broken,
The crownless again shall be king.

— J.R.R. Tolkien

Disclaimer: AG is part of the junior faculty at MIT, he is also co-founder of Palamedrix as well as involved in a few other stealth startups that are all actively working in the field of next-gen proteomics.

Note 1: I have provided links to some academic papers in this post which might not be accessible to certain readers due to a paywall. In case you are experiencing such an issue I would like to point you to sci-hub, a website that can help you circumvent the paywall. Please do not take this as me condoning or supporting the approach taken by the said website; however, I strongly believe that all scientific research that is funded by the public (directly or indirectly) should be open-access.

Note 2: I have avoided talking about how protein detection is used within an academic setting because that use case is too fragmented to be described concisely.

.