Cracking the code behind the code: how a unique collaboration between Oncode Investigators reveales the rules of gene regulation

Some scientific breakthroughs don’t belong to a single lab, discipline, or idea. They only become possible when collaboration is the strategy. Oncode Institute brought seven Oncode Investigators together across biology, AI, and clinical research to create PARM – a new method to break the limits of traditional genomic computing. This enabled them  to reveal the language for genetic instructions that tell genes when to turn on and off. 

2026. 02. 04.

Some scientific questions are so complex that no single discipline can answer them alone. Understanding how cells know when to switch on or off genes is one of those questions. Today, scientists publish in Nature  about their relentless back-and-forth between lab experiments and computation that enabled them to build this lightweight model. Scientists around the world can now start using this tool for reading these genetic instructions, creating leads for new cancer diagnostics, patient stratification, and future therapies.

For decades, scientists have known how DNA in our genes encodes proteins. But a deeper mystery remained: how the same DNA behaves differently in different cell types. How does a cell decide which genes to activate, and which ones to keep silent?

“The classical genetic code explains how genes in our DNA encode proteins,” explains Bas van Steensel from the Netherlands Cancer Institute (NKI). “But for most genes, we honestly didn’t understand how they are regulated. We know that the DNA between our genes contains regulatory elements such as promotors.  But the language of this control system that decides whether a gene turns on or off, in which cell, and how strongly was largely unknown.” 

Within the Oncode Institute, a bold idea emerged: what if this code could be learned, not by studying one gene at a time, but by combining millions of measurements with artificial intelligence? This ambition gave rise to the PERICODE project, a mission to decode the genome’s 'operating system' and answer the question: Why do some changes in non-coding DNA, such as promoters, have devastating consequences, while others have no effect at all?

Teaching the AI model 

PERICODE brought together seven Oncode Institute Investigators, driven by the same mission, but spanning genomics, AI, biochemistry , and clinical research. 

In the Bas van Steensel lab at the NKI, researchers developed a technology that made it possible to measure gene regulation at an unprecedented scale. Millions of specially designed measurements captured how short DNA sequences influence gene activity. But data alone is not insight. That is where Jeroen de Ridder and his team entered the picture. The volume of data specifically targeted to gene regulation enabled training AI models that truly captured the biological rules underlying gene activation. 

“Most AI models learn from whatever data happens to exist,” de Ridder explains. “Here, the measurements and the AI were designed together. This allowed us to make super-efficient models for specific cell types that could be applied at a scale previously unthinkable”

The moment the code became readable

The PARM model enabled the team to study how gene regulation differs between cell types, how it changes when cells are exposed to stimuli such as drug treatments. Moreover, the model revealed in extreme detail what the architecture of the ‘on and off buttons’ of each gene is. Crucially, the team did not stop at prediction. Every model output was subjected to rigorous experimental testing to make sure that these predictions were indeed correct. 

This relentless back-and-forth between experiment and computation revealed something remarkable: gene regulation is far more predictable than previously believed. “We’ve known the letters of the genetic alphabet for decades,” says Van Steensel. “What we lacked was the grammar/language for the gene control system. Now, we can understand the system better.  PARM allowed us to uncover those rules at scale, so we can now understand, and even predict, how regulatory DNA controls gene activity.”

Why this matters: especially for cancer

Most mutations in cancer genomes do not alter proteins at all. Instead, they disrupt regulatory DNA, switches that control gene activity. Until now, interpreting such mutations has been extremely difficult. Despite notable progress in the field, the existing AI models were either too heavy to be applied to the vast numbers of mutations that exist or are too generic and do not adequately capture cell type variability. The new AI model changes that. It allows researchers to predict the functional impact of regulatory mutations in specific cell types and under specific conditions, opening new paths for cancer diagnostics, patient stratification, and future therapies.

From fundamental discovery to real-world impact

Because the regulatory code uncovered in PERICODE is general and predictive, it opens the door to multiple applications, ranging from new diagnostic tests to discovery of new drug targets.

There are possibilities in actively pursuing patenting and valorization of the underlying technology. The goal is not only scientific impact, but translation: creating pathways toward diagnostics, therapeutics, and potentially new ventures emerging from this discovery.

Key facts at a glance

  • Project: PERICODE (Oncode Institute initiative)
  • Collaboration: 7 Oncode labs:
    • Prof. Bas van Steensel, NKI
    • Prof. Jeroen de Ridder, UMC Utrecht
    • Prof. Emile Voest, NKI
    • Prof. Michiel Vermeulen, NKI
    • Prof. Lude Franke, UMC Groningen
    • Dr. Sarah Derks, UMC Amsterdam
    • Prof. Wilbert Zwart, NKI

Bas van Steensel, Oncode Investigator 
“We’ve known the letters of the genetic alphabet for decades. What we lacked was the grammar/language for the gene control system. Now, we can understand the system better.  PARM allowed us to uncover those rules at scale, so we can now understand, and even predict, how regulatory DNA controls gene activity.”

Jeroen de Ridder, Oncode Investigator
“Most AI models learn from whatever data happens to exist. Here, the measurements and the AI were designed together. This allowed us to make super-efficient models for specific cell types that could be applied at a scale previously unthinkable”