Background

Bioproduction of valuable vaccines and biotherapeutics play essential roles in disease treatment and prevention; however, developing robust, predictable, and sustainable expression is challenging, especially in the mammalian cell line. A recent work (Jan Zrimec, 2022) uses generative adversarial networks (GAN) model to solve this issue, which perturbs the regulatory element of DNA sequence to achieve the desired transcription result.

The success of diffusion model in image generation field opens up new opportunities, inspiring researchers to apply diffusion model in biological entity generation, such as RFdiffusion for protein generation. In this project, you will explore the possibility of applying diffusion model to stable and diverse DNA sequence generation, with a focus on one of the regulatory elements in the DNA called promoter.

Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models

Controlling gene expression with deep generative design of regulatory DNA - Nature Communications

Effective gene expression prediction from sequence by integrating long-range interactions - Nature Methods

Automated model-predictive design of synthetic promoters to control transcriptional profiles in bacteria - Nature Communications

Project Aim

The first stage of the project focuses on using the diffusion model to generate diverse DNA sequence for a fixed transcription profile.

GitHub - JanZrimec/ExpressionGAN

GitHub - calico/basenji: Sequential regulatory activity predictions with deep convolutional neural networks.

EPD The Eukaryotic Promoter Database

Extension