Hello everyone! Long time no see! Today we are going to share my reflection on an article published 3 months ago by the group of Rob Phillips, in which they revised our understanding about how the binding energy between RNA polymerases and promoters affects the level of gene expression.
Figure source: The Figure 4 of Einav, T. and Phillips, R. (2019). How the avidity of polymerase binding to the –35/–10 promoter sites affects gene expression. Proceedings of the National Academy of Sciences, 116(27), pp.13340-13345.
Bioengineers always seek to manipulate genes precisely, just like engineers can tune a machine with knobs and buttons. To attain this goal, we need to predict the gene expression levels from their DNA sequences. However, such effort is hampered by our inability to fully interpret the sequence of the promoters. A promoter is a DNA sequence where RNA polymerases bind and initiate gene transcription; it is usually composed of a background element, an upstream element, a -35 site, a -10 site, and a spacer between the -35 and the -10 sites. Traditionally, we believe that a strong binding energy between promoters and RNA polymerases can prevent nonselective binding, and thus produce more mRNAs. To identify the binding energy, we use energy matrix model, which assumes that we can calculate the energy by independently adding the contribution from each element. Nonetheless, Guillaume Urtecho and his colleagues recently published a set of gene expression data with 12,288 artificial promoters that cannot be explained by energy matrix model . In response to this contradiction, Tal Einav and Rob Phillips proposed a new model in “How the avidity of polymerase binding to the -35/-10 promoter sites affects gene expression.” Although the authors failed to clearly define their cutoff value in determining the significance of interactions, limiting the feasibility of this new method, their theory revolutionized our assumptions on the interactions between promoters and RNA polymerases.
Einav and Phillips first analyzed the possible binding patterns and their corresponding energies between an RNA polymerase, a -35 site, and a -10 site, the two strongest binding elements within a promoter. An RNA polymerase can bind to nothing, to -35 site alone, to -10 site alone, or to both -35 and -10 sites. According to the energy matrix model, if an RNA polymerase binds to both -35 and -10 sites, the total energy equals the sum of the binding energies from both sites. The authors reconfirmed that the energy matrix model failed to explain Urtecho’s results. They therefore proposed a multivalent model, which hypothesized an avidity effect between the -35 and the -10 sites: the binding to one site facilitates the binding to the other. To quantify this effect, they added an interaction energy term to the total energy when an RNA polymerase binds to both sites. With this model, they estimated the total binding energies of different promoters, reconciled the conflicts between the theories and Urtecho’s results, and discovered that multivalent binding makes promoters more resistant to mutation. Furthermore, they noticed that the promoters with the strongest binding energies did not produce the most mRNAs: when the binding energy exceeds a certain value, the gene expression level paradoxically decreases. They reasoned that this paradox is because RNA polymerases must dissociate from the promoter to initiate transcription. If RNA polymerases bind too strongly to the promoter, the promoter will become a trap that stops RNA polymerases from reading the DNA sequences. They then developed an equation for such effect and predicted that the energy required to initiate transcription equals $6.2 k_BT$.
In this study, by amending energy matrix model, Einav and Phillips successfully deciphered the gene expression data with measurable physical quantities. The concept of the energy matrix model was first proposed in 1981 ; it not only explained existing data, but also helped us discover unknown gene regulatory sites . The failure of the energy matrix model is therefore quite shocking. In Urtecho’s study, he resorted to machine learning after the failure of energy matrix model . However, machine learning alone cannot generate measurable physical quantities such as binding energy. The timely success of multivalent model thus resuscitated the practice of using physics principles to understand gene regulations.
Einav and Phillips also revolted against an enduring assumption that stronger binding energy translates to more gene expression. This finding is revolutionary because it violates the assumptions of almost every biophysical model about gene regulations, forcing us to reexamine the validity all previous models. In my opinion, one reason why we never noticed this before is probably because the energy matrix model underestimates the binding energy, concealing the imprisoning effect of strong promoters on RNA polymerases. If we reanalyze previous experiments with the multivalent model, we might discover how ubiquitous this effect is. In addition, the authors predicted the transcription initiation energy to be $6.2 k_BT$. It will be very illuminating if future studies give a similar prediction, pointing to a common mechanism.
However, A tiny blemish in this research is that the authors fail to provide a persuasive explanation for their cutoff point in recognizing interactions. (The authors tried to explain their reasoning in the supplementary information. However, it seems a little subjective to me.) To detect interactions among different promoter elements, they proposed a mathematical formula, which can also be applied to other regulatory mechanisms. Their formula produces a correlation coefficient between the predicted gene expressions and the measured ones. In this framework, a poor correlation suggests the existence of interactions. Although the correlation coefficients between several pairs of promoter elements approach 0, which definitely imply poor correlation, they only regarded a negative correlation coefficient as significant. Their method would have been more instructive if they had built a more systematic method to determine when interactions among promoter elements are non-negligible.
In conclusion, although Einav and Phillips fail to give a clear instructions on when to use the multivalent model and how to interpret the correlation coefficient generated by their formula, they provided a new paradigm in modeling the interactions between promoters and RNA polymerases and revised our understanding of how the binding energy of promoters affects gene expression. Moving forward, it will be exciting to see if the multivalent model can help dissect other gene regulatory systems. This article will definitely spur related research and further unveil the mechanisms behind gene transcription regulations.
 Urtecho, G., Tripp, A., Insigne, K., Kim, H. and Kosuri, S. (2018). Systematic Dissection of Sequence Elements Controlling σ70 Promoters Using a Genomically Encoded Multiplexed Reporter Assay in Escherichia coli. Biochemistry, 58(11), pp.1539-1551.
 Jencks, W. (1981). On the attribution and additivity of binding energies. Proceedings of the National Academy of Sciences, 78(7), pp.4046-4050.
 Belliveau, N., Barnes, S., Ireland, W., Beeler, S., Kinney, J. and Phillips, R. (2018). A Systematic and Scalable Approach for Dissecting the Molecular Mechanisms of Transcriptional Regulation in Bacteria. Biophysical Journal, 114(3), p.151a.