keju: powerful and accurate inference in Massively Parallel Reporter Assays
Published in biorxiv, 2026
Massively Parallel Reporter Assays (MPRAs) interrogate the regulatory function of thousands of designed genetic elements in parallel through linked DNA and RNA readouts using an engineered construct and attached minimal reporter. Given the complexity of MPRA experimental designs, several different sources of uncertainty complicate inference. We show that previous methods do not account for substantial differences in uncertainty levels between the DNA and RNA counts and between batches. Accordingly, we present keju, a hierarchical statistical model that estimates candidate transcription rate, differential activity between conditions, and effects from promoter composition for MPRA data. To maximize statistical power and improve false positive rate control, keju conditions on the DNA counts to model batch-specific and modality-specific uncertainty in the RNA counts. keju shows vastly improved sensitivity (59%) in simulations compared to previous methods (31% for MPRAnalyze and 9% for BCalm), and also has lower, more robust false positive rates, calling only 6.8% of unlabeled negative controls significant in real data (compared to 34% for MPRAnalyze and 12% for BCalm).
Recommended citation: Albert Xue, Adam M Zahm, Justin English, Sriram Sankararaman, and Harold Pimentel (2026). "keju: powerful and accurate inference in Massively Parallel Reporter Assays." biorxiv.
Download Paper
