CMU-CS-24-154
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-154

Learning genome-wide interactions of intrinsically
disordered proteins with DNA using U-DisCo

Hongwei Tu

M.S. Thesis

December 2024

CMU-CS-24-154.pdf


Keywords: Intrinsically disordered proteins, protein-DNA interactions, deep learning

Proteins are essential regulators of cellular processes. Intrinsically disordered proteins (IDPs), despite lacking stable tertiary structures under physiological conditions, play crucial yet often underexplored roles in biological processes. With recent experimental advances like DisP-seq for probing IDP-DNA binding, there is a pressing need for efficient, interpretable computational methods to identify sequence determinants of IDP-DNA interactions and analyze their cooperative effects on gene regulation. To address this, we develop U-DisCo, a novel deep learning model that predicts base-resolution IDP-DNA binding profiles directly from DNA sequences. Leveraging a U-Net architecture, U-DisCo captures both local base-level interactions and long-range dependencies up to 20 kilobases with high accuracy and computational efficiency, outperforming the baseline BPNet. By incorporating ATAC-seq data, U-DisCo enables robust cross-cell type predictions as a multimodal framework. U-DisCo identified key IDP-binding motifs, revealing distinct interaction patterns and cooperative behaviors across different IDPs. Interestingly, we observed short-range interactions for motifs like AP-2 and EWS-FLI1 (single GGAA motif), while others exhibited independent, enhancer-like functions. Further analysis revealed that some IDPs favored certain strand orientations, suggesting their involvement in specific regulatory mechanisms. Overall, U-DisCo is the first computational approach to explore multiple IDPs within a single cell type, offering a versatile framework for studying IDP-mediated gene regulation and genome-wide regulatory elements.

42 pages

Thesis Committee:
Jian Ma (Chair)
Lei Li

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by [email protected]