|
CMU-CS-05-163
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-05-163
Structure Based Chemical Shift Prediction
using Random Forests Non-linear Regression
K. Arun*, Christopher James Langmead
July 2005
CMU-CS-05-163.pdf
Keywords: Computational biology, structural biology,
Nuclear Magnetic Resonance, NMR, chemical shift, regression,
Random Forests
Protein nuclear magnetic resonance (NMR) chemical shifts
are among the most accurately measurable spectroscopic parameters
and are closely correlated to protein structure because of their
dependence on the local electronic environment. The precise nature
of this correlation remains largely unknown. Accurate prediction of
chemical shifts from existing structures' atomic co-ordinates will
permit close study of this relationship. This paper presents a novel
non-linear regression based approach to chemical shift prediction
from protein structure. The regression model employed combines
quantum, classical and empirical variables and provides
statistically significant improved prediction accuracy over existing
chemical shift predictors, across protein backbone atom types. The
results presented here were obtained using the Random Forest
regression algorithm on a protein entry data set derived from the
RefDB re-referenced chemical shift database.
14 pages
*Department of Biological Sciences, Carnegie Mellon University
|