|
CMU-CS-05-164
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-05-164
A Randomized Algorithm for Learning Mahalanobis Metrics:
Application to Classification and Regression of Biological Data
Christopher James Langmead
July 2005
CMU-CS-05-164.pdf
Keywords: Computational biology, metric learning, classification, regression
We present a randomized algorithm for semi-supervised learning
of Mahalanobis metrics over Rn.
The inputs to the algorithm are a set, U, of unlabeled points in
Rn, a set of pairs of points,
S = {(x,y)i}; x,y ∈ U,
that are known to be similar, and a set
of pairs of points,
D = {(x,y)i}; x,y ∈ U,
that are known to be dissimilar. The algorithm randomly samples
S, D, and m-dimensional subspaces of
Rn and learns a metric
for each subspace. The metric over Rn is a linear
combination of the subspace metrics. The randomization addresses
issues of efficiency and overfitting. Extensions of the algorithm
to learning non-linear metrics via kernels, and as a
pre-processing step for dimensionality reduction are discussed.
The new method is demonstrated on a regression problem
(structure-based chemical shift prediction) and a classification
problem (predicting clinical outcomes for immunomodulatory
strategies for treating severe sepsis).
15 pages
|