|
CMU-CS-02-126
Computer Science Department
School of Computer Science, Carnegie Mellon University
CMU-CS-02-126
Using Asymmetric Distributions to Improve Classifier Probabilities:
A Comparison of New and Standard Parametric Methods
Paul N. Bennett
April 2002
An abbreviated version of this report will also appear in the
Proceedings of the 26th Annual International ACM
Conference on
Research and Development in Information Retrieval (SIGIR)
Toronto, Canada, July 28 - August 1, 2003.
CMU-CS-02-126.ps (Color)
CMU-CS-02-126.pdf (Color)
CMU-CS-02-126.ps (B&W)
CMU-CS-02-126.pdf (B&W)
Keywords: Calibration, well-calibrated, reliability, posterior,
text classification, cost-sensitive learning, active learning,
post-processing, probability estimates
For many discriminative classifiers, it is desirable to convert an
unnormalized confidence score output from the classifier to a
normalized probability estimate. Such a method can also be used for
creating better estimates from a probabilistic classifier that
outputs poor estimates. Typical parametric methods have an underlying
assumption that the score distribution for a class is symmetric; we
motivate why this assumption is undesirable, especially when the
scores are output by a classifier. Two asymmetric families, an
asymmetric generalization of a Gaussian and a Laplace distribution,
are presented, and a method of fitting them in expected linear time is
described. Finally, an experimental analysis of parametric fits to
the outputs of two text classifiers, naive Bayes (which is known to
emit poor probabilities) and a linear SVM, is conducted. The analysis
shows that one of these asymmetric families is theoretically
attractive (introducing few new parameters while increasing
flexibility), computationally efficient, and empirically preferable.
24 pages
|