CMU-CS-16-123 Computer Science Department School of Computer Science, Carnegie Mellon University
Tinkering Under The Hood: Vivek R. Krishnan August 2016 M.S. Thesis
We consider the task of visual zero-shot learning, in which a system must learn to recognize concepts omitted from the training set. While most prior work make use of linguistic cues to do this, we do so by using a pictorial language representation of the training set, implicitly learned by a CNN, to generalize to new classes. We first demonstrate the robustness of pictorial language classifiers (PLCs) by applying them in a weakly supervised manner: labeling unlabeled concepts for visual classes present in the training data. Specifically we show that a PLC built on top of a CNN trained for ImageNet classification can localize humans in Graz- 02 and determine the pose of birds in PASCAL-VOC without extra labeled data or additional training. We then apply PLCs in an interactive zero-shot manner, demonstrating that pictorial languages are expressive enough to detect a set of visual classes in MSCOCO that never appear in the ImageNet training set.
30 pages
Frank Pfenning, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection |