CMU-CS-20-111 Computer Science Department School of Computer Science, Carnegie Mellon University
Injecting output constraints into neural NLP models Jay Yoon Lee Ph.D. Thesis July 2020
The goal of this thesis is injecting prior knowledge and constraints into neural models, primarily for natural language processing (NLP) tasks. While neural models have set new state of the art performance in many tasks from computer vision to NLP, they often fail to learn to consistently produce well-formed structures unless there is an immense amount of training data. This thesis argues that not all the aspects of the model have to be learned from the data itself, and shows that injecting simple knowledge and constraints into neural models can help low-resource, out-of-domain settings, as well as improve state-of-the-art models. This thesis focuses on structural knowledge of the output space and injects knowledge of correct or preferred structures as an objective to the model without any modification to the model structure, in a model-agnostic way. The first benefit in focusing on knowledge on the output space is that it is intuitive as we can directly enforce output to satisfy logical/linguistic constraints. Another advantage of structural knowledge is that it often does not require a labeled dataset. Focusing on deterministic constraints on the output values, this thesis first applies output constraints at inference time via the gradient-based inference (GBI) method. In the spirit of gradient-based training, GBI enforces constraints for each input at test-time by optimizing continuous model weights until the network's inference procedure generates an output that satisfies the constraints. Then, this thesis shows that constraint injection on inference-time can be extended to training time: from instance-based optimization at test time to generalization to multiple instances at training time. In training with structural constraints, this thesis presents (1) a structural constraint loss, (2) a joint objective of structural loss and supervised loss on a training set and, (3) a joint objective in a semi-supervised setting. All the loss functions show improvements and among them, the semi-supervised approach shows the largest improvement and is particularly effective in a low-resource setting. The analysis shows that the efforts at training time and at inference time are complementary rather than exclusive: the performance is best when efforts on train-time and inference-time methods are combined. Finally, this thesis presents an agreement constraint on a multi-view learning that can utilize the semi-supervised approach with the constraint. The presented agreement constraint in multi-view learning is general in that it can be applied to any sequence-labeling problem with multiple views, while other constraints in this thesis consider prior knowledge about specific tasks. This semi-supervised approach again shows large gains in low-resource settings and shows effectiveness on high-resource as well.
126 pages
Thesis Committee:
Srinivasan Seshan, Head, Computer Science Department
| |
Return to:
SCS Technical Report Collection This page maintained by [email protected] |