Text representations learned via language modeling have been incredibly effective, surpassing the prior state-of-the-art across a variety of downstream tasks. I’ll give an overview of one such model: BERT (Bidirectional Encoder Representations from Transformers; Devlin et. al, 2018), developed by the Google AI Language group. BERT creates pre-trained representations that can be fine-tuned, which allows it to be used effectively in a variety of tasks with minimal architecture modifications.
However, despite large gains on standard benchmarks, neural network models still make mistakes due to mismatches between the training-test inputs and the inputs an NLP system would be asked to handle “in the wild”. I’ll discuss two case studies from our group that demonstrate how aligning model training to be sensitive to linguistic properties of expected real-world inputs can greatly improve accuracies.
The first (Zhang et. al, 2018) addresses text containing multiple languages, which is prevalent online. A feed-forward network with a simple globally constrained decoder outperforms previously published multilingual approaches in terms of both accuracy and speed, yielding an 800x speed-up and a 19.5% averaged absolute gain on three codemixed datasets.
The second (Elkahky et. al, 2018) addresses noun-verb ambiguity in English part-of-speech tagging, a frequent source of egregious errors. We create a new dataset of over 30,000 naturally-occurring non-trivial examples of noun-verb ambiguity. Enhancing the strongest existing tagger with contextual word embeddings and targeted training data gives a 52% relative improvement. Downstream, using just this enhanced tagger yields a 28% reduction in error over the prior