Evaluating the Factual Consistency of Abstractive Text Summarization

https://arxiv.org/pdf/1910.12840.pdf

Summarization is still an unsolved problem in NLP and probably the one the lease progress has been made towards even with all the new language progress now. One of the aspects that makes this a tough problem is that there hasn't been a great way to measure accuracy.

In this paper the authors propose a new synthetic dataset. The dataset is generated from source documents (aka the thing you're trying to summarize) that have had a series of rule-based transformations. They wanted to make it possible to train a classifier to detect the correctness between the synthetic and real source data. The image below from the paper shows what transformations they applied.

This paper is also interesting because they frame summarization as a QA problem. A summary can be thought of as an answer to questions about a source text.

I'm excited to see more work based off this dataset tackling summarization or QA.