Summarizing text is a task at which machine learning algorithms are improving, as evidenced by a recent paper published by Microsoft. That’s good news — automatic summarization systems promise to cut down on the amount of message-reading enterprise workers do, which one survey estimates amounts to 2.6 hours each day.
Not to be outdone, a Google Brain and Imperial College London team built a system — Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence, or Pegasus — that leverages Google’s Transformers architecture combined with pretraining objectives tailored for abstractive text generation. They say it achieves state-of-the-art results in 12 summarization tasks spanning news, science, stories, instructions, emails, patents, and legislative bills, and that it shows “surprising” performance on low-resource summarization, surpassing previous top results on six data sets with only 1,000 examples.