Here is a new series of interesting articles:
PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization from Google.
Their hypothesis is that a pre-training task that’s closer to the final task will make the model better. To test this, their pre-training consist at trying to guess missing sentences (ex with 2, 3 words in the blog) in a document. They also found that sentences to mask have to be important (they use ROUGE to determine which are the important ones). Thn, they need only 1k examples for fine-tuning. Here is a link to their code
RAG Model for open domain QA
State-of-the-art approaches such as generative seq2seq transformers leverage a large amount of unlabelled text to build a general model of language understanding before being fine-tuned on specific NLP tasks such as sentiment analysis or question answering (QA). While such models are packed with potential, they also have three major downsides:
- they cannot easily expand or revise their memory
- they can’t straightforwardly provide insight into their predictions
- they may produce occasional “hallucinations.”
Like standard seq2seq models, RAG takes a sequence as input and outputs a corresponding sequence. But rather than passing the input directly to the generator, RAG instead uses the input to retrieve a set of relevant documents, such as articles from the Wikipedia corpus.
Here is the paper and the transdormer implementation.
REALM: Integrating Retrieval into Language Representation Models
It’s google novel paradigm for language model pre-training, which augments a language representation model with a knowledge retriever, allowing REALM models to retrieve textual world knowledge explicitly from raw text documents, instead of memorizing all the knowledge in the model parameters. A unique aspect of REALM is the way that we train this retrieval mechanism. Instead of relying on a pre-existing document retrieval system, we train a neural document retriever using an unsupervised fill-in-the-blank training objective. Code is here
Wav2Vec 2.0: Automatic Speech Recognition From 10 Minute Sample
The paper Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations claims to “show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.”
Facebook AI researchers believe learning good representations of speech is the key to success. “Learning purely from labeled examples does not resemble language acquisition in humans: infants learn language by listening to adults around them – a process that requires learning good representations of speech.”
Here is the paper and the code
- Advancing Instance-Level Recognition Research
Instance-level recognition (ILR) is the computer vision task of recognizing a specific instance of an object, rather than simply the category to which it belongs. For example, instead of labeling an image as “post-impressionist painting”, we’re interested in instance-level labels like “Starry Night Over the Rhone by Vincent van Gogh”, or “Arc de Triomphe de l’Étoile, Paris, France”, instead of simply “arch”. The code is here