Large Language Models (LLMs) are smart enough to understand context. They can answer questions, leveraging their vast training data to provide coherent and contextually relevant responses, no matter whether the topic is astronomy, history, or even physics. However, due to their inability to connect the dots and remember all the details, LLMs, especially the smaller models like llama2-13b-chat, can hallucinate even when the requested knowledge is in the training data.

A new technique, Retrieval Augmented Generation (RAG), fills the knowledge gaps, reducing hallucinations by augmenting prompts with external data. Combined with a vector database (like MyScale (opens new window)), it substantially increases the performance gain in extractive question-answering systems, even with exhaustive knowledge bases like Wikipedia in the training set.

To this end, this article focuses on determining the performance gain with RAG on the widely-used MMLU dataset. We find that both the performance of commercial and open source LLMs can be significanlty improved when knowledge can be retrieved from Wikipedia using a vector database. More interestingly, this result is achieved even when Wikipedia is already in the training set of these models.

Leave a Reply

Your email address will not be published. Required fields are marked *