Writing in the Margins (Proposed solution to Lost in the Middle) ✨

Sept 05, 2024

<aside> 🏹

We will be dissecting the paper: [Arxiv] Writing in the Margins: Better Inference Pattern for Long Context Retrieval

</aside>

Imagine you’re reading a dense, complex novel. As the pages turn, the plot thickens, and crucial details are scattered across the text, each paragraph holding a piece of the puzzle. But here’s the catch: when you try to recall those key points later, they’ve become a tangled mess, lost somewhere in the middle of the story. Frustrating, right?

Now, consider this same challenge, but instead of you, it’s a large language model (LLM) tasked with understanding and synthesizing those details. LLMs, like GPT-4, are incredibly powerful, but they face their own struggles when it comes to processing long texts. They can easily get "lost in the middle," missing out on crucial information that’s buried deep into long pieces of text.

This is where the Writing in the Margins (WiM) approach comes into play—an interesting technique designed to help this very problem. All credit to the Writer team.

In this blog, we will deep dive into what WiM is, how it works, and why should we care.

The Problem: Lost in the Middle

LLMs are typically designed to process text up to a certain length, known as the context window. When the text exceeds this length, the model either truncates the input (potentially missing crucial information) or struggles to maintain coherence and relevance across the entire context. This issue becomes even more pronounced when dealing with texts where relevant details are not just at the beginning or end but scattered throughout, a scenario often referred to as the "lost in the middle" problem.

The Solution: Writing in the Margins (WiM)

The Writing in the Margins (WiM) approach provides a novel solution to the "lost in the middle" problem.

<aside> 💡

But what the heck is “Margin”?

</aside>

Just like we jot down notes in the margins of a book highlighting the important stuff without having to reread the whole thing, WiM enables language models to do the same by breaking down long context into smaller chunks/segments, and for each chunk, WiM extracts its own "margin" (aka intermediate summary). These margins capture the most important points of each segment, focusing on the information most relevant to the task at hand.

Since it’s inspired by the margins of the book, therefore the term “Margin”.

Key Features of WiM

Chunking the Long Context: WiM breaks down long texts into smaller chunks.
Processing Each Chunk Individually: Generating margin notes (summaries) and classifying these margin notes to determine their relevance to the query.
Integrating Margins into Final Inference: Using these combined notes along with the original instruction to generate a precise and informed final response.
Memory Efficiency: The method is designed to be computationally efficient, leveraging a partially prefilled KV cache to avoid multiple reprocessing steps.

The Problem: Lost in the Middle

The Solution: Writing in the Margins (WiM)

Key Features of WiM

How WiM Works: Step-by-Step Breakdown