The Long Context

In "You Exist In The Long Context," Steven Johnson explores the advancements in large language models (LLMs), particularly the significant impact of long context windows. Johnson illustrates this progress by creating an interactive game based on his book, showcasing the LLM's ability to handle complex narratives and maintain factual accuracy. He draws a parallel between LLMs' short-term memory improvements and the case of Henry Molaison, a patient with severe memory impairment, highlighting how expanded context windows have overcome previous limitations. He ultimately argues that this enhanced contextual understanding allows for more sophisticated applications, including personalised learning and collaborative decision-making. Johnson concludes by discussing the potential for LLMs to become invaluable tools for accessing and integrating expert knowledge.

Limitations of Early Language Models like GPT-3

Early language models like GPT-3, while impressive for their time, exhibited a significant limitation: a limited context window. This meant they had a restricted short-term memory, analogous to the condition of patient H.M., who was unable to form new memories after a specific brain surgery.

GPT-3, introduced in 2019, had a context window of just over 2,000 “tokens”, equivalent to about 1,500 words. This was the maximum amount of new information that could be shared with the model. Exceeding this limit caused the model to "forget" information presented earlier in the conversation. It could follow short instructions based on its vast long-term memory (parametric memory) but struggled with extended narratives or explanations requiring the retention of information over a longer stretch of text. Essentially, interacting with GPT-3 was like having a conversation with someone who had to constantly be reintroduced to the topic because they couldn't retain information beyond a few sentences.

This limited context window resulted in several shortcomings:

Conversational Incoherence:The inability to remember previous turns in a conversation made interactions with GPT-3 feel disjointed and repetitive. Users had to repeatedly provide context, leading to an unnatural flow.
Increased Hallucinations: While GPT-3 possessed a vast knowledge base, its limited short-term memory made it prone to fabricating information, especially when the required information was not part of the immediate context.
Inability to Handle Complex Narratives or Arguments: GPT-3 struggled to follow narratives or arguments that spanned beyond its limited context window. Understanding relationships between events and concepts spread across a large text was impossible, limiting its analytical capabilities.

The subsequent expansion of context windows in models like ChatGPT (which boasts an 8K context window, four times larger than GPT-3) marked a significant advancement in AI capabilities. These larger context windows facilitated more coherent conversations, reduced hallucinations, and allowed for a deeper understanding of complex narratives. However, it's essential to note that even with these advancements, AI models still do not possess human-like consciousness or sentience.

Impacts of Expanding AI Context Windows

The expansion of AI context windows has been a pivotal factor in the advancements of AI capabilities, going beyond simply increasing the size of training data or model parameters. This expansion has led to significant improvements across various aspects of AI functionality:

Document Summarization and Processing: One prominent application is the processing of extensive documents or text corpora. With larger context windows, LLMs can maintain the coherence and relevance of the generated summary over longer texts. This is particularly beneficial for legal documents, research papers, and books, where context from the entire document is crucial for generating accurate summaries
Improved Conversational Agents: In the realm of chatbots and conversational agents, long context windows enable the model to maintain the context of the conversation over extended interactions. This means the AI can refer back to previous parts of the dialogue, providing more coherent and contextually relevant responses, leading to more sophisticated and human-like interactions.
Code Generation and Understanding: For developers using LLMs to assist in code generation, debugging, or understanding, larger context windows allow the model to consider more lines of code at once. This can improve the quality of the generated code and the accuracy of suggestions, as the model can better understand the overall structure and dependencies within the code.
Historical Data Analysis: In applications involving historical data, such as financial market analysis or historical research, long context windows enable the model to consider larger sequences of events. This can lead to more accurate predictions and insights, as the model can identify patterns and trends over more extended periods (Source [4]).
Complex Query Processing: When dealing with complex queries that require understanding multiple pieces of information from different parts of a large dataset, extended context windows can significantly enhance the model’s ability to retrieve and synthesize relevant information, providing more accurate and comprehensive responses (Source [9]).
Creative Writing and Content Generation: For tasks like story writing or content creation, where maintaining narrative coherence and consistency is vital, long context windows allow the model to track character development, plot points, and thematic elements over longer passages of text. This results in more cohesive and engaging content.

Long Context Windows vs. RAG

The advancements in long context windows have sparked a debate on the necessity of techniques like Retrieval Augmented Generation (RAG). While long context windows allow models to process and utilize vast amounts of context directly, RAG combines the retrieval of relevant information from external sources with the generative capabilities of LLMs. Here are some key applications and advantages of RAG:

Real-Time Information Retrieval: One of the primary advantages of RAG is its ability to retrieve up-to-date information from external databases or documents, ensuring that the generated content is current and accurate. Traditional language models, even with large context windows, rely heavily on their pre-existing training data, which can become outdated over time. RAG addresses this by accessing real-world data as needed, enhancing the model’s ability to answer complex and timely questions effectively.
Enhanced Enterprise AI Capabilities: RAG's ability to access specific, relevant external data enhances the model’s precision and utility. This combination is crucial for various enterprise applications, such as legal document analysis, financial reporting, and customer support, where accuracy and relevancy are paramount.
Augmented Retrieval and Agent Capabilities: RAG is particularly useful in applications where detailed and context-specific information retrieval is necessary. For example, in customer support systems, RAG can retrieve specific answers from a company’s knowledge base, providing more precise and contextually appropriate responses to user queries. This contrasts with long context window models that might struggle to identify the most relevant information from a vast pool of data.

The choice between long context windows and RAG significantly influences the overall performance of deep learning models in various real-world applications. RAG is significantly more scalable and cost-effective than long context windows because it only retrieves and processes the most relevant pieces of information, reducing the number of tokens that need to be processed. This approach minimizes computational costs and latency, making it suitable for high-volume queries and real-time applications.

Summary

In summary, long context windows improve LLM performance by allowing the model to process and retain more internal context without external retrieval. In contrast, RAG is an algorithmic retrieval technique that enhances LLMs by fetching relevant information from external sources. While long context windows cannot replicate the exact functionality of RAG, they can be used in conjunction with RAG to create a more powerful system. This combination allows the model to leverage the strengths of both approaches: the ability to process extensive internal context and the efficiency of selective external information retrieval.

Photo by Pixabay