Trending:
AI & Machine Learning

ReAct pattern: why LLM agents fail after 10 steps, and how to fix context overflow

ReAct (Reasoning + Acting) lets LLMs loop through thoughts and tool calls until they reach an answer. The pattern works until context windows fill up, attention dilutes, and agents start hallucinating. Here's what breaks and what works.

What ReAct Actually Does

ReAct agents loop: the LLM generates a thought, calls a tool (search, calculator, API), observes the result, then thinks again. This interleaving of reasoning and action, introduced in a 2022 paper, outperformed pure chain-of-thought on HotpotQA and Fever benchmarks by grounding answers in external data instead of pure model prediction.

The mechanics are straightforward. The system watches for formatted output like Action: search["weather Singapore"], intercepts it, runs the actual tool, and injects the result as Observation: 32°C, sunny. The LLM never executes anything. It just writes text that looks like a tool call. The scaffolding around it does the rest.

Frameworks like LangGraph and LangChain have standardized implementations. Enterprise teams use ReAct for agentic automation because the explicit reasoning traces are debuggable, a significant improvement over silent failures.

Where It Breaks

Two failure modes matter. First: infinite loops. The agent searches, rephrases, searches again without converging. The fix is simple: max iteration limits.

Second: context overflow. Every thought, action, and observation appends to the conversation. After 10-15 steps, token counts explode. But performance degrades before you hit the limit.

The attention mechanism spreads weight across all tokens. More tokens means each one gets a thinner slice of attention. Add the "lost in the middle" effect (models disproportionately attend to the start and end of context) and you get agents that hallucinate because they can't actually focus on reasoning from step 5 of a 15-step chain.

What Works

Production systems compress context between steps. Recent exchanges stay verbatim, older ones get summarized. LangChain's ConversationSummaryBufferMemory does this: sliding window with selective compression.

The tradeoff: lossy compression. The summarizer might drop a detail that becomes critical ten steps later. There's no perfect solution. You're trading context freshness against information retention.

The explicit thought step matters because chain-of-thought is extra computation. Without it, the model gets one forward pass. With it, every generated token becomes new context for the next pass. This is why ReAct uses more tokens than pure CoT but enables verifiable, grounded reasoning.

Worth noting: ReAct increases costs versus pure CoT for tasks that don't need tools. Some teams favor hybrids with reflection for advanced reasoning. The pattern is a starting point, not optimal for all scenarios. The token budget matters: context engineering is the difference between an agent that works and one that loops until it halts.