Instacart's context problem: why LLMs can't nail grocery delivery in under a second

Real-time ordering demands sub-second responses, but LLMs struggle with the context needed for personalized grocery delivery. Instacart CTO Anirban Kundu explains why chunking data and deploying microagents beats monolithic systems when latency matters.

The Biggish Editorial · Wednesday, February 4, 2026

The brownie recipe problem

Instacart CTO Anirban Kundu has a name for his company's LLM challenge: the brownie recipe problem. It's not enough for a model to understand "I want to make brownies." The system needs to know what's in stock at the user's local market, whether they prefer organic eggs, what substitutes work if the first choice is unavailable, and whether ice cream will melt before delivery arrives.

All of this context must be processed in under a second. Any slower and users bail.

Chunking beats context overload

Instacart's solution: split processing into stages. A large foundation model handles intent and product categorization. Then specialized small language models tackle catalog context (which products work together, what substitutes make sense) and semantic understanding (what counts as a healthy snack for an 8-year-old).

This matters because loading a user's entire purchase history into a reasoning model creates an unmanageable bloat. The chunking approach keeps models focused and fast.

The catalog context layer handles Instacart's "over double digit" cases where products aren't available locally. The system must understand substitutions that work at multiple detail levels, then factor in logistics like delivery time for items that spoil quickly.

Microagents over monoliths

Instacart is experimenting with AI agents but found that a single agent handling multiple tasks becomes unwieldy. Instead, they're deploying microagents, each focused on specific tasks like payment systems or integrations with third-party point-of-sale platforms.

The company has integrated OpenAI's Model Context Protocol and Google's Universal Commerce Protocol to standardize connections between AI models and merchant systems. The Unix philosophy applies: smaller, focused tools beat monolithic systems.

The real work isn't integration. It's handling failure modes and latency. Different merchant systems behave differently, update at different intervals, and have varying reliability. Kundu's team spends two-thirds of their time fixing error cases, not building features.

What this means in practice

For enterprise teams building real-time AI systems, Instacart's architecture offers a blueprint: use foundation models for intent, specialized SLMs for domain context, and microagents for integration. The trade-off between context richness and response time is real. Loading more context improves accuracy but kills latency.

The brownie recipe problem isn't unique to grocery delivery. Any system juggling personalization, real-time inventory, and sub-second responses faces similar constraints. History suggests the answer isn't bigger models with infinite context windows. It's better chunking strategies and focused agents that know their lanes.

The brownie recipe problem

Chunking beats context overload

Microagents over monoliths

What this means in practice

Related Articles

Amazon's $50B OpenAI bet includes Alexa integration, Claude replacement

Axiom's AI solves four unsolved math problems, turns conjectures into theorems

Anthropic pledges Claude stays ad-free as OpenAI tests ChatGPT advertising