Andrej Karpathy · YC AI Startup School · Jun 2025
Agentic Engineering
Agentic engineering designs the agentic loops where models plan, act, observe, and refine — and engineers the context that steers every turn of the loop.
Andrej Karpathy
“In industrial-strength LLM apps, filling the context window is a delicate art and science.”
LLM
= CPU
Context Window
= RAM
LLMs are a new kind of operating system. The context window is the model's working memory, where every token must be carefully placed.
- Science
- Task descriptions, few-shot examples, RAG, multimodal data, tools, state, and history
- Art
- Guiding intuition around LLM psychology and understanding human spirits
- Balance
- Too little context = poor performance; too much = high cost and performance degradation
Why agentic engineering, now
Pioneered by Tobi Lütke and Andrej Karpathy, the work moves from one-shot prompts to engineering the loops that let agents run real, industrial-strength tasks end to end.
Previously: Short task descriptions
→ Thick software layer
Complex systems coordinating LLM calls
Previously: One-shot instructions
→ Closed agentic loops
Plan → act → observe → refine on every step
Previously: Static, unchanging information
→ System prompt learning
LLMs learning by taking their own notes
On the “ChatGPT wrapper” framing, Karpathy is blunt: “This term is tired and really, really wrong.”
Software in the age of AI
The new software paradigm Karpathy defined at YC AI Startup School, 2025.
Classical programming
- Explicit instructions
- Deterministic behavior
- Human-written code
Neural network era
- Data-driven
- Learned behaviors
- Weight optimization
Agentic engineering era
- Agentic loops
- Multi-agent systems
- Dynamic adaptation
The “jagged intelligence” aside
LLMs can solve complex math problems yet fail at simple tasks. They are strong at complex reasoning, creative problem solving, and language understanding — weak at simple arithmetic, context drift, and consistency. That profile is exactly why the loop matters: observe, verify, and correct before the next step.
Filling the context window effectively. Distribute the token budget, then assemble it deliberately for every request.
Context budget
- System prompt
- 10–20%
- Examples
- 20–30%
- RAG content
- 30–40%
- History
- 10–20%
- Buffer
- 10%
Context Window Planning
Strategically distribute your token budget
System prompt (10-20%), Examples (20-30%), RAG content (30-40%), History (10-20%), Buffer (10%)
Dynamic Context Assembly
Create custom context for each request
Task analysis → Relevant retrieval → Priority sorting → Token optimization → Context injection
Cascading Context Strategy
Break down and chain complex tasks
Decompose large tasks into subtasks, use optimized context for each, merge results
Context Decay & Refresh
Clean old information, add new
Temporal relevance scoring, sliding window approach, importance-based retention
Multi-Agent Orchestration
Specialized agents with different contexts
Each agent has its own context, coordinator agent management, shared memory systems
The building blocks the loop reaches for at each step — retrieval, memory, tools, and compaction.
RAG (Retrieval-Augmented Generation)
Dynamic information retrieval enables LLMs to access current and accurate information through vector databases and semantic search.
State & History Management
Intelligent management of conversation history, user preferences, and application state. Critical for efficient context window usage.
Few-Shot Examples
Carefully selected examples for the task. Ensures LLMs produce output in the desired format and quality.
Tool Use & Function Calling
LLM interaction with external systems. Required for API calls, database queries, and computations.
Multimodal Context
Combining text, images, audio, and other data types. Critical for rich context creation.
Context Compaction
Maximum information density without exceeding token limits. Summarization, filtering, and prioritization techniques.
Watch what the step produced, name the failure mode, and apply the engineering response before the next turn.
The loop's output: systems that improve as they run, with autonomy you can dial up over time.
Code Generation Systems
Systems like GitHub Copilot and Cursor use agentic engineering to understand entire codebases and generate consistent code.
- Understanding project structure
- Maintaining code style
- Import and dependency management
Enterprise AI Assistants
Corporate AI assistants use agentic engineering to understand organizational knowledge and processes.
Autonomous Agents
Systems like AutoGPT run long-horizon agentic loops to execute multi-step tasks independently.
Educational Systems
Personalized learning platforms use student context to provide adaptive learning experiences.
Autonomy
A continuous spectrum, not a switch. Users dial the autonomy level of a system from full manual control to fully autonomous operation.
Self-improving systems
System prompt learning lets LLMs learn from their own experience. Each interaction becomes a data point that improves the system's context strategy — mechanisms similar to the way humans take notes and learn.
AI-native architecture
Systems designed from the ground up for agents. Human interfaces become secondary as API- and context-first approaches take priority: build for agents, adapt for humans.
What the loop changes
No vanity metrics — just the mechanisms that move the work.
- Grounding
- RAG and validation gates reduce hallucination.
- Compaction
- Summarization and prioritization tighten token budgets.
- Closed loops
- Observe-and-correct steps enable autonomous, long-horizon throughput.
In practice: Shopify
Led by CEO Tobi Lütke, Shopify Magic and Sidekick apply agentic engineering principles to provide AI support to millions of merchants — grounding responses in store context and product catalogs, analyzing merchant behavior history, and injecting e-commerce best practices into the loop.