The First Karpathy Loop for Production Coding Agents
Karpathy showed what happens when you let an AI agent run 700 experiments overnight. The model proposes hypotheses, runs them, scores results, keeps what works, throws away what doesn't. Repeat. Th...

Source: DEV Community
Karpathy showed what happens when you let an AI agent run 700 experiments overnight. The model proposes hypotheses, runs them, scores results, keeps what works, throws away what doesn't. Repeat. The part nobody talks about: how do you know which experiments actually mattered? I've been building with AI coding agents for months. Claude Code, Codex, Gemini CLI. The pattern is always the same: you give an agent a task, it runs, it produces output. Sometimes the output is good. Sometimes it's not. You squint at logs, compare diffs, make a judgment call. Move on. That loop works fine for single tasks. It breaks completely when you want the agent to iterate on its own work. The Problem Say you want an agent to optimize a function. Or fix a flaky test. Or refactor a module until it passes a quality gate. Without loops, you're doing this manually. Run the agent. Check the output. Run it again with different instructions. Check again. Copy paste the good parts. This is not what "autonomous" mea