The co-host gets a French accent this week, courtesy of Mistral Large 3—a granular mixture-of-experts model from the European lab that keeps punching above its weight. But the real subject is the harness: the scaffolding that turns a language model into something that can act. Mark and the co-host dig into the “sandwich architecture” of voice agents (speech-to-text → LLM → text-to-speech), why it makes conversations feel like tennis matches, and the “criminally overlooked” practice of evals. A UC Berkeley paper provides the reality check: 68% of deployed agents need human intervention within ten steps, 70% use off-the-shelf models, and 74% depend on human evaluation. The hype says autonomous agents are coming. The data says we’re still building harnesses.
News & Culture
Sahil Lavingia on X: “Harness is the new app”
Christopher Alexander’s A Pattern Language



