/wunsch/log
Human in the Loop
S01E03: Harness
0:00
-49:47

S01E03: Harness

What turns a model into something useful

The co-host gets a French accent this week, courtesy of Mistral Large 3—a granular mixture-of-experts model from the European lab that keeps punching above its weight. But the real subject is the harness: the scaffolding that turns a language model into something that can act. Mark and the co-host dig into the “sandwich architecture” of voice agents (speech-to-text → LLM → text-to-speech), why it makes conversations feel like tennis matches, and the “criminally overlooked” practice of evals. A UC Berkeley paper provides the reality check: 68% of deployed agents need human intervention within ten steps, 70% use off-the-shelf models, and 74% depend on human evaluation. The hype says autonomous agents are coming. The data says we’re still building harnesses.

News & Culture

Models, Tools, & Platforms

Concepts & Research

Human in the Loop is a weekly conversation about AI with an AI co-host. Subscribe to get new episodes and join the discussion.

Discussion about this episode

User's avatar

Ready for more?