Agent Orchestration: When One Model, When a Crew

Summary

Multi-agent systems are seductive and often unnecessary. The pragmatic rules for choosing between a single model with tools and an orchestrated crew.

Context

Crews are not a default

The agent-of-agents pattern is having a moment. Conference talks demo crews of specialized agents collaborating to handle complex workflows, and the demos are genuinely impressive. The pattern is also responsible for a growing number of brittle, expensive systems that a single model with well-designed tools would have solved better.

The choice between one model and a crew is not stylistic. It is a function of task structure, latency budget, and the cost of a wrong answer. Teams that default to crews because crews look modern end up paying a substantial complexity tax with no offsetting capability gain. The right starting point is one model with a tool belt. Add agents only when a named failure justifies the cost.

When a crew is the right call

Four signals argue for a crew. The work involves long-horizon planning, where branching plans need to be evaluated mid-flight rather than chosen up front. The sub-tasks require heterogeneous reasoning that benefits from genuinely different model capabilities, such as code generation alongside vision alongside structured math. The output needs independent verification by a second agent applying criteria the first agent cannot self-evaluate. Or the work spans long enough that shared state and explicit memory matter more than what fits in a single prompt window.

When two or more of these signals are present, the cost of a crew is justified. When only one is present, the team should look hard at whether a single model with structured prompting can close the gap.

When a crew is the wrong call

Four signals argue against a crew. Sub-second latency is required, and every agent hop is a round-trip the user will feel. The workflow is linear, in which case a pipeline is the right architecture and agents are not a pipeline. Token cost is sensitive, and three agents at 20,000 tokens each cost more than one model at 30,000. Operational ownership is unclear, because crews multiply failure modes and an on-call owner per agent is rarely sustainable.

How to build a crew that does not drift

Crews work when role boundaries are sharp, communication is structured, and one agent owns the final answer. The failure mode is the loose collaborate-freely pattern, where agents talk to each other in natural language and the trace becomes uninterpretable after twenty steps. Use named roles, schema-typed messages between agents, and an explicit terminating agent that produces the user-facing output. Instrument every hop with token count, latency, and override rate. Production crews look more like a small distributed system than a conversation.

Closing

The right question is not whether to build a crew. The right question is whether the marginal value of one more agent exceeds the marginal cost of one more round-trip. For most enterprise workflows, it does not. Boring single-model architectures with tool belts do the work, and they do it cheaper, faster, and with audit trails the team can actually read.