Medley vs. Conductor: Which AI Agent Orchestrator Is Right for You?
A Honest Comparison for Builders Running Parallel Coding Agents
If you’re running multiple AI coding agents on a Mac and wondering whether Conductor or Medley is the better fit, you’re asking the right question — and the honest answer is: it depends on what you’re trying to do.
Conductor, built by Melty Labs, is a genuinely excellent tool for a specific job. Medley is designed for a different (and broader) job. This article breaks down both tools fairly so you can make the right call for your workflow.
What Conductor Does Well
Conductor’s core insight is simple and powerful: parallel coding agents need isolated workspaces. Its worktree-based architecture spins up separate Git worktrees for each agent session, so agents don’t step on each other’s changes. The review and merge UX is polished — you can see diffs, compare outputs, and merge the best result without leaving the app.
For a solo developer or small team running pure coding tasks — “write this feature,” “fix this bug,” “refactor this module” — Conductor delivers a clean, Mac-native experience that gets out of your way.
Where Conductor shines:
- Parallel coding agents with true worktree isolation
- Clean diff/review/merge UX for comparing agent outputs
- Mac-native polish and performance
- Low setup friction for pure coding workflows
If your entire workflow lives inside a single codebase and you want to run several coding agents in parallel and pick the best result, Conductor is a strong choice.
Where Conductor Stops
Conductor’s strength is also its boundary. The tool is built around the assumption that you are the planner. You decide what tasks to run, which agents to assign them to, and how to sequence the work. Conductor executes in parallel — it doesn’t decompose.
That’s fine when your work is a list of discrete coding tasks. It becomes a bottleneck when:
- Your mission spans more than code (you need research, a GTM doc, a changelog, and a deployment — not just a diff)
- You’re running agents across multiple projects simultaneously and need one place to see what needs your attention
- You want the system to learn from your past approvals so you’re not reviewing the same class of decision every week
- You’re spending real money on model calls and have no visibility into cost per outcome
Conductor also doesn’t route across models. If you want to use Claude Code for one sub-task and Codex for another based on cost or capability, you’re doing that manually. There’s no memory layer that compounds your approvals into autonomy over time.
What Medley Does Differently
Medley’s starting point is the mission, not the task list. You describe what you’re trying to accomplish — “launch the v2 feature, update the docs, and draft the announcement post” — and Medley decomposes that into a visible, editable DAG (directed acyclic graph) of sub-tasks. Each sub-task gets routed to the right agent (Claude Code, Codex, others) with the right model tier, context, and tools.
This decomposition-first approach changes the nature of the work you’re doing. Instead of managing sessions, you’re managing outcomes.
Mission Decomposition: The Core Difference
The biggest architectural difference between Medley and Conductor is who does the decomposition.
In Conductor, you do it. You decide what to run in parallel, what to sequence, what context each agent needs. That’s fine for experienced developers with a clear task list — but it’s cognitive overhead that compounds across projects.
In Medley, the system does it. You describe the mission; Medley produces the DAG. You can edit it, but you don’t have to build it from scratch. For complex, multi-domain missions — the kind that span code, content, and coordination — this is a meaningful difference.
The Attention Queue
One of Medley’s most underrated features is its cross-project attention queue. Instead of context-switching between four terminal tabs or three Conductor sessions to check what needs your input, Medley surfaces one prioritized queue of decisions that actually require a human. Everything else keeps moving.
For solo founders and small teams running multiple projects in parallel, this is the difference between feeling in control and feeling like you’re always catching up.
Earned Autonomy and Decision Memory
Every time you approve or reject an agent decision in Medley, that judgment is recorded with context and learned from. In week one, you’re approving most things. By week six, the system has internalized your preferences and is auto-applying them to routine decisions — surfacing only the genuinely novel ones.
Conductor has no equivalent. Each session starts fresh. Your accumulated judgment doesn’t compound into anything.
Model-Agnostic Routing
Medley routes sub-tasks across Claude Code, Codex, and whatever models come next — optimizing for cost, latency, or multiple perspectives depending on the task. You bring your own keys (BYOK), and there’s no vendor lock-in baked into the architecture.
Conductor doesn’t route across models. It runs whatever agent you point it at.
Head-to-Head: Medley vs. Conductor
| Capability | Conductor | Medley |
|---|---|---|
| Parallel agent execution | ✅ Worktree isolation | ✅ DAG-based sub-tasks |
| Mission decomposition | ❌ You do it | ✅ System produces DAG |
| Cross-domain missions (code + content + GTM) | ❌ Code-only | ✅ Multi-domain |
| Model-agnostic routing | ❌ Manual | ✅ Automatic |
| Cross-project attention queue | ❌ | ✅ |
| Decision memory / earned autonomy | ❌ | ✅ |
| Diff/review/merge UX | ✅ Polished | ✅ Project-native output |
| Mac-native polish | ✅ | ✅ |
| BYOK / local | ✅ | ✅ Free, local download |
| Cost visibility | ❌ | ✅ Per-sub-task routing |
Which One Is Right for You?
Choose Conductor if:
- Your work is primarily or exclusively coding tasks
- You want worktree isolation and a polished diff/merge UX
- You have a clear task list and want to parallelize execution
- You prefer a minimal, focused tool with low setup overhead
Choose Medley if:
- Your missions span code, content, research, and coordination
- You’re running multiple projects simultaneously and need one attention queue
- You want the system to learn from your decisions and compound into autonomy
- You want model-agnostic routing without vendor lock-in
- You’re spending real money on agent calls and want visibility into cost per outcome
The Honest Take
Conductor is not a lesser tool — it’s a different tool. If you’re a developer who wants to run three coding agents in parallel, compare their outputs, and merge the best one, Conductor is excellent at that job. It’s focused, polished, and fast.
Medley is for the builder who has outgrown the session model entirely. When your work is too complex to fit in a task list, when you’re running agents across code and non-code domains, when you want the system to get smarter about your preferences over time — that’s when Medley’s architecture starts to pay off.
The Bigger Picture
The shift happening in developer tooling right now isn’t just about running more agents in parallel. It’s about moving from session management to work management. The question isn’t “how do I run four Claude Code sessions at once?” — it’s “how do I describe what I’m trying to accomplish and trust the system to figure out the rest?”
Conductor answers the first question well. Medley is built around the second.
If you’re at the point where managing your agents has become its own job, it might be worth taking a look at Medley — it’s a free local download, BYOK, and the decomposition is the product, not your problem.