Sakana Fugu vs. Medley: Two Approaches to Multi-Model Orchestration
A fair, detailed comparison of black-box API routing vs. transparent local orchestration
Two Tools, One Problem, Very Different Philosophies
Multi-model orchestration has become one of the most contested categories in AI tooling. The core insight — that routing tasks to the right model for each role produces better results than defaulting to a single frontier model — is now widely accepted. The question is how you implement it.
Sakana Fugu and Medley both answer that question, but they answer it in fundamentally different ways. Fugu is a managed cloud API: you send a request, the system routes it across a pool of frontier models using its TRINITY architecture, and you get a result. Medley is a local desktop application: you describe a goal, the system decomposes it into a visible execution plan, assigns sub-tasks to agents, and learns from every decision you make along the way.
This is not a case where one tool is obviously better. They reflect different priorities, different trust models, and different assumptions about what developers actually need. This comparison is designed to help you figure out which one fits your situation.
What Fugu Does — and Does Well
Fugu is Sakana AI's multi-model orchestration product. It exposes a single OpenAI-compatible API endpoint that dynamically routes requests across a pool of frontier models using two core mechanisms:
TRINITY assigns roles to models: a Thinker reasons about the problem, a Worker executes the task, and a Verifier checks the output. Different models can fill different roles depending on the task.
Conductor is a reinforcement-learning-based routing layer that decides which model handles which role, optimizing for task performance.
The benchmark results on coding tasks are strong. Fugu Ultra scores 73.7 on SWE-Bench Pro — ahead of GPT-5.5 at 58.6 and competitive with Claude Opus 4.8 at 69.2. On LiveCodeBench it reaches 93.2. On TerminalBench 2.1 it scores 82.1. These are real numbers on real benchmarks, and they reflect a genuine architectural advantage for software engineering workloads.
For teams that want a drop-in replacement for their existing OpenAI-compatible API calls — with automatic model selection and no changes to their integration layer — Fugu's design is clean and coherent. You get better results without changing your code.
Where Fugu has real limitations: it is blocked in the EU/EEA with no confirmed access timeline, access outside that region is still application-gated as of June 2026, it is a fully managed cloud API with no local option and no BYOK, and the routing is a black box — you can opt out of specific providers, but you cannot see which model handled which sub-task, why, or what it cost per task.
What Medley Does — and Does Well
Medley is a desktop application for managing long-running AI agent work. You describe a goal — a "mission" — and Medley decomposes it into a directed acyclic graph (DAG) of sub-tasks, assigns each sub-task to the right agent (Claude Code, Codex, others) with the right model tier, context, and tools, and executes the plan.
The key architectural difference from Fugu is visibility. Every sub-task in the DAG is visible, labeled, and editable before and during execution. You can see which agent is handling which step, what context it has, and what it costs. You can interrupt the plan, redirect a sub-task, or change the agent assignment mid-run.
Medley also introduces a trust-compounding loop that has no equivalent in Fugu. Every approval or rejection you make is recorded with context and learned from. Over time, Medley builds a model of your preferences and applies them automatically — surfacing only the decisions that are genuinely novel. In week one you approve most things; by week six the system handles routine decisions on its own.
Medley runs locally on your machine. You bring your own API keys. Your data does not pass through a managed cloud. There is no waitlist, no regional restriction, and no application process — it is a free download available globally.
On benchmarks, Medley is positioned shoulder-to-shoulder with leading models like Fable and Mythos across engineering, scientific, and reasoning benchmarks — because it routes each sub-task to the right-sized model rather than defaulting to the most expensive frontier model for everything.
The Core Architectural Difference
The fundamental divide between Fugu and Medley is not about benchmark scores or feature lists. It is about where the intelligence lives and who controls it.
Fugu's model: Intelligence lives in the cloud. The Conductor routing layer makes decisions you cannot see, using a model pool you cannot inspect, at a cost you cannot attribute per sub-task. You get results. You do not get visibility.
Medley's model: Intelligence is shared between the system and the user. The DAG is visible and editable. Every decision is logged. The system learns from your input and applies it forward. You get results and you get the reasoning behind them.
Neither model is wrong. But they are suited to very different users and very different workloads.
Five-Dimension Comparison
1. Access and Availability
Fugu is blocked in the EU/EEA with no confirmed timeline for access. Outside that region, access is application-gated — you apply, wait for approval, and receive an API key. Self-serve general availability has not been clearly announced as of June 2026. The rollout is Japan-first, which means non-Japanese developers face slower access and less community support.
Medley is available globally, immediately, as a free download. No application, no waitlist, no regional restriction.
2. Routing Transparency
Fugu routes requests through a black-box API. You can opt out of specific model providers, but you cannot see which model handled which sub-task, why the Conductor made that routing decision, or what each step cost. When something goes wrong, you have a result and no execution trace.
Medley shows you the full execution plan as a DAG before and during execution. Every sub-task is labeled with its agent assignment, context, and cost. You can inspect, edit, and interrupt at any point. When something goes wrong, you have a complete audit trail.
3. Local vs. Cloud
Fugu is a fully managed cloud API. There is no local option, no self-hosted deployment, and no way to run it on your own infrastructure. Your requests and data pass through Sakana AI's managed environment.
Medley runs on your machine. Your data stays local. You control the infrastructure. For teams with data residency requirements, compliance constraints, or simply a preference for owning their stack, this is a meaningful difference.
4. Cost Model
Fugu bills per token. When multiple agents run in parallel, billing follows multi-agent stacking rules that are not clearly documented for all scenarios. You receive a result and a token count; you do not receive a per-sub-task cost breakdown.
Medley shows cost per completed task and per sub-task. Because you bring your own API keys, you pay your negotiated rates directly to the model providers. There is no markup, no stacking ambiguity, and no opaque billing layer.
5. Autonomy Over Time
Fugu has no mechanism for learning from your decisions. Every request is treated independently. The system does not build a model of your preferences, your team's standards, or your project's constraints.
Medley's decision memory records every approval and rejection with context and applies it forward. The system gets better at predicting your preferences over time, reducing the number of decisions you need to make manually. This trust-compounding loop is one of Medley's most distinctive architectural features.
Full Feature Comparison
| Feature | Sakana Fugu | Medley |
|---|---|---|
| EU/EEA access | Blocked | Available |
| Access model | Application-gated beta | Free download |
| Local execution | No | Yes |
| BYOK | No | Yes |
| Visible execution plan | No | Yes (editable DAG) |
| Per-sub-task cost | No | Yes |
| Decision memory | No | Yes |
| Interrupt / redirect mid-run | No | Yes |
| OpenAI-compatible API | Yes | Routes to compatible models |
| Coding benchmark strength | Very strong (SWE-Bench 73.7) | Competitive across domains |
| Cross-domain missions | Limited | Yes |
| Japan-first rollout | Yes | No |
| Pricing transparency | Per-token, stacking rules unclear | Per-task, direct to provider |
Who Should Use Fugu vs. Who Should Use Medley
Fugu is the better fit if:
- You are outside the EU/EEA and can get beta access
- You want a drop-in OpenAI-compatible endpoint with no integration changes
- Your workload is primarily software engineering tasks where Fugu's benchmarks are strongest
- You want automatic model selection without managing the routing yourself
- You are comfortable with black-box routing and do not need per-sub-task visibility
Medley is the better fit if:
- You are in the EU/EEA or cannot get Fugu beta access
- You want to see and control the execution plan
- You need BYOK or local execution for compliance, cost, or infrastructure reasons
- Your workload spans multiple domains — research, writing, analysis, code — not just coding
- You want a system that learns from your decisions and gets better over time
- You need per-sub-task cost attribution for budgeting or billing
The Bottom Line
Fugu and Medley are not competing for the same user. Fugu is for teams that want a clean, powerful API endpoint and are willing to trade visibility for simplicity. Medley is for builders who want to see inside the machine — who need to control costs, audit decisions, and build a system that compounds their judgment over time.
If you want to start today, globally, with full visibility and your own keys, medley.sh is where to go.