Sakana Fugu vs. Medley: Two Approaches to Multi-Model Orchestration

A fair, detailed comparison of black-box API routing vs. transparent local orchestration

Two Tools, One Problem, Very Different Philosophies

Multi-model orchestration has become one of the most contested categories in AI tooling. The core insight — that routing tasks to the right model for each role produces better results than defaulting to a single frontier model — is now widely accepted. The question is how you implement it.

Sakana Fugu and Medley both answer that question, but they answer it in fundamentally different ways. Fugu is a managed cloud API: you send a request, the system routes it across a pool of frontier models using its TRINITY architecture, and you get a result. Medley is a local desktop application: you describe a goal, the system decomposes it into a visible execution plan, assigns sub-tasks to agents, and learns from every decision you make along the way.

This is not a case where one tool is obviously better. They reflect different priorities, different trust models, and different assumptions about what developers actually need. This comparison is designed to help you figure out which one fits your situation.

What Fugu Does — and Does Well

Fugu is Sakana AI's multi-model orchestration product. It exposes a single OpenAI-compatible API endpoint that dynamically routes requests across a pool of frontier models using two core mechanisms:

TRINITY assigns roles to models: a Thinker reasons about the problem, a Worker executes the task, and a Verifier checks the output. Different models can fill different roles depending on the task.

Conductor is a reinforcement-learning-based routing layer that decides which model handles which role, optimizing for task performance.

The benchmark results on coding tasks are strong. Fugu Ultra scores 73.7 on SWE-Bench Pro — ahead of GPT-5.5 at 58.6 and competitive with Claude Opus 4.8 at 69.2. On LiveCodeBench it reaches 93.2. On TerminalBench 2.1 it scores 82.1. These are real numbers on real benchmarks, and they reflect a genuine architectural advantage for software engineering workloads.

For teams that want a drop-in replacement for their existing OpenAI-compatible API calls — with automatic model selection and no changes to their integration layer — Fugu's design is clean and coherent. You get better results without changing your code.

Where Fugu has real limitations: it is blocked in the EU/EEA with no confirmed access timeline, access outside that region is still application-gated as of June 2026, it is a fully managed cloud API with no local option and no BYOK, and the routing is a black box — you can opt out of specific providers, but you cannot see which model handled which sub-task, why, or what it cost per task.

What Medley Does — and Does Well

Medley is a desktop application for managing long-running AI agent work. You describe a goal — a "mission" — and Medley decomposes it into a directed acyclic graph (DAG) of sub-tasks, assigns each sub-task to the right agent (Claude Code, Codex, others) with the right model tier, context, and tools, and executes the plan.

The key architectural difference from Fugu is visibility. Every sub-task in the DAG is visible, labeled, and editable before and during execution. You can see which agent is handling which step, what context it has, and what it costs. You can interrupt the plan, redirect a sub-task, or change the agent assignment mid-run.

Medley also introduces a trust-compounding loop that has no equivalent in Fugu. Every approval or rejection you make is recorded with context and learned from. Over time, Medley builds a model of your preferences and applies them automatically — surfacing only the decisions that are genuinely novel. In week one you approve most things; by week six the system handles routine decisions on its own.

Medley runs locally on your machine. You bring your own API keys. Your data does not pass through a managed cloud. There is no waitlist, no regional restriction, and no application process — it is a free download available globally.

On benchmarks, Medley is positioned shoulder-to-shoulder with leading models like Fable and Mythos across engineering, scientific, and reasoning benchmarks — because it routes each sub-task to the right-sized model rather than defaulting to the most expensive frontier model for everything.

The Core Architectural Difference

The fundamental divide between Fugu and Medley is not about benchmark scores or feature lists. It is about where the intelligence lives and who controls it.

Fugu's model: Intelligence lives in the cloud. The Conductor routing layer makes decisions you cannot see, using a model pool you cannot inspect, at a cost you cannot attribute per sub-task. You get results. You do not get visibility.

Medley's model: Intelligence is shared between the system and the user. The DAG is visible and editable. Every decision is logged. The system learns from your input and applies it forward. You get results and you get the reasoning behind them.

Neither model is wrong. But they are suited to very different users and very different workloads.

Five-Dimension Comparison

1. Access and Availability

Fugu is blocked in the EU/EEA with no confirmed timeline for access. Outside that region, access is application-gated — you apply, wait for approval, and receive an API key. Self-serve general availability has not been clearly announced as of June 2026. The rollout is Japan-first, which means non-Japanese developers face slower access and less community support.

Medley is available globally, immediately, as a free download. No application, no waitlist, no regional restriction.

2. Routing Transparency

Fugu routes requests through a black-box API. You can opt out of specific model providers, but you cannot see which model handled which sub-task, why the Conductor made that routing decision, or what each step cost. When something goes wrong, you have a result and no execution trace.

Medley shows you the full execution plan as a DAG before and during execution. Every sub-task is labeled with its agent assignment, context, and cost. You can inspect, edit, and interrupt at any point. When something goes wrong, you have a complete audit trail.

3. Local vs. Cloud

Fugu is a fully managed cloud API. There is no local option, no self-hosted deployment, and no way to run it on your own infrastructure. Your requests and data pass through Sakana AI's managed environment.

Medley runs on your machine. Your data stays local. You control the infrastructure. For teams with data residency requirements, compliance constraints, or simply a preference for owning their stack, this is a meaningful difference.

4. Cost Model

Fugu bills per token. When multiple agents run in parallel, billing follows multi-agent stacking rules that are not clearly documented for all scenarios. You receive a result and a token count; you do not receive a per-sub-task cost breakdown.

Medley shows cost per completed task and per sub-task. Because you bring your own API keys, you pay your negotiated rates directly to the model providers. There is no markup, no stacking ambiguity, and no opaque billing layer.

5. Autonomy Over Time

Fugu has no mechanism for learning from your decisions. Every request is treated independently. The system does not build a model of your preferences, your team's standards, or your project's constraints.

Medley's decision memory records every approval and rejection with context and applies it forward. The system gets better at predicting your preferences over time, reducing the number of decisions you need to make manually. This trust-compounding loop is one of Medley's most distinctive architectural features.

Full Feature Comparison

Feature Sakana Fugu Medley
EU/EEA access Blocked Available
Access model Application-gated beta Free download
Local execution No Yes
BYOK No Yes
Visible execution plan No Yes (editable DAG)
Per-sub-task cost No Yes
Decision memory No Yes
Interrupt / redirect mid-run No Yes
OpenAI-compatible API Yes Routes to compatible models
Coding benchmark strength Very strong (SWE-Bench 73.7) Competitive across domains
Cross-domain missions Limited Yes
Japan-first rollout Yes No
Pricing transparency Per-token, stacking rules unclear Per-task, direct to provider

Who Should Use Fugu vs. Who Should Use Medley

Fugu is the better fit if:

  • You are outside the EU/EEA and can get beta access
  • You want a drop-in OpenAI-compatible endpoint with no integration changes
  • Your workload is primarily software engineering tasks where Fugu's benchmarks are strongest
  • You want automatic model selection without managing the routing yourself
  • You are comfortable with black-box routing and do not need per-sub-task visibility

Medley is the better fit if:

  • You are in the EU/EEA or cannot get Fugu beta access
  • You want to see and control the execution plan
  • You need BYOK or local execution for compliance, cost, or infrastructure reasons
  • Your workload spans multiple domains — research, writing, analysis, code — not just coding
  • You want a system that learns from your decisions and gets better over time
  • You need per-sub-task cost attribution for budgeting or billing

The Bottom Line

Fugu and Medley are not competing for the same user. Fugu is for teams that want a clean, powerful API endpoint and are willing to trade visibility for simplicity. Medley is for builders who want to see inside the machine — who need to control costs, audit decisions, and build a system that compounds their judgment over time.

If you want to start today, globally, with full visibility and your own keys, medley.sh is where to go.