Building Custom ML Agent Orchestration: From Theory to Pi Coding Agent

I wanted to build something like oh-my-opencode, but fully customizable. My own agent orchestration system where I control the routing, the models, the skills—everything.

So I researched. I dug through LangGraph's architecture. I traced through oh-my-opencode's source code line by line. Then I implemented it all in pi-coding-agent.

What I found changed how I think about agent systems. The intelligence isn't in the code. It's in the prompts. And the real power? Model routing.

The Research

I started by launching parallel research agents—my usual approach for unfamiliar territory. One searched framework patterns (LangGraph, CrewAI, AutoGen). One traced through oh-my-opencode's TypeScript source.

The findings converged on a surprising insight: the most successful agent implementations use simple, composable patterns—not complex frameworks.

From my research:

"Start simple, add complexity only when measurable improvement is demonstrated. Frameworks create abstraction layers that obscure control flow and hinder debugging."

The frameworks I initially considered—LangGraph, CrewAI, AutoGen—are useful for their primitives. But the orchestration logic? That's just configuration and prompts.

The Core Pattern: Agents as Configuration

Here's the key insight from oh-my-opencode: an agent is just a configuration object.

interface AgentConfig {
  description: string; // Short description for routing
  model: string; // e.g., "openai/gpt-4o"
  temperature?: number;
  tools?: { write?: boolean }; // Tool restrictions
  prompt: string; // The agent's "brain"
}

That's it. The "Oracle" agent isn't special code—it's a read-only model with a specific system prompt. The "Explore" agent isn't magic—it's a fast model restricted to search tools with instructions to fire 3+ tools in parallel.

The intelligence comes from the prompt, not the implementation.

Model Selection as Architecture Decision

This is the insight that changed everything for me: different tasks need different models.

Not "use the best model for everything." Not "use the cheapest model for everything." Use the right model for the task at hand.

Here's my routing table:

Task Complexity	Model	Why
Trivial (typos, simple fixes)	Gemini 3 Flash	Speed matters, quality doesn't
Medium (standard implementation)	Claude Sonnet 4.6	Good balance of speed and quality
Complex (multi-file, tricky logic)	Claude Opus 4.6	Need best reasoning
Expert (novel problems, research)	GPT-5.4	Maximum capability

Why This Matters

1. Cost control without quality sacrifice

Running everything through Opus 4.6 would cost 30x more than routing trivial tasks to Flash. For a typo fix, Flash is better—it's faster, and the quality difference is irrelevant.

2. Speed where it matters

Exploration agents fire constantly during my workflow. If each one took 30 seconds (Opus) instead of 3 seconds (Flash), I'd stop using them. Fast models enable high-frequency delegation.

3. Quality where it matters

When I'm debugging a race condition after two failed attempts, I don't want Flash. I want the best reasoning model available, cost be damned.

Implementing It in Pi Coding Agent

Armed with this understanding, I configured pi-coding-agent (v0.52.7) to implement these patterns. Everything lives in ~/.pi/agent/:

~/.pi/agent/
├── settings.json          # Main config
├── orchestrator.json      # Model routing & sub-agents
├── keybindings.json       # Custom shortcuts
├── AGENTS.md              # Global context
├── extensions/
│   ├── orchestrator.ts    # Orchestration logic
│   └── permission-gates.ts
├── skills/
│   ├── handoff/SKILL.md
│   ├── code-review/SKILL.md
│   └── ...
└── prompts/
    ├── commit-message.md
    └── pr-description.md

The Model Routing Config

In orchestrator.json, I implemented the routing table:

{
  "modelRouting": {
    "trivial": {
      "provider": "google-antigravity",
      "model": "gemini-3-flash"
    },
    "simple": {
      "provider": "google-antigravity",
      "model": "gemini-3-flash"
    },
    "medium": {
      "provider": "github-copilot",
      "model": "claude-sonnet-4.6"
    },
    "complex": {
      "provider": "github-copilot",
      "model": "claude-opus-4.6"
    },
    "expert": {
      "provider": "google-antigravity",
      "model": "gemini-3-pro-high"
    }
  },
  "complexitySignals": {
    "trivial": ["^what is", "^where is", "^show me", "typo"],
    "simple": ["rename", "format", "lint", "add comment"],
    "medium": ["add function", "fix bug", "refactor"],
    "complex": ["architect", "migrate", "security review"],
    "expert": ["race condition", "memory leak", "distributed"]
  }
}

The orchestrator extension classifies prompts by matching against these signals, then routes to the appropriate model.

Sub-Agents

I configured specialized sub-agents that trigger on specific patterns:

{
  "subAgents": {
    "reviewer": {
      "enabled": true,
      "thinkingLevel": "medium",
      "triggers": ["review", "check.*code", "pr\\s+review"],
      "systemPrompt": "You are a code review specialist. Focus on logic correctness, security vulnerabilities, performance issues, type safety, maintainability..."
    },
    "debugger": {
      "enabled": true,
      "thinkingLevel": "high",
      "triggers": ["debug", "fix.*bug", "not\\s+working"],
      "systemPrompt": "You are a debugging expert. Follow: Reproduce → Isolate → Hypothesize → Test → Fix..."
    },
    "explainer": {
      "enabled": true,
      "provider": "google-antigravity",
      "model": "gemini-3-flash",
      "triggers": ["explain", "how.*work", "what.*does"]
    }
  }
}

The explainer uses cheap Gemini Flash—perfect for "what does this do?" questions that don't need heavy reasoning.

Four Orchestration Modes

The extension registers keyboard shortcuts for switching modes:

Shortcut	Mode	Behavior
`Ctrl+Shift+1`	Direct	No orchestration, use default model
`Ctrl+Shift+2`	Delegate	Route to sub-agents based on triggers
`Ctrl+Shift+3`	Plan	Step-by-step execution with templates
`Ctrl+Shift+4`	Auto	Smart routing based on complexity signals
`Ctrl+Shift+0`	Status	Show current mode and routing

Skills as Prompt Injection

Skills are markdown files that get injected into an agent's system prompt. They need YAML front-matter:

---
name: handoff
description: Generate a structured handoff summary to continue work in a new session
---

# Handoff

Generate a structured handoff summary...

When the agent's task matches a skill's description, the content gets appended to the system prompt. The agent now "knows" that expertise.

The power is in composition—you can load multiple skills for a single task:

delegate_task({
  category: "visual-engineering",
  load_skills: ["frontend-ui-ux", "playwright"],
  prompt: "Build the signup form and test it",
});
// → model with design expertise AND browser testing knowledge

Permission Gates

The permission-gates.ts extension blocks dangerous commands:

const DANGEROUS_PATTERNS = [
  /\brm\s+-rf?\b/,
  /\bgit\s+reset\s+--hard\b/,
  /\bgit\s+push\s+.*--force\b/,
  /\bDROP\s+(TABLE|DATABASE)\b/i,
];

pi.on("tool_call", async (event, ctx) => {
  if (isToolCallEventType("bash", event)) {
    const command = event.input.command;
    if (DANGEROUS_PATTERNS.some((p) => p.test(command))) {
      const ok = await ctx.ui.confirm(
        "Destructive command",
        `Allow: ${command}?`,
      );
      if (!ok) return { block: true, reason: "Blocked by user" };
    }
  }
});

The Result

Now when I start pi:

[Skills]
  user
    ~/.pi/agent/skills/handoff/SKILL.md
    ~/.pi/agent/skills/code-review/SKILL.md
    ~/.pi/agent/skills/debug/SKILL.md
    ...

[Extensions]
  user
    ~/.pi/agent/extensions/orchestrator.ts
    ~/.pi/agent/extensions/permission-gates.ts

Simple questions route to Gemini Flash. Complex architecture questions go to Opus. Destructive commands require confirmation. And I can switch modes with keyboard shortcuts.

The Meta-Lesson

The hardest part of this research wasn't understanding the code. It was accepting that agents are simpler than they look.

I expected complex state machines. I found configuration objects.

I expected sophisticated routing algorithms. I found lookup tables.

I expected magic. I found prompts.

The sophistication is in two places:

Prompt engineering — what the agent knows and how it behaves
Model routing — matching task complexity to model capability

Once you accept that, building your own orchestrator becomes tractable.

Add a new agent? Write a markdown prompt. Add a new category? Map a string to a model. Add a new skill? Write expertise as markdown. Route to the right model? Check a lookup table.

The code just wires it together.

Resources

pi-coding-agent — The tool I configured
oh-my-opencode — Real-world orchestrator that inspired this

The whole thing took a a few hours of trial and error, mostly fighting with the keybindings format and finding the right model names. Worth it for the multi-model routing alone.

pi-coding-agent is by @badlogic. The orchestration pattern is inspired by how I think about delegation—cheap and fast for simple things, powerful and slow for hard things—and partly by oh-my-opencode's approach to agent orchestration and sub-agent delegation.