Orchestrating AI Agents with Claude Code CLI

Most developers treat Claude Code as an interactive coding assistant. But underneath the conversational surface sits a fully programmable agent runtime — one that supports six distinct orchestration architectures, each suited to a different class of problem.

After spending significant time integrating Claude Code into enterprise pipelines — spanning CI/CD automation, multi-domain service reviews, and regulated deployment gates — I have mapped these patterns against real production constraints. This article is the distillation.

We will cover the six patterns, when to reach for each, and — critically for architects working in regulated environments — a direct comparison of running Claude Code as a headless CLI versus using the Agent SDK for fine-grained programmatic control.

Why Orchestration Pattern Matters

The most common failure mode I see is not a model capability problem — it is a structural mismatch. Developers reach for the most powerful pattern (agent teams, full autonomy) when the problem is linear. Or they use a single sequential agent for a parallelisable workload and wonder why it is slow and context-bloated.

"The culprit is almost never the model. It is using the wrong workflow pattern for the job."

Claude Code exposes six structural patterns. Think of them as a spectrum from lowest autonomy / lowest token cost to highest autonomy / highest token cost. Choosing correctly saves 40–60% on token spend and dramatically improves reliability.

The Six Orchestration Patterns

PATTERN 01 · LOWEST COST

Sequential Pipeline

One agent, ordered steps, each dependent on the prior output. The default mental model. Minimal overhead — no agent spawning, no context forking.

--continue Multi-turn CI reviews

PATTERN 02 · AUTOMATION CORE

Headless / Batch

Non-interactive, print-and-exit via -p. Combines with --bare for deterministic, ambient-config-free runs. The backbone of CI/CD integration.

-p flag --bare GitHub Actions

PATTERN 03 · RECOMMENDED DEFAULT

Operator / Orchestrator

One Opus/Sonnet orchestrator coordinates Haiku subagents for execution. Hierarchical planning-execution split. Typical 40–50% cost saving versus all-Sonnet.

--model opus Subagents Task delegation

PATTERN 04 · PARALLEL SPEED

Split-and-Merge

Independent subtasks run concurrently across worktrees or spawned agents; results merge at end. Linear speedup on parallelisable work. Higher token cost.

--worktree Parallel Fan-out

PATTERN 05 · MAX CAPABILITY

Agent Teams

Peer Claude instances — researcher, writer, reviewer — each with defined roles, tools, and model assignments. Collaborative, not hierarchical. Highest overhead.

--agents JSON .claude/agents/ Multi-role

PATTERN 06 · ISOLATION

Worktree Isolation

Each agent or task gets its own git worktree — full branch isolation, zero merge conflicts. Auto-cleaned if unchanged. Used internally by /batch.

--worktree isolation: worktree Parallel PRs

⚡

Patterns compose. A production workflow might use an Operator orchestrator at the top level, which triggers a Split-and-Merge phase for parallel domain analysis, followed by a Sequential pipeline for report generation, with a final Agent Team review before the PR gate. You are not locked into one pattern per run.

Deep Dive: The Operator Pattern

The Operator pattern is the workhorse for complex, multi-domain tasks. It separates strategic reasoning (orchestrator) from tactical execution (subagents), mirroring how senior engineers actually delegate — the lead architect defines the approach; specialists implement it.

Orchestrator · claude-opus-4-6

↓ delegates tasks via Task tool

security-reviewer
haiku · Read, Grep

db-expert
sonnet · Bash, Read

observability-eng
haiku · Read, Grep

↓ results consolidate back to orchestrator

Bash

Read

Edit

Grep

MCP tools

The key architectural insight: exploration tokens should never pollute the orchestrator's context window. When you instruct the orchestrator to "use an explore agent to find all authentication-related files," those file reads happen in a subagent context that is discarded after use. The orchestrator only receives the distilled result.

Defining subagents

Subagents live in .claude/agents/ (project-scoped, team-shared) or ~/.claude/agents/ (personal, all projects). Frontmatter controls model, tools, permission mode, and invocation behaviour:

yaml · .claude/agents/security-reviewer.md

---
name: security-reviewer
description: Expert security code reviewer. Use PROACTIVELY after any
  changes to authentication, authorisation, or data handling.
model: claude-haiku-4-5
tools: Read, Grep, Glob, Bash
permissionMode: plan
---

You are a senior security engineer. When invoked:
1. Identify files recently changed (git diff)
2. Analyse for OWASP Top 10 vulnerabilities
3. Check for secrets, SQL injection, improper auth
4. Return structured findings with severity ratings

⚠

Model cost ladder: Haiku (~$0.80/MTok) → Sonnet (~$3/MTok) → Opus (~$15/MTok). Reserve Opus for the orchestrator's planning and synthesis. Most subagent work — file reads, grep, straightforward analysis — does not need Sonnet-level reasoning. A mixed fleet cuts session cost 40–50%.

Headless CLI vs Agent SDK:
A Regulated Environment Comparison

This is where the architecture decision becomes consequential for enterprise teams. Claude Code's --bare -p mode (headless CLI) and the Python/TypeScript Agent SDK both enable programmatic execution — but they represent fundamentally different control postures.

The headless CLI is optimised for speed and simplicity. The Agent SDK is optimised for auditability, governance, and integration. In regulated environments — financial services, healthcare, government — the tradeoffs below are not academic.

Dimension	Headless CLI · `--bare -p`	Agent SDK · Python / TypeScript
Invocation model	Shell subprocess. Prompt as CLI arg or stdin. Fire-and-forget or capture stdout.	Native library call. Full async control, callbacks, and structured message objects.
Tool approval	`--allowedTools` pattern list declared at launch. Static per invocation. Coarse-grained	Per-tool-call approval callbacks. Approve, deny, or modify tool inputs programmatically. Fine-grained
Audit trail	JSON output with session metadata. Tool calls visible in `stream-json` mode. No native structured event log. Basic	Full message event stream with typed objects. Every tool use, tool result, and model turn is a first-class event. Emit directly to SIEM / audit log. Full event log
Secrets management	`ANTHROPIC_API_KEY` env var or `--settings` JSON. Exposed as process env in container. Env-level	Injected at runtime via SDK config. Integrates directly with AWS Secrets Manager, Vault, or KMS — no env var required. Runtime injection
Human-in-the-loop	No native mid-run pause. Workaround: permission modes or multi-invocation chains with human-gated shell scripts. Limited	First-class approval gates. Pause execution, surface context to human reviewer, resume or abort — all within one session. Native
Error handling	Retry events emitted as `system/api_retry` in stream-json. Catch in shell; implement custom backoff externally. External	Typed exceptions, retry hooks, and custom backoff strategies in-process. Structured error taxonomy. In-process
Session continuity	Session IDs via `--resume`. Stateless between invocations; must capture and thread IDs externally. Manual	Session object persisted in-process. History, context, and tool state managed natively across multi-turn flows. Managed
Structured output	`--output-format json` + `--json-schema`. Schema validation post-hoc via jq or shell parsing. Schema-constrained	Native typed response objects. Pydantic / Zod validation in-process. No shell parsing required. Typed native
MCP integration	`--mcp-config <file>` at launch. Static per invocation. Cannot swap servers mid-run.	MCP servers added/removed programmatically at runtime. Dynamic tool surfaces per workflow stage.
Context window control	`--bare` strips ambient context. `/compact` in-session only. No programmatic trim. Coarse	Full programmatic control over message history. Inject, truncate, or rewrite context between turns. Precise
Compliance controls	Permission modes, `--disallowedTools`. Governance via wrapper scripts and container policy. Perimeter-based	Inline policy enforcement. Block tool calls matching PII patterns, enforce dry-run mode for production, log all model decisions with full context. Inline governance
Setup complexity	`npm install -g @anthropic-ai/claude-code`. Works immediately. No code required. Minimal	SDK package + agent loop implementation + callback wiring. Meaningful initial investment. Higher
Best fit	CI code review Automated refactors Nightly test runs Developer tooling	BFSI workflows Regulated deploys Audit-required tasks Human-in-loop flows

When to Use Which

A practical decision guide based on the shape of your task and your compliance posture:

CLI PR review automation, nightly code quality gates, automated documentation generation — any workflow that is well-defined, stateless, and repeatable without mid-run decisions.
CLI Developer inner loop: claude interactive + --worktree for exploratory parallel spikes. Low ceremony, high velocity.
SDK Any workflow that touches production data, financial records, or PII — where every tool invocation must be logged with full context and potentially human-approved.
SDK Multi-stage deployment pipelines in regulated environments where each stage requires a policy check — the SDK's per-call approval callback is the right primitive.
BOTH Complex enterprise workflows can layer both: CLI headless for the non-sensitive planning and analysis phase, SDK for the execution phase that touches production systems. The session ID bridges them.

Token Discipline in Agent Loops

Token cost in agentic workflows is not linear with task complexity — it compounds. A 100-iteration debug loop with unrestricted Bash tool use adds approximately 24,500 extra input tokens in tool-call overhead alone. Three practical controls make the largest difference:

bash · cost-optimised CI invocation pattern

# 1. --bare strips ambient context (CLAUDE.md, skills, MCP, auto-memory)
# 2. --disallowedTools removes tool schemas from context
# 3. --effort low reduces thinking budget on well-scoped tasks
# 4. Haiku for the subagent; Sonnet only for final synthesis

claude --bare \
  -p "Review the auth module for security issues" \
  --model claude-haiku-4-5 \
  --disallowedTools "Bash,Edit" \
  --allowedTools "Read,Grep,Glob" \
  --effort low \
  --output-format json \
  --json-schema '{"type":"object","properties":{"findings":{"type":"array"}}}'

The escalation ladder is the single most impactful heuristic: Haiku → Sonnet → Opus. Start with Haiku. If the output feels shallow or the reasoning is wrong, escalate. Reserve Opus for architectural decisions, complex synthesis, and orchestrator-level planning. Never use Opus for file reads or grep operations.

Closing Thoughts

Claude Code is often framed as a developer productivity tool. That framing undersells it. What Anthropic has built is a programmable agent runtime with a conversational interface as its most visible surface.

The orchestration patterns are the architecture. The CLI flags are the configuration. The Agent SDK is the governance layer. Used deliberately — with the right pattern for the right task and the right tool for the right environment — this is a genuine step change in how we build and operate software systems.

"Scalability before convenience. Operability before elegance. Compliance before experimentation."

That lens still holds. The Operator pattern with explicit subagent definitions, --bare in CI, and the Agent SDK at production execution boundaries — that is the architecture I would recommend to any enterprise team adopting agentic AI today.

What patterns are you using? Happy to discuss implementation specifics in the comments — particularly around MCP integration, hooks for governance, and CLAUDE.md design for multi-domain platforms.