Optimal Coding Agent

February 26, 2026

Edit on GitHub

My blueprint for the optimal coding agent (team).
Keeping LLMs in the smart zone using short and relevant context.

Usage patterns:

quick (small) edit
- skip to instructor agent
major edit
- full coding agent orchestra
new project
- deep research for best tech stack
- use proven project template

UX:

Provide idea -> discuss
Refine plan with designer -> submit to general
General creates the task tree -> review and submit
Watch the executors work -> review finished idea

IDEATOR / MUSE

Agent helps you crystallize your vision and define a hard success criterion (agent has web access to research ideas).

DESIGNER

Uses knowledge base: tool to query existing solved solutions + web access (query tools like deepwiki.com). Designer also needs relevant context to create good architecture (blog posts, papers, books, etc)

translates idea into objects / modules and interactions
defines interfaces / boundaries for modules
produces a design with specifications
discuss best tools (tech stack) for the design specs

GENERAL

Creates the battle plan - brief the troops.

ORCHESTRATOR

Generate the task tree from design + tech stack.

create modular plan with task execution dependency tree
- every task has:
  - ID
  - Problem: in- / output typed interface + internal behavior / structure
  - Criterion: success metric / test
  - Dependencies: task IDs
  - Complexity: low, mid, high
    - maps to small / big LLM
    - can assign multiple models to work on same task with different approaches
  - Context: added by INSTRUCTOR
  - State: defined, open, busy, trial, done, fail
  - StateHistory: List<(state, date, reason, author_agent)>

Refs:

https://github.com/steveyegge/beads
https://github.com/joshuadavidthomas/opencode-beads
https://github.com/Dicklesworthstone/beads_viewer
Why decentral task orchestration wins
Claude Code released tasks feature (similar to beads)

INSTRUCTOR

Provides every task with perfect (relevant) Context.

NEW: code templates for fresh projects
adds only relevant tools (MCP / CLI) to task context (prevents context bloat)
relevant and version matching documentation for coding tasks
- Coding guidelines based on programming languages / code files involved for the task
- General use: official matching version docs / blog posts
- Existing code: analyse package dependencies
  - exact (version) matching code documentation
  - source code -> extract public function interfaces with doc strings
    - Python: .venv/lib/python3.11/site-packges/$package_name
    - JS: node_modules/

Refs:

EXECUTORS

Coding agents working on tasks in parallel - working off the task tree backwards recursively (breadth first, towards the root task - the IDEA).

CREATOR

use LSP server for precise edits based on TASK problem and context
- optimal code exploration
- optimal code editing

VALIDATOR

validates if CREATOR solved tasks Criterion (test / metric)
- yes: task done
- no: retry task with modified context (add failures / learnings)

REFLECTOR

uses REPL / MCP to inspect live vars of code produced by CREATOR
fixes any issues spotted by VALIDATOR
persist learnings in knowledge base for future reference

TASK Life Cycle

State: defined, open, busy, trial, done, fail

defined: ID, Problem, Criterion, Complexity
open: added Context by INSTRUCTOR
busy: CREATOR is coding
trial: CREATOR finished
VALIDATOR is happy -> done
VALIDATOR is unhappy -> open
VALIDATOR gives up -> fail

Typical Project:

task 0 (root task) is the IDEA itself and is validated at last when all subtasks are done
the first task to tackle is the last task we planned out:
setup coding environment for new project

User Interface:

saves chats as text files (easy to inspect whole context)
displays (token usage/context limit) in current session

new:
when agent sees multiple solutions explore in parallel

Minimal additions that would move it closer to “optimal”:
Add a GATEKEEPER (cheap, fast) before GENERAL
Input: idea + rough scope
Output: route → instructor-only | partial orchestra | full orchestra

Add post‑mortem synthesis (lightweight)
After task 0 validation:

what task boundaries were wrong?
what context was unnecessary?
what should become a template?
These don’t add conceptual weight—they reduce entropy over time.

References

Existing Agents:

https://www.mihaileric.com/The-Emperor-Has-No-Clothes/
- https://news.ycombinator.com/item?id=46545620
https://pi.dev/
https://github.com/openai/codex
https://github.com/jacobsparts/agentlib - really cool project - drops the agent directly into Python repl
https://agent-flywheel.com/flywheel
Verified Spec-Driven Development
replace claude code with cheaper + faster alternatives

Eval extending existing open-source coding agents TUI or GUI apps:

https://github.com/wandb/catnip
https://github.com/block/goose seems to be really close to my vision - check it out
https://thebob.dev/ai/tools/productivity/2025/10/31/why-we-built-claude-os-and-what-it-actually-is/
https://anandchowdhary.com/blog/2025/running-claude-code-in-a-loop
https://ampcode.com/

Fast LLMs:
groq.com, https://www.inceptionlabs.ai/, https://chat.z.ai/, https://huggingface.co/chat/

#coding #AI