🏗 ARCHITECTURE 4 min read

June under the hood: the board becomes a pult, prompts evolve behind a holdout gate, logs shrink 99.5%

Approve a gate → an agent spawns and streams live. Plus: a self-improvement loop with anti-overfit gating, $0 context compression, scope-pinned task briefs, and Fable 5 support.

The last two posts were about the pivot — autopilots, live connectors, the operator console. This one is about the engine room: four upgrades that shipped in the same June sprint and that you'd otherwise only discover by reading the changelog. Users keep telling us they don't read the changelog. Fair.


1. The board is now a pult, not a mirror

Until v2.64 the dev board showed you the pipeline: tasks, gates, costs. To act on anything you went back to the terminal.

Now approving a gate (or pressing Run) spawns a Claude Code agent headlessly in the project and streams its output into the board — assistant text, tool calls, result, parsed from stream-json and pushed over SSE. There's a Run-agent panel with a prompt field and a live stream, and an Approve + ▶run button right on the gate card. Approve the plan, watch the implementation start, without touching a terminal.

Running an autonomous agent that edits files from a web page is exactly as dangerous as it sounds, so the guardrails came first:

Verified end-to-end with a stub binary (all four guardrails, Stop button) and a real claude run.


2. Prompts now have to prove they got better

Every agent in GreatCTO learns from lessons. The uncomfortable question: when the system rewrites an agent's prompt based on a lesson, who checks the rewrite didn't make it worse?

v2.37 closed the loop, porting the generate→evaluate→gate cycle from hexo-ai/sia:

A learned improvement can no longer ship until it's re-proven on cases it never saw. The same loop later gated the compression layer below — turtles all the way down, but each turtle is tested.


3. Context compression: 31,475 chars of CI log → 155

Agents read logs, test output, JSON dumps. Most of it is repetition. v2.38 added a compression layer — deterministic, $0, no LLM, no native deps, concepts borrowed from chopratejas/headroom:

InputResult
CI log31,475 → 155 chars (−99.5%), FATAL/ERROR/stacks kept verbatim
JSON−43% minified, −98% with array crush
Noisy test run−86%, the FAIL preserved

The part that makes aggressive compression safe: CCR — Compressed Context with Retrieval. Anything dropped is stored locally, content-addressed, and recoverable on demand; the memory filter appends a recall footer listing what it filtered. Lossless-on-demand. And a fidelity eval (through the v2.37 holdout gate, naturally) ensures a compressor only ships if the key fact survives.

l3-support compresses logs and qa-engineer compresses test output before reasoning — fewer tokens spent re-reading the same stack trace twelve times.


4. Scope creep is now caught mechanically

The classic agent failure: asked to fix the webhook, also "improved" the auth module. v2.39 added governance inspired by NaCl, all machine-checkable at $0:


Also in June

All of it: open source, MIT, zero telemetry, github.com/avelikiy/great_cto. The full gory detail lives in the CHANGELOG — but now you don't have to read it.

Also published on
devto