📜 proof · one real run

What one real GreatCTO run actually looks like.

Feature: ship voice-AI compliance pack (TCPA, STIR/SHAKEN, state recording-consent). Every stage below is real. Every artifact links to public GitHub source. No screenshots. No staged demos.

The run · 2026-05-14

From prompt to shipped in 1h 26m.

One feature: add a domain-pack overlay so voice-AI startups (Sierra, Cresta, Phonely, …) get the right gates automatically. Below: every stage, every agent, every artifact. PR #22 merged at 17:17:48 +0200.

wall-clock
1h 26m
llm cost
~$3.40
human gates
2 (plan + ship)
lines shipped
414 (+test infra)
15:51 · T+0 operator (human)
Prompt: "Voice-AI startups (Sierra, Cresta, Phonely) keep tripping on TCPA + state recording-consent + STIR/SHAKEN. Ship a domain pack so the gates fire automatically when the archetype is detected."
15:54 · T+3m architect ~$0.32
Read existing pack examples (clinical-pack, hr-ai-pack). Identified the contract: a pack ships 1 pack-spec + 1 reviewer agent + ≥3 EVAL fixtures + 1 CLI signal in packs.ts. Drafted ARCH.note listing the 5 OWASP-LLM threats specific to voice (PII leakage in transcripts, prompt injection via caller speech, synth-voice disclosure, call-handoff identity drift, recording-consent state-by-state matrix).
15:58 · T+7m ⚐ GATE: PLAN · operator (human) ~30 s
Operator reviewed the ARCH.note. Approved scope: voice-pack, 4 evals (handoff, PII, injection, synth-disclosure), TCPA + STIR/SHAKEN + state-recording-consent in the threat model. Rejected scope creep: no IVR-specific overlay yet (defer to v2.9).
APPROVED · proceed with 4-eval pack · skip IVR overlay
15:59 · T+8m pm ~$0.08
Decomposed into 4 independent tasks (no dependency between them, so parallelize): voice-pack.md spec, voice-ai-reviewer.md agent, EVAL fixtures × 4, CLI signal in packs.ts. Filed as beads tasks.
16:04 · T+13m senior-dev #1 (parallel) ~$0.42
Authored skills/great_cto/packs/voice-pack.md — pack spec: detection signals (Twilio/Vonage/Retell SDKs, keywords "voice agent / IVR / phone tree"), gates added when pack overlays an archetype, references to laws and standards.
16:11 · T+20m senior-dev #2 (parallel) ~$0.68
Authored agents/voice-ai-reviewer.md — reviewer agent prompt: when to fire, gates it owns (gate:voice-compliance), threat model, pre-implementation checklist, sign-off criteria. 200 lines.
16:19 · T+28m senior-dev #3 (parallel) ~$0.74
Authored 4 EVAL fixtures — one per identified threat. Each fixture has: scenario, expected verdict, refusal pattern, red-team probes. Used the existing EVAL-*.md template.
16:25 · T+34m senior-dev #4 (parallel) ~$0.18
Added voice-pack detection signals to packages/cli/src/packs.ts: exact-match keywords ("twilio", "vonage", "retell", "voice agent", "IVR"), README hints, dep tree probes. Wrote unit tests covering all signals.
16:35 · T+44m ai-eval-engineer (review) ~$0.24
Verified each EVAL fixture covers a distinct threat (no overlap, no gaps). Checked refusal patterns are testable and not just narrative.
APPROVED · 4 evals cover the 4 threats with non-overlapping scenarios
16:42 · T+51m ai-security-reviewer (review) ~$0.31
Mapped each EVAL to OWASP LLM Top-10. Flagged one gap: missing eval for LLM-08 (excessive agency via tool use). Asked: "Does the pack's reviewer agent enforce tool-allowlisting in the agent loop?"
PARTIAL · add tool-allowlist clause to voice-ai-reviewer.md OR ship eval for LLM-08 in v2.8.1
16:48 · T+57m senior-dev #2 (re-claim) ~$0.16
Added tool-allowlist clause to voice-ai-reviewer.md per the security reviewer's request. Re-submitted.
16:58 · T+1h7m tests/run-packs-e2e.mjs CI · $0
Ran the full pack-chain validator: voice-pack detection signals → reviewer agent → 4 EVAL files → CLI suggester. 47 assertions passed across the voice-pack fixture; 456 across all 10 packs.
PASS · 47/47 voice-pack assertions · CLI suggests voice-pack on Twilio fixture
17:10 · T+1h19m ⚐ GATE: SHIP · operator (human) ~30 s
Operator reviewed: all artifacts, the security-reviewer's PARTIAL → fixed chain, the e2e output. Approved merge to main.
APPROVED · ship as v2.8.0 · merged in PR #22
17:17 · T+1h26m devops ~$0.07
PR #22 merged at 17:17:48 +0200. CI green. npm publish v2.8.0 triggered.
17:20 · T+1h29m continuous-learner ~$0.05
Extracted lesson: "pack-rollout pattern" — every new pack ships exactly pack-spec + reviewer + ≥3 EVALs + CLI signal. Wrote to .great_cto/lessons.md. Promoted to ~/.great_cto/decisions.md after the 3rd pack hit (clinical-pack, hr-ai-pack, voice-pack).
LESSON SAVED · next pack rollout will skip 80 % of the exploration time

⚠ Honest caveats

Want more proof?

9 more packs shipped the same way.

Each one followed the pack-rollout pattern. Each pack page on this site links back to its agent + EVALs + GitHub commit. Pick one to inspect.

All 10 packs ↗ Architecture diagram ↗ MTTR methodology ↗ ← Back to home