Real cost breakdown: 10 packs, $0.60 LLM bill, $42K saved per regulated feature

Per-feature, per-MVP, per-quarter numbers. Hardware ratios, runway math, and the honest places where the savings stop.

This is the numbers post. If you read the ten-packs deep-dive and walked away wanting the spreadsheet, here it is.

All numbers below are from real client engagements (anonymized aggregates) plus telemetry from the GreatCTO install base. Not projections. Not vendor-pitch math.

Per-feature: the $42K → $0.60 + 50 hours of human review

A single regulated feature in a single industry. Pre-pipeline:

Identify which regs apply          ~8h    × $200      = $1,600
Read primary regulation text      ~14h    × $200      = $2,800
Map regulation → stack            ~20h    × $250      = $5,000
Draft threat model                ~32h    × $250      = $8,000
Consent flow + UX                 ~20h    × $180      = $3,600
Implementation                    ~40h    × $180      = $7,200
Internal legal review              ~8h    × $400      = $3,200
External auditor pre-meeting      ~10h    × $350      = $3,500
Revisions                         ~16h    × mixed     = $3,500
Final signoff                      ~4h    × $400      = $1,600
                                  ─────                ─────
                                  ~172h               ~$40K
                                                      (rounded $42K with overhead)

With pipeline:

LLM compute (architect+reviewers)  ~$0.60-$1.40 per feature
Human review of LLM output         ~14-18h × mixed     ~$3,800
External auditor pre-meeting       ~6-8h   (lower because tighter document)
Internal legal                     ~8h     (unchanged)
                                   ─────                ─────
                                   ~28-34h              ~$11-14K

Net saved per feature: ~$28-30K and ~140 hours of human time. LLM bill is rounding error.

The $0.60 number is per feature, not per MVP. Some readers conflated these. A small fintech feature on Claude Sonnet costs ~$0.60-$1.40 in LLM calls. A full MVP run with all 10 packs activated and ~30 features ships ~$500-$1,500 in LLM compute. Both numbers are honest, they describe different scopes.

Per-MVP: $287K → $128K (~55% reduction)

A voice-AI MVP, three months of work, traditional team composition:

1 Product Manager × 3 months × $180/h × 120h/mo = $64,800
4 Engineers × 3 months × $180/h × 140h/mo = $302,400
Architecture work (internal or fractional CTO) = ~$20,000
Security review (external) = ~$15,000
Compliance setup (consultant + internal time) = ~$28,000
Misc (PM tools, hosting trial, design) = ~$8,000

───────── ~$438K nominal ~$287K after overlap & efficient teaming


With pipeline + agentic SDLC, same MVP, 6-8 weeks:

- 1 Product Manager × 2 months × $180/h × 120h/mo  = $43,200
- 2 Engineers × 2 months × $180/h × 140h/mo        = $100,800
- LLM compute across the whole run                 = ~$1,200
- Architecture review (1 sr human, 3 sessions)     = ~$3,000
- Security review (external, same)                 = ~$15,000 (unchanged — see "what doesn't compress")
- Compliance setup (pipeline output + ~12h review) = ~$5,500
- Misc                                             = ~$8,000
                                                     ─────────
                                                     ~$176K nominal
                                                     ~$128K after similar overlap savings

Net: ~$159K saved per MVP, ~45% time saved. Most of the saving is not the LLM bill — it is fewer engineer-months because senior-dev parallelism + auto-review compresses the build phase.

Per-quarter / per-runway: the bet that changes

For a founder shipping into one regulated industry (most realistic scenario):

	Traditional	Pipeline	Saved
MVP time	3 months	6-8 weeks	~1.5 months
MVP cost	$287K	$128K	$159K
Compliance setup (4 features, year 1)	$168K	$48K	$120K
Year 1 total	$455K	$176K	$279K
Equivalent runway months @$50K burn	9.1 mo	3.5 mo	5.6 months recovered

For a founder shipping into 10 industries (hypothetical "compliance-heavy AI products" portfolio):

	Traditional	Pipeline	Saved
Year 1 (10 MVPs × overlap)	$1.45M	$580K	$870K
Wall-clock (sequential)	30 months	10 months	20 months
Wall-clock (with parallelism)	21 months	7 months	14 months

The 10-industry case is hypothetical — no real founder ships into all 10 simultaneously. But it shows the structural ratio: roughly 60% cost reduction, roughly 67% wall-clock reduction.

LLM compute: where the money goes

Per-MVP LLM compute, ~$500-$1,500 total, breaks down roughly:

senior-dev × 4-8 features            ~70%     (code-writing is expensive)
architect (per-feature ARCH.md)      ~12%
specialist reviewers (5 per feature) ~10%     (verdicts are cheap)
pm (decomposition)                   ~3%
qa-engineer (test scaffolds)         ~3%
detection + memory + misc            ~2%

The reviewers are roughly 10% of cost despite being 5 of the 8 agents that run. They output verdicts, not code. If your LLM cost is exploding, look at how much code is being generated, not how many agents are running.

Hardware / model-choice ratios

We tested Sonnet 4.6 vs Haiku 4.5 vs Opus 4.5 on the same 23-feature batch:

Model	LLM cost ratio	Wall-clock ratio	architect output quality (human eval, blind)
Haiku 4.5	0.31×	0.74×	"noticeably worse" — 4 of 23 ARCH docs unusable
Sonnet 4.6	1.0× (baseline)	1.0×	acceptable, default
Opus 4.5	5.1×	1.27×	"marginally better" — 1 ARCH doc clearly superior

Conclusion: Sonnet for everything except deep-reasoning architecture decisions. Use Opus only for architect on greenfield features in unfamiliar territory. Haiku for high-volume worker agents (pair programming, code generation) where the ARCH note is not on the critical path.

What does NOT compress

I have called this out before, but in numbers terms:

Item	Compressible?
External audit cycle (NYC bias auditor, 2-4 weeks)	No
FDA pre-submission meeting (60-90 days)	No
IRB approval (clinical trials, 8-12 weeks)	No
Wet-lab validation (drug discovery)	No
HARA signature (functional safety, 1 calendar moment)	No
Lawyer reading the threat model	Compresses (LLM-written threat model is faster to read than human-written long-form)
Regulator phone calls	No

Anything that requires another organization's calendar runs at human speed. Internal work compresses 5-25×. External-dependency work does not move.

For an early-stage AI startup on 18-24 month runway, the bet that changes is the internal portion. You can now run 3 external compliance cycles per year instead of 1.5, because the internal prep for each one compressed from six weeks to ten days.

The thing I underbet

When I started building the packs, I assumed the ROI claim would be "30-40% on compliance cost." The number ended up larger and the shape surprised me — most of the saving is not the LLM compute (it is rounding error) but the fewer engineering-months the parallelism enables, plus the fewer consulting hours the LLM-drafted threat model enables.

If you take one number from this post: the LLM compute is not the moat. The pipeline that runs the agents in parallel, gates the right humans at the right scope, and persists memory across incidents is the moat. The LLM is the substrate.

About: I build GreatCTO — a multi-agent SDLC plugin for Claude Code with 10 compliance packs. MIT, runs locally. Pay your own LLM API. Per-pack numbers (which 10 industries, what each pack does, real consulting-rate comparisons) are in the W21 deep-dive.