How GreatCTO chooses which compliance pack to attach

Regex vs LLM-based archetype detection, the false-positive count, and why I keep rejecting the obvious fix.

Every time someone runs npx great-cto init, the CLI has to decide:

What kind of project is this? (one of ~25 archetypes)
Which compliance packs apply on top? (voice / clinical / fintech / lending / 6 more)
Are any of those guesses wrong enough that the user will get a useless threat model and abandon the tool?

That last question is what makes the detection logic interesting. Get it wrong and the first impression is "this is producing nonsense about regulations I don't care about." Get it too conservative and the user has to manually configure packs that should have auto-attached, defeating the point.

After four months in production, here is what works.

What I tried first: LLM-based detection

Original design (rejected after 2 weeks): pipe the repo's README, package.json, and top-level directory listing into Claude and ask it to classify.

Problems, in order of severity:

Latency. First run of init now takes 12-18 seconds instead of <1s. Users perceive this as broken.
Cost. Roughly $0.04 per init. Negligible per user, real money at scale.
Hallucinations. Claude classified a Helm chart for an internal Kubernetes operator as "fintech, because the README mentions billing in the Operator's logging section." It does not. The word "billing" appeared once, describing log volume.
Variance. Same repo, same prompt, two runs: voice-AI then mlops. Probably temperature noise. Not acceptable for a decision that shapes the rest of the pipeline.

Killed it. Went to a regex-based detector. Latency dropped from 15s to 180ms. Cost dropped to $0. Variance dropped to zero.

The trade-off: regex cannot read intent. It reads tokens. A repo that says it does voice AI in its README but actually contains a music-recommender model will get the voice pack. That is a false positive I accept because the alternative (LLM in the loop) had its own false positives and was 80× slower.

The current detector

Three signal layers:

Layer 1 — package.json dependencies. twilio / livekit / deepgram / elevenlabs → voice pack. stripe / plaid / dwolla → fintech. tensorflow / pytorch + transformers → ml-pack (different from voice-pack). And so on for ~80 strong signal tokens.

Layer 2 — file paths. clinical/, fda/, phi/, hipaa/ in directory names → clinical pack. webhook/ + signature-related code → api-platform-pack.

Layer 3 — README + top-level docs grep. Exact-match keywords only, not fuzzy. "AEDT", "automated employment decision", "NYC Local Law 144" → hr-ai pack. "21 CFR Part 11", "SaMD", "FDA pre-submission" → clinical pack.

Each pack has a minimum signal count. voice-pack needs ≥2 of its 11 tokens. fintech needs ≥3 of 14. This is what cut false positives roughly in half.

The false positives I have logged

Across 4 months and ~340 init runs (instrumented from telemetry), 12 confirmed false positives:

repo type	wrongly attached pack	trigger	fix
static-site generator	voice-pack	README explicitly disclaiming Twilio	exact-match keywords only
music-recommender ML	voice-pack	"audio" in package description	removed "audio" as solo trigger
internal Helm chart	fintech	"billing" in operator log section	minimum 3 signals
docs-only repo	clinical	"patient" in user-research subfolder	excluded `docs/` from path scan
game-server prototype	mlops	`torch` in optional dev-dep	only scan `dependencies`, not `devDependencies`
7 others	various	various	each addressed via test case in `tests/detection.test.mjs`

The 12 cases are committed as regression tests. If the detector ever re-introduces one of these false positives, CI fails.

The case I worry about: silent false negatives

Easier to log a false positive (user complains "why is this thing telling me about TCPA"). Harder to catch a false negative (user runs init on a repo that should have hr-ai pack attached, doesn't, ships with no bias audit, gets fined two years later).

Mitigations:

/migrate command. Rerun detection with updated rules. New packs (or new keywords for existing packs) get a second chance to attach.
PROJECT.md is editable. The packs: list is plain YAML. User can add manually if detection missed.
Public catalogue. greatcto.systems/companies.html lists 200+ companies and the packs that would auto-attach to each. If a user's similar competitor is in the catalogue, they get a sanity check on whether their detection is correct.
Telemetry on no-pack runs. When init detects zero packs, we log it (anon, opt-in). If a class of project keeps coming through with no pack and the cost-of-miss is high (regulated industry), I add detection rules.

I have not had a confirmed regulatory false negative yet. That is partly because the user population is small (~500 active installs as of writing) and partly because the high-stakes archetypes (clinical, fintech, lending) have strong-signal vocabulary that is hard to miss.

What I will not add

People keep asking for two features I have rejected:

"Pack confidence scores." The detector should output 0-1 confidence per pack so the user can sort. I rejected this: it implies a precision the regex layer does not actually have, and users will treat a 0.6 score as "halfway right" when really it means "one signal matched, probably noise."
"Auto-update detection from telemetry." If we see 10 users with xyz in their repo overriding our detection, automatically add xyz as a fintech signal. Rejected: too easy to poison. One determined attacker registers 10 fake xyz/random-name repos with manual fintech tags and the global detector starts attaching fintech to everyone using xyz.

Both of these are textbook examples of "the obvious feature that becomes a backdoor."

What I might add

LLM in the loop, but only for ambiguous cases. If 2+ packs have signal but below threshold for any one, pipe the README into Claude with a strict "pick one or 'unclear'" prompt. Latency penalty only on the 5-10% of repos that are ambiguous, not all of them.
Per-language detection. Right now everything assumes Node/Python/JVM-ish patterns. Rust and Go projects sometimes have weak signal even when they are clearly fintech or healthcare. Not urgent — those communities are smaller in the user base.

The detection logic is small, boring, and one of the parts of the system I am most defensive of. It is the first thing every user sees, and a wrong first guess loses them.

About: I build GreatCTO — a multi-agent SDLC plugin for Claude Code. MIT, runs locally. The detector source is in packages/cli/src/detect.ts — read or fork.