πŸ›‘ COMPLIANCE 5 min read

How GreatCTO chooses which compliance pack to attach

Regex vs LLM-based archetype detection, the false-positive count, and why I keep rejecting the obvious fix.

Every time someone runs npx great-cto init, the CLI has to decide:

That last question is what makes the detection logic interesting. Get it wrong and the first impression is "this is producing nonsense about regulations I don't care about." Get it too conservative and the user has to manually configure packs that should have auto-attached, defeating the point.

After four months in production, here is what works.

What I tried first: LLM-based detection

Original design (rejected after 2 weeks): pipe the repo's README, package.json, and top-level directory listing into Claude and ask it to classify.

Problems, in order of severity:

  1. Latency. First run of init now takes 12-18 seconds instead of <1s. Users perceive this as broken.
  2. Cost. Roughly $0.04 per init. Negligible per user, real money at scale.
  3. Hallucinations. Claude classified a Helm chart for an internal Kubernetes operator as "fintech, because the README mentions billing in the Operator's logging section." It does not. The word "billing" appeared once, describing log volume.
  4. Variance. Same repo, same prompt, two runs: voice-AI then mlops. Probably temperature noise. Not acceptable for a decision that shapes the rest of the pipeline.

Killed it. Went to a regex-based detector. Latency dropped from 15s to 180ms. Cost dropped to $0. Variance dropped to zero.

The trade-off: regex cannot read intent. It reads tokens. A repo that says it does voice AI in its README but actually contains a music-recommender model will get the voice pack. That is a false positive I accept because the alternative (LLM in the loop) had its own false positives and was 80Γ— slower.

The current detector

Three signal layers:

Layer 1 β€” package.json dependencies. twilio / livekit / deepgram / elevenlabs β†’ voice pack. stripe / plaid / dwolla β†’ fintech. tensorflow / pytorch + transformers β†’ ml-pack (different from voice-pack). And so on for ~80 strong signal tokens.

Layer 2 β€” file paths. clinical/, fda/, phi/, hipaa/ in directory names β†’ clinical pack. webhook/ + signature-related code β†’ api-platform-pack.

Layer 3 β€” README + top-level docs grep. Exact-match keywords only, not fuzzy. "AEDT", "automated employment decision", "NYC Local Law 144" β†’ hr-ai pack. "21 CFR Part 11", "SaMD", "FDA pre-submission" β†’ clinical pack.

Each pack has a minimum signal count. voice-pack needs β‰₯2 of its 11 tokens. fintech needs β‰₯3 of 14. This is what cut false positives roughly in half.

The false positives I have logged

Across 4 months and ~340 init runs (instrumented from telemetry), 12 confirmed false positives:

repo typewrongly attached packtriggerfix
static-site generatorvoice-packREADME explicitly disclaiming Twilioexact-match keywords only
music-recommender MLvoice-pack"audio" in package descriptionremoved "audio" as solo trigger
internal Helm chartfintech"billing" in operator log sectionminimum 3 signals
docs-only repoclinical"patient" in user-research subfolderexcluded docs/ from path scan
game-server prototypemlopstorch in optional dev-deponly scan dependencies, not devDependencies
7 othersvariousvariouseach addressed via test case in tests/detection.test.mjs

The 12 cases are committed as regression tests. If the detector ever re-introduces one of these false positives, CI fails.

The case I worry about: silent false negatives

Easier to log a false positive (user complains "why is this thing telling me about TCPA"). Harder to catch a false negative (user runs init on a repo that should have hr-ai pack attached, doesn't, ships with no bias audit, gets fined two years later).

Mitigations:

  1. /migrate command. Rerun detection with updated rules. New packs (or new keywords for existing packs) get a second chance to attach.
  2. PROJECT.md is editable. The packs: list is plain YAML. User can add manually if detection missed.
  3. Public catalogue. greatcto.systems/companies.html lists 200+ companies and the packs that would auto-attach to each. If a user's similar competitor is in the catalogue, they get a sanity check on whether their detection is correct.
  4. Telemetry on no-pack runs. When init detects zero packs, we log it (anon, opt-in). If a class of project keeps coming through with no pack and the cost-of-miss is high (regulated industry), I add detection rules.

I have not had a confirmed regulatory false negative yet. That is partly because the user population is small (~500 active installs as of writing) and partly because the high-stakes archetypes (clinical, fintech, lending) have strong-signal vocabulary that is hard to miss.

What I will not add

People keep asking for two features I have rejected:

Both of these are textbook examples of "the obvious feature that becomes a backdoor."

What I might add

The detection logic is small, boring, and one of the parts of the system I am most defensive of. It is the first thing every user sees, and a wrong first guess loses them.


About: I build GreatCTO β€” a multi-agent SDLC plugin for Claude Code. MIT, runs locally. The detector source is in packages/cli/src/detect.ts β€” read or fork.