📊 archetype: data-platform

Build a data platform without the GDPR scramble.

Building with Snowflake, BigQuery, dbt, Spark, or Databricks? GreatCTO auto-detects the data-platform archetype and ships GDPR retention, PII detection in logs, lineage tracking, and SLO budget gates from day one.

What you avoid

The 5 data-platform bugs that haunt audits.

Without GreatCTO

  • PII logged in Spark driver logs — kept 90 days
  • GDPR retention not enforced — 5-year-old user data
  • Lineage broken — can't answer "where does this come from"
  • Broken DAG silently fills nulls — bad dashboards
  • PII in BigQuery exports without masking
  • SAR fails · DPIA fails · ICO knocks.

With GreatCTO

  • PII detection on driver logs · masked at sink
  • Retention policies codified · auto-enforced
  • Lineage required: every column has a source
  • DAG failures alert · null-thresholds at gate:ship
  • Export classification + masking auto-checked
  • SAR-ready · DPIA-clean · auditor-friendly.
Auto-applied gates

Detected: dbt_project.yml + snowflake
data-platform archetype.

Compliance auto-suggested: gdpr · data-residency · lineage. Specialist agents activated:

01 · security-officer

GDPR + data residency

Article 5 minimization · Article 17 erasure · Article 32 security of processing · cross-border transfer (SCCs) · DPIA generation.

02 · db-migration-reviewer

Schema migration safety

Zero-downtime ALTER patterns · index lock duration · backfill safety · rollback strategy · PII column handling.

03 · performance-engineer

Query budget

p95 query latency · partition / cluster strategy · cost-per-query · slot consumption · BI dashboard SLO.

04 · code-reviewer

dbt model review

Idempotent models · incremental safety · test coverage (not_null, unique, accepted_values) · lineage docs · contract enforcement.

Domain pack overlays

Likely to overlay on data-platform.

Packs auto-attach when CLI detects pack-specific signals (e.g. twilio in deps → voice-pack). Each pack adds its own reviewer agents + human gates on top of the base archetype pipeline.

+ Climate MRV
GHG Protocol, Verra, SBTi, CSRD, CBAM + biosecurity (DURC, IGSC HSP v2)
+ Clinical Trials
ICH-GCP E6(R3), 21 CFR Part 11, CDISC, FHIR R5, OMOP, DICOM, de-id
Real-world examples

Companies operating as data-platform.

13 startups in this space. Click for full pack mapping.

IQVIA
Human data science for healthcare
publicUS
Tempus AI
Precision medicine via AI + clinical data
publicUS
Flatiron Health
Oncology EHR + real-world evidence
subsidiaryUS
Benchling
R&D cloud for life sciences
growthUS
Castor
Modern clinical-trial data platform
series-cNL
CarbonChain
Supply-chain carbon accounting
series-bGB
Pachama
AI-verified forest carbon credits
series-bUS
Sylvera
Carbon credit ratings + research
series-bGB
Climate X
Physical climate-risk analytics
series-aGB
Isometric
Standards + verification for carbon removal
series-aGB
Lamin Labs
Open-source data infrastructure for biology
seedDE
Nest Genomics
Software for genetics integration into clinical care
seedUS
Perse
AI for sustainability data
seedGB

Listed companies operate in this space. Inclusion is based on publicly available product descriptions and does not imply endorsement of or by GreatCTO.

30 seconds

Drop into any dbt / Spark / Snowflake / BigQuery repo.

$ npx great-cto init
no signup·runs locally·pay your own API