Skip to main content
An empty test database doesn’t catch real bugs. A database full of copied production data is a compliance violation. Flock gives you the third option: realistic synthetic populations seeded directly into your schema.

Prerequisites

  • Pidgeon CLI installed (dotnet tool install -g pidgeon)
  • A database with an existing schema (PostgreSQL, MySQL, or SQL Server)
  • Database credentials with read/write access

Step 1: Connect to your database

pidgeon flock connect \
  --provider postgres \
  --connection-string "Host=localhost;Port=5432;Database=ehr_dev;Username=dev;Password=secret"
Flock analyzes the schema and reports:
  • Tables classified as patient, encounter, clinical, financial, or reference
  • Foreign key relationships mapped
  • Column types and constraints detected

Step 2: Learn patterns from existing data (optional)

If your database already has sample data, Flock can learn the statistical distribution patterns:
pidgeon flock learn --tables patients,encounters,diagnoses --sample-size 500
This creates a profile that captures:
  • Column value distributions (age ranges, gender ratios)
  • Referential patterns (which diagnosis codes appear together)
  • Temporal patterns (encounter durations, admission-to-discharge intervals)

Step 3: Generate a synthetic population

Generate 1,000 patients with related records:
pidgeon flock generate \
  --count 1000 \
  --format sql \
  --geographic-focus us \
  --seed 42
Flock generates:
  • Demographics: Age, gender, race, and geography distributions matching US Census data
  • Comorbidities: Realistic disease correlations (diabetes with hypertension, obesity with sleep apnea)
  • Temporal coherence: Admissions before discharges, lab orders before results
  • Family linkage: Realistic household structures and family relationships

Step 4: Preview with dry-run

Before writing anything, preview the generated SQL:
pidgeon flock seed --dry-run
This outputs the SQL INSERT statements in foreign-key order without executing them. Review to confirm the data looks correct.

Step 5: Seed the database

pidgeon flock seed
Flock inserts records in FK-dependency order so referential integrity is maintained. All synthetic records are tagged for easy identification and cleanup.

Step 6: Verify the results

Check that data was seeded correctly:
# Check job analytics
pidgeon flock status
Or via the API:
curl https://api.pidgeon.health/api/flock/generate/{jobId}/analytics

Alternative output formats

Flock can generate data in multiple formats beyond SQL:
pidgeon flock generate --count 1000 --format sql --output ./seed-data/

Clean up synthetic data

Remove all Flock-generated records when you’re done:
pidgeon flock seed --cleanup
Or via the API:
curl -X DELETE "https://api.pidgeon.health/api/flock/seed/cleanup?connectionString=Host%3Dlocalhost%3B..."
Cleanup removes all records tagged as synthetic. This cannot be undone.

Next steps