An empty test database doesn’t catch real bugs. A database full of copied production data is a compliance violation. Flock gives you the third option: realistic synthetic populations seeded directly into your schema.
Prerequisites
Pidgeon CLI installed (dotnet tool install -g pidgeon)
A database with an existing schema (PostgreSQL, MySQL, or SQL Server)
Database credentials with read/write access
Step 1: Connect to your database
pidgeon flock connect \
--provider postgres \
--connection-string "Host=localhost;Port=5432;Database=ehr_dev;Username=dev;Password=secret"
Flock analyzes the schema and reports:
Tables classified as patient, encounter, clinical, financial, or reference
Foreign key relationships mapped
Column types and constraints detected
Step 2: Learn patterns from existing data (optional)
If your database already has sample data, Flock can learn the statistical distribution patterns:
pidgeon flock learn --tables patients,encounters,diagnoses --sample-size 500
This creates a profile that captures:
Column value distributions (age ranges, gender ratios)
Referential patterns (which diagnosis codes appear together)
Temporal patterns (encounter durations, admission-to-discharge intervals)
Step 3: Generate a synthetic population
Generate 1,000 patients with related records:
pidgeon flock generate \
--count 1000 \
--format sql \
--geographic-focus us \
--seed 42
Flock generates:
Demographics : Age, gender, race, and geography distributions matching US Census data
Comorbidities : Realistic disease correlations (diabetes with hypertension, obesity with sleep apnea)
Temporal coherence : Admissions before discharges, lab orders before results
Family linkage : Realistic household structures and family relationships
Step 4: Preview with dry-run
Before writing anything, preview the generated SQL:
pidgeon flock seed --dry-run
This outputs the SQL INSERT statements in foreign-key order without executing them. Review to confirm the data looks correct.
Step 5: Seed the database
Flock inserts records in FK-dependency order so referential integrity is maintained. All synthetic records are tagged for easy identification and cleanup.
Step 6: Verify the results
Check that data was seeded correctly:
# Check job analytics
pidgeon flock status
Or via the API:
curl https://api.pidgeon.health/api/flock/generate/{jobId}/analytics
Flock can generate data in multiple formats beyond SQL:
SQL INSERT
CSV
HL7 Streams
FHIR Bundles
pidgeon flock generate --count 1000 --format sql --output ./seed-data/
Clean up synthetic data
Remove all Flock-generated records when you’re done:
pidgeon flock seed --cleanup
Or via the API:
curl -X DELETE "https://api.pidgeon.health/api/flock/seed/cleanup?connectionString=Host%3Dlocalhost%3B..."
Cleanup removes all records tagged as synthetic. This cannot be undone.
Next steps