Prerequisites
- Pidgeon CLI installed (
dotnet tool install -g pidgeon) - A directory of real HL7 messages to de-identify
Basic de-identification
Process an entire directory of messages:- Reads every message file in
./real-messages - Replaces patient names, MRNs, SSNs, addresses, and phone numbers
- Shifts all dates forward by 30 days (preserving relative intervals)
- Writes clean messages to
./safe-messages
Date shifting
Date shifting moves all dates by a fixed offset while preserving the temporal relationships between events:Consistent hashing
For scenarios where you need the same input to produce the same output (e.g., matching patients across de-identified files), use a salt:Preserve identifiers
If you need to keep certain identifiers intact (e.g., for matching across systems):What gets de-identified
| Field Type | Action | Example |
|---|---|---|
| Patient name | Replaced with synthetic name | Smith, John → Martinez, Elena |
| MRN / Patient ID | Replaced (or kept with —keep-ids) | 12345 → 98761 |
| SSN | Removed entirely | 123-45-6789 → XXX-XX-XXXX |
| Date of birth | Date-shifted | 1985-03-15 → 1985-04-14 |
| Address | Replaced with synthetic address | 123 Main St → 456 Oak Ave |
| Phone number | Replaced | 555-0100 → 555-0742 |
| All dates | Date-shifted | Consistent offset applied |
Workflow: real messages to test data
Collect real messages
Export messages from your integration engine (Mirth, Rhapsody, etc.) into a directory.
Next steps
- Analyze de-identified messages to create vendor profiles
- Generate additional synthetic data matching the same patterns
- Build test scenarios using de-identified messages as templates

