De-identification Methods
Safe Harbor
Safe Harbor
Removes all 18 HIPAA Safe Harbor identifiers. The simplest and most conservative approach.
Safe Harbor Plus
Safe Harbor Plus
Removes identifiers and replaces them with realistic synthetic values. The result looks like a real message — useful for testing downstream systems that reject empty fields.
Expert Determination
Expert Determination
Statistical approach using k-anonymity and l-diversity analysis. Configurable risk thresholds let you balance data utility against re-identification risk. Produces equivalence class analysis and risk scoring reports.
Full Synthetic
Full Synthetic
Replaces the entire message with a synthetic equivalent that preserves clinical structure but shares no values with the original.
Usage
What Gets Replaced
80+ PHI fields across 10 HL7 segment types are mapped and handled:| Identifier Type | Examples | Action |
|---|---|---|
| Patient name | PID.5 | Replaced with synthetic name |
| MRN / Patient ID | PID.3 | Replaced (or kept with --keep-ids) |
| SSN | PID.19 | Removed entirely |
| Date of birth | PID.7 | Date-shifted |
| Address | PID.11, NK1.4, GT1.5 | Replaced with synthetic address |
| Phone / email | PID.13, PID.14, NK1.5 | Replaced with synthetic values |
| Provider name / NPI | OBR.16, PV1.7, PV1.8, PV1.9 | Replaced |
| Account number | PID.18 | Replaced maintaining format |
| Insurance ID | IN1.36, IN2 fields | Replaced |
| Device / biometric IDs | Various | Removed or replaced |
| All date/datetime fields | Across all segments | Shifted by consistent offset |
| Free text fields | OBX.5, NTE.3 | Scanned for embedded PHI patterns |
Segments covered include MSH, PID, NK1, PV1, PV2, OBR, OBX, GT1, IN1, and IN2. Custom field mappings can be added for organization-specific PHI locations.
Risk Assessment
Post can assess re-identification risk for your de-identified output:- k-anonymity scoring — Measures whether individuals can be singled out
- l-diversity analysis — Checks sensitive attribute diversity within equivalence classes
- Compliance reporting — HTML and JSON reports suitable for audit documentation
Consistency Across Batches
When de-identifying multiple messages from the same patient, relationships are preserved:- Same input MRN always produces the same synthetic MRN (within a salt context)
- ID mappings persist across runs when using
--salt - Temporal relationships between messages are maintained through consistent date shifting

