Outputs and Contract
Published outputs
empi schema
| Table | Description |
|---|---|
source_person | Standardized input records combining claims and clinical demographics. One row per source system + source ID. |
pair_decisions | All scored pairs with final decision (match, non_match, clerical_review) after applying thresholds, hard rules, and manual overrides. |
person_crosswalk | Core mapping table: person_id ← source_system + source_id. Includes active_bool, cluster metadata, and timestamps. |
person_counts | Record count per resolved person_id. |
person_attrs | Survivorship golden record: one row per person_id with consolidated attributes (name, DOB, SSN, address, phone, email, etc.). |
person_attrs_provenance | Audit trail: tall table with person_id, attribute_name, attribute_value, and the source record that contributed it. |
person_id_map | Intermediate crosswalk used by input-layer remapping models. |
work_queue | Candidate pairs in the clerical-review band, ready for analyst review. Includes work_item_id, pair_key, match_probability, and the source keys for both sides. |
work_item_crosswalk | Stable mapping from work_item_id to pair_key across runs. |
cluster_id_crosswalk | Tracks cluster merge/split history across runs. |
overrides | Manual review decisions written by the review app. Incremental with full_refresh: false. |
rematch_requests | Records flagged for re-scoring (from split/carve operations). Incremental with full_refresh: false. |
core schema
| Table | Description |
|---|---|
person | Person dimension: person_id, cluster version, record count, and timestamps. |
input_layer schema (remapped)
For each enabled domain, the package publishes a remapped version of the upstream empi_pre__* table with person_id overwritten to the EMPI-resolved value.
Claims domain:
| Table | Source |
|---|---|
eligibility | empi_pre__eligibility |
medical_claim | empi_pre__medical_claim |
pharmacy_claim | empi_pre__pharmacy_claim |
provider_attribution | empi_pre__provider_attribution (when enabled) |
Clinical domain:
| Table | Source |
|---|---|
patient | empi_pre__patient |
appointment | empi_pre__appointment |
condition | empi_pre__condition |
encounter | empi_pre__encounter |
immunization | empi_pre__immunization |
lab_result | empi_pre__lab_result |
medication | empi_pre__medication |
observation | empi_pre__observation |
procedure | empi_pre__procedure |
All original columns are preserved. Only person_id is replaced with the EMPI-resolved value.
Input contract
Required columns for linkage
The empi_pre__eligibility and empi_pre__patient models feed the identity resolution scorer. These columns are used:
Identity core:
| Column | Type | Used for |
|---|---|---|
source_system | string | Record provenance, pair generation |
source_id | string | Unique record identifier within source |
first_name | string | Fuzzy name comparison |
middle_name | string | Fuzzy name comparison |
last_name | string | Fuzzy name comparison |
name_suffix | string | Exact comparison |
birth_date | date | Date comparison, blocking |
death_date | date | Date comparison |
sex | string | Exact comparison |
social_security_number | string | Exact comparison, blocking |
Contact:
| Column | Type | Used for |
|---|---|---|
address | string | Fuzzy address comparison |
city | string | Fuzzy comparison |
state | string | Exact comparison |
zip_code | string | Exact comparison |
phone | string | Exact digit comparison |
email | string | Exact comparison, blocking |
Demographic:
| Column | Type | Used for |
|---|---|---|
race | string | Survivorship only |
ethnicity | string | Survivorship only |
Provenance:
| Column | Type | Used for |
|---|---|---|
data_source | string | Source tracking |
ingest_datetime | timestamp | Survivorship timestamp |
file_date | date | Survivorship timestamp |
file_name | string | Audit trail |
Domain enablement
Which empi_pre__* models are required depends on project vars:
| Var | When true | Required models |
|---|---|---|
claims_enabled | Claims sources included in linkage | empi_pre__eligibility, empi_pre__medical_claim, empi_pre__pharmacy_claim |
clinical_enabled | Clinical sources included in linkage | empi_pre__patient + all clinical entity models |
provider_attribution_enabled | Provider attribution remapped | empi_pre__provider_attribution |
When a domain is disabled, its corresponding upstream refs are not required for package execution.