Skip to main content

Outputs and Contract

Published outputs

empi schema

TableDescription
source_personStandardized input records combining claims and clinical demographics. One row per source system + source ID.
pair_decisionsAll scored pairs with final decision (match, non_match, clerical_review) after applying thresholds, hard rules, and manual overrides.
person_crosswalkCore mapping table: person_idsource_system + source_id. Includes active_bool, cluster metadata, and timestamps.
person_countsRecord count per resolved person_id.
person_attrsSurvivorship golden record: one row per person_id with consolidated attributes (name, DOB, SSN, address, phone, email, etc.).
person_attrs_provenanceAudit trail: tall table with person_id, attribute_name, attribute_value, and the source record that contributed it.
person_id_mapIntermediate crosswalk used by input-layer remapping models.
work_queueCandidate pairs in the clerical-review band, ready for analyst review. Includes work_item_id, pair_key, match_probability, and the source keys for both sides.
work_item_crosswalkStable mapping from work_item_id to pair_key across runs.
cluster_id_crosswalkTracks cluster merge/split history across runs.
overridesManual review decisions written by the review app. Incremental with full_refresh: false.
rematch_requestsRecords flagged for re-scoring (from split/carve operations). Incremental with full_refresh: false.

core schema

TableDescription
personPerson dimension: person_id, cluster version, record count, and timestamps.

input_layer schema (remapped)

For each enabled domain, the package publishes a remapped version of the upstream empi_pre__* table with person_id overwritten to the EMPI-resolved value.

Claims domain:

TableSource
eligibilityempi_pre__eligibility
medical_claimempi_pre__medical_claim
pharmacy_claimempi_pre__pharmacy_claim
provider_attributionempi_pre__provider_attribution (when enabled)

Clinical domain:

TableSource
patientempi_pre__patient
appointmentempi_pre__appointment
conditionempi_pre__condition
encounterempi_pre__encounter
immunizationempi_pre__immunization
lab_resultempi_pre__lab_result
medicationempi_pre__medication
observationempi_pre__observation
procedureempi_pre__procedure

All original columns are preserved. Only person_id is replaced with the EMPI-resolved value.

Input contract

Required columns for linkage

The empi_pre__eligibility and empi_pre__patient models feed the identity resolution scorer. These columns are used:

Identity core:

ColumnTypeUsed for
source_systemstringRecord provenance, pair generation
source_idstringUnique record identifier within source
first_namestringFuzzy name comparison
middle_namestringFuzzy name comparison
last_namestringFuzzy name comparison
name_suffixstringExact comparison
birth_datedateDate comparison, blocking
death_datedateDate comparison
sexstringExact comparison
social_security_numberstringExact comparison, blocking

Contact:

ColumnTypeUsed for
addressstringFuzzy address comparison
citystringFuzzy comparison
statestringExact comparison
zip_codestringExact comparison
phonestringExact digit comparison
emailstringExact comparison, blocking

Demographic:

ColumnTypeUsed for
racestringSurvivorship only
ethnicitystringSurvivorship only

Provenance:

ColumnTypeUsed for
data_sourcestringSource tracking
ingest_datetimetimestampSurvivorship timestamp
file_datedateSurvivorship timestamp
file_namestringAudit trail

Domain enablement

Which empi_pre__* models are required depends on project vars:

VarWhen trueRequired models
claims_enabledClaims sources included in linkageempi_pre__eligibility, empi_pre__medical_claim, empi_pre__pharmacy_claim
clinical_enabledClinical sources included in linkageempi_pre__patient + all clinical entity models
provider_attribution_enabledProvider attribution remappedempi_pre__provider_attribution

When a domain is disabled, its corresponding upstream refs are not required for package execution.