Skip to main content

Official Runs and History

Spotlight DQ supports persisting DQ and reconciliation metrics across multiple dbt runs, building a time series you can use for trend analysis and run-to-run comparison.

How it works

By default (dq_persist_results: false), only current-run summary tables are built. History models emit zero rows.

When you set dq_persist_results: true, the package appends (upserts) current-run results into three incremental history tables:

History tableSource
reconciliation_history__monthly_summaryreconciliation__monthly_summary
dq_history__input_layer_field_summarydq_results__input_layer_field_summary
dq_history__terminology_field_summarydq_results__terminology_field_summary

Each row is stamped with run_id and run_ts so you can distinguish results from different runs.

Run metadata

VariableDefaultPurpose
dq_persist_resultsfalseEnable history persistence
dq_run_iddbt invocation_idIdentifier stamped on each row
tuva_last_runrun_started_atTimestamp stamped on each row

For scheduled production runs, use a meaningful dq_run_id so history is easy to query:

dbt run -s tag:spotlight --vars '{dq_persist_results: true, dq_run_id: "official_2026_03"}'

Upsert behavior

History tables use adapter-aware incremental strategies:

AdapterStrategy
Snowflake, BigQuery, Databricks, Redshift, Sparkmerge
Fabric, othersdelete+insert

If you re-run with the same dq_run_id, existing rows for that run are updated rather than duplicated. This makes reruns safe.

Schema evolution is handled with on_schema_change: append_new_columns. If you add new DQ checks in a package upgrade, new columns appear in history without breaking existing data.

Latest vs previous run views

Each history table has two companion views:

  • *__latest_official_run: rows from the most recent run_ts / run_id
  • *__previous_official_run: rows from the second most recent run_ts / run_id

These let you compare consecutive runs without writing window functions:

-- Which terminology fields degraded between the last two runs?
select
l.model_name,
l.field_name,
p.valid_rows_applicable_pct as previous_pct,
l.valid_rows_applicable_pct as latest_pct,
l.valid_rows_applicable_pct - p.valid_rows_applicable_pct as delta
from spotlight.dq_history__terminology_field_summary__latest_official_run l
join spotlight.dq_history__terminology_field_summary__previous_official_run p
on l.data_source = p.data_source
and l.payer = p.payer
and l.plan = p.plan
and l.model_name = p.model_name
and l.field_name = p.field_name
where l.valid_rows_applicable_pct < p.valid_rows_applicable_pct
order by delta asc;

Typical workflow

  1. Dev runs: leave dq_persist_results at false. Iterate on upstream data and models without polluting history.
  2. Official runs: set dq_persist_results: true with a meaningful dq_run_id. Run on a schedule (weekly, monthly, or after each data refresh).
  3. Monitoring: query latest/previous views or the full history table to track trends, set alerts on degradation, and report to stakeholders.

Analysis queries

The package includes analysis SQL files in the analyses/ directory for common diagnostic scenarios:

  • dq_terminology_latest_vs_previous_deltas.sql: compares terminology validation metrics between the latest and previous official runs, showing row counts and percentage deltas.
  • reconciliation_enrollment_diagnostics_*.sql: a series of queries for investigating enrollment-to-claims mismatches, including monthly dropoff analysis, mismatch reason bucketing, and unmatched member investigation.

Run these with dbt compile to generate executable SQL, then run directly in your warehouse.