Quickstart

This is the shortest path to generate your first set of predictive outputs in-warehouse using illuminate_predictive_models.

The package supports Snowflake, Databricks, and Microsoft Fabric.

Compatibility

illuminate_predictive_models uses its own product versioning. Tuva version compatibility is documented separately and normalized internally where possible.

`illuminate_predictive_models` release line	Tested Tuva versions	Notes
Current release line	`v0.15.3`, `v0.16.x`	Supports both person-grain HCC outputs and newer `person_id + payer` HCC outputs by collapsing HCC assignments back to person-level features. Tuva `v0.17.0` is excluded because of a known upstream claims-only compilation regression.

1. Add package dependencies

In your project packages.yml:

packages:
  - git: "https://github.com/tuva-health/the_tuva_project.git"
    revision: v0.15.3
  - git: "https://github.com/illuminatehealth/illuminate_predictive_models.git"
    revision: v1.0.0

For production installs, pin illuminate_predictive_models to a published release tag rather than main. Avoid pinning to Tuva v0.17.0; use the latest v0.16.x release or the patched v0.17.1+ line once available.

Then install packages:

dbt deps

2. Set minimal vars

In your project dbt_project.yml:

vars:
  ml_enabled: true
  ml_prediction_anchor_month: "2018-05-01" # defaults to the latest member month in the anchor population if not provided

3. Run package models

dbt run --select package:illuminate_predictive_models

What happens:

Config and contract models build.
Training matrix builds.
train_model_registry either trains a new bundle or reuses a matching prior bundle from train_model_registry_history.
Prediction matrix builds.
predict_values, predict_probabilities_long, and train_metrics_long build.

4. Validate outputs

Check these tables in your predictive model schema (default schema name is ml):

train_model_registry
predict_values
predict_probabilities_long for both encounter threshold probabilities and spend top-percent probabilities
train_metrics_long

For initial build, start with a small sample run to limit compute while still building/testing the pipeline:

dbt run --select package:illuminate_predictive_models --vars '{ml_dev_sample_enabled: true, ml_dev_sample_rows: 10000, ml_dev_sample_seed: 42}'

The first run after enabling a new project or changing the training signature will train and seed the history table. The next unchanged run should reuse the same model_version with status = skipped_existing_model.

Compatibility​

1. Add package dependencies​

2. Set minimal vars​

3. Run package models​

4. Validate outputs​

Compatibility

1. Add package dependencies

2. Set minimal vars

3. Run package models

4. Validate outputs