Quickstart
This is the shortest path to generate your first set of predictive outputs in-warehouse using illuminate_predictive_models.
The package supports Snowflake, Databricks, and Microsoft Fabric.
Compatibility
illuminate_predictive_models uses its own product versioning. Tuva version compatibility is documented separately and normalized internally where possible.
illuminate_predictive_models release line | Tested Tuva versions | Notes |
|---|---|---|
| Current release line | v0.15.3, v0.16.x | Supports both person-grain HCC outputs and newer person_id + payer HCC outputs by collapsing HCC assignments back to person-level features. Tuva v0.17.0 is excluded because of a known upstream claims-only compilation regression. |
1. Add package dependencies
In your project packages.yml:
packages:
- git: "https://github.com/tuva-health/the_tuva_project.git"
revision: v0.15.3
- git: "https://github.com/illuminatehealth/illuminate_predictive_models.git"
revision: v1.0.0
For production installs, pin illuminate_predictive_models to a published release tag rather than main.
Avoid pinning to Tuva v0.17.0; use the latest v0.16.x release or the patched v0.17.1+ line once available.
Then install packages:
dbt deps
2. Set minimal vars
In your project dbt_project.yml:
vars:
ml_enabled: true
ml_prediction_anchor_month: "2018-05-01" # defaults to the latest member month in the anchor population if not provided
3. Run package models
dbt run --select package:illuminate_predictive_models
What happens:
- Config and contract models build.
- Training matrix builds.
train_model_registryeither trains a new bundle or reuses a matching prior bundle fromtrain_model_registry_history.- Prediction matrix builds.
predict_values,predict_probabilities_long, andtrain_metrics_longbuild.
4. Validate outputs
Check these tables in your predictive model schema (default schema name is ml):
train_model_registrypredict_valuespredict_probabilities_longfor both encounter threshold probabilities and spend top-percent probabilitiestrain_metrics_long
For initial build, start with a small sample run to limit compute while still building/testing the pipeline:
dbt run --select package:illuminate_predictive_models --vars '{ml_dev_sample_enabled: true, ml_dev_sample_rows: 10000, ml_dev_sample_seed: 42}'
The first run after enabling a new project or changing the training signature will train and seed the history table. The next unchanged run should reuse the same model_version with status = skipped_existing_model.