Skip to main content

Quickstart

This is the shortest path to generate your first set of predictive outputs in-warehouse using illuminate_predictive_models.

The package supports Snowflake, Databricks, and Microsoft Fabric.

Compatibility

illuminate_predictive_models uses its own product versioning. Tuva version compatibility is documented separately and normalized internally where possible.

illuminate_predictive_models release lineTested Tuva versionsNotes
Current release linev0.15.3, v0.16.xSupports both person-grain HCC outputs and newer person_id + payer HCC outputs by collapsing HCC assignments back to person-level features. Tuva v0.17.0 is excluded because of a known upstream claims-only compilation regression.

1. Add package dependencies

In your project packages.yml:

packages:
- git: "https://github.com/tuva-health/the_tuva_project.git"
revision: v0.15.3
- git: "https://github.com/illuminatehealth/illuminate_predictive_models.git"
revision: v1.0.0

For production installs, pin illuminate_predictive_models to a published release tag rather than main. Avoid pinning to Tuva v0.17.0; use the latest v0.16.x release or the patched v0.17.1+ line once available.

Then install packages:

dbt deps

2. Set minimal vars

In your project dbt_project.yml:

vars:
ml_enabled: true
ml_prediction_anchor_month: "2018-05-01" # defaults to the latest member month in the anchor population if not provided

3. Run package models

dbt run --select package:illuminate_predictive_models

What happens:

  1. Config and contract models build.
  2. Training matrix builds.
  3. train_model_registry either trains a new bundle or reuses a matching prior bundle from train_model_registry_history.
  4. Prediction matrix builds.
  5. predict_values, predict_probabilities_long, and train_metrics_long build.

4. Validate outputs

Check these tables in your predictive model schema (default schema name is ml):

  • train_model_registry
  • predict_values
  • predict_probabilities_long for both encounter threshold probabilities and spend top-percent probabilities
  • train_metrics_long

For initial build, start with a small sample run to limit compute while still building/testing the pipeline:

dbt run --select package:illuminate_predictive_models --vars '{ml_dev_sample_enabled: true, ml_dev_sample_rows: 10000, ml_dev_sample_seed: 42}'

The first run after enabling a new project or changing the training signature will train and seed the history table. The next unchanged run should reuse the same model_version with status = skipped_existing_model.