Operations

Troubleshooting

Training matrix returned no rows

Error: int_model_matrix_train returned no rows

Check anchor months in int_anchor_population: confirm data exists for the configured training window.
Check label_complete_flag = 1 rows in int_labels_long: if the outcome window isn't complete, rows are excluded from training.
Try loosening the training anchor window vars (ml_train_anchor_start_month / ml_train_anchor_end_month).

Registry is empty

Error: train_model_registry is empty; run training flow first

Training must complete before running prediction or metrics models:

dbt run --select train_model_registry

Invalid artifact stage path

Error: Unsupported artifact_uri format

Check that ml_artifact_stage is valid for your platform's artifact storage configuration. In Snowflake, it must be a stage path that starts with @. Also verify the configured role has read/write access to that location.

No predictions generated

Error: No predictions were generated for any trained target/horizon

Confirm the prediction anchor month has rows in the prediction matrix.
Confirm the target policy matches what was used during training.
Rebuild the sparse coordinate artifacts and rerun prediction.

Registry rows not found for model version

Error: No train_model_registry rows found for model_version=...

Rebuild train_model_registry and predict_values in the same schema and environment.
Check for cross-environment reference mix-ups (e.g., production registry referenced from a dev environment).

FAQ

Do I need config seed files?

No. All configuration is driven by dbt vars with sensible built-in defaults. Define vars in dbt_project.yml for project-level settings, and use --vars only for one-off run overrides.

How do I force retraining?

Set ml_force_train: true when running train_model_registry.

What does `ml_force_train: false` do?

It computes a training signature from your current runtime settings plus the current target and feature configuration, then checks train_model_registry_history for a matching prior bundle. If one is found, training is skipped and the existing model is reused. If not, a new model is trained and recorded there for future reuse.

Reuse is bundle-level, not per target. Adding, removing, or changing any target invalidates the prior bundle and retrains all targets together.

If a single policy row contains multiple target_values, that list is shorthand for multiple separate targets. For example, target_values: [emergency department, ambulatory surgery center] creates two prediction outputs, not one combined output.

Can I generate predictions for a specific month?

Yes. Set ml_prediction_anchor_month to the desired month (e.g., "2026-02-01").

What happens if I don't set a prediction month?

The package defaults to the latest available anchor month in the data.

Are models trained separately per data source?

Yes. Each bundle contains entries scoped by data_source and target/horizon key, so models are trained and applied independently per source.

How are train/test splits done?

Splits are performed at the person level using a grouped shuffle split on person_id. This prevents anchors from the same person appearing in both train and test sets, reducing within-person temporal leakage.

Can I disable specific feature groups?

Yes. Use the ml_feature_policy var to enable or disable any combination of demographics, utilization, conditions, and HCC feature groups.

Which platforms are supported?

illuminate_predictive_models supports Snowflake, Databricks, and Microsoft Fabric. It is not limited to Snowflake, although some examples in the docs use Snowflake-style stage syntax when describing artifact storage.

Troubleshooting​

Training matrix returned no rows​

Registry is empty​

Invalid artifact stage path​

No predictions generated​

Registry rows not found for model version​

FAQ​

Do I need config seed files?​

How do I force retraining?​

What does ml_force_train: false do?​

Can I generate predictions for a specific month?​

What happens if I don't set a prediction month?​

Are models trained separately per data source?​

How are train/test splits done?​

Can I disable specific feature groups?​

Which platforms are supported?​