Operations
Troubleshooting
Training matrix returned no rows
Error: int_model_matrix_train returned no rows
- Check anchor months in
int_anchor_population: confirm data exists for the configured training window. - Check
label_complete_flag = 1rows inint_labels_long: if the outcome window isn't complete, rows are excluded from training. - Try loosening the training anchor window vars (
ml_train_anchor_start_month/ml_train_anchor_end_month).
Registry is empty
Error: train_model_registry is empty; run training flow first
Training must complete before running prediction or metrics models:
dbt run --select train_model_registry
Invalid artifact stage path
Error: Unsupported artifact_uri format
Check that ml_artifact_stage is valid for your platform's artifact storage configuration. In Snowflake, it must be a stage path that starts with @. Also verify the configured role has read/write access to that location.
No predictions generated
Error: No predictions were generated for any trained target/horizon
- Confirm the prediction anchor month has rows in the prediction matrix.
- Confirm the target policy matches what was used during training.
- Rebuild the sparse coordinate artifacts and rerun prediction.
Registry rows not found for model version
Error: No train_model_registry rows found for model_version=...
- Rebuild
train_model_registryandpredict_valuesin the same schema and environment. - Check for cross-environment reference mix-ups (e.g., production registry referenced from a dev environment).
FAQ
Do I need config seed files?
No. All configuration is driven by dbt vars with sensible built-in defaults. Define vars in dbt_project.yml for project-level settings, and use --vars only for one-off run overrides.
How do I force retraining?
Set ml_force_train: true when running train_model_registry.
What does ml_force_train: false do?
It computes a training signature from your current runtime settings plus the current target and feature configuration, then checks train_model_registry_history for a matching prior bundle. If one is found, training is skipped and the existing model is reused. If not, a new model is trained and recorded there for future reuse.
Reuse is bundle-level, not per target. Adding, removing, or changing any target invalidates the prior bundle and retrains all targets together.
If a single policy row contains multiple target_values, that list is shorthand for multiple separate targets. For example, target_values: [emergency department, ambulatory surgery center] creates two prediction outputs, not one combined output.
Can I generate predictions for a specific month?
Yes. Set ml_prediction_anchor_month to the desired month (e.g., "2026-02-01").
What happens if I don't set a prediction month?
The package defaults to the latest available anchor month in the data.
Are models trained separately per data source?
Yes. Each bundle contains entries scoped by data_source and target/horizon key, so models are trained and applied independently per source.
How are train/test splits done?
Splits are performed at the person level using a grouped shuffle split on person_id. This prevents anchors from the same person appearing in both train and test sets, reducing within-person temporal leakage.
Can I disable specific feature groups?
Yes. Use the ml_feature_policy var to enable or disable any combination of demographics, utilization, conditions, and HCC feature groups.
Which platforms are supported?
illuminate_predictive_models supports Snowflake, Databricks, and Microsoft Fabric. It is not limited to Snowflake, although some examples in the docs use Snowflake-style stage syntax when describing artifact storage.