KDIGO-Based Phenotyping & Predictive Modeling using MIMIC-IV
This project builds an end-to-end clinical machine learning pipeline to study and predict drug-associated acute kidney injury (AKI) in critically ill adults using MIMIC-IV v3.1. The goal is early AKI risk detection following exposure to nephrotoxic medications, with a strong emphasis on clinical validity and interpretability.
Using gold-standard KDIGO 2012 AKI definitions (serum creatinine + urine output), I construct a reproducible ICU cohort, identify drug exposure windows, engineer temporal features from pre-drug data, and train CatBoost models to predict 48-hour AKI risk.
Key Highlights
- Analyzed 65,000+ ICU stays from MIMIC-IV
- Implemented full KDIGO 2012 AKI phenotyping (SCr + urine output)
- Modeled AKI risk after nephrotoxic drugs (e.g., vancomycin, NSAIDs, ACE/ARBs)
- Engineered temporal features from labs, vitals, urine output, and severity scores
- Improved early AKI detection with ROC-AUC ~0.85 and nearly doubled recall
- Used SHAP explainability to validate clinical plausibility and risk drivers
Why This Project Matters
- Addresses a high-impact patient safety problem in the ICU
- Uses guideline-based labels, not weak proxies
- Balances performance, recall, and interpretability
- Designed for decision support, not just benchmark accuracy
Key Results (High-Level)
| Model | ROC-AUC | PR-AUC | Recall @ 0.50 | Notes |
|---|---|---|---|---|
| Baseline | ~0.82 | ~0.15 | ~0.37 | Static features only |
| Enriched | ~0.85 | ~0.19 | ~0.77 | Temporal + severity features |
Takeaway:
Temporal labs, vitals, urine output and SOFA/SAPS-II nearly double recall while maintaining precision, a crucial improvement for early nephrotoxin risk prediction in ICU settings.