Skip to content

Add SepsisPredictionEICU and SepsisPredictionMIMIC4 tasks — pre-ICU window, Sepsis-3 proxy labeling #911

@SHA888

Description

@SHA888

Summary

PyHealth currently has no sepsis prediction task. This issue proposes adding two
new task definitions:

  • SepsisPredictionEICU — eICU Collaborative Research Database
  • SepsisPredictionMIMIC4 — MIMIC-IV

Both tasks target pre-ICU / early ICU detection windows, filling a confirmed
gap identified in a PROSPERO-registered systematic review (CRD420251164609) covering
14 studies (53,795 participants): nearly all existing sepsis ML models are trained
on in-ICU data with no standardized reproducible benchmark.

Clinical Motivation

Sepsis is defined by the Sepsis-3 consensus (Singer et al., JAMA 2016) as
life-threatening organ dysfunction caused by dysregulated host response to infection.
Early detection — before ICU admission or within the first hours — is where clinical
impact is highest and where published ML models show the widest performance variance
(AUROC 0.72–0.94 across studies), largely due to inconsistent cohort definitions
and labeling strategies.

A reproducible PyHealth task enforces a consistent:

  • Observation window (configurable: default 6h from first available data)
  • Prediction gap (configurable: 0–12h before onset)
  • Sepsis label derivation (Sepsis-3 proxy: ICD codes + SOFA ≥ 2 or
    apacheadmissiondx for eICU)

Proposed Implementation

SepsisPredictionEICU

input_schema = {
    "vitals": "time_series",     # HR, RR, MAP, SpO2, Temp from vitalPeriodic
    "labs": "sequence",          # WBC, lactate, creatinine from lab table
    "conditions": "sequence",    # apacheadmissiondx / pasthistory
    "demographics": "static",    # age, gender, admissionweight
}
output_schema = {
    "sepsis_label": "binary",
}

Label derivation from eICU:

  • Positive: apacheadmissiondx containing sepsis/septic shock keywords or
    ICD codes in diagnosis table (995.91, 995.92, A41.x)
  • Negative: non-infectious admission diagnoses
  • Excluded: ambiguous/missing diagnosis records

SepsisPredictionMIMIC4

input_schema = {
    "vitals": "time_series",     # chartevents: HR, RR, MAP, SpO2, GCS
    "labs": "sequence",          # labevents: WBC, creatinine, bilirubin, lactate
    "conditions": "sequence",    # diagnoses_icd (ICD-10)
    "demographics": "static",    # age, gender, admission_type
}
output_schema = {
    "sepsis_label": "binary",
}

Label derivation from MIMIC-IV:

  • Use sepsis3 flag from mimiciv_derived.sepsis3 (already in MIMIC-IV derived
    tables) where available
  • Fallback: ICD-10 codes A40.x, A41.x in diagnoses_icd

Deliverables

  • pyhealth/tasks/sepsis_prediction_eicu.py
  • pyhealth/tasks/sepsis_prediction_mimic4.py
  • Export in pyhealth/tasks/__init__.py
  • Unit tests in tests/tasks/
  • Example notebook in examples/sepsis_prediction/
  • Docstring with clinical references

References

  • Singer M et al. The Third International Consensus Definitions for Sepsis and
    Septic Shock (Sepsis-3). JAMA. 2016;315(8):801–810.
  • Johnson AEW et al. MIMIC-IV, a freely accessible electronic health record
    dataset. Sci Data. 2023.
  • Pollard TJ et al. The eICU Collaborative Research Database. Sci Data. 2018.
  • Sucandra et al. Time Advantage and Diagnostic Accuracy of Biomarker-Enhanced
    ML/DL for Sepsis Detection in Pre-ICU Settings. PROSPERO CRD420251164609.

Notes

Happy to implement this. Will open a PR once approach is confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions