Releases: jd4501/entity-coding
v1.0.0 - ACL 2025 reproduction snapshot
v1.0.0 - ACL 2025 reproduction snapshot
Companion release for Douglas et al. (2025), "Less is More: Explainable and Efficient ICD Code Prediction with Clinical Entities".
- Paper: https://aclanthology.org/2025.acl-long.1489/
- DOI: https://doi.org/10.18653/v1/2025.acl-long.1489
This tag is the first public repository snapshot and the stable reference point for paper-equivalent reproduction. The main branch will continue to receive fixes and improvements after this release, and some future changes may shift observable outputs.
For manuscript reproduction, check out this tag explicitly:
git clone https://github.com/jd4501/entity-coding.git
cd entity-coding
git checkout v1.0.0Start here
- Quick start and repo tour: https://github.com/jd4501/entity-coding/blob/v1.0.0/README.md
- End-to-end inference guide: https://github.com/jd4501/entity-coding/blob/v1.0.0/docs/inference.md
- Per-table reproduction recipes: https://github.com/jd4501/entity-coding/blob/v1.0.0/docs/reproduce.md
- Training and evidence reproduction: https://github.com/jd4501/entity-coding/blob/v1.0.0/docs/training.md and https://github.com/jd4501/entity-coding/blob/v1.0.0/docs/evidence.md
- Licenses and secondary citations: https://github.com/jd4501/entity-coding/blob/v1.0.0/docs/licenses.md
What is included
The repository includes the pipeline code, three synthetic discharge summaries for testing, a committed synthetic sample run under results/sample_results/, and paper-supplementary result artefacts under results/.
Model checkpoints, the entity-aware tokenizer, and released-checkpoint prediction feathers are fetched separately by python data_download.py; see the README and inference guide above. Source clinical data is not redistributed here. MIMIC-III, MIMIC-IV, MIMIC-IV-Note, and the MIMIC-IV-Ext-EntityCoding annotation release require credentialed PhysioNet access; the annotation release was still in PhysioNet review at the time of this tag.
Use and citation
The repository code is MIT-licensed. Trained model weights inherit non-commercial restrictions from their MIMIC and i2b2/n2c2 training data and are not released for clinical decision support, billing automation, or commercial deployment.
Please cite the ACL 2025 paper when using this repository. Work that reuses the vendored PLM-CA fork under external/plm_ca/ should also cite Edin et al. (2024); secondary citations are listed in docs/licenses.md.