This tool generates synthetic openEHR with variation that does not break the original archetype or template constraints. You can upload operational templates (opt) or your existing canonical compositions — and it produces as many flat or canonical compositions as you want (>10k files will be given as tar.gz ) Useful for testing, demonstrations, or training environments without using real patient data.
- Mutation is driven by webtemplate (WT) constraints per rmType, not random guessing
- Flat format (FLAT) used for generation; canonical format supported for duplication
- Targets ehrbase via openEHR REST API v1; any openEHR REST API v1 spec-compliant CDR should work
- Entry point:
gen-openehr.py
If you don't have access to an openEHR CDR, check /ehrbase folder for docker setup stuff (.env.ehrbase and docker-compose.yml which I improved from original ehrbase distribution; e.g. persistent DB and health checks to containers and more).
- go into
/ehrbasefolder - run docker compose up -d
And voila - in most cases it should be up and running on: http://localhost:8080/ehrbase/rest/openehr/v1
Reads canonical JSON compositions from source_models/user_compositions/, strips UIDs (if any),
and posts each one N times to the CDR or saves to dist/compositions/.
Use this when you have known-good canonical compositions and want to replicate them. Obviously you should have opt in the CDR (you can use Mode 3 to upload opt)
When saving locally (a), if the total composition count exceeds 10,000 the tool asks:
e.g. 12,000 compositions to save: (a) Individual files / (b) Zip [default]:
- Default (Enter or
b) → singledist/compositions/compositions.tar.gz(gzip compressed) a→ individual.jsonfiles as before
Reads flat composition skeletons from source_models/flat_composition_skeletons/,
applies WT-driven mutation per rmType, and posts or saves the result.
Requires Mode 3 to have been run first to populate skeletons and webtemplates.
When saving locally (a), the tool first asks for the output format:
Format: (a) Flat [default] / (b) Canonical (via AQL):
- Flat (default,
a): saves the mutated flat JSON directly — no CDR connection needed. - Canonical (via AQL) (
b): posts each flat composition to the CDR, then fetches the canonical JSON back using paginated AQL (SELECT c FROM EHR ... CONTAINS COMPOSITION c LIMIT 10 OFFSET n) and saves the CDR-returned canonical representation. Requires a live CDR connection.
The same tar.gz threshold applies: if total compositions exceed 10,000, a packaging prompt appears (same wording as Mode 1).
Full environment preparation in one step:
- Clears
opt_webtemplates/andflat_composition_skeletons/ - Prompts for ehrbase URL and credentials (saved to
ehrbase_config.json) - Uploads all
.optfiles fromsource_models/opts/to the CDR- 200/201: extracts
template_idfromLocationheader - 409 (already exists): extracts
template_idfrom OPT XML body
- 200/201: extracts
- Fetches and saves webtemplates to
source_models/opt_webtemplates/ - Fetches flat example compositions per WT and saves envelopes to
source_models/flat_composition_skeletons/
Re-running Mode 3 wipes and regenerates all artefacts. Credentials can be updated at this point.
Mutation is applied per WT node rmType. Keys matching protected path segments are always skipped.
| rmType | Behaviour |
|---|---|
DV_QUANTITY |
±10% jitter on |magnitude; clamped to WT min/max range; |unit untouched |
DV_CODED_TEXT (local) |
Random pick from WT input code list |
DV_CODED_TEXT (openehr) |
Untouched |
DV_TEXT |
Shuffle words (multi-word); append random hex suffix (single word) |
DV_DATE_TIME / DV_DATE / DV_TIME |
±15% of one day (86 400 s) |
DV_DURATION |
Untouched |
DV_ORDINAL |
Random pick from WT list; sets |ordinal, |value, |code |
DV_COUNT |
Random integer within WT validation range |
null_flavour (mandatory) |
Injected via WT id path (e.g. element/coded_text_value|code); value keys kept |
Protected path segments (any key containing these is skipped entirely):
category, context, language, territory, composer,
_work_flow_id, _guideline_id, _instruction_details, ism_transition, annotations
ism_transition is fully protected because careflow_step, current_state, and transition
are tightly coupled — mutating one without the others produces invalid ISM state machine transitions.
source_models/
opts/ # Input: OPT files to upload
opt_webtemplates/ # Generated by Mode 3: webtemplate JSONs
flat_composition_skeletons/ # Generated by Mode 3: flat example envelopes
user_compositions/ # Input: canonical JSONs for Mode 1
dist/
compositions/ # Output: generated compositions
ehrbase_config.json # Saved API credentials (gitignored)
ehrbase/
Flat skeleton files are wrapped in an envelope:
{ "template_id": "...", "flat_comp": { ... } }- Python 3.10+
- Running ehrbase (or any openEHR REST API v1 compliant CDR)
source_models/opts/populated with your OPT files before running Mode 3
Windows
py -3.12 -m venv venv
.\venv\Scripts\Activate.ps1
Linux / macOS
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
python3 gen-openehr.py
- Place
.optfiles insource_models/opts/ - Run Mode 3 — enter ehrbase URL and credentials once; credentials saved to
ehrbase_config.json - Run Mode 2 — choose count per skeleton and destination (local disk or CDR)
- Optionally place canonical JSONs in
source_models/user_compositions/and use Mode 1
ehrbase_config.jsonis gitignored. Re-run Mode 3 to update credentials or URL.- Mode 3 wipes
opt_webtemplates/andflat_composition_skeletons/on every run — any manual edits to skeletons will be lost. dist/compositions/is wiped at the start of every Mode 1 or Mode 2 local-save run.- Concurrency is capped at 10 parallel requests (asyncio semaphore) for all CDR calls.
- Total elapsed time is always printed on exit:
[*] Total time: Xm Ys. - Project must be on a local drive; do not store the venv in synced folders (OneDrive, Google Drive).