Skip to content

fix(#10209): harden xsltproc against XML external entity attacks#10949

Open
officialasishkumar wants to merge 2 commits intomedic:masterfrom
officialasishkumar:10209-xxe-protection
Open

fix(#10209): harden xsltproc against XML external entity attacks#10949
officialasishkumar wants to merge 2 commits intomedic:masterfrom
officialasishkumar:10209-xxe-protection

Conversation

@officialasishkumar
Copy link
Copy Markdown
Contributor

Description

CHT api forwards admin-uploaded XForm XML to xsltproc via stdin to render the form HTML and the XForm model (api/src/services/generate-xform.js). xsltproc was invoked without any restrictions on external resource resolution, so an admin who could upload a form was able to embed an XXE payload that exfiltrated arbitrary files from the api container into the resulting form HTML. See OWASP XXE / CWE-611.

The repository's own threat assessment in #10209 is that this is a defence-in-depth fix because the only attacker who reaches this code path is already an admin with the medic password and access to the (open-source) container contents. That said, removing the primitive entirely closes the door on chained attacks and any future code path that runs the same xsltproc invocation against less-privileged input.

Approach

Two complementary defences in api/src/services/generate-xform.js:

  1. Reject XForms that declare a DOCTYPE or external entity. CHT XForms have no legitimate reason to declare either construct, so a strict allowlist of safe input shapes is the cheapest and most reliable fix. The check strips XML comments first so that benign comments mentioning the literal text <!DOCTYPE / <!ENTITY are not flagged.

  2. Pass --nonet to xsltproc. This causes libxml2 to refuse to resolve any external resource (DTDs, entities, stylesheets) over the network, providing defence in depth in case any future code path manages to slip a DTD past the input check.

The validation runs before childProcess.spawn, so tainted input never reaches xsltproc.

Verification

npm run unit-api for the affected suite (api/tests/mocha/services/generate-xform.spec.js):

  • Existing 46 tests still pass unchanged.
  • 4 new tests cover the new behaviour:
    • --nonet is the first argument passed to every xsltproc spawn.
    • XForms with <!DOCTYPE> are rejected with a clear error message and xsltproc is never spawned.
    • XForms with <!ENTITY> are rejected with the same error.
    • XForms whose comments mention the literal text <!DOCTYPE / <!ENTITY are still accepted (no false positives on benign content).

npm run lint is clean for both modified files.

Fixes #10209

Code review checklist

  • Readable: Concise, well named, follows the style guide
  • Tested: New unit tests cover the rejection paths, the --nonet flag wiring, and the comment false-positive case
  • Backwards compatible: Real CHT XForms do not declare a DOCTYPE or external entities, so existing forms continue to render without change. New forms that legitimately need xsltproc keep working with --nonet in place.

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.

@officialasishkumar
Copy link
Copy Markdown
Contributor Author

The failing SonarCloud Quality Gate is a pre-existing javascript:S4036 Security Hotspot ("Make sure the PATH variable only contains fixed, unwriteable directories.") on the childProcess.spawn(XSLTPROC_CMD, ...) call.

It is an existing hotspot on master (AYpIVtpaQPYCEFsu0mXg, opened on 2021-09-22, on the same line in the same file) and SonarCloud has re-raised it as a new hotspot here (AZ3bKSCvOCQBCO-DKvAK) because adding assertNoExternalEntities and the XSLTPROC_FLAGS constant shifted the spawn call from line 37 to line 69.

The spawn call itself is unchanged in shape (still resolves xsltproc through PATH); the only edit is to its argv. Happy to address it separately if the maintainers would like the hotspot resolved (e.g. resolving an absolute path at startup), but it felt out of scope for the XXE fix.

@witash witash self-requested a review April 30, 2026 06:04
Copy link
Copy Markdown
Contributor

@witash witash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea to just refuse to upload forms containing a DOCTYPE or ENTITY element.
I can't think of any reason a cht form would need these.

tested locally and confirm it works, not able to upload a malicious form

Comment thread api/src/services/generate-xform.js Fixed
CHT api forwards admin-uploaded XForm XML to xsltproc via stdin to render
form HTML and the XForm model. xsltproc was invoked without any restrictions
on external resource resolution, so an admin who could upload a form was able
to embed an XXE payload that exfiltrated arbitrary files from the api
container into the resulting form HTML. See OWASP CWE-611.

This change adds two complementary defences in api/src/services/generate-xform.js:

  * `assertNoExternalEntities` rejects any XForm that declares a DOCTYPE or
    an <!ENTITY> before xsltproc is ever spawned. CHT XForms have no
    legitimate reason to use either construct, so this is a strict allowlist
    of safe input shapes. XML comments are stripped before scanning so that
    benign comments mentioning the literal text "<!DOCTYPE" or "<!ENTITY"
    are not flagged.

  * The xsltproc invocation now passes `--nonet`, which causes libxml2 to
    refuse to resolve any external resource (DTDs, entities, stylesheets)
    over the network. This is defence in depth for the case where a future
    code path manages to slip a DTD past the input check.

New unit tests cover all four cases: `--nonet` is forwarded, DOCTYPE input
is rejected, <!ENTITY> input is rejected, and XForms whose comments contain
the literal text "<!DOCTYPE"/"<!ENTITY" are still accepted.
@officialasishkumar
Copy link
Copy Markdown
Contributor Author

Addressed the CodeQL incomplete-sanitization finding in 89e15e9 by scanning the raw XForm XML for XML declarations before xsltproc runs. The focused Mocha test file now passes with COUCH_URL=http://admin:pass@localhost:5984/medic UNIT_TEST_ENV=1, and eslint passes for the changed files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Protect against XXE attack

3 participants