Skip to content

As a PDS data engineer, I want label validation to run in parallel within a single JVM so that I can fully utilize available CPU cores without spawning separate processes #1566

@jordanpadams

Description

@jordanpadams

Checked for duplicates

Yes - I've already checked

🧑‍🔬 User Persona(s)

Data Engineer, Node Operator

💪 Motivation

...so that I can validate large bundles in a fraction of the current time by utilizing multiple CPU cores within one JVM, eliminating the need for the heavy-handed validate-bundle shell script that spawns separate JVM processes via GNU Parallel.

📖 Additional Details

Impact: Very High

Relevant files:

  • src/main/java/gov/nasa/pds/tools/label/LabelValidator.java (line 456)

Problem:
LabelValidator.parseAndValidate() is declared synchronized, meaning only one thread can validate a label at a time within a single JVM. This completely negates any in-process parallelism. The synchronized keyword exists because the method mutates shared instance state (cachedParser, cachedValidatorHandler, cachedSchematron, etc.).

Recommendation:
Refactor LabelValidator to be thread-safe by making the mutable state (parser, validator handler, schematron cache) either thread-local or passed as method parameters.

Note: depends on resolving the pervasive static mutable state first (see companion issue on LabelUtil, ReferentialIntegrityUtil, etc.).

For Internal Dev Team To Complete

Acceptance Criteria

Given a bundle with many product labels
When I perform validation on a multi-core machine
Then I expect labels to be validated concurrently within a single JVM process

⚙️ Engineering Details

🎉 I&T

Metadata

Metadata

Assignees

Type

No type

Projects

Status

ToDo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions