Skip to content

As a PDS developer, I want inline regex patterns in FieldValueValidator pre-compiled as static constants so that per-field Pattern compilation overhead is eliminated #1570

@jordanpadams

Description

@jordanpadams

Checked for duplicates

Yes - I've already checked

🧑‍🔬 User Persona(s)

Data Engineer, PDS Developer

💪 Motivation

...so that I can validate large table products faster by eliminating redundant Pattern compilation that occurs on every field of every record.

📖 Additional Details

Impact: Medium

Relevant files:

  • src/main/java/gov/nasa/pds/tools/validate/content/table/FieldValueValidator.java (lines 76-100, 843-856)

Problem:
Inside FieldValueValidator.checkFormat(), calls like specifier.matches("[eE]") and value.trim().matches(p) use String.matches(), which compiles a new Pattern object on every invocation. These are called for every field of every record in every table. The top-level patterns (e.g., asciiIntegerPattern) are correctly pre-compiled as static final Pattern constants, but the format-checking code does not follow the same pattern.

Recommendation:
Pre-compile the inline patterns used inside checkFormat() as static final Pattern constants, consistent with the approach already used elsewhere in the class. Replace String.matches(p) calls with cached Pattern instances.

For Internal Dev Team To Complete

Acceptance Criteria

Given a table product with millions of records
When I perform field format validation
Then I expect no regex Pattern objects are compiled more than once per pattern string

⚙️ Engineering Details

🎉 I&T

Metadata

Metadata

Type

No type

Projects

Status

🏁 Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions