Skip to content

[HUDI-9624] Warn when event-time field is set without watermark tracking#18721

Open
1fanwang wants to merge 1 commit into
apache:masterfrom
1fanwang:fix/HUDI-9624-warn-event-time-watermark
Open

[HUDI-9624] Warn when event-time field is set without watermark tracking#18721
1fanwang wants to merge 1 commit into
apache:masterfrom
1fanwang:fix/HUDI-9624-warn-event-time-watermark

Conversation

@1fanwang
Copy link
Copy Markdown

@1fanwang 1fanwang commented May 12, 2026

Describe the issue this Pull Request addresses

Closes #17105 / HUDI-9624.

When a user sets hoodie.payload.event.time.field to opt into event-time semantics but leaves hoodie.write.track.event.time.watermark at its default false, event-time watermark metadata is silently absent from commit metadata. The user gets no signal that their intent to track event time was dropped.

Summary and Changelog

  • HoodieWriteConfig.Builder.validate(): warn when the user has configured an event-time field but not enabled watermark tracking. The warning names both keys and tells the user how to opt in.
  • TestHoodieWriteConfig: three new tests cover (a) warning fires when only the event-time field is set, (b) no warning when both keys are set, (c) no warning when neither is set.

Impact

Strictly informational — no behavior change to commit metadata, no new exceptions thrown. Fires once per HoodieWriteConfig build, alongside the existing advisory warnings in validate().

Risk Level

Low. The check reads only write-config properties (no meta-client access).

Documentation Update

Not required — the warning itself is self-documenting and points the user at the existing config key.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided
  • Adequate tests were added

@github-actions github-actions Bot added the size:S PR with lines of changes in (10, 100] label May 12, 2026
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR adds an informational warning when hoodie.payload.event.time.field is set but hoodie.write.track.event.time.watermark is left at its default false, helping users notice that their event-time tracking intent was silently dropped. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.

cc @yihua

Copy link
Copy Markdown
Collaborator

@lokeshj1703 lokeshj1703 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1fanwang Thanks for working on this! I have added a few minor comments.

Comment on lines +3866 to +3878

// If the user has configured an event-time field but not enabled watermark tracking,
// event-time metadata will be silently absent from commit metadata. Surface a hint
// so the user can opt into tracking via TRACK_EVENT_TIME_WATERMARK.
String eventTimeFieldName = writeConfig.getString(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY);
if (!StringUtils.isNullOrEmpty(eventTimeFieldName)
&& !writeConfig.getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK)) {
log.warn("{}={} is configured but {}={}; event-time watermark metadata will not be tracked. "
+ "Set {}=true to record event-time watermark in commit metadata.",
HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, eventTimeFieldName,
TRACK_EVENT_TIME_WATERMARK.key(), writeConfig.getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK),
TRACK_EVENT_TIME_WATERMARK.key());
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract this logic to a separate method like validateEventTimeConfigs within this function.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c3d3f45 — extracted to validateEventTimeConfigs() called at the end of validate().

Comment on lines +936 to +963
@Test
public void testNoWarnWhenEventTimeFieldSetAndWatermarkTrackingEnabled() {
Properties props = new Properties();
props.setProperty(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, "ts");
props.setProperty(HoodieWriteConfig.TRACK_EVENT_TIME_WATERMARK.key(), "true");
List<LogEvent> events = captureWarnLogsFromBuild(props);
boolean foundHint = events.stream()
.filter(e -> e.getLevel() == Level.WARN)
.map(e -> e.getMessage().getFormattedMessage())
.anyMatch(msg -> msg.contains(HoodieWriteConfig.TRACK_EVENT_TIME_WATERMARK.key())
&& msg.contains("event-time watermark metadata will not be tracked"));
assertFalse(foundHint,
"Expected no warning when both event-time field and watermark tracking are configured");
}

@Test
public void testNoWarnWhenEventTimeFieldNotSet() {
Properties props = new Properties();
// No event-time field configured.
List<LogEvent> events = captureWarnLogsFromBuild(props);
boolean foundHint = events.stream()
.filter(e -> e.getLevel() == Level.WARN)
.map(e -> e.getMessage().getFormattedMessage())
.anyMatch(msg -> msg.contains(HoodieWriteConfig.TRACK_EVENT_TIME_WATERMARK.key())
&& msg.contains("event-time watermark metadata will not be tracked"));
assertFalse(foundHint,
"Expected no warning when event-time field is not configured");
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions can be removed I think. The major functionality is covered by other test case.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c3d3f45 — dropped the two negative-case tests (testNoWarnWhenEventTimeFieldSetAndWatermarkTrackingEnabled, testNoWarnWhenEventTimeFieldNotSet) and inlined the captureWarnLogsFromBuild helper into the one remaining test. Net: 88 added lines down to 30.

When a user sets `hoodie.payload.event.time.field` but leaves
`hoodie.write.track.event.time.watermark` at its default `false`,
event-time metadata is silently absent from commit metadata —
the user's intent to use event-time semantics is dropped without
any signal.

Add a warning at write-config build time that names both the
configured event-time field and the disabled tracking flag, so the
user can opt in explicitly. The check is purely on write-config
properties (no meta-client access needed) and runs once per write
client construction, matching the existing pattern of advisory
warnings in `Builder.validate()`.
@1fanwang 1fanwang force-pushed the fix/HUDI-9624-warn-event-time-watermark branch from 27c6c25 to c3d3f45 Compare May 14, 2026 17:21
Copy link
Copy Markdown
Contributor

@hudi-agent hudi-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Thanks for the contribution! This PR adds a one-time advisory warning during HoodieWriteConfig validation when hoodie.payload.event.time.field is configured but hoodie.write.track.event.time.watermark is left at its default false, so users get a signal that their event-time intent would otherwise be silently dropped. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small naming and code-clarity suggestions below.

cc @yihua

validateEventTimeConfigs();
}

private void validateEventTimeConfigs() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: validateEventTimeConfigs reads like it enforces a constraint and throws on failure — which is what other validate* methods in this builder typically do. Since this one only logs a warning, a name like warnIfEventTimeWatermarkNotTracked would more accurately signal the intent to future readers.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

log.warn("{}={} is configured but {}={}; event-time watermark metadata will not be tracked. "
+ "Set {}=true to record event-time watermark in commit metadata.",
HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, eventTimeFieldName,
TRACK_EVENT_TIME_WATERMARK.key(), writeConfig.getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 nit: getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK) is called again here, but inside this branch it's always false (that's what the if condition guards on). Extracting it to a local boolean trackWatermark = ... before the if and reusing it in both places would avoid the redundant config lookup and make the relationship explicit.

- AI-generated; verify before applying. React 👍/👎 to flag quality.

@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Inform User of setting event_time metadata

4 participants