[HUDI-9624] Warn when event-time field is set without watermark tracking#18721
[HUDI-9624] Warn when event-time field is set without watermark tracking#187211fanwang wants to merge 1 commit into
Conversation
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR adds an informational warning when hoodie.payload.event.time.field is set but hoodie.write.track.event.time.watermark is left at its default false, helping users notice that their event-time tracking intent was silently dropped. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
lokeshj1703
left a comment
There was a problem hiding this comment.
@1fanwang Thanks for working on this! I have added a few minor comments.
|
|
||
| // If the user has configured an event-time field but not enabled watermark tracking, | ||
| // event-time metadata will be silently absent from commit metadata. Surface a hint | ||
| // so the user can opt into tracking via TRACK_EVENT_TIME_WATERMARK. | ||
| String eventTimeFieldName = writeConfig.getString(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY); | ||
| if (!StringUtils.isNullOrEmpty(eventTimeFieldName) | ||
| && !writeConfig.getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK)) { | ||
| log.warn("{}={} is configured but {}={}; event-time watermark metadata will not be tracked. " | ||
| + "Set {}=true to record event-time watermark in commit metadata.", | ||
| HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, eventTimeFieldName, | ||
| TRACK_EVENT_TIME_WATERMARK.key(), writeConfig.getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK), | ||
| TRACK_EVENT_TIME_WATERMARK.key()); | ||
| } |
There was a problem hiding this comment.
We can extract this logic to a separate method like validateEventTimeConfigs within this function.
There was a problem hiding this comment.
Done in c3d3f45 — extracted to validateEventTimeConfigs() called at the end of validate().
| @Test | ||
| public void testNoWarnWhenEventTimeFieldSetAndWatermarkTrackingEnabled() { | ||
| Properties props = new Properties(); | ||
| props.setProperty(HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, "ts"); | ||
| props.setProperty(HoodieWriteConfig.TRACK_EVENT_TIME_WATERMARK.key(), "true"); | ||
| List<LogEvent> events = captureWarnLogsFromBuild(props); | ||
| boolean foundHint = events.stream() | ||
| .filter(e -> e.getLevel() == Level.WARN) | ||
| .map(e -> e.getMessage().getFormattedMessage()) | ||
| .anyMatch(msg -> msg.contains(HoodieWriteConfig.TRACK_EVENT_TIME_WATERMARK.key()) | ||
| && msg.contains("event-time watermark metadata will not be tracked")); | ||
| assertFalse(foundHint, | ||
| "Expected no warning when both event-time field and watermark tracking are configured"); | ||
| } | ||
|
|
||
| @Test | ||
| public void testNoWarnWhenEventTimeFieldNotSet() { | ||
| Properties props = new Properties(); | ||
| // No event-time field configured. | ||
| List<LogEvent> events = captureWarnLogsFromBuild(props); | ||
| boolean foundHint = events.stream() | ||
| .filter(e -> e.getLevel() == Level.WARN) | ||
| .map(e -> e.getMessage().getFormattedMessage()) | ||
| .anyMatch(msg -> msg.contains(HoodieWriteConfig.TRACK_EVENT_TIME_WATERMARK.key()) | ||
| && msg.contains("event-time watermark metadata will not be tracked")); | ||
| assertFalse(foundHint, | ||
| "Expected no warning when event-time field is not configured"); | ||
| } |
There was a problem hiding this comment.
These functions can be removed I think. The major functionality is covered by other test case.
There was a problem hiding this comment.
Done in c3d3f45 — dropped the two negative-case tests (testNoWarnWhenEventTimeFieldSetAndWatermarkTrackingEnabled, testNoWarnWhenEventTimeFieldNotSet) and inlined the captureWarnLogsFromBuild helper into the one remaining test. Net: 88 added lines down to 30.
When a user sets `hoodie.payload.event.time.field` but leaves `hoodie.write.track.event.time.watermark` at its default `false`, event-time metadata is silently absent from commit metadata — the user's intent to use event-time semantics is dropped without any signal. Add a warning at write-config build time that names both the configured event-time field and the disabled tracking flag, so the user can opt in explicitly. The check is purely on write-config properties (no meta-client access needed) and runs once per write client construction, matching the existing pattern of advisory warnings in `Builder.validate()`.
27c6c25 to
c3d3f45
Compare
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR adds a one-time advisory warning during HoodieWriteConfig validation when hoodie.payload.event.time.field is configured but hoodie.write.track.event.time.watermark is left at its default false, so users get a signal that their event-time intent would otherwise be silently dropped. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. A couple of small naming and code-clarity suggestions below.
cc @yihua
| validateEventTimeConfigs(); | ||
| } | ||
|
|
||
| private void validateEventTimeConfigs() { |
There was a problem hiding this comment.
🤖 nit: validateEventTimeConfigs reads like it enforces a constraint and throws on failure — which is what other validate* methods in this builder typically do. Since this one only logs a warning, a name like warnIfEventTimeWatermarkNotTracked would more accurately signal the intent to future readers.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
| log.warn("{}={} is configured but {}={}; event-time watermark metadata will not be tracked. " | ||
| + "Set {}=true to record event-time watermark in commit metadata.", | ||
| HoodiePayloadProps.PAYLOAD_EVENT_TIME_FIELD_PROP_KEY, eventTimeFieldName, | ||
| TRACK_EVENT_TIME_WATERMARK.key(), writeConfig.getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK), |
There was a problem hiding this comment.
🤖 nit: getBooleanOrDefault(TRACK_EVENT_TIME_WATERMARK) is called again here, but inside this branch it's always false (that's what the if condition guards on). Extracting it to a local boolean trackWatermark = ... before the if and reusing it in both places would avoid the redundant config lookup and make the relationship explicit.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Describe the issue this Pull Request addresses
Closes #17105 / HUDI-9624.
When a user sets
hoodie.payload.event.time.fieldto opt into event-time semantics but leaveshoodie.write.track.event.time.watermarkat its defaultfalse, event-time watermark metadata is silently absent from commit metadata. The user gets no signal that their intent to track event time was dropped.Summary and Changelog
HoodieWriteConfig.Builder.validate(): warn when the user has configured an event-time field but not enabled watermark tracking. The warning names both keys and tells the user how to opt in.TestHoodieWriteConfig: three new tests cover (a) warning fires when only the event-time field is set, (b) no warning when both keys are set, (c) no warning when neither is set.Impact
Strictly informational — no behavior change to commit metadata, no new exceptions thrown. Fires once per
HoodieWriteConfigbuild, alongside the existing advisory warnings invalidate().Risk Level
Low. The check reads only write-config properties (no meta-client access).
Documentation Update
Not required — the warning itself is self-documenting and points the user at the existing config key.
Contributor's checklist