Skip to content

[PLUGIN-1950] Log warning when a table has no records to read#73

Open
psainics wants to merge 2 commits intodata-integrations:developfrom
cloudsufi:feat/plugin-1950
Open

[PLUGIN-1950] Log warning when a table has no records to read#73
psainics wants to merge 2 commits intodata-integrations:developfrom
cloudsufi:feat/plugin-1950

Conversation

@psainics
Copy link
Copy Markdown
Contributor

@psainics psainics commented Apr 4, 2026

Log warning when a table has no records to read

Jira : Plugin-1950

Description

When the multi-table source reads from tables that contain zero rows, the lack of output can be confusing to debug. This adds a WARN-level log line with the table name in both DBTableRecordReader and SQLStatementRecordReader so operators can quickly identify empty tables.

Test setup

  • 3 tables table_r0 table_r1 table_r3, suffix represents the number of records in the table

Case 1) Custom SQL Statements

image image image

Case 2) Table Allow List / Table Block List

image image image

Case 3) Table Block List (skip empty table) (no empty table log expected)

image image image

When the multi-table source reads from tables that contain zero rows,
the lack of output can be confusing to debug. This adds a WARN-level
log line with the table name in both DBTableRecordReader and
SQLStatementRecordReader so operators can quickly identify empty tables.
@psainics psainics self-assigned this Apr 6, 2026
@psainics psainics added the build label Apr 6, 2026
@psainics
Copy link
Copy Markdown
Contributor Author

psainics commented Apr 6, 2026

image

}
if (!results.next()) {
if (pos == 0) {
LOG.warn("Table '{}' had no records to read.", tableName.getTable());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 100% safe only when there is 1 split.

Imagine a table with 2,000 records, where the primary key ID has a massive gap. The records exist from ID = 1 to 1000, and ID = 3000 to 4000.

If your job calculates splits in chunks of 1000 IDs, you might get the following splits:

Split 1 (ID 1 - 1000): Has 1000 records.

Split 2 (ID 1001 - 2000): Has 0 records (Empty split).

Split 3 (ID 2001 - 3000): Has 0 records (Empty split).

Split 4 (ID 3001 - 4000): Has 1000 records.

When the RecordReader for Split 2 runs, its query returns an empty ResultSet. Since it is the first read attempt for that split (pos == 0), it will log: Table 'MyTable' had no records to read. even though the table actually contains 2,000 records.

So there is this corner case.

I am not sure if we can solve at a split level unless we make sure the no of splits is 1. or run a SELECT 1 FROM somewhere at a higher level.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sql query is single split, for table i am logging for each split.

- DBTableRecordReader: log table name and split query so users can
  identify which table and range returned no records
- SQLStatementRecordReader: log the SQL statement instead of the
  resolved table name for clarity
@psainics psainics requested a review from sahusanket April 7, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants