fix: InfluxDB line protocol silent data loss and missing gzip support by robfrank · Pull Request #3833 · ArcadeData/arcadedb

robfrank · 2026-04-12T10:10:25Z

Summary

Fixes #3821 (reported in discussion #3819)

Silent data loss: PostTimeSeriesWriteHandler now returns 400 with a descriptive error when all line protocol samples reference unknown measurement types, instead of silently returning 204. Partial success (some known, some unknown) still returns 204 but logs a warning.
Gzip support: The write handler now decompresses Content-Encoding: gzip request bodies before parsing. Telegraf's [[outputs.influxdb]] plugin sends gzip by default, which previously produced garbled "malformed line protocol" errors.
Grammar negative tests: Added regression tests confirming that undocumented CREATE TIMESERIES TYPE syntax (PRECISION NANOSECOND, COMPACTION INTERVAL) is properly rejected with CommandParsingException.

Test plan

CreateTimeSeriesTypeStatementTest - 8 tests (6 existing + 2 new negative grammar tests)
LineProtocolParserTest - 15 existing tests (no regressions)
PostTimeSeriesWriteHandlerIT - 5 tests (3 existing + 2 new: unknownMeasurementTypeReturnsError, gzipCompressedBodyIsAccepted)

🤖 Generated with Claude Code

…#3821) Return 400 instead of silent 204 when line protocol samples reference unknown timeseries types, and decompress Content-Encoding: gzip bodies in the write handler. Adds negative grammar tests for undocumented CREATE TIMESERIES TYPE syntax (PRECISION, COMPACTION INTERVAL). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

gemini-code-assist

Code Review

This pull request fixes critical bugs in the TimeSeries InfluxDB Line Protocol, including silent data loss on unknown measurement types and missing gzip decompression support. It also adds grammar validation tests for the CREATE TIMESERIES TYPE statement. Review feedback identifies a race condition in the asynchronous payload handling and suggests using JSONObject for safer error response generation.

gemini-code-assist · 2026-04-12T10:11:38Z

+    final AtomicReference<byte[]> bytesRef = new AtomicReference<>();
+    e.getRequestReceiver().receiveFullBytes(
+        (exchange, data) -> bytesRef.set(data),
+        (exchange, err) -> {
+          LogManager.instance().log(this, Level.SEVERE, "receiveFullBytes completed with an error: %s", err, err.getMessage());
+          exchange.setStatusCode(StatusCodes.INTERNAL_SERVER_ERROR);
+          exchange.getResponseSender().send("Invalid Request");
+        });
+
+    final byte[] rawBytes = bytesRef.get();
+    if (rawBytes == null) {
+      rawPayload = null;
+      return null;
+    }


The use of receiveFullBytes is asynchronous and does not block the calling thread. Since bytesRef.get() is called immediately after, it will likely return null because the request body has not been fully read yet, leading to an incorrect "Request body is empty" error. Since this handler runs on a worker thread and has called e.startBlocking(), you should use the blocking getInputStream() to read the payload.

final byte[] rawBytes; try { rawBytes = e.getInputStream().readAllBytes(); } catch (final IOException ex) { LogManager.instance().log(this, Level.SEVERE, "Error reading request payload", ex); return null; } if (rawBytes.length == 0) { rawPayload = null; return null; }

Pushback: AbstractServerHttpHandler.parseRequestPayload (lines 63-73) uses this exact same pattern - AtomicReference + receiveFullBytes + .get(). When blocking mode is started and the handler runs on a worker thread, receiveFullBytes completes synchronously before returning. This is the established pattern across all handlers in the codebase.

gemini-code-assist · 2026-04-12T10:11:38Z

+      return new ExecutionResponse(400,
+          "{ \"error\" : \"Unknown timeseries type(s): " + String.join(", ", unknownTypes)
+              + ". Create the type first with CREATE TIMESERIES TYPE.\"}");


Manually constructing JSON strings is error-prone and can lead to invalid JSON if any measurement names contain characters that require escaping (like double quotes). It is safer to use the JSONObject class to build the response body.

Suggested change

return new ExecutionResponse(400,

"{ \"error\" : \"Unknown timeseries type(s): " + String.join(", ", unknownTypes)

+ ". Create the type first with CREATE TIMESERIES TYPE.\"}");

return new ExecutionResponse(400,

new JSONObject().put("error", "Unknown timeseries type(s): " + String.join(", ", unknownTypes)

+ ". Create the type first with CREATE TIMESERIES TYPE.").toString());

Fixed in f02ef8b. Switched to new JSONObject().put("error", ...).toString() to prevent JSON injection from measurement names containing quotes.

codacy-production · 2026-04-12T10:11:54Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 10 complexity

Metric Results

Complexity 10

View in Codacy

🟢 Coverage 67.57% diff coverage · -7.84% coverage variation

Metric Results

Coverage variation ✅ -7.84% coverage variation

Diff coverage ✅ 67.57% diff coverage

View coverage diff in Codacy

Coverage variation details

Coverable lines Covered lines Coverage

Common ancestor commit (598496b) 116361 85639 73.60%

Head commit (f02ef8b) 147613 (+31252) 97073 (+11434) 65.76% (-7.84%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details

Coverable lines Covered lines Diff coverage

Pull request (#3833) 37 25 67.57%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

_{TIP This summary will be updated as you push new changes. Give us feedback}

claude · 2026-04-12T10:13:16Z

Code Review

Good fixes overall - the silent data loss and gzip issues are real bugs worth addressing. A few concerns to address before merging.

Critical: JSON injection in error response

PostTimeSeriesWriteHandler.java line 181 builds JSON via string concatenation using measurement names that come directly from the InfluxDB line protocol request body. A name containing a quote character or backslash will produce malformed or injected JSON.

Use JSONObject as the project convention requires (CLAUDE.md: use com.arcadedb.serializer.json.JSONObject):

return new ExecutionResponse(400,
new JSONObject().put("error",
"Unknown timeseries type(s): " + String.join(", ", unknownTypes)
+ ". Create the type first with CREATE TIMESERIES TYPE.").toString());

Medium: gzip decompression failure returns 500 not 400

The IllegalArgumentException thrown on line 98 propagates to the framework as a 500 Internal Server Error, even though the client sent a malformed gzip body (a 400-level error). Catch it in execute() or return ExecutionResponse(400, ...) instead.

Medium: Planning document committed to repo root

3821-timeseries-line-protocol-bugs.md at the repo root is a work-notes file and should not be committed to the repository. Remove it before merging.

Minor: Silent skip when type exists but is not a LocalTimeSeriesType

When inserted == 0 and unknownTypes.isEmpty(), the code returns 204 silently. This happens when samples match a schema type that is not a LocalTimeSeriesType (the continue on line 146). That branch also hides data loss - worth logging a warning there too, consistent with the unknown-type handling added in this PR.

Pre-existing issue (not introduced by this PR): rawPayload is an instance field on a shared handler. Since Undertow handlers are typically singleton instances, concurrent requests could overwrite each other's rawPayload. Not introduced here but worth tracking.

Positive observations:

Gzip handling correctly checks Content-Encoding (not Content-Type), matching RFC 7231.
LinkedHashSet for unknownTypes preserves insertion order for deterministic error messages.
The unknownMeasurementTypeReturnsError and gzipCompressedBodyIsAccepted integration tests are well-structured.
The negative grammar tests are a good regression net for parser-level contract validation.

…on-timeseries types, remove planning doc - Use JSONObject to build error response body instead of string concatenation, preventing JSON injection from measurement names - Log a warning when samples match types that exist but are not TimeSeries types (consistent with unknown-type handling) - Remove work-tracking doc from repo root Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

robfrank · 2026-04-12T10:20:13Z

Addressed in f02ef8b:

JSON injection: Fixed. Error response now uses JSONObject.put().toString() instead of string concatenation.

gzip 500 vs 400: Pushback. AbstractServerHttpHandler.handleRequest() line 202-206 catches IllegalArgumentException and maps it to sendErrorResponse(exchange, 400, ...). The parseRequestPayload call at line 155 is inside that same try block, so the IllegalArgumentException from gzip decompression failure is already returned as 400.

Planning doc: Removed from the repo.

Non-timeseries type silent skip: Fixed. Added nonTimeSeriesTypes tracking set with a Level.WARNING log, consistent with the unknown-type handling.

rawPayload thread safety: Acknowledged as pre-existing. The parent class has the same pattern (rawPayload is an instance field on a singleton handler). Out of scope for this PR.

claude · 2026-04-12T11:22:19Z

Good fixes overall - the gzip support and silent-data-loss prevention are both real improvements. A few issues worth addressing before merging:

Bug: Silent data loss still possible for non-timeseries types

The 400 error is only returned when unknownTypes is non-empty, but a measurement that exists as a non-timeseries type also silently drops data when it is the only kind of sample. The condition on line 182:

if (inserted == 0 && !unknownTypes.isEmpty())

should cover both skip reasons:

if (inserted == 0 && (!unknownTypes.isEmpty() || !nonTimeSeriesTypes.isEmpty()))

Otherwise a payload whose only measurement is a known-but-wrong type still returns 204 with no data written - the exact silent-data-loss problem this PR is fixing for unknown types. A test covering that path is also missing.

Empty transaction committed before returning 400

When all samples are skipped and a 400 is about to be returned, database.commit() has already been called (line 176) on an empty transaction. It is harmless, but wasteful and slightly misleading. Consider moving the commit inside the inserted > 0 branch or rolling back when there is nothing to commit.

Thread-safety of `rawPayload` instance field

rawPayload is a mutable instance field on what appears to be a singleton handler. Under concurrent requests this is a data race: one request can overwrite another's payload before execute() reads it. This pre-dates the PR, but the PR's new gzip branch depends on this field, so the risk surface is slightly larger now. The cleanest fix is to pass the decompressed string back through the return value and have execute() receive it via the payload parameter (or a ThreadLocal). At minimum it is worth opening a follow-up issue.

Partial-success behaviour is underdocumented

The PR description says "Partial success (some known, some unknown) still returns 204 but logs a warning." That is a policy decision - InfluxDB itself returns 204 with a X-Influxdb-Error header for partial writes. Worth noting in the handler's Javadoc so future maintainers understand why the mixed case was left as a warning rather than an error.

Minor: `gzip.finish()` vs try-with-resources in the test

In gzipCompressedBodyIsAccepted the test calls gzip.finish() inside the try block but the stream is also closed by the try-with-resources - finish() is called twice. It works but is redundant; removing the explicit gzip.finish() call would clean it up.

The grammar regression tests are a nice addition - testing that undocumented syntax is rejected clearly is useful. The overall direction of the PR is correct.

codecov · 2026-04-12T16:19:24Z

Codecov Report

❌ Patch coverage is 48.64865% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.30%. Comparing base (598496b) to head (f02ef8b).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...erver/http/handler/PostTimeSeriesWriteHandler.java	48.64%	12 Missing and 7 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3833      +/-   ##
==========================================
+ Coverage   64.73%   65.30%   +0.56%     
==========================================
  Files        1579     1579              
  Lines      116361   116399      +38     
  Branches    24675    24679       +4     
==========================================
+ Hits        75328    76011     +683     
+ Misses      30751    30063     -688     
- Partials    10282    10325      +43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

gemini-code-assist bot reviewed Apr 12, 2026

View reviewed changes

robfrank added this to the 26.4.1 milestone Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: InfluxDB line protocol silent data loss and missing gzip support#3833

fix: InfluxDB line protocol silent data loss and missing gzip support#3833
robfrank wants to merge 2 commits intomainfrom
fix/3821-timeseries-line-protocol-bugs

robfrank commented Apr 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Uh oh!

robfrank Apr 12, 2026

Uh oh!

gemini-code-assist bot Apr 12, 2026

Uh oh!

robfrank Apr 12, 2026

Uh oh!

codacy-production bot commented Apr 12, 2026 •

edited

Loading

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

robfrank commented Apr 12, 2026 •

edited

Loading

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

codecov bot commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

robfrank commented Apr 12, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

robfrank Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

robfrank Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

codacy-production bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

claude bot commented Apr 12, 2026

Uh oh!

robfrank commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Apr 12, 2026

Bug: Silent data loss still possible for non-timeseries types

Empty transaction committed before returning 400

Thread-safety of rawPayload instance field

Partial-success behaviour is underdocumented

Minor: gzip.finish() vs try-with-resources in the test

Uh oh!

codecov bot commented Apr 12, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production bot commented Apr 12, 2026 •

edited

Loading

robfrank commented Apr 12, 2026 •

edited

Loading

Thread-safety of `rawPayload` instance field

Minor: `gzip.finish()` vs try-with-resources in the test