Skip to content

Enhance data descriptor handling by adding length check#233

Open
johan-tribus wants to merge 1 commit into
101arrowz:masterfrom
johan-tribus:deflate-patch
Open

Enhance data descriptor handling by adding length check#233
johan-tribus wants to merge 1 commit into
101arrowz:masterfrom
johan-tribus:deflate-patch

Conversation

@johan-tribus
Copy link
Copy Markdown

Previously, a chunk could coincidentally contain a data descriptor signature in an unexpected position, causing the decoder to halt and throw an unexpected EOF error when reading a .zip stream archive. This update introduces a length check on the compressed size, allowing the decoder to continue processing if there are still bytes left to be deflated.

Previously, a chunk could coincidentally contain a data descriptor
signature in an unexpected position, causing the decoder to halt and
throw an unexpected EOF error when reading a .zip stream archive. This
update introduces a length check on the compressed size, allowing the
decoder to continue processing if there are still bytes left to be
deflated.
@101arrowz
Copy link
Copy Markdown
Owner

The reason I didn't do this to begin with is that ZIPs that are compressed in a streaming way have the length in the local header set to 0 AFAIK. Does this still work with such files? If you aren't able to test this I will do so in a few days. Thanks for your patience on this!

@cyfung1031
Copy link
Copy Markdown

cyfung1031 commented May 31, 2026

This should be suppressed by #275


This is the right direction. Just #275 is a much comprehensive solution to avoid the false detection of ZIP signature

ZIPs that are compressed in a streaming way have the length in the local header set to 0 AFAIK.

#275 does not rely on the compressed size from the local header, which can be zero/unknown for streaming ZIPs. It only validates candidate boundaries using the compressed size stored in the data descriptor, and this.b accounts for bytes already flushed across previous chunks.

So streaming ZIPs with zero local-header sizes should still work, provided their data descriptor contains the actual compressed size as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants