Skip to content

fix fd leak in GenerateDataFromFile on send error#4577

Open
cyyever wants to merge 2 commits into
NVIDIA:mainfrom
cyyever:fix/binary-proto-fd-leak
Open

fix fd leak in GenerateDataFromFile on send error#4577
cyyever wants to merge 2 commits into
NVIDIA:mainfrom
cyyever:fix/binary-proto-fd-leak

Conversation

@cyyever
Copy link
Copy Markdown
Contributor

@cyyever cyyever commented May 12, 2026

Description

This PR fixes fd leak in GenerateDataFromFile on send error.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR fixes a file-descriptor leak in GenerateDataFromFile where the fd was only closed inside generate() on EOF, leaving it open whenever sender.sendall() raised mid-stream. The fix wraps send_binary_data in closing(generator), moves cleanup to a new close() hook on DataGenerator, and also replaces os.stat(file_name) with os.fstat(self.file.fileno()) to eliminate a TOCTOU window.

  • DataGenerator gains an optional, no-op close() override; GenerateDataFromFile implements it as self.file.close(), which is idempotent and safe for repeated calls.
  • Three new unit tests verify the fd is closed on success, on a sender exception, and on a body-size mismatch — covering all exit paths of send_binary_data.

Confidence Score: 5/5

Safe to merge — targeted resource-cleanup fix with no behavioural side effects on the happy path.

The refactor only adds a guaranteed-close guarantee via closing(generator) and replaces the in-generate() close with a dedicated close() hook; all existing send/receive logic is preserved. The three new tests exercise every exit path of send_binary_data and confirm the fd is closed in each case.

No files require special attention.

Important Files Changed

Filename Overview
nvflare/fuel/hci/binary_proto.py Adds close() hook to DataGenerator base class, wraps send_binary_data body in closing(generator), and refactors GenerateDataFromFile to close the file via close() instead of inside generate(); also swaps os.stat for os.fstat to avoid a TOCTOU race.
tests/unit_test/fuel/hci/binary_proto_test.py Adds three targeted tests covering fd-close behaviour on success, on mid-stream sender failure, and on body-size mismatch; uses FailingSender helper to inject failures at a configurable call index.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant send_binary_data
    participant closing ctx
    participant GenerateDataFromFile
    participant Sender

    Caller->>send_binary_data: send_binary_data(sender, generator, meta)
    send_binary_data->>closing ctx: __enter__(generator)
    send_binary_data->>GenerateDataFromFile: data_size()
    send_binary_data->>Sender: sendall(BINARY_MARKER)
    send_binary_data->>Sender: sendall(header_bytes)
    send_binary_data->>Sender: sendall(meta_bytes)
    loop until generate() returns empty
        send_binary_data->>GenerateDataFromFile: generate()
        GenerateDataFromFile-->>send_binary_data: chunk
        send_binary_data->>Sender: sendall(chunk)
        note over Sender: raises IOError on failure
    end
    send_binary_data->>Sender: sendall(footer_bytes)
    closing ctx->>GenerateDataFromFile: close() [always, even on exception]
    GenerateDataFromFile->>GenerateDataFromFile: self.file.close()
    send_binary_data-->>Caller: sent_body_size
Loading

Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/binary-prot..." | Re-trigger Greptile

@pcnudde
Copy link
Copy Markdown
Collaborator

pcnudde commented May 12, 2026

if commits are not signed we can not merge them.

Signed-off-by: cyy <cyyever@outlook.com>
@cyyever cyyever force-pushed the fix/binary-proto-fd-leak branch from 16e0c12 to e3ff894 Compare May 14, 2026 00:38
@cyyever
Copy link
Copy Markdown
Contributor Author

cyyever commented May 14, 2026

@pcnudde Signed it.

@chesterxgchen chesterxgchen requested a review from nvidianz May 19, 2026 20:33
@chesterxgchen
Copy link
Copy Markdown
Collaborator

  1. One observation: test_fd_closed_on_body_size_mismatch mutates gen.size = 999_999 after construction to
    fake a mismatch. This works and is acceptable for this kind of unit test, but it's relying on
    implementation internals. A comment noting why would help future readers (# force size mismatch to
    exercise the post-loop RuntimeError path).

The suggestion is a one-line comment in test_fd_closed_on_body_size_mismatch to explain the gen.size
mutation.

  1. nvflare/fuel/hci/binary_proto.py:394: with closing(generator) makes send_binary_data() require every generator object to implement close(). Before this PR, the runtime contract was effectively duck-
    typed: anything with data_size() and generate() worked. A custom generator that does not subclass DataGenerator can now successfully send the full payload, then fail on exit with AttributeError: ... has no attribute 'close'. To preserve compatibility, use a try/finally and call close() only when present:

    close = getattr(generator, "close", None)
    try:
    ...
    finally:
    if close:
    close()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants