fix(client): prevent resource exhaustion from long-lived DNS sessions#2724
fix(client): prevent resource exhaustion from long-lived DNS sessions#2724
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes transport session buildup caused by DNS-over-UDP interception by ensuring DNS-only traffic does not keep long-lived underlying transport sessions open, preventing resource exhaustion and VPN connectivity failures (see #2679).
Changes:
- Forward mode: close the underlying UDP transport session immediately after the first DNS response is delivered.
- Truncate mode: lazily create the base transport session only when the first non-DNS packet is sent (DNS-only flows never create it).
- Add targeted regression tests and package-level documentation explaining the DNS interception model.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
client/go/outline/dnsintercept/forward.go |
Closes the underlying session after first resolver response (adds sync.Once-guarded close). |
client/go/outline/dnsintercept/forward_test.go |
Adds assertions/tests for early-close behavior and non-DNS response behavior. |
client/go/outline/dnsintercept/truncate.go |
Defers base session creation until first non-DNS packet; adds locking to manage lazy init. |
client/go/outline/dnsintercept/truncate_test.go |
Adds DNS-only regression test ensuring no base session allocation occurs. |
client/go/outline/dnsintercept/README.md |
New documentation describing abstractions, modes, and dynamic switching. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
This PR addresses transport session accumulation caused by UDP DNS interception by ensuring DNS-related sessions don’t stay open longer than necessary, reducing the risk of resource exhaustion under sustained DNS load in the Outline client.
Changes:
- Forward mode: close the underlying transport session immediately after delivering the first DNS response.
- Truncate mode: lazily create the base transport session only when the first non-DNS packet is observed (DNS-only flows never allocate it).
- Add/extend tests and add a package README documenting the DNS interception architecture and modes.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| client/go/outline/dnsintercept/forward.go | Adds early-close logic for forwarded DNS responses using synchronization to avoid multiple close attempts from the resp path. |
| client/go/outline/dnsintercept/forward_test.go | Adds assertions/tests for early-close behavior and ensuring non-DNS responses don’t trigger early close. |
| client/go/outline/dnsintercept/truncate.go | Defers base session creation until first non-DNS packet; DNS-only sessions stay local. |
| client/go/outline/dnsintercept/truncate_test.go | Adds a DNS-only test ensuring no base session is created and Close remains safe. |
| client/go/outline/dnsintercept/README.md | Documents how interception works, including forward/truncate modes and switching behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
jyyi1
left a comment
There was a problem hiding this comment.
The code LGTM, with a small performance related comment.
| return req.trunc.WriteTo(p, destination) | ||
| } | ||
| return req.PacketRequestSender.WriteTo(p, destination) | ||
| req.mu.Lock() |
There was a problem hiding this comment.
Will the lock cause any performance degradation? Probably use a double-lock pattern?
9496877 to
2baea52
Compare
|
rebased off of master to pull in #2735 |
…ssions DNS is one-shot (one query, one response), but each forwarded DNS query was holding a transport session open until the 30-second write-idle timeout. Under sustained DNS load this causes session accumulation and eventual resource exhaustion, breaking VPN connectivity. Two fixes: - forward: close the transport session immediately after delivering the first DNS response, instead of waiting for the idle timeout. - truncate: create the base transport session lazily, only on the first non-DNS packet. DNS-only flows never open a base session at all. Fixes #2679 Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Includes explanations of the PacketProxy/PacketRequestSender/ PacketResponseReceiver abstractions, forward and truncate modes, dynamic switching, and Mermaid diagrams for each. Also adds field comments to forwardPacketRespReceiver and truncatePacketReqSender per code review feedback. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Use <br/> for line breaks (not \n) and replace subgraph with a hexagon node to avoid label overlap. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- forward: guard sender field with a mutex to satisfy the Go memory model. The reader goroutine started by base.NewSession is launched before sender is assigned, so without synchronization the write is not guaranteed to be visible to WriteFrom. - truncate: use errors.Join to propagate errors from both base.Close and trunc.Close instead of silently dropping the trunc error. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
If a DNS response arrives before NewSession sets wrapper.sender, once.Do would fire with sender == nil, skip the close, and never run again — leaking the session until the idle timeout. Fix: record pendingClose = true inside once.Do when sender is nil. NewSession checks this flag immediately after setting sender and closes it if needed. The mutex guarantees the two critical sections are ordered: either sender is visible to once.Do, or pendingClose is visible to NewSession. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
WriteFrom can only fire after the caller has sent a packet via WriteTo, which requires having the PacketRequestSender, which is only returned after NewSession sets sender. So sender is always non-nil when WriteFrom runs, and pendingClose is unnecessary. The mutex is still required: without it, the Go memory model does not guarantee the reader goroutine (started inside base.NewSession) sees the write to sender, since the goroutine start happens before the assignment. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
2baea52 to
3746475
Compare
|
rebased again to get #2738 |
There was a problem hiding this comment.
Pull request overview
This PR updates the Outline client’s DNS interception layer to avoid transport session buildup under sustained DNS load by (1) closing forwarded DNS UDP sessions immediately after the first response, and (2) avoiding creation of a base transport session entirely for DNS-only truncate-mode flows.
Changes:
- Forward mode: close the underlying UDP transport session right after delivering the first DNS response.
- Truncate mode: lazily create the base UDP session only when the first non-DNS packet is sent.
- Rename DNS intercept wrappers to
NewDNSRedirect*/NewDNSTruncate*and update call sites + tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| client/go/outline/dnsintercept/forward.go | Renames redirect wrappers and adds “close-after-first-DNS-response” behavior to prevent long-lived sessions. |
| client/go/outline/dnsintercept/forward_test.go | Updates tests for new APIs and adds assertions around early-close behavior. |
| client/go/outline/dnsintercept/truncate.go | Renames truncate wrapper and defers base session creation until non-DNS traffic appears. |
| client/go/outline/dnsintercept/truncate_test.go | Adds DNS-only test to ensure no base session is created in truncate mode. |
| client/go/outline/dnsintercept/README.md | New documentation describing DNS intercept behavior and modes. |
| client/go/outline/configregistry/outline_dns_intercept.go | Switches wiring to the new DNS intercept constructors and documents forward vs truncate behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (resp *dnsRedirectPacketRespReceiver) WriteFrom(p []byte, source net.Addr) (int, error) { | ||
| if addr, ok := source.(*net.UDPAddr); ok && isEquivalentAddrPort(addr.AddrPort(), resp.fpp.resolverRemoteAddr) { | ||
| source = net.UDPAddrFromAddrPort(resp.fpp.resolverLinkLocalAddr) | ||
| n, err := resp.PacketResponseReceiver.WriteFrom(p, source) | ||
| resp.once.Do(func() { | ||
| resp.mu.Lock() | ||
| s := resp.sender | ||
| resp.mu.Unlock() | ||
| s.Close() | ||
| }) |
There was a problem hiding this comment.
In dnsRedirectPacketRespReceiver.WriteFrom, resp.sender can still be nil (e.g., if the underlying PacketProxy starts delivering packets before NewSession assigns wrapper.sender). In that case s.Close() will panic. Consider ensuring sender is set before any callbacks can run (or make the close path tolerate nil / defer closing until sender becomes available).
| func (req *dnsTruncatePacketReqSender) Close() (err error) { | ||
| req.mu.Lock() | ||
| defer req.mu.Unlock() | ||
| if req.base != nil { | ||
| err = req.base.Close() | ||
| } | ||
| return errors.Join(err, req.trunc.Close()) |
There was a problem hiding this comment.
dnsTruncatePacketReqSender.Close holds req.mu while calling into req.base.Close() and req.trunc.Close(). Calling external Close methods under a mutex can block other callers and risks lock-order issues if those Close implementations call back into this object. Prefer copying the senders to locals, clearing req.base under the lock, then unlocking before invoking Close().
| func (req *dnsTruncatePacketReqSender) Close() (err error) { | |
| req.mu.Lock() | |
| defer req.mu.Unlock() | |
| if req.base != nil { | |
| err = req.base.Close() | |
| } | |
| return errors.Join(err, req.trunc.Close()) | |
| func (req *dnsTruncatePacketReqSender) Close() error { | |
| req.mu.Lock() | |
| base := req.base | |
| trunc := req.trunc | |
| req.base = nil | |
| req.mu.Unlock() | |
| var err error | |
| if base != nil { | |
| err = base.Close() | |
| } | |
| if trunc != nil { | |
| err = errors.Join(err, trunc.Close()) | |
| } | |
| return err |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
| base, err := fpp.base.NewSession(&forwardPacketRespReceiver{resp, fpp}) | ||
| func (fpp *dnsRedirectPacketProxy) NewSession(resp network.PacketResponseReceiver) (_ network.PacketRequestSender, err error) { | ||
| wrapper := &dnsRedirectPacketRespReceiver{PacketResponseReceiver: resp, fpp: fpp} | ||
| baseSender, err := fpp.baseProxy.NewSession(wrapper) |
There was a problem hiding this comment.
I'm skeptical of this NewSession. I think we need to do it lazily too, like we are doing for truncate.
But this also made me realize that the layering is probably wrong. We probably want a top level Packet Proxy that dispatches to two PPs: a DNS PP and a "Default" PP. That way we can share the dispatching and NAT logic.
We may want to borrow the logic from the server: https://github.com/OutlineFoundation/tunnel-server/blob/4d09f750827738d21432095a46e455d24e172109/service/udp.go#L438
There was a problem hiding this comment.
I would also like to get this merged in (we can release in beta to test viability on apple platforms) and then do the further refactoring in a follow up.
DNS is one-shot (one query, one response), but each forwarded DNS query was holding a transport session open until the 30-second write-idle timeout. Under sustained DNS load this causes session accumulation and eventual resource exhaustion, breaking VPN connectivity.
See dnsintercept/README.md for the context to help review this PR.
Two fixes:
Should fix #2679