Skip to content

fix(client): prevent resource exhaustion from long-lived DNS sessions#2724

Open
fortuna wants to merge 14 commits intomasterfrom
fix/dns-session-resource-leak
Open

fix(client): prevent resource exhaustion from long-lived DNS sessions#2724
fortuna wants to merge 14 commits intomasterfrom
fix/dns-session-resource-leak

Conversation

@fortuna
Copy link
Copy Markdown
Collaborator

@fortuna fortuna commented Mar 14, 2026

DNS is one-shot (one query, one response), but each forwarded DNS query was holding a transport session open until the 30-second write-idle timeout. Under sustained DNS load this causes session accumulation and eventual resource exhaustion, breaking VPN connectivity.

See dnsintercept/README.md for the context to help review this PR.

Two fixes:

  • forward: close the transport session immediately after delivering the first DNS response, instead of waiting for the idle timeout.
  • truncate: create the base transport session lazily, only on the first non-DNS packet. DNS-only flows never open a base session at all.

Should fix #2679

@fortuna fortuna changed the title fix(dnsintercept): prevent resource exhaustion from long-lived DNS se… fix(client): prevent resource exhaustion from long-lived DNS sessions Mar 14, 2026
@github-actions github-actions bot added size/L and removed size/M labels Mar 16, 2026
@fortuna fortuna requested review from Copilot, jyyi1 and ohnorobo March 16, 2026 03:59
@fortuna fortuna marked this pull request as ready for review March 16, 2026 03:59
@fortuna fortuna requested a review from a team as a code owner March 16, 2026 03:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes transport session buildup caused by DNS-over-UDP interception by ensuring DNS-only traffic does not keep long-lived underlying transport sessions open, preventing resource exhaustion and VPN connectivity failures (see #2679).

Changes:

  • Forward mode: close the underlying UDP transport session immediately after the first DNS response is delivered.
  • Truncate mode: lazily create the base transport session only when the first non-DNS packet is sent (DNS-only flows never create it).
  • Add targeted regression tests and package-level documentation explaining the DNS interception model.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
client/go/outline/dnsintercept/forward.go Closes the underlying session after first resolver response (adds sync.Once-guarded close).
client/go/outline/dnsintercept/forward_test.go Adds assertions/tests for early-close behavior and non-DNS response behavior.
client/go/outline/dnsintercept/truncate.go Defers base session creation until first non-DNS packet; adds locking to manage lazy init.
client/go/outline/dnsintercept/truncate_test.go Adds DNS-only regression test ensuring no base session allocation occurs.
client/go/outline/dnsintercept/README.md New documentation describing abstractions, modes, and dynamic switching.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses transport session accumulation caused by UDP DNS interception by ensuring DNS-related sessions don’t stay open longer than necessary, reducing the risk of resource exhaustion under sustained DNS load in the Outline client.

Changes:

  • Forward mode: close the underlying transport session immediately after delivering the first DNS response.
  • Truncate mode: lazily create the base transport session only when the first non-DNS packet is observed (DNS-only flows never allocate it).
  • Add/extend tests and add a package README documenting the DNS interception architecture and modes.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
client/go/outline/dnsintercept/forward.go Adds early-close logic for forwarded DNS responses using synchronization to avoid multiple close attempts from the resp path.
client/go/outline/dnsintercept/forward_test.go Adds assertions/tests for early-close behavior and ensuring non-DNS responses don’t trigger early close.
client/go/outline/dnsintercept/truncate.go Defers base session creation until first non-DNS packet; DNS-only sessions stay local.
client/go/outline/dnsintercept/truncate_test.go Adds a DNS-only test ensuring no base session is created and Close remains safe.
client/go/outline/dnsintercept/README.md Documents how interception works, including forward/truncate modes and switching behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown
Contributor

@jyyi1 jyyi1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM, with a small performance related comment.

return req.trunc.WriteTo(p, destination)
}
return req.PacketRequestSender.WriteTo(p, destination)
req.mu.Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the lock cause any performance degradation? Probably use a double-lock pattern?

@ohnorobo ohnorobo force-pushed the fix/dns-session-resource-leak branch from 9496877 to 2baea52 Compare April 2, 2026 09:34
@ohnorobo
Copy link
Copy Markdown
Contributor

ohnorobo commented Apr 2, 2026

rebased off of master to pull in #2735

fortuna and others added 6 commits April 2, 2026 17:08
…ssions

DNS is one-shot (one query, one response), but each forwarded DNS query
was holding a transport session open until the 30-second write-idle
timeout.  Under sustained DNS load this causes session accumulation and
eventual resource exhaustion, breaking VPN connectivity.

Two fixes:
- forward: close the transport session immediately after delivering the
  first DNS response, instead of waiting for the idle timeout.
- truncate: create the base transport session lazily, only on the first
  non-DNS packet.  DNS-only flows never open a base session at all.

Fixes #2679

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Includes explanations of the PacketProxy/PacketRequestSender/
PacketResponseReceiver abstractions, forward and truncate modes,
dynamic switching, and Mermaid diagrams for each.

Also adds field comments to forwardPacketRespReceiver and
truncatePacketReqSender per code review feedback.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Use <br/> for line breaks (not \n) and replace subgraph with a
hexagon node to avoid label overlap.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- forward: guard sender field with a mutex to satisfy the Go memory
  model. The reader goroutine started by base.NewSession is launched
  before sender is assigned, so without synchronization the write is
  not guaranteed to be visible to WriteFrom.
- truncate: use errors.Join to propagate errors from both base.Close
  and trunc.Close instead of silently dropping the trunc error.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
If a DNS response arrives before NewSession sets wrapper.sender, once.Do
would fire with sender == nil, skip the close, and never run again —
leaking the session until the idle timeout.

Fix: record pendingClose = true inside once.Do when sender is nil.
NewSession checks this flag immediately after setting sender and closes
it if needed. The mutex guarantees the two critical sections are ordered:
either sender is visible to once.Do, or pendingClose is visible to
NewSession.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
WriteFrom can only fire after the caller has sent a packet via WriteTo,
which requires having the PacketRequestSender, which is only returned
after NewSession sets sender. So sender is always non-nil when WriteFrom
runs, and pendingClose is unnecessary.

The mutex is still required: without it, the Go memory model does not
guarantee the reader goroutine (started inside base.NewSession) sees the
write to sender, since the goroutine start happens before the assignment.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@ohnorobo ohnorobo force-pushed the fix/dns-session-resource-leak branch from 2baea52 to 3746475 Compare April 2, 2026 15:08
@ohnorobo
Copy link
Copy Markdown
Contributor

ohnorobo commented Apr 2, 2026

rebased again to get #2738

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Outline client’s DNS interception layer to avoid transport session buildup under sustained DNS load by (1) closing forwarded DNS UDP sessions immediately after the first response, and (2) avoiding creation of a base transport session entirely for DNS-only truncate-mode flows.

Changes:

  • Forward mode: close the underlying UDP transport session right after delivering the first DNS response.
  • Truncate mode: lazily create the base UDP session only when the first non-DNS packet is sent.
  • Rename DNS intercept wrappers to NewDNSRedirect* / NewDNSTruncate* and update call sites + tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
client/go/outline/dnsintercept/forward.go Renames redirect wrappers and adds “close-after-first-DNS-response” behavior to prevent long-lived sessions.
client/go/outline/dnsintercept/forward_test.go Updates tests for new APIs and adds assertions around early-close behavior.
client/go/outline/dnsintercept/truncate.go Renames truncate wrapper and defers base session creation until non-DNS traffic appears.
client/go/outline/dnsintercept/truncate_test.go Adds DNS-only test to ensure no base session is created in truncate mode.
client/go/outline/dnsintercept/README.md New documentation describing DNS intercept behavior and modes.
client/go/outline/configregistry/outline_dns_intercept.go Switches wiring to the new DNS intercept constructors and documents forward vs truncate behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +107 to +116
func (resp *dnsRedirectPacketRespReceiver) WriteFrom(p []byte, source net.Addr) (int, error) {
if addr, ok := source.(*net.UDPAddr); ok && isEquivalentAddrPort(addr.AddrPort(), resp.fpp.resolverRemoteAddr) {
source = net.UDPAddrFromAddrPort(resp.fpp.resolverLinkLocalAddr)
n, err := resp.PacketResponseReceiver.WriteFrom(p, source)
resp.once.Do(func() {
resp.mu.Lock()
s := resp.sender
resp.mu.Unlock()
s.Close()
})
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dnsRedirectPacketRespReceiver.WriteFrom, resp.sender can still be nil (e.g., if the underlying PacketProxy starts delivering packets before NewSession assigns wrapper.sender). In that case s.Close() will panic. Consider ensuring sender is set before any callbacks can run (or make the close path tolerate nil / defer closing until sender becomes available).

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +114
func (req *dnsTruncatePacketReqSender) Close() (err error) {
req.mu.Lock()
defer req.mu.Unlock()
if req.base != nil {
err = req.base.Close()
}
return errors.Join(err, req.trunc.Close())
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dnsTruncatePacketReqSender.Close holds req.mu while calling into req.base.Close() and req.trunc.Close(). Calling external Close methods under a mutex can block other callers and risks lock-order issues if those Close implementations call back into this object. Prefer copying the senders to locals, clearing req.base under the lock, then unlocking before invoking Close().

Suggested change
func (req *dnsTruncatePacketReqSender) Close() (err error) {
req.mu.Lock()
defer req.mu.Unlock()
if req.base != nil {
err = req.base.Close()
}
return errors.Join(err, req.trunc.Close())
func (req *dnsTruncatePacketReqSender) Close() error {
req.mu.Lock()
base := req.base
trunc := req.trunc
req.base = nil
req.mu.Unlock()
var err error
if base != nil {
err = base.Close()
}
if trunc != nil {
err = errors.Join(err, trunc.Close())
}
return err

Copilot uses AI. Check for mistakes.
base, err := fpp.base.NewSession(&forwardPacketRespReceiver{resp, fpp})
func (fpp *dnsRedirectPacketProxy) NewSession(resp network.PacketResponseReceiver) (_ network.PacketRequestSender, err error) {
wrapper := &dnsRedirectPacketRespReceiver{PacketResponseReceiver: resp, fpp: fpp}
baseSender, err := fpp.baseProxy.NewSession(wrapper)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical of this NewSession. I think we need to do it lazily too, like we are doing for truncate.

But this also made me realize that the layering is probably wrong. We probably want a top level Packet Proxy that dispatches to two PPs: a DNS PP and a "Default" PP. That way we can share the dispatching and NAT logic.

We may want to borrow the logic from the server: https://github.com/OutlineFoundation/tunnel-server/blob/4d09f750827738d21432095a46e455d24e172109/service/udp.go#L438

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to merge it as is (it's a strict improvement over master), and work on the further fix in a follow up PR.

@ohnorobo @jyyi1 thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also like to get this merged in (we can release in beta to test viability on apple platforms) and then do the further refactoring in a follow up.

@fortuna fortuna requested a review from jyyi1 April 2, 2026 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

V 1.19 brake connections

4 participants