Skip to content

fix(pam): fix SCP returning exit code 1 despite successful transfer#170

Open
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1775171234-fix-scp-proxy-exit-code
Open

fix(pam): fix SCP returning exit code 1 despite successful transfer#170
devin-ai-integration[bot] wants to merge 2 commits intomainfrom
devin/1775171234-fix-scp-proxy-exit-code

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 2, 2026

Description 📣

Fixes an issue where SCP (and potentially SFTP/rsync) through the PAM SSH proxy returns exit code 1 even when the file transfer succeeds.

Root cause: In handleChannel, the proxy only waited for one direction of the bidirectional data copy to finish before starting channel teardown. For SCP uploads, the client→server copy finishes first (all file data sent), but the remote scp process hasn't exited yet — it still needs to send the exit-status channel request. The old code then waited only 500ms for serverReqDone before closing both channels. If the exit-status hadn't been forwarded yet, the client never received it and defaulted to exit code 1.

Fix (3 parts):

  1. Wait for both data copy directions to complete before teardown, so the server's data EOF (which follows exit-status) is observed.
  2. Call serverChannel.CloseWrite() after the client→server copy finishes, signaling EOF to the remote process so it can exit and deliver exit-status.
  3. Increase the server-requests safety timeout from 500ms → 3s (should rarely fire now).

Updates since last revision:

  • CloseWrite() error is now logged at Debug level instead of being silently dropped.
  • Added a cancelled flag to avoid duplicate "Channel cancelled by context" log entries when ctx fires on both loop iterations.

Type ✨

  • Bug fix
  • New feature
  • Improvement
  • Breaking change
  • Documentation

Tests 🛠️

  • Verified the project builds cleanly (go build ./...)
  • Needs manual testing: Run infisical pam ssh proxy and perform SCP through the proxy; verify exit code is 0 on success.

Human review checklist

  • Does CloseWrite() on the server channel have any adverse effect on interactive SSH sessions or SFTP subsystems? (It's called after proxyData returns, meaning the client already sent EOF, so it should be safe — but worth verifying.)
  • Verify that waiting for both data directions doesn't cause hangs for long-lived interactive SSH sessions (the loop should still exit promptly when the client disconnects).
  • Is 3s an appropriate fallback timeout for serverReqDone?

Link to Devin session: https://app.devin.ai/sessions/e7ba74ee0e7546a6a6855a7b5eed6de1


Open with Devin

Previously, handleChannel only waited for one direction of the data
copy to finish before starting channel teardown. For SCP operations,
the client→server copy finishes first (file data sent), but the server
has not yet delivered exit-status. The 500ms timeout would fire,
channels were closed, and exit-status was lost — causing SCP to report
exit code 1 despite success.

Changes:
- Wait for both data copy directions to complete before teardown
- Call CloseWrite on server channel after client→server copy finishes
  so the remote SCP process receives EOF and can exit cleanly
- Increase the server-requests timeout from 500ms to 3s as a safety net

Co-Authored-By: Andrey Lyubavin <andrey@infisical.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR fixes a long-standing SCP exit-code-1 bug by making the channel teardown wait for both data-copy directions (instead of just one), calling serverChannel.CloseWrite() after the client→server copy so the remote scp process receives EOF and can send its exit-status, and raising the exit-status forwarding timeout from 500 ms to 3 s. The root-cause analysis is accurate and the three-part fix is correct.

Confidence Score: 5/5

Safe to merge — the fix is logically sound and all remaining findings are minor P2 style suggestions.

No P0 or P1 issues found. The two comments cover a silently-dropped CloseWrite error (easy to add logging) and a duplicate log line on context cancellation. Neither affects correctness or reliability of the fix.

No files require special attention beyond the two P2 nits in packages/pam/handlers/ssh/proxy.go.

Important Files Changed

Filename Overview
packages/pam/handlers/ssh/proxy.go Fixes premature SCP teardown by waiting for both data-copy directions, calling CloseWrite to signal EOF to the remote process, and increasing the exit-status timeout from 500ms to 3s. Two minor P2 style issues: CloseWrite error silently dropped, and duplicate log message on context cancellation.

Reviews (1): Last reviewed commit: "fix(pam): wait for both data directions ..." | Re-trigger Greptile

err := p.proxyData(clientChannel, serverChannel, "client→server", sessionID, true, chState)
// Signal the server that the client is done writing so the remote process
// receives EOF and can exit, which triggers exit-status delivery.
serverChannel.CloseWrite()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 CloseWrite() error silently dropped

The error returned by serverChannel.CloseWrite() is discarded. If the channel is already half-closed or broken when this fires, the failure will be invisible in logs, making it harder to diagnose future issues.

Suggested change
serverChannel.CloseWrite()
if err := serverChannel.CloseWrite(); err != nil {
log.Debug().Err(err).Str("sessionID", sessionID).Msg("Failed to CloseWrite on server channel")
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added error logging for CloseWrite() in c619d69.

Comment on lines +267 to 276
for i := 0; i < 2; i++ {
select {
case err := <-errChan:
if err != nil && err != io.EOF {
log.Debug().Err(err).Str("sessionID", sessionID).Msg("Channel proxy error")
}
case <-ctx.Done():
log.Info().Str("sessionID", sessionID).Msg("Channel cancelled by context")
}
case <-ctx.Done():
log.Info().Str("sessionID", sessionID).Msg("Channel cancelled by context")
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Duplicate log entry on context cancellation

If ctx is already cancelled when the loop starts (or is cancelled during the first iteration), ctx.Done() fires on both iterations and "Channel cancelled by context" is logged twice. Since errChan has capacity 2 the goroutines won't leak, but the duplicate Info-level log line can be confusing. Consider tracking whether cancellation was already observed:

Suggested change
for i := 0; i < 2; i++ {
select {
case err := <-errChan:
if err != nil && err != io.EOF {
log.Debug().Err(err).Str("sessionID", sessionID).Msg("Channel proxy error")
}
case <-ctx.Done():
log.Info().Str("sessionID", sessionID).Msg("Channel cancelled by context")
}
case <-ctx.Done():
log.Info().Str("sessionID", sessionID).Msg("Channel cancelled by context")
}
cancelled := false
for i := 0; i < 2; i++ {
select {
case err := <-errChan:
if err != nil && err != io.EOF {
log.Debug().Err(err).Str("sessionID", sessionID).Msg("Channel proxy error")
}
case <-ctx.Done():
if !cancelled {
log.Info().Str("sessionID", sessionID).Msg("Channel cancelled by context")
cancelled = true
}
}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — added a cancelled flag to deduplicate the log in c619d69.

…ancel log

Co-Authored-By: Andrey Lyubavin <andrey@infisical.com>
Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants