Skip to content

Fix yproxy error messages#165

Merged
reshke merged 11 commits into
masterfrom
DecLogLevel
May 8, 2026
Merged

Fix yproxy error messages#165
reshke merged 11 commits into
masterfrom
DecLogLevel

Conversation

@leborchuk
Copy link
Copy Markdown
Contributor

@leborchuk leborchuk commented May 7, 2026

This is generated code.

We see two issues in yproxy logs:

  1. Multiple cat error and write unix /tmp/yproxy.sock->@: write: broken pipe
  2. Intermittent upload error

Here I generate fixes to address issues:

The summary of changes across 3 files:


pkg/storage/s3storage.go — Fixes for upload errors

Fix 1: Fail loudly on missing credentials — Added getCredentials() helper that replaces all 8 silent s.credentialMap[bucket] map accesses. It returns an explicit error if the bucket has no credentials configured, and warns when keys are empty (which causes silent fallback to ambient credentials — the root cause of the 403 SignatureDoesNotMatch).

Fix 2: Retry on NoSuchUpload — Added isNoSuchUploadError() helper and a retry loop (up to 3 attempts) in PutFileToDest() that detects NoSuchUpload (404) and restarts the entire multipart upload. This addresses the case where the S3 provider aborts the upload ID server-side.

**Fix 7: Add rate limit for S3 listings, we should perform no more than 2 requests per second

Bonus: Fixed PatchFile() return value — was returning nil on session error instead of the error.


pkg/proc/yio/yrreader.go — Fixes for cat errors

Fix 3: Downgraded retry log — Changed YRestartReader.Restart() from Error to Warn with a clearer message: "cat object: restarting read from offset after transient error".


pkg/proc/interaction.go — Fixes for cat errors + decrypt offset bug

Fix 4: Broken pipe detection — In ProcessCatExtended(), EPIPE/ErrClosedPipe errors (client disconnected) are now logged at Warn instead of Error.


Copilot AI review requested due to automatic review settings May 7, 2026 10:54
@leborchuk
Copy link
Copy Markdown
Contributor Author

Root Cause Analysis

Issue 1: "cat object with offset after possible error"

Trace through the code:

  1. NewYRetryReader creates a YproxyRetryReader with needReacquire=true, so the first Read() call triggers Restart(0) — a fresh S3 range GET.
  2. Mid-stream, the S3 TCP connection is reset (connection reset by peer / unexpected EOF). YproxyRetryReader.Read catches the error, records offsetReached, sets needReacquire=true, and loops.
  3. On the next iteration, YRestartReader.Restart(offsetReached) is called — this logs "cat object with offset after possible error" at ERROR level and re-issues a range GET from the byte offset. This is working as designed — the retries are succeeding.
  4. The terminal error is "write unix /tmp/yproxy.sock->@: write: broken pipe" in ProcessCatExtended — the downstream client (PostgreSQL/wal-g) closed its end of the unix socket before the cat finished.

Most likely root causes:

  • Primary: The client disconnected (query cancellation, statement timeout, or client crash). The broken pipe on the unix socket is the actual failure; the S3 retries are noise from the retry mechanism working correctly.
  • Secondary (latent bug): When decrypt=true AND startOffset != 0, io.CopyN(io.Discard, contentReader, int64(startOffset)) operates on decrypted bytes, but YproxyRetryReader.offsetReached tracks encrypted bytes (since the retry reader is below the decryptor). If a read error occurs during the discard phase, the restart will seek to the wrong encrypted offset, corrupting the stream.

Misleading log: The "cat object with offset after possible error" message fires at ERROR level even on the very first retry (retry count=0), making it look like a fatal error when it is actually a normal retry. It should be WARN or INFO.


Issue 2: "failed to upload" — Two distinct sub-cases

Sub-case A — SignatureDoesNotMatch (HTTP 403, 2026-05-05):

In S3SessionPool.createSession, credentials are taken from config.StorageCredentials (static key/secret). The 403 means the HMAC-SHA256 signature computed by the SDK does not match what the S3 endpoint computed. Two causes:

  • Most likely: Expired/rotated credentialsaccess_key_id or secret_access_key in the yproxy config no longer match the S3 endpoint. Requires ops action: rotate credentials and update config.
  • Also possible: Clock skew — AWS Signature V4 embeds a timestamp; if the yproxy host clock drifts >15 min from the S3 server, every request will 403.

Sub-case B — NoSuchUpload (HTTP 404, 2026-03-15):

In PutFileToDest, up.Upload(...) performs a multipart upload. The upload ID issued by CreateMultipartUpload no longer exists on the S3 side when the parts are being committed. Cause: the S3 provider aborted the multipart upload due to inactivity timeout or an explicit abort by another process. There is no retry logic for this case — the entire upload fails.

Cascade errors (io: read/write on closed pipe) in both sub-cases are consequences, not causes: when PutFileToDest returns an error, the io.Pipe reader r is closed, and the writer goroutine in ProcessPutExtended gets broken pipe on its next Write.


Proposed Fixes

# Issue Fix
1 S3 TCP resets logged as ERROR on every retry In YRestartReader.Restart, change the log level from Error to Warn when offsetStart > 0 (it is a retry, not a fatal error).
2 Client disconnect causes noisy error In ProcessCatExtended, check if the io.Copy error is broken pipe / io.ErrClosedPipe and log at WARN instead of ERROR — the client disconnected, yproxy did nothing wrong.
3 Offset tracking bug (decrypt + startOffset) When decrypt=true and startOffset != 0, the discard must happen on the raw (pre-decrypt) reader, not the decrypted reader, so offsetReached stays in sync with the restart offset. Or: track a separate decrypted-byte counter and do not use the retry reader's offset for restart in this path.
4 403 SignatureDoesNotMatch Ops fix: verify and rotate access_key_id/secret_access_key in the yproxy config. Check clock sync (chronyc tracking or timedatectl) on the yproxy host.
5 404 NoSuchUpload Add retry logic in PutFileToDest: catch NoSuchUpload from up.Upload and restart the entire multipart upload (re-create upload ID and re-upload all parts). Also increase the multipart upload TTL on the S3 provider side.
6 Semaphore not covering actual I/O In S3SessionPool.GetSession, the semaphore is released immediately after session creation via defer, so it does not limit concurrent transfers at all. The semaphore acquire/release should wrap the actual GetObject/Upload call at the call site, not inside GetSession.

Diagnosis confirmation needed before implementing fixes 3, 5, 6: fixes 1, 2, 4 are straightforward; fixes 3, 5, 6 require code changes. Please confirm which issues to address in code.

Comment thread pkg/proc/interaction.go Fixed
Comment thread pkg/proc/interaction.go Fixed
Comment thread pkg/proc/interaction.go Fixed
Comment thread pkg/proc/interaction.go Fixed
Comment thread pkg/proc/interaction.go Fixed
Comment thread pkg/proc/interaction.go Fixed
Comment thread pkg/proc/interaction.go Fixed
Comment thread pkg/proc/yio/yrreader.go Fixed
Comment thread pkg/storage/s3storage.go Fixed
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates yproxy’s S3 upload and “cat” read paths to produce clearer, more actionable errors and to adjust retry behavior—especially when decryption is enabled (where mid-stream restarts are unsafe).

Changes:

  • Added credential lookup helper to fail fast when bucket credentials are missing and warn on empty keys.
  • Added multipart-upload retry logic for NoSuchUpload and fixed an error-return bug in PatchFile.
  • Split ProcessCatExtended into decrypt vs non-decrypt paths, adding full-pipeline retries for decrypt and downgrading/disambiguating certain log severities/messages.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pkg/storage/s3storage.go Adds credential validation and multipart upload retry handling; fixes an incorrect nil error return in PatchFile.
pkg/proc/yio/yrreader.go Adjusts restart log severity/message and introduces a “no-retry” retry-reader constructor for decrypt use-cases.
pkg/proc/interaction.go Refactors cat handling into decrypt/non-decrypt flows; adds broken-pipe detection and full-pipeline retry loop for decrypt.
Comments suppressed due to low confidence (1)

pkg/proc/interaction.go:152

  • Same uint64int64 overflow risk for startOffset when calling io.CopyN in the decrypted path. Validate startOffset fits into int64 before casting and return an error if it does not.
	if startOffset != 0 {
		if _, err := io.CopyN(io.Discard, contentReader, int64(startOffset)); err != nil {
			return err
		}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/storage/s3storage.go Outdated
Comment thread pkg/proc/interaction.go Outdated
Comment thread pkg/proc/interaction.go Outdated
Comment thread pkg/proc/yio/yrreader.go Outdated
@github-actions

This comment has been minimized.

Copy link
Copy Markdown

Copilot AI commented May 7, 2026

Root Cause Analysis

Issue 1: "cat object with offset after possible error"

...

Confirmed and implemented the code-side upload retry correction in 09e3ce1 (issue #5 path): retries now require a seekable source and rewind before retry; non-seekable streams are not retried on NoSuchUpload.

Comment thread pkg/storage/s3storage.go Fixed
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

@check-spelling-bot Report

🔴 Please review

See the 📂 files view, the 📜action log, 👼 SARIF report, or 📝 job summary for details.

Unrecognized words (28)
alloc
Burstable
CGO
COPYV
cpath
EPIPE
fds
gogc
gomaxprocs
gomemlimit
goroutines
inuse
LISTV
mallocs
mcache
memstats
mspan
mws
promauto
promhttp
PUTV
qfkbl
sched
setport
Sourced
TSTo
Vec
ydt
These words are not needed and should be removed persisnt

To accept these unrecognized words as correct and remove the previously acknowledged and now absent words, you could run the following commands

... in a clone of the git@github.com:open-gpdb/yproxy.git repository
on the DecLogLevel branch (ℹ️ how do I use this?):

curl -s -S -L 'https://raw.githubusercontent.com/check-spelling/check-spelling/main/apply.pl' |
perl - 'https://github.com/open-gpdb/yproxy/actions/runs/25549190293/attempts/1' &&
git commit -m 'Update check-spelling metadata'

OR

To have the bot accept them for you, comment in the PR quoting the following line:
@check-spelling-bot apply updates.

Available 📚 dictionaries could cover words (expected and unrecognized) not in the 📘 dictionary

This includes both expected items (238) from .github/actions/spelling/expect.txt and unrecognized words (28)

Dictionary Entries Covers Uniquely
cspell:golang/dict/go.txt 2099 15 5
cspell:node/dict/node.txt 891 15 2
cspell:php/dict/php.txt 1689 14 2
cspell:python/src/python/python-lib.txt 2417 14 2
cspell:dotnet/dict/dotnet.txt 405 7
cspell:filetypes/filetypes.txt 264 5 1
cspell:fullstack/dict/fullstack.txt 419 5 1
cspell:java/src/java.txt 2464 5 2
cspell:python/src/python/python.txt 392 5
cspell:typescript/dict/typescript.txt 1098 4 1
cspell:k8s/dict/k8s.txt 153 4
cspell:django/dict/django.txt 393 3 2
cspell:aws/aws.txt 218 3 1
cspell:cpp/src/stdlib-c.txt 278 3 1
cspell:rust/dict/rust.txt 30 3
cspell:scala/dict/scala.txt 153 3
cspell:cpp/src/stdlib-cpp.txt 252 3
cspell:npm/dict/npm.txt 302 3
cspell:r/src/r.txt 543 3
cspell:lua/dict/lua.txt 190 2 1

Consider adding them (in .github/workflows/spelling.yaml) in jobs:/spelling: for uses: check-spelling/check-spelling@main in its with to extra_dictionaries:

            cspell:golang/dict/go.txt
            cspell:node/dict/node.txt
            cspell:php/dict/php.txt
            cspell:python/src/python/python-lib.txt
            cspell:dotnet/dict/dotnet.txt
            cspell:filetypes/filetypes.txt
            cspell:fullstack/dict/fullstack.txt
            cspell:java/src/java.txt
            cspell:python/src/python/python.txt
            cspell:typescript/dict/typescript.txt
            cspell:k8s/dict/k8s.txt
            cspell:django/dict/django.txt
            cspell:aws/aws.txt
            cspell:cpp/src/stdlib-c.txt
            cspell:rust/dict/rust.txt
            cspell:scala/dict/scala.txt
            cspell:cpp/src/stdlib-cpp.txt
            cspell:npm/dict/npm.txt
            cspell:r/src/r.txt
            cspell:lua/dict/lua.txt

To stop checking additional dictionaries, add (in .github/workflows/spelling.yaml) for uses: check-spelling/check-spelling@main in its with:

check_extra_dictionaries: ""
Pattern suggestions ✂️ (9)

You could add these patterns to .github/actions/spelling/patterns.txt:

# Automatically suggested patterns

# hit-count: 13 file-count: 8
# https/http/file urls
(?:\b(?:https?|ftp|file)://)[-A-Za-z0-9+&@#/*%?=~_|!:,.;]+[-A-Za-z0-9+&@#/*%=~_|]

# hit-count: 8 file-count: 2
# version suffix <word>v#
(?:(?<=[A-Z]{2})V|(?<=[a-z]{2}|[A-Z]{2})v)\d+(?:\b|(?=[a-zA-Z_]))

# hit-count: 4 file-count: 4
# uuencoded
[!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_]{40,}

# hit-count: 4 file-count: 2
# C network byte conversions
(?:\d|\bh)to(?!ken)(?=[a-z])|to(?=[adhiklpun]\()

# hit-count: 4 file-count: 1
# container images
image: [-\w./:@]+

# hit-count: 1 file-count: 1
# scala imports
^import (?:[\w.]|\{\w*?(?:,\s*(?:\w*|\*))+\})+

# hit-count: 1 file-count: 1
# Debian changelog severity
[-\w]+ \(.*\) (?:\w+|baseline|unstable|experimental); urgency=(?:low|medium|high|emergency|critical)\b

# hit-count: 1 file-count: 1
# Compiler flags (Unix, Java/Scala)
# Use if you have things like `-Pdocker` and want to treat them as `docker`
(?:^|[\t ,>"'`=(])-(?:(?:J-|)[DPWXY]|[Llf])(?=[A-Z]{2,}|[A-Z][a-z]|[a-z]{2,})

# hit-count: 1 file-count: 1
# Compiler flags (Windows / PowerShell)
# This is a subset of the more general compiler flags pattern.
# It avoids matching `-Path` to prevent it from being treated as `ath`
(?:^|[\t ,"'`=(])-(?:[DPL](?=[A-Z]{2,})|[WXYlf](?=[A-Z]{2,}|[A-Z][a-z]|[a-z]{2,}))

Alternatively, if a pattern suggestion doesn't make sense for this project, add a # to the beginning of the line in the candidates file with the pattern to stop suggesting it.

Notices ℹ️ (2)

See the 📂 files view, the 📜action log, 👼 SARIF report, or 📝 job summary for details.

ℹ️ Notices Count
ℹ️ candidate-pattern 15
ℹ️ unused-config-file 1

See ℹ️ Event descriptions for more information.

If the flagged items are 🤯 false positives

If items relate to a ...

  • binary file (or some other file you wouldn't want to check at all).

    Please add a file path to the excludes.txt file matching the containing file.

    File paths are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your files.

    ^ refers to the file's path from the root of the repository, so ^README\.md$ would exclude README.md (on whichever branch you're using).

  • well-formed pattern.

    If you can write a pattern that would match it,
    try adding it to the patterns.txt file.

    Patterns are Perl 5 Regular Expressions - you can test yours before committing to verify it will match your lines.

    Note that patterns can't match multiline strings.

🚂 If you're seeing this message and your PR is from a branch that doesn't have check-spelling,
please merge to your PR's base branch to get the version configured for your repository.

@reshke reshke merged commit 5758f9e into master May 8, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants