From 397fcbeb230e19843aae83a25be0e316b9bc603f Mon Sep 17 00:00:00 2001 From: A Ibrahim Date: Mon, 11 May 2026 14:45:21 +0200 Subject: [PATCH 1/2] fix(storage): encode keys per-segment to avoid SigV4 path double-escape MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `copy.ts`, `move.ts` (via `copyOrMove`), and `updateObject` built request paths and `X-Amz-Copy-Source` headers with plain `encodeURIComponent(key)`. When the custom HTTP client signs with SigV4 (access-key auth), the signer URI-escapes the path again during canonical-request construction. A `/` in the key became `%2F` and then `%252F`. The gateway decodes `%2F` back to `/` before computing its own canonical path, so the two canonical paths diverge and the server returns `403 SignatureDoesNotMatch`. OAuth/session-token callers were unaffected — the custom HTTP client skips SigV4 in that branch — which is why the bug only surfaced when the CLI integration suite started exercising `copy`/`move` with access keys. Adds `encodeObjectKey` in `@shared/utils` that splits on `/`, per-segment encodes, and rejoins. All four call sites in `copy.ts` and `update.ts` switch to it. Adds five unit tests covering flat keys, nested keys (regression-tagged), special chars within a segment, trailing slashes (folder markers), and empty input. Documents the SigV4 encoding rule and scope (path required; header recommended; query strings and AWS-SDK paths unaffected) in AGENTS.md so the gotcha is discoverable next time it almost gets re-introduced. Assisted-by: Claude Opus 4.7 via Claude Code --- .changeset/sigv4-path-encoding.md | 7 +++ AGENTS.md | 68 +++++++++++++++++++++++ packages/storage/src/lib/object/copy.ts | 6 +- packages/storage/src/lib/object/update.ts | 6 +- shared/utils.test.ts | 31 ++++++++++- shared/utils.ts | 17 ++++++ 6 files changed, 128 insertions(+), 7 deletions(-) create mode 100644 .changeset/sigv4-path-encoding.md diff --git a/.changeset/sigv4-path-encoding.md b/.changeset/sigv4-path-encoding.md new file mode 100644 index 0000000..cc59955 --- /dev/null +++ b/.changeset/sigv4-path-encoding.md @@ -0,0 +1,7 @@ +--- +'@tigrisdata/storage': patch +--- + +Fix `403 SignatureDoesNotMatch` from `copy`, `move`, and `updateObject` when the object key contains `/` and the request is signed with access-key SigV4. + +Object keys in the request path and `X-Amz-Copy-Source` header are now URL-encoded per-segment via the new `encodeObjectKey` helper, preserving `/` as a separator. Previously, plain `encodeURIComponent(key)` turned `/` into `%2F`, which the signer then double-escaped to `%252F` during canonical-request construction — diverging from the gateway's canonical path and producing `SignatureDoesNotMatch`. OAuth/session-token callers were unaffected because that auth path skips SigV4 signing entirely. diff --git a/AGENTS.md b/AGENTS.md index f9bcad0..2840633 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -108,3 +108,71 @@ Root-level scripts: - The project uses Husky for Git hooks (commitlint via `commit-msg`, Biome `check` via `pre-commit`). - Commitizen is configured for Conventional Commits (`pnpm commit`). - Biome handles linting and formatting (replaced ESLint and Prettier). + +## Known pitfalls + +### SigV4 path encoding for object keys + +Scope: this only applies to code that talks to the **custom HTTP +client** in `shared/http-client.ts` (used by `copy`, `move`, +`updateObject`, and the `setBucket*` family). Anything routed through +the AWS SDK (`ListObjectsV2Command`, `PutObjectCommand`, +`HeadObjectCommand`, `GetObjectCommand`, `RemoveObjectCommand`, etc.) +handles its own SigV4 correctly — don't touch encoding there. + +**Rule for the custom client**: whenever you build a **request path** +that embeds an object key, do **not** encode the whole key with +`encodeURIComponent`. Use `encodeObjectKey` from `@shared/utils` +instead. + +Why the path specifically: when the request is signed with SigV4 (the +access-key auth branch), `SignatureV4.sign()` URI-escapes the request +path *again* during canonical-request construction. If the key was +already escaped with plain `encodeURIComponent`, every `/` becomes +`%2F` and then `%252F`. The Tigris gateway decodes `%2F` back to `/` +before computing its own canonical path, so the two canonical paths +diverge and the server returns `403 SignatureDoesNotMatch`. + +**Other surfaces — not affected by the double-encode bug but worth +knowing**: + +- **Query strings**: the HTTP client extracts query params via + `URL.searchParams.forEach`, which returns the **decoded** value + before handing it to the signer. The signer encodes once for the + canonical query string, the server canonicalizes the same way — + single encoding round-trip. Writing `?prefix=${encodeURIComponent(p)}` + is safe; so is writing the value raw and letting `URL` percent-encode + it. +- **Header values**: SigV4's canonical-header construction does not + URL-escape header values (only trims and collapses whitespace), so + there's no double-encode mismatch. However, S3's + `X-Amz-Copy-Source` spec **requires** the value be URL-encoded; the + server URL-decodes it before treating it as a key. Use + `encodeObjectKey` there too — it gives you the spec-correct encoding + for free (special chars escaped, `/` separators preserved). + +Symptoms that point at this bug: + +- `403 Forbidden` with body `SignatureDoesNotMatch` for + requests that include a nested object key (key contains `/`). +- Works fine in tests that mock the network or run with OAuth/session + tokens (those skip the SigV4 branch entirely — see + `shared/http-client.ts`), then fails the first time a real consumer + exercises the call with access keys. +- A previously-fine flat key like `file.txt` succeeds, but + `folder/file.txt` 403s. + +Mitigation for new code: + +```ts +import { encodeObjectKey } from '@shared/utils'; + +path: `/${bucket}/${encodeObjectKey(key)}?x-id=...`, +headers: { + [TigrisHeaders.COPY_SOURCE]: `${srcBucket}/${encodeObjectKey(src)}`, +} +``` + +When adding tests for any function that hits the custom HTTP client, +include at least one case with a key that contains `/`. It's the only +way to catch this against the real gateway with access-key auth. diff --git a/packages/storage/src/lib/object/copy.ts b/packages/storage/src/lib/object/copy.ts index 97529c8..e6709ea 100644 --- a/packages/storage/src/lib/object/copy.ts +++ b/packages/storage/src/lib/object/copy.ts @@ -1,5 +1,5 @@ import { TigrisHeaders } from '@shared/headers'; -import { handleError } from '@shared/utils'; +import { encodeObjectKey, handleError } from '@shared/utils'; import { config, missingConfigError } from '../config'; import { createStorageClient } from '../http-client'; import type { TigrisStorageConfig, TigrisStorageResponse } from '../types'; @@ -56,7 +56,7 @@ export async function copyOrMove( } const headers: Record = { - [TigrisHeaders.COPY_SOURCE]: `${srcBucket}/${encodeURIComponent(src)}`, + [TigrisHeaders.COPY_SOURCE]: `${srcBucket}/${encodeObjectKey(src)}`, }; if (rename) { @@ -66,7 +66,7 @@ export async function copyOrMove( try { const response = await storageHttpClient.request({ method: 'PUT', - path: `/${destBucket}/${encodeURIComponent(dest)}?x-id=CopyObject`, + path: `/${destBucket}/${encodeObjectKey(dest)}?x-id=CopyObject`, headers, }); diff --git a/packages/storage/src/lib/object/update.ts b/packages/storage/src/lib/object/update.ts index e239f6c..74a2f1c 100644 --- a/packages/storage/src/lib/object/update.ts +++ b/packages/storage/src/lib/object/update.ts @@ -1,6 +1,6 @@ import { PutObjectAclCommand } from '@aws-sdk/client-s3'; import { TigrisHeaders } from '@shared/headers'; -import { handleError } from '@shared/utils'; +import { encodeObjectKey, handleError } from '@shared/utils'; import { config, missingConfigError } from '../config'; import { createStorageClient } from '../http-client'; import { createTigrisClient } from '../tigris-client'; @@ -50,9 +50,9 @@ export async function updateObject( try { const response = await storageHttpClient.request({ method: 'PUT', - path: `/${bucket}/${encodeURIComponent(key)}?x-id=CopyObject`, + path: `/${bucket}/${encodeObjectKey(key)}?x-id=CopyObject`, headers: { - [TigrisHeaders.COPY_SOURCE]: `${bucket}/${encodeURIComponent(path)}`, + [TigrisHeaders.COPY_SOURCE]: `${bucket}/${encodeObjectKey(path)}`, [TigrisHeaders.RENAME]: 'true', }, }); diff --git a/shared/utils.test.ts b/shared/utils.test.ts index 066f2cd..fcd4033 100644 --- a/shared/utils.test.ts +++ b/shared/utils.test.ts @@ -1,5 +1,5 @@ import { describe, expect, it } from 'vitest'; -import { executeWithConcurrency } from './utils'; +import { encodeObjectKey, executeWithConcurrency } from './utils'; describe('executeWithConcurrency', () => { it('should execute all tasks and return results in order', async () => { @@ -136,3 +136,32 @@ describe('executeWithConcurrency', () => { expect(executionOrder).toEqual([1, 2, 3]); }); }); + +describe('encodeObjectKey', () => { + it('encodes a flat key with no slashes', () => { + expect(encodeObjectKey('file.txt')).toBe('file.txt'); + }); + + it('preserves slashes as path separators', () => { + // Regression: a plain `encodeURIComponent` turns `/` into `%2F`, + // which the SigV4 signer then double-encodes to `%252F` during + // canonical-request construction, producing SignatureDoesNotMatch. + expect(encodeObjectKey('folder/file.txt')).toBe('folder/file.txt'); + expect(encodeObjectKey('a/b/c/d.txt')).toBe('a/b/c/d.txt'); + }); + + it('encodes special characters within segments', () => { + expect(encodeObjectKey('folder/my file.txt')).toBe('folder/my%20file.txt'); + expect(encodeObjectKey('folder/?weird=name.txt')).toBe( + 'folder/%3Fweird%3Dname.txt' + ); + }); + + it('preserves trailing slash (folder markers)', () => { + expect(encodeObjectKey('folder/')).toBe('folder/'); + }); + + it('handles empty input', () => { + expect(encodeObjectKey('')).toBe(''); + }); +}); diff --git a/shared/utils.ts b/shared/utils.ts index 1e53325..b73757d 100644 --- a/shared/utils.ts +++ b/shared/utils.ts @@ -31,6 +31,23 @@ export async function executeWithConcurrency( return results; } +/** + * Encode an object key for use in an S3-style request path or + * `X-Amz-Copy-Source` header. Encodes each path segment with + * `encodeURIComponent` and rejoins with `/`, so the slash separators + * survive verbatim. + * + * Why this matters: when the request is signed with SigV4 (access-key + * auth path), the signer URI-escapes the path again during canonical + * request construction. If the key was already escaped with a plain + * `encodeURIComponent`, the `/` becomes `%2F` and then `%252F` — + * mismatch against the server's canonical (which decodes `%2F` + * back to `/` first). Result: `SignatureDoesNotMatch`. + */ +export function encodeObjectKey(key: string): string { + return key.split('/').map(encodeURIComponent).join('/'); +} + export function toError(value: unknown): Error { if (value instanceof Error) return value; if (typeof value === 'string') return new Error(value); From 98d77cc7c97ab2339f7935354d8a86ccbbc542bc Mon Sep 17 00:00:00 2001 From: A Ibrahim Date: Mon, 11 May 2026 15:22:15 +0200 Subject: [PATCH 2/2] fix(storage): set uriEscapePath: false on SigV4 for single-encoded paths MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit `@smithy/signature-v4`'s `SignatureV4` defaults `uriEscapePath: true`, which is the AWS-standard double-encoding scheme. S3 uses single-encoding, and Tigris gateway follows S3 semantics. With the default, the signer re-percent-encoded any path sequence during canonicalization (e.g. `%20` → `%2520`) while the gateway treated the wire path as single-encoded — producing `SignatureDoesNotMatch` for every key with a character that requires percent-encoding (space, `?`, `=`, `&`, etc.). The per-segment `encodeObjectKey` fix from the previous commit was necessary but not sufficient: it kept `/` intact for valid URLs but left every other special char exposed to the signer's re-canonicalization pass. Adds `uriEscapePath: false` to the `SignatureV4` constructor in the custom HTTP client. Verified end-to-end against access-key auth: `cp src.txt 'folder/my file.txt'` now succeeds (was 403 Forbidden SignatureDoesNotMatch). AGENTS.md updated to describe both encoding traps (the signer setting and per-segment pre-encoding) and to require integration tests with both a nested key and a special-char key. Changeset updated to reflect both fixes. Addresses greptile review on #101. Assisted-by: Claude Opus 4.7 via Claude Code --- .changeset/sigv4-path-encoding.md | 9 ++++++-- AGENTS.md | 34 +++++++++++++++++++++++-------- shared/http-client.ts | 5 +++++ 3 files changed, 37 insertions(+), 11 deletions(-) diff --git a/.changeset/sigv4-path-encoding.md b/.changeset/sigv4-path-encoding.md index cc59955..38ab155 100644 --- a/.changeset/sigv4-path-encoding.md +++ b/.changeset/sigv4-path-encoding.md @@ -2,6 +2,11 @@ '@tigrisdata/storage': patch --- -Fix `403 SignatureDoesNotMatch` from `copy`, `move`, and `updateObject` when the object key contains `/` and the request is signed with access-key SigV4. +Fix `403 SignatureDoesNotMatch` from `copy`, `move`, and `updateObject` when the object key contains `/` or any other character that requires percent-encoding (space, `?`, `=`, etc.) and the request is signed with access-key SigV4. -Object keys in the request path and `X-Amz-Copy-Source` header are now URL-encoded per-segment via the new `encodeObjectKey` helper, preserving `/` as a separator. Previously, plain `encodeURIComponent(key)` turned `/` into `%2F`, which the signer then double-escaped to `%252F` during canonical-request construction — diverging from the gateway's canonical path and producing `SignatureDoesNotMatch`. OAuth/session-token callers were unaffected because that auth path skips SigV4 signing entirely. +Two related fixes: + +- Custom HTTP client now constructs `SignatureV4` with `uriEscapePath: false`, matching S3's single-encoding canonical-path scheme. The default `true` (AWS-standard double-encoding) caused the signer to re-escape any percent sequence in the path during canonicalization, while Tigris gateway uses S3 single-encoding — producing the mismatch for every key with a special char. +- Object keys in the request path and `X-Amz-Copy-Source` header are URL-encoded per-segment via the new `encodeObjectKey` helper, preserving `/` as a separator so the wire URL stays valid. + +OAuth/session-token callers were unaffected because that auth path skips SigV4 signing entirely. diff --git a/AGENTS.md b/AGENTS.md index 2840633..07cbf04 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -125,13 +125,22 @@ that embeds an object key, do **not** encode the whole key with `encodeURIComponent`. Use `encodeObjectKey` from `@shared/utils` instead. -Why the path specifically: when the request is signed with SigV4 (the -access-key auth branch), `SignatureV4.sign()` URI-escapes the request -path *again* during canonical-request construction. If the key was -already escaped with plain `encodeURIComponent`, every `/` becomes -`%2F` and then `%252F`. The Tigris gateway decodes `%2F` back to `/` -before computing its own canonical path, so the two canonical paths -diverge and the server returns `403 SignatureDoesNotMatch`. +Why the path specifically: there are two encoding traps that compound. + +1. **Signer setting**: `@smithy/signature-v4`'s `SignatureV4` + constructor defaults `uriEscapePath: true` — the AWS-standard + double-encoding scheme. S3 uses single-encoding, so our custom HTTP + client passes `uriEscapePath: false`. If you ever build a second + signer instance, set the same flag. With the default, the signer + re-percent-encodes any sequence in the path during canonicalization + (`%20` → `%2520`), and Tigris gateway — which uses S3 single-encoding + — diverges from the client canonical and returns + `403 SignatureDoesNotMatch` for every key with a special char. +2. **Pre-encoding**: even with `uriEscapePath: false`, the wire URL + must still be a valid URL — keys with `/`, spaces, `?`, etc. need + percent-encoding when serialized. Use `encodeObjectKey` (single + encode, per-segment, preserves `/`) so the on-wire path is valid + and matches what the signer signs. **Other surfaces — not affected by the double-encode bug but worth knowing**: @@ -174,5 +183,12 @@ headers: { ``` When adding tests for any function that hits the custom HTTP client, -include at least one case with a key that contains `/`. It's the only -way to catch this against the real gateway with access-key auth. +exercise at least these two key shapes against the real gateway with +access-key auth: + +- A key with `/` (e.g. `folder/file.txt`) — catches the per-segment + encoding regression. +- A key with a character that requires percent-encoding (space, `?`, + `=`, `&`) — catches the `uriEscapePath` regression. Unit tests on + `encodeObjectKey` alone aren't enough; the bug only shows up when + the signer's canonicalization pass runs end-to-end. diff --git a/shared/http-client.ts b/shared/http-client.ts index 30b4bf0..7325d1b 100644 --- a/shared/http-client.ts +++ b/shared/http-client.ts @@ -64,6 +64,11 @@ async function generateSignatureHeaders( region: 'auto', service: 's3', sha256: Sha256, + // S3 uses single-encoding for the canonical path; the default + // (`true`) is the AWS-standard double-encoding scheme and produces + // `SignatureDoesNotMatch` whenever a path segment contains + // characters that need percent-encoding (space, `?`, `=`, etc.). + uriEscapePath: false, }); const query: Record = {};