-
Notifications
You must be signed in to change notification settings - Fork 315
fix(backend): handle Gitea ERR_STREAM_PREMATURE_CLOSE during sync #1405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -12,6 +12,22 @@ import { measure } from './utils.js'; | |
| const logger = createLogger('gitea'); | ||
| const GITEA_CLOUD_HOSTNAME = "gitea.com"; | ||
|
|
||
| // Some Gitea instances (particularly when behind certain reverse proxies or with | ||
| // response compression enabled) cause `cross-fetch` to fail while reading the | ||
| // response body with ERR_STREAM_PREMATURE_CLOSE. Forcing identity encoding and | ||
| // closing the connection avoids the premature close. | ||
| // @see https://github.com/sourcebot-dev/sourcebot/issues/1404 | ||
| const customFetch: typeof fetch = (url, options = {}) => { | ||
| return fetch(url, { | ||
| ...options, | ||
| headers: { | ||
| ...(options.headers ?? {}), | ||
| 'Accept-Encoding': 'identity', | ||
| 'Connection': 'close', | ||
| }, | ||
| }); | ||
| }; | ||
|
|
||
| export const getGiteaReposFromConfig = async (config: GiteaConnectionConfig) => { | ||
| const hostname = config.url ? | ||
| new URL(config.url).hostname : | ||
|
|
@@ -25,7 +41,7 @@ export const getGiteaReposFromConfig = async (config: GiteaConnectionConfig) => | |
|
|
||
| const api = giteaApi(config.url ?? 'https://gitea.com', { | ||
| token: token, | ||
| customFetch: fetch, | ||
| customFetch, | ||
| }); | ||
|
|
||
| let allRepos: GiteaRepository[] = []; | ||
|
|
@@ -49,8 +65,11 @@ export const getGiteaReposFromConfig = async (config: GiteaConnectionConfig) => | |
| allWarnings = allWarnings.concat(warnings); | ||
| } | ||
|
|
||
| allRepos = allRepos.filter(repo => repo.full_name !== undefined); | ||
| allRepos = allRepos.filter(repo => { | ||
| if (repo === null || repo === undefined) { | ||
| logger.warn(`Skipping null/undefined repository returned by the Gitea API`); | ||
| return false; | ||
| } | ||
| if (repo.full_name === undefined) { | ||
| logger.warn(`Repository with undefined full_name found: repoId=${repo.id}`); | ||
| return false; | ||
|
|
@@ -208,6 +227,10 @@ const getRepos = async <T>(repoList: string[], api: Api<T>) => { | |
| api.repos.repoGet(owner, repoName), | ||
| ); | ||
|
|
||
| if (response.error || !response.data) { | ||
| throw response.error ?? new Error(`Received empty response body while fetching repository ${repo}`); | ||
| } | ||
|
|
||
|
Comment on lines
+230
to
+233
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Inspect getReposForOrgs/getReposOwnedByUsers for equivalent response validation
fd gitea.ts packages/backend/src --exec sed -n '96,219p' {}Repository: sourcebot-dev/sourcebot Length of output: 3484 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Locate paginate and the repo listing callers/validators
rg -n "const paginate|function paginate|paginate\(" packages/backend/src -g '*.ts' -g '*.tsx'
# Show the relevant sections around paginate and repoGet in gitea.ts
file="packages/backend/src/gitea.ts"
start=$(rg -n "const paginate|function paginate" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
sed -n "$((start-20)),$((start+120))p" "$file"
fi
echo "---- repoGet section ----"
rg -n "repoGet\(|Received empty response body|response\.error|!response\.data" "$file" -n -A4 -B4
echo "---- repoCompileUtils non-null assertions ----"
rg -n "full_name!|clone_url!|id!" packages/backend/src/repoCompileUtils.ts -n -A3 -B3Repository: sourcebot-dev/sourcebot Length of output: 5303 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Inspect measure() and any existing handling of HttpResponse errors/empty payloads
file="packages/backend/src/gitea.ts"
rg -n "const measure|function measure|measure\(" packages/backend/src -g '*.ts'
echo "---- measure and surrounding helpers ----"
start=$(rg -n "const measure|function measure" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
sed -n "$((start-40)),$((start+80))p" "$file"
fi
echo "---- HttpResponse usage in gitea.ts ----"
rg -n "HttpResponse|response\.error|response\.data|data: response" "$file" -n -A3 -B3
echo "---- API client type definitions if present ----"
fd -a "types.ts" packages/backend/src node_modules 2>/dev/null | head -n 20Repository: sourcebot-dev/sourcebot Length of output: 2613 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Read measure() and the gitea pagination helper
sed -n '1,80p' packages/backend/src/utils.ts
echo "----"
sed -n '120,320p' packages/backend/src/gitea.ts
# Find where HttpResponse comes from if it's referenced in the repo
rg -n "interface HttpResponse|type HttpResponse|HttpResponse<" packages/backend/src -g '*.ts'Repository: sourcebot-dev/sourcebot Length of output: 7603 Guard the paginated org/user repo-listing responses 🤖 Prompt for AI Agents |
||
| logger.debug(`Found repo ${repo} in ${durationMs}ms.`); | ||
| return { | ||
| type: 'valid' as const, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🌐 Web query:
Does forcing a Connection: close header on every fetch request in Node.js (via cross-fetch / node-fetch) cause performance regressions or socket errors with HTTP keep-alive agents?💡 Result:
Forcing a Connection: close header on every fetch request in Node.js can cause performance regressions and, in certain environments, trigger socket errors or connectivity issues [1][2]. The impact of this practice is primarily negative due to the following reasons: 1. Performance Regression: HTTP keep-alive is designed to improve performance by reusing existing TCP connections for multiple requests [3][4]. By forcing Connection: close, you instruct the client to terminate the TCP connection after each request [3]. This forces the client to perform a full TCP (and potentially TLS) handshake for every single request, significantly increasing latency and overhead, especially for applications making frequent calls to the same host [4]. 2. Socket Errors and Compatibility: Modern Node.js versions and HTTP agents are increasingly optimized for keep-alive by default [2][5]. Forcing Connection: close can conflict with the internal management logic of the Node.js http.Agent [1][2]. Specifically, sending a Connection: close header on a connection that the underlying agent intends to keep open can lead to protocol mismatches, resulting in socket hang-ups or ECONNRESET errors [2]. 3. Library-Specific Context: In older versions of libraries like node-fetch, the library automatically appended a Connection: close header when no agent was provided [6]. This behavior was later identified as a source of performance degradation and instability in newer Node.js releases, leading to its removal in subsequent updates [1][2]. Modern implementations rely on the Node.js http.Agent to manage connection lifecycles appropriately [1][7]. To manage connection behavior correctly, you should: - Use an http.Agent with the keepAlive: true option enabled if you wish to reuse connections [5][8]. - Avoid manually setting the Connection header; allow the Node.js Agent to manage the header based on the agent's configuration and the specific request/response lifecycle [3][1]. Top results: [3][1][2][5]
Citations:
🏁 Script executed:
Repository: sourcebot-dev/sourcebot
Length of output: 10256
🏁 Script executed:
Repository: sourcebot-dev/sourcebot
Length of output: 10207
🌐 Web query:
cross-fetch Node.js backend uses node-fetch and how it handles the Connection header and keep-alive agent💡 Result:
Cross-fetch does not implement its own logic for connection handling or HTTP agents; instead, it acts as a proxy [1][2][3]. When running in a Node.js environment, cross-fetch imports and exports node-fetch [1][3]. Consequently, its behavior regarding the Connection header and keep-alive agents is identical to that of node-fetch [1][4]. Regarding connection handling in node-fetch: 1. Connection Header: Historically, node-fetch included a default Connection: close header, which could interfere with connection pooling and keep-alive functionality [5][6]. This default behavior was removed in later versions to allow the underlying Node.js http/https implementation to manage the Connection header according to standard defaults (which typically support keep-alive when agents are used) [6]. 2. Keep-Alive Agents: Node-fetch supports the use of custom http(s).Agent instances via the agent option in the fetch request configuration [4]. When an agent is provided, node-fetch delegates the responsibility of connection management—including keep-alive settings—to that agent [4][6]. If no agent is explicitly provided, node-fetch relies on the default Node.js global agent [4]. Since Node.js 19, keepAlive is true by default in the global agent, but in earlier versions, users needed to manually configure and pass an agent with keepAlive: true to utilize connection pooling effectively [4]. In summary, to enable keep-alive when using cross-fetch in a Node.js backend, you should follow standard node-fetch practices: pass an http(s).Agent configured with { keepAlive: true } in the options object of your fetch call [4][5].
Citations:
🏁 Script executed:
Repository: sourcebot-dev/sourcebot
Length of output: 6159
🏁 Script executed:
Repository: sourcebot-dev/sourcebot
Length of output: 1447
Avoid
Connection: closeon every Gitea request. This path paginates org/repo/user listings, so a sync can make many requests; forcing a fresh connection each time drops keep-alive reuse and adds avoidable TCP/TLS overhead. Gate the workaround to affected hosts or use a keep-alive agent instead.🤖 Prompt for AI Agents