Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [EE] Fixed Ask Sourcebot mermaid diagrams overflowing their container by contain-fitting them to both width and height, and made revealing a diagram from the answer jump it into view instantly to avoid over/undershooting. [#1373](https://github.com/sourcebot-dev/sourcebot/pull/1373)
- Verified GitHub review webhook deliveries before processing them. [#1378](https://github.com/sourcebot-dev/sourcebot/pull/1378)
- Passed Zoekt index parameters via argv to preserve revision names with punctuation. [#1376](https://github.com/sourcebot-dev/sourcebot/pull/1376)
- Fixed Gitea sync failing with `ERR_STREAM_PREMATURE_CLOSE` by forcing identity encoding on the Gitea API fetch and guarding against null repository responses. [#1405](https://github.com/sourcebot-dev/sourcebot/pull/1405)

## [5.0.4] - 2026-06-18

Expand Down
27 changes: 25 additions & 2 deletions packages/backend/src/gitea.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,22 @@ import { measure } from './utils.js';
const logger = createLogger('gitea');
const GITEA_CLOUD_HOSTNAME = "gitea.com";

// Some Gitea instances (particularly when behind certain reverse proxies or with
// response compression enabled) cause `cross-fetch` to fail while reading the
// response body with ERR_STREAM_PREMATURE_CLOSE. Forcing identity encoding and
// closing the connection avoids the premature close.
// @see https://github.com/sourcebot-dev/sourcebot/issues/1404
const customFetch: typeof fetch = (url, options = {}) => {
return fetch(url, {
...options,
headers: {
...(options.headers ?? {}),
'Accept-Encoding': 'identity',
'Connection': 'close',
},
});
};
Comment on lines +15 to +29

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

Does forcing a Connection: close header on every fetch request in Node.js (via cross-fetch / node-fetch) cause performance regressions or socket errors with HTTP keep-alive agents?

💡 Result:

Forcing a Connection: close header on every fetch request in Node.js can cause performance regressions and, in certain environments, trigger socket errors or connectivity issues [1][2]. The impact of this practice is primarily negative due to the following reasons: 1. Performance Regression: HTTP keep-alive is designed to improve performance by reusing existing TCP connections for multiple requests [3][4]. By forcing Connection: close, you instruct the client to terminate the TCP connection after each request [3]. This forces the client to perform a full TCP (and potentially TLS) handshake for every single request, significantly increasing latency and overhead, especially for applications making frequent calls to the same host [4]. 2. Socket Errors and Compatibility: Modern Node.js versions and HTTP agents are increasingly optimized for keep-alive by default [2][5]. Forcing Connection: close can conflict with the internal management logic of the Node.js http.Agent [1][2]. Specifically, sending a Connection: close header on a connection that the underlying agent intends to keep open can lead to protocol mismatches, resulting in socket hang-ups or ECONNRESET errors [2]. 3. Library-Specific Context: In older versions of libraries like node-fetch, the library automatically appended a Connection: close header when no agent was provided [6]. This behavior was later identified as a source of performance degradation and instability in newer Node.js releases, leading to its removal in subsequent updates [1][2]. Modern implementations rely on the Node.js http.Agent to manage connection lifecycles appropriately [1][7]. To manage connection behavior correctly, you should: - Use an http.Agent with the keepAlive: true option enabled if you wish to reuse connections [5][8]. - Avoid manually setting the Connection header; allow the Node.js Agent to manage the header based on the agent's configuration and the specific request/response lifecycle [3][1]. Top results: [3][1][2][5]

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== gitea.ts ==\n'
sed -n '1,220p' packages/backend/src/gitea.ts

printf '\n== package references ==\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' \
  'cross-fetch|node-fetch|keepAlive|Connection: close|Accept-Encoding|fetch\(' \
  packages/backend package.json pnpm-lock.yaml yarn.lock package-lock.json 2>/dev/null || true

printf '\n== file list ==\n'
git ls-files 'packages/backend/*' | sed -n '1,120p'

Repository: sourcebot-dev/sourcebot

Length of output: 10256


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '1,220p' packages/backend/src/gitea.ts
printf '\n---\n'
rg -n --hidden --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' \
  'cross-fetch|node-fetch|keepAlive|Connection: close|Accept-Encoding|fetch\(' \
  packages/backend package.json pnpm-lock.yaml yarn.lock package-lock.json 2>/dev/null || true
printf '\n---\n'
git ls-files 'packages/backend/*' | sed -n '1,120p'

Repository: sourcebot-dev/sourcebot

Length of output: 10207


🌐 Web query:

cross-fetch Node.js backend uses node-fetch and how it handles the Connection header and keep-alive agent

💡 Result:

Cross-fetch does not implement its own logic for connection handling or HTTP agents; instead, it acts as a proxy [1][2][3]. When running in a Node.js environment, cross-fetch imports and exports node-fetch [1][3]. Consequently, its behavior regarding the Connection header and keep-alive agents is identical to that of node-fetch [1][4]. Regarding connection handling in node-fetch: 1. Connection Header: Historically, node-fetch included a default Connection: close header, which could interfere with connection pooling and keep-alive functionality [5][6]. This default behavior was removed in later versions to allow the underlying Node.js http/https implementation to manage the Connection header according to standard defaults (which typically support keep-alive when agents are used) [6]. 2. Keep-Alive Agents: Node-fetch supports the use of custom http(s).Agent instances via the agent option in the fetch request configuration [4]. When an agent is provided, node-fetch delegates the responsibility of connection management—including keep-alive settings—to that agent [4][6]. If no agent is explicitly provided, node-fetch relies on the default Node.js global agent [4]. Since Node.js 19, keepAlive is true by default in the global agent, but in earlier versions, users needed to manually configure and pass an agent with keepAlive: true to utilize connection pooling effectively [4]. In summary, to enable keep-alive when using cross-fetch in a Node.js backend, you should follow standard node-fetch practices: pass an http(s).Agent configured with { keepAlive: true } in the options object of your fetch call [4][5].

Citations:


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== root package.json ==\n'
sed -n '1,220p' package.json

printf '\n== backend package.json ==\n'
sed -n '1,220p' packages/backend/package.json

printf '\n== cross-fetch/node-fetch package metadata in lockfile ==\n'
awk '
  $0 ~ /^"cross-fetch@npm:\^4\.0\.0":$/ {show=1}
  show {print}
  show && NF==0 {exit}
' yarn.lock

printf '\n== node-fetch 2.7.0 metadata in lockfile ==\n'
awk '
  $0 ~ /^"node-fetch@npm:\^2\.6\.7, node-fetch@npm:\^2\.7\.0":$/ {show=1}
  show {print}
  show && NF==0 {exit}
' yarn.lock

Repository: sourcebot-dev/sourcebot

Length of output: 6159


🏁 Script executed:

#!/bin/bash
set -euo pipefail

printf '\n== paginate usage in gitea.ts ==\n'
rg -n 'paginate\(' packages/backend/src/gitea.ts

printf '\n== connectionUtils.ts ==\n'
sed -n '1,220p' packages/backend/src/connectionUtils.ts

Repository: sourcebot-dev/sourcebot

Length of output: 1447


Avoid Connection: close on every Gitea request. This path paginates org/repo/user listings, so a sync can make many requests; forcing a fresh connection each time drops keep-alive reuse and adds avoidable TCP/TLS overhead. Gate the workaround to affected hosts or use a keep-alive agent instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/gitea.ts` around lines 15 - 29, The custom fetch wrapper
in gitea.ts is forcing Connection: close for every request, which breaks
keep-alive reuse during paginated org/repo/user syncs. Update customFetch to
apply this workaround only for affected Gitea hosts or error-prone cases, or
switch to a keep-alive agent while still preserving the Accept-Encoding identity
workaround. Keep the change localized to customFetch so the rest of the Gitea
request flow continues using normal connection reuse.


export const getGiteaReposFromConfig = async (config: GiteaConnectionConfig) => {
const hostname = config.url ?
new URL(config.url).hostname :
Expand All @@ -25,7 +41,7 @@ export const getGiteaReposFromConfig = async (config: GiteaConnectionConfig) =>

const api = giteaApi(config.url ?? 'https://gitea.com', {
token: token,
customFetch: fetch,
customFetch,
});

let allRepos: GiteaRepository[] = [];
Expand All @@ -49,8 +65,11 @@ export const getGiteaReposFromConfig = async (config: GiteaConnectionConfig) =>
allWarnings = allWarnings.concat(warnings);
}

allRepos = allRepos.filter(repo => repo.full_name !== undefined);
allRepos = allRepos.filter(repo => {
if (repo === null || repo === undefined) {
logger.warn(`Skipping null/undefined repository returned by the Gitea API`);
return false;
}
if (repo.full_name === undefined) {
logger.warn(`Repository with undefined full_name found: repoId=${repo.id}`);
return false;
Expand Down Expand Up @@ -208,6 +227,10 @@ const getRepos = async <T>(repoList: string[], api: Api<T>) => {
api.repos.repoGet(owner, repoName),
);

if (response.error || !response.data) {
throw response.error ?? new Error(`Received empty response body while fetching repository ${repo}`);
}

Comment on lines +230 to +233

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect getReposForOrgs/getReposOwnedByUsers for equivalent response validation
fd gitea.ts packages/backend/src --exec sed -n '96,219p' {}

Repository: sourcebot-dev/sourcebot

Length of output: 3484


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate paginate and the repo listing callers/validators
rg -n "const paginate|function paginate|paginate\(" packages/backend/src -g '*.ts' -g '*.tsx'

# Show the relevant sections around paginate and repoGet in gitea.ts
file="packages/backend/src/gitea.ts"
start=$(rg -n "const paginate|function paginate" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
  sed -n "$((start-20)),$((start+120))p" "$file"
fi

echo "---- repoGet section ----"
rg -n "repoGet\(|Received empty response body|response\.error|!response\.data" "$file" -n -A4 -B4

echo "---- repoCompileUtils non-null assertions ----"
rg -n "full_name!|clone_url!|id!" packages/backend/src/repoCompileUtils.ts -n -A3 -B3

Repository: sourcebot-dev/sourcebot

Length of output: 5303


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect measure() and any existing handling of HttpResponse errors/empty payloads
file="packages/backend/src/gitea.ts"

rg -n "const measure|function measure|measure\(" packages/backend/src -g '*.ts'

echo "---- measure and surrounding helpers ----"
start=$(rg -n "const measure|function measure" "$file" | head -n1 | cut -d: -f1)
if [ -n "${start:-}" ]; then
  sed -n "$((start-40)),$((start+80))p" "$file"
fi

echo "---- HttpResponse usage in gitea.ts ----"
rg -n "HttpResponse|response\.error|response\.data|data: response" "$file" -n -A3 -B3

echo "---- API client type definitions if present ----"
fd -a "types.ts" packages/backend/src node_modules 2>/dev/null | head -n 20

Repository: sourcebot-dev/sourcebot

Length of output: 2613


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Read measure() and the gitea pagination helper
sed -n '1,80p' packages/backend/src/utils.ts
echo "----"
sed -n '120,320p' packages/backend/src/gitea.ts

# Find where HttpResponse comes from if it's referenced in the repo
rg -n "interface HttpResponse|type HttpResponse|HttpResponse<" packages/backend/src -g '*.ts'

Repository: sourcebot-dev/sourcebot

Length of output: 7603


Guard the paginated org/user repo-listing responses
packages/backend/src/gitea.ts:145,185 getReposOwnedByUsers() / getReposForOrgs() still rely on paginate() without the same empty/error-response guard, so a premature-close response can surface as an opaque runtime failure in these sync paths.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/gitea.ts` around lines 230 - 233, The paginated repo
listing paths in getReposOwnedByUsers() and getReposForOrgs() are missing the
same empty/error-response guard used elsewhere, so a premature-close from
paginate() can bubble up as an opaque failure. Update the pagination handling in
these functions in gitea.ts to explicitly check for response.error or missing
response.data after each page fetch, and throw a clear error with the relevant
repo/org context before mapping or accumulating results.

logger.debug(`Found repo ${repo} in ${durationMs}ms.`);
return {
type: 'valid' as const,
Expand Down
Loading