Skip to content

Easy optimisations#141

Open
samcunliffe wants to merge 10 commits into
mainfrom
ai/easy-optimisations
Open

Easy optimisations#141
samcunliffe wants to merge 10 commits into
mainfrom
ai/easy-optimisations

Conversation

@samcunliffe
Copy link
Copy Markdown
Member

@samcunliffe samcunliffe commented Apr 10, 2026

Caution

Before you even bother reviewing this: how do we feel about AI-assisted code in this repo?


The core commit here, 28e347f, was heavily AI-assisted. I've reviewed and added some human comments (you have to believe me when I say the human wrote them).

Solves

What

Instead of running the data collection concurrently in series, run 4 threads. This doesn't hit rate limits (or at least not any more than the original) and speeds the data collection up from > 1 hour to ~10 minutes (❗❗❗).

A factor 6 code increase for 600 100 new lines of code.

I (human Sam) also moved the token generation to just before we call the data collection (so the cache and dependencies etc all happen before). This is a very marginal saving.

Why

Because GitHub app tokens have a hard limit of 1 hour life. There doesn't appear to be a way to extend the validity. Also, the speedup is probably a bit better for developer quality of life when handling Dependabot updates.

samcunliffe and others added 4 commits March 31, 2026 07:38
... to speed up data collection.

AI-generated: GPT-5.3-Codex (via GitHub Copilot)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@samcunliffe samcunliffe force-pushed the ai/easy-optimisations branch from 4c7a7a5 to 28e347f Compare April 13, 2026 10:37
Copy link
Copy Markdown
Member Author

@samcunliffe samcunliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the main saving is concurrently fetching issues data (because many issues for many repos).

Comment on lines +6 to +27
export const mapWithConcurrency = async <T, R>(
items: T[],
concurrency: number,
mapper: (item: T) => Promise<R>,
): Promise<R[]> => {
const results: R[] = new Array(items.length);
let nextIndex = 0;

const worker = async () => {
while (nextIndex < items.length) {
const currentIndex = nextIndex;
nextIndex += 1;
results[currentIndex] = await mapper(items[currentIndex]);
}
};

await Promise.all(
Array.from({ length: Math.min(concurrency, items.length) }, () => worker()),
);

return results;
};
Copy link
Copy Markdown
Member Author

@samcunliffe samcunliffe Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Human Sam wrote this:

This function effectively does map(f, list), but because of JavaScript and nicer lambda functions, the order is first: the items, then: the number of concurrent function mappings to run at once, then the function to map.

Later, this is used in the fetchers so we can fetch ~4 repos and issues concurrently at once.

const {
issuesAverageAge: openIssuesAverageAge,
issuesMedianAge: openIssuesMedianAge,
} = await calculateIssueMetricsPerRepo(repoName, 'open', octokit, config);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're using mapWithConcurrency to map calculateIssueMetricsPerRepo over the list of repository names (for example).

@samcunliffe samcunliffe marked this pull request as ready for review April 13, 2026 10:59
@samcunliffe samcunliffe requested a review from a team as a code owner April 13, 2026 10:59
@samcunliffe samcunliffe requested a review from Copilot April 13, 2026 11:00
@samcunliffe samcunliffe self-assigned this Apr 13, 2026
@samcunliffe samcunliffe added enhancement New feature or request javascript Pull requests that update javascript code labels Apr 13, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Speeds up metrics collection by introducing a small concurrency utility and using it to parallelize backend fetchers; also tweaks the CI workflow and updates the npm lockfile.

Changes:

  • Add mapWithConcurrency worker-pool helper and use it to parallelize repository contributor stats fetching.
  • Parallelize per-repository issue metric collection.
  • Adjust GitHub Actions workflow to cache npm deps, move token generation later, and run backend via npm workspaces (plus resulting package-lock.json updates).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
backend/src/fetchers/fetcher_utils.ts Introduces mapWithConcurrency helper for bounded parallel mapping.
backend/src/fetchers/repository.ts Uses bounded concurrency + retry loop to fetch contributor stats more quickly.
backend/src/fetchers/issues.ts Runs per-repo issue metrics collection with bounded concurrency and per-repo error logging.
.github/workflows/nextjs.yml Adds npm caching, moves app-token creation later, and uses workspace-based backend install/run.
package-lock.json Updates lockfile (notably includes large dependency bumps).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/src/fetchers/repository.ts
Comment thread backend/src/fetchers/issues.ts
Comment thread backend/src/fetchers/fetcher_utils.ts
Comment on lines -27 to -31
- uses: actions/create-github-app-token@29824e69f54612133e76f7eaac726eef6c875baf # v2.2.1
id: generate_token
with:
app-id: ${{ secrets.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
Copy link
Copy Markdown
Member

@giordano giordano Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the move to try and keep the key alive as long as possible (even if it's just a bunch of extra seconds)? (Edit: I just saw the human comment in the fist post) Here is a suggestion for automatically refreshing the key, for longer-running tasks, but it'd require some substantial changes here. Having an overall quicker workflow would be nicer though

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly.

Though if the dependencies change it can actually take ones of minutes to do the npm install.

This saving is essentially negligible compared to the parallelsation and non blocking concurrency.

Comment thread package-lock.json
samcunliffe and others added 3 commits April 13, 2026 14:16
Package updates can come in other Dependabot PRs.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment thread backend/src/fetchers/repository.ts Outdated
Comment on lines +10 to +13
const CONTRIBUTORS_RETRY_DELAY_MS = 15_000;
// GitHub's stats/contributors endpoint often returns 202 for several minutes
// while repository statistics are being generated, so allow a longer retry window.
const CONTRIBUTORS_MAX_RETRIES = 12;
Copy link
Copy Markdown
Member Author

@samcunliffe samcunliffe Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const CONTRIBUTORS_RETRY_DELAY_MS = 15_000;
// GitHub's stats/contributors endpoint often returns 202 for several minutes
// while repository statistics are being generated, so allow a longer retry window.
const CONTRIBUTORS_MAX_RETRIES = 12;
// GitHub's stats/contributors endpoint often returns 202 for several minutes
// while repository statistics are being generated, so allow a longer retry window.
const CONTRIBUTORS_RETRY_DELAY_MS = 30_000;
const CONTRIBUTORS_MAX_RETRIES = 4;

Copy link
Copy Markdown
Member Author

@samcunliffe samcunliffe Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@UCL/open-source-impact-seed

So, this business will likely need a bit of tweaking.

It might be nice to run with a lot more retries in the nightly cron but only a few when we're testing dependencies. I.e. have these numbers passed in as configuration parameters to the job.

For now though, this works. ~Some repos still return 202 for the contributor count. But these are not the "stellar" repos that are at the top of the showcase. So my feeling is: leave as is.

@samcunliffe samcunliffe requested review from a team and giordano April 13, 2026 13:48
@samcunliffe samcunliffe enabled auto-merge (squash) April 13, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request javascript Pull requests that update javascript code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants