Easy optimisations by samcunliffe · Pull Request #141 · UCL/open-source-dashboard

samcunliffe · 2026-04-10T16:28:20Z

Caution

Before you even bother reviewing this: how do we feel about AI-assisted code in this repo?

The core commit here, 28e347f, was heavily AI-assisted. I've reviewed and added some human comments (you have to believe me when I say the human wrote them).

Solves

Failing: hitting resource limit #123

What

Instead of running the data collection concurrently in series, run 4 threads. This doesn't hit rate limits (or at least not any more than the original) and speeds the data collection up from > 1 hour to ~10 minutes (❗❗❗).

A factor 6 code increase for ~~600~~ 100 new lines of code.

I (human Sam) also moved the token generation to just before we call the data collection (so the cache and dependencies etc all happen before). This is a very marginal saving.

Why

Because GitHub app tokens have a hard limit of 1 hour life. There doesn't appear to be a way to extend the validity. Also, the speedup is probably a bit better for developer quality of life when handling Dependabot updates.

... to speed up data collection. AI-generated: GPT-5.3-Codex (via GitHub Copilot) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This reverts commit 4bc8326.

samcunliffe

I believe the main saving is concurrently fetching issues data (because many issues for many repos).

samcunliffe · 2026-04-13T10:47:58Z

+export const mapWithConcurrency = async <T, R>(
+  items: T[],
+  concurrency: number,
+  mapper: (item: T) => Promise<R>,
+): Promise<R[]> => {
+  const results: R[] = new Array(items.length);
+  let nextIndex = 0;
+
+  const worker = async () => {
+    while (nextIndex < items.length) {
+      const currentIndex = nextIndex;
+      nextIndex += 1;
+      results[currentIndex] = await mapper(items[currentIndex]);
+    }
+  };
+
+  await Promise.all(
+    Array.from({ length: Math.min(concurrency, items.length) }, () => worker()),
+  );
+
+  return results;
+};


Human Sam wrote this:

This function effectively does map(f, list), but because of JavaScript and nicer lambda functions, the order is first: the items, then: the number of concurrent function mappings to run at once, then the function to map.

Later, this is used in the fetchers so we can fetch ~4 repos and issues concurrently at once.

samcunliffe · 2026-04-13T10:49:40Z

+      const {
+        issuesAverageAge: openIssuesAverageAge,
+        issuesMedianAge: openIssuesMedianAge,
+      } = await calculateIssueMetricsPerRepo(repoName, 'open', octokit, config);


Here we're using mapWithConcurrency to map calculateIssueMetricsPerRepo over the list of repository names (for example).

Copilot

Pull request overview

Speeds up metrics collection by introducing a small concurrency utility and using it to parallelize backend fetchers; also tweaks the CI workflow and updates the npm lockfile.

Changes:

Add mapWithConcurrency worker-pool helper and use it to parallelize repository contributor stats fetching.
Parallelize per-repository issue metric collection.
Adjust GitHub Actions workflow to cache npm deps, move token generation later, and run backend via npm workspaces (plus resulting package-lock.json updates).

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`backend/src/fetchers/fetcher_utils.ts`	Introduces `mapWithConcurrency` helper for bounded parallel mapping.
`backend/src/fetchers/repository.ts`	Uses bounded concurrency + retry loop to fetch contributor stats more quickly.
`backend/src/fetchers/issues.ts`	Runs per-repo issue metrics collection with bounded concurrency and per-repo error logging.
`.github/workflows/nextjs.yml`	Adds npm caching, moves app-token creation later, and uses workspace-based backend install/run.
`package-lock.json`	Updates lockfile (notably includes large dependency bumps).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

giordano · 2026-04-13T11:03:51Z

-      - uses: actions/create-github-app-token@29824e69f54612133e76f7eaac726eef6c875baf # v2.2.1
-        id: generate_token
-        with:
-          app-id: ${{ secrets.APP_ID }}
-          private-key: ${{ secrets.APP_PRIVATE_KEY }}


Is the move to try and keep the key alive as long as possible (even if it's just a bunch of extra seconds)? (Edit: I just saw the human comment in the fist post) Here is a suggestion for automatically refreshing the key, for longer-running tasks, but it'd require some substantial changes here. Having an overall quicker workflow would be nicer though

Yeah, exactly.

Though if the dependencies change it can actually take ones of minutes to do the npm install.

This saving is essentially negligible compared to the parallelsation and non blocking concurrency.

Package updates can come in other Dependabot PRs.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

samcunliffe · 2026-04-13T13:37:16Z

+const CONTRIBUTORS_RETRY_DELAY_MS = 15_000;
+// GitHub's stats/contributors endpoint often returns 202 for several minutes
+// while repository statistics are being generated, so allow a longer retry window.
+const CONTRIBUTORS_MAX_RETRIES = 12;


Suggested change

const CONTRIBUTORS_RETRY_DELAY_MS = 15_000;

// GitHub's stats/contributors endpoint often returns 202 for several minutes

// while repository statistics are being generated, so allow a longer retry window.

const CONTRIBUTORS_MAX_RETRIES = 12;

// GitHub's stats/contributors endpoint often returns 202 for several minutes

// while repository statistics are being generated, so allow a longer retry window.

const CONTRIBUTORS_RETRY_DELAY_MS = 30_000;

const CONTRIBUTORS_MAX_RETRIES = 4;

@UCL/open-source-impact-seed

So, this business will likely need a bit of tweaking.

It might be nice to run with a lot more retries in the nightly cron but only a few when we're testing dependencies. I.e. have these numbers passed in as configuration parameters to the job.

For now though, this works. ~Some repos still return 202 for the contributor count. But these are not the "stellar" repos that are at the top of the showcase. So my feeling is: leave as is.

samcunliffe and others added 4 commits March 31, 2026 07:38

Run npm audit fix.

e89847b

Try adding a limit to the age range of e.g. issues.

4bc8326

Generate token right before collecting metrics.

dc5a3a0

Improve concurrency and use bounded parallelisation

28e347f

... to speed up data collection. AI-generated: GPT-5.3-Codex (via GitHub Copilot) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

samcunliffe force-pushed the ai/easy-optimisations branch from 4c7a7a5 to 28e347f Compare April 13, 2026 10:37

samcunliffe added 2 commits April 13, 2026 11:37

Revert "Try adding a limit to the age range of e.g. issues."

67a3d1c

This reverts commit 4bc8326.

Add documentation to the worker pool concurrency getter.

8bca65f

samcunliffe commented Apr 13, 2026

View reviewed changes

samcunliffe marked this pull request as ready for review April 13, 2026 10:59

samcunliffe requested a review from a team as a code owner April 13, 2026 10:59

samcunliffe requested a review from Copilot April 13, 2026 11:00

samcunliffe self-assigned this Apr 13, 2026

samcunliffe added enhancement New feature or request javascript Pull requests that update javascript code labels Apr 13, 2026

Copilot started reviewing on behalf of samcunliffe April 13, 2026 11:00 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread backend/src/fetchers/repository.ts

Comment thread backend/src/fetchers/issues.ts

Comment thread backend/src/fetchers/fetcher_utils.ts

giordano reviewed Apr 13, 2026

View reviewed changes

samcunliffe commented Apr 13, 2026

View reviewed changes

Comment thread package-lock.json

samcunliffe and others added 3 commits April 13, 2026 14:16

Revert changes to packages for this PR.

d6aeeaf

Package updates can come in other Dependabot PRs.

Try 12 retries.

22e1f59

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Make sure the number of concurrent threads >=1 and is integer.

5c40a38

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

samcunliffe commented Apr 13, 2026

View reviewed changes

Revert 12 retries but wait a bit longer.

6696527

samcunliffe requested review from a team and giordano April 13, 2026 13:48

samcunliffe enabled auto-merge (squash) April 13, 2026 13:53

samcunliffe mentioned this pull request Apr 13, 2026

Run npm audit fix. #132

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Easy optimisations#141

Easy optimisations#141
samcunliffe wants to merge 10 commits into
mainfrom
ai/easy-optimisations

samcunliffe commented Apr 10, 2026 •

edited

Loading

Uh oh!

samcunliffe left a comment

Uh oh!

samcunliffe Apr 13, 2026 •

edited

Loading

Uh oh!

samcunliffe Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

giordano Apr 13, 2026 •

edited

Loading

Uh oh!

samcunliffe Apr 13, 2026

Uh oh!

Uh oh!

samcunliffe Apr 13, 2026 •

edited

Loading

Uh oh!

samcunliffe Apr 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

samcunliffe commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Solves

What

Why

Uh oh!

samcunliffe left a comment

Choose a reason for hiding this comment

Uh oh!

samcunliffe Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samcunliffe Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

giordano Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samcunliffe Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

samcunliffe Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samcunliffe Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

samcunliffe commented Apr 10, 2026 •

edited

Loading

samcunliffe Apr 13, 2026 •

edited

Loading

giordano Apr 13, 2026 •

edited

Loading

samcunliffe Apr 13, 2026 •

edited

Loading

samcunliffe Apr 13, 2026 •

edited

Loading