Fix/ingestion flow#159
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Warning Review limit reached
More reviews will be available in 27 minutes and 59 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR migrates HMRC sponsor location data from HMRC-provided town/city fields to Companies House-derived location. The schema, ingestion, search API, matching adapters, and UI components are updated to source location from Companies House profiles via query-time joins while capturing sponsor licence numbers. ChangesHMRC Location Data Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/web/src/api/hmrc.ts (1)
220-228:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't cap stale-slug fallback to 10 raw licence rows.
The comment here talks about a handful of
(rating, route)groups, but this query still returns one row per licence. Any slug with more than 10 licences can drop the requested hash from the loader's containment scan and incorrectly 301 to a different record even though that hash exists again.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/web/src/api/hmrc.ts` around lines 220 - 228, The query against hmrcSkilledWorkers currently uses .limit(10) which caps raw licence rows and can drop the requested hash; remove the .limit(10) and instead return group-level results (or at minimum all matching rows) so the loader's containment scan won't miss the requested hash. Modify the db.select/from/where block that builds rows (the call using hmrcSkilledWorkers, eq(hmrcSkilledWorkers.nameSlug, slug), orderBy(asc(hmrcSkilledWorkers.hash))) to either remove the limit or replace it with a GROUP BY on the intended grouping columns (e.g., hmrcSkilledWorkers.rating and hmrcSkilledWorkers.route) and select a single representative per group (e.g., min/max hash) so you get "a handful" of groups rather than up to 10 arbitrary licence rows.
🧹 Nitpick comments (1)
docs/hmrc-csv-format-change.md (1)
96-96: ⚡ Quick winClarify or remove the "CLAUDE.md" reference.
The documentation references a "
CLAUDE.mdinvariant" but no such file appears in the provided context, coding guidelines, or review stack. This could confuse readers.If this refers to an internal document, consider either:
- Linking to it explicitly
- Replacing it with "coding guidelines" or "useCardMetrics sync requirement"
- Removing the reference and keeping just the technical constraint description
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/hmrc-csv-format-change.md` at line 96, The doc line references a mysterious "CLAUDE.md invariant" alongside `fixedHeight:62` and `useCardMetrics`; update this to avoid confusion by either (a) adding an explicit link to the CLAUDE.md file if it exists, (b) replacing "CLAUDE.md invariant" with a clearer phrase like "coding guidelines" or "useCardMetrics sync requirement", or (c) remove the file reference entirely and leave only the technical constraint (e.g., "`fixedHeight:62` — `useCardMetrics` must stay in sync"), ensuring the terms `fixedHeight:62` and `useCardMetrics` remain unchanged so the technical requirement is preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/web/scripts/ingest-hmrc-csv.ts`:
- Around line 117-125: The ingestion currently dedupes solely by the hash
produced in computeHash(licence,typeRating,route), which can hide conflicting
values in other columns (e.g., organisation_name, sponsor_status); before
collapsing rows by hmrc_skilled_workers.hash, group incoming CSV rows by the
computed hash and compare all non-hash, persistent fields (at least
organisation_name and sponsor_status) for consistency; if any group contains
differing values, surface the conflict (fail the import or log and skip
according to policy) so you don’t silently retain arbitrary
organisation_name/sponsor_status, and only then dedupe/insert the canonical row
into the table or abort.
In `@apps/web/src/api/hmrc.ts`:
- Around line 54-58: The array_agg(distinct ...) expression for
sponsorLicenceNumbers is non-deterministic because it doesn't guarantee order;
change it to produce a deterministic ordered array by using ORDER BY inside the
aggregate (or an ordered subselect). Update the field that references
hmrcSkilledWorkers.sponsorLicenceNumber (the sponsorLicenceNumbers projection)
to use coalesce(array_agg(distinct ${hmrcSkilledWorkers.sponsorLicenceNumber}
ORDER BY ${hmrcSkilledWorkers.sponsorLicenceNumber}) FILTER (WHERE
${hmrcSkilledWorkers.sponsorLicenceNumber} IS NOT NULL), '{}') so the array
elements are consistently ordered, and make the same change for the other
occurrence mentioned (the block at lines ~154-157).
---
Outside diff comments:
In `@apps/web/src/api/hmrc.ts`:
- Around line 220-228: The query against hmrcSkilledWorkers currently uses
.limit(10) which caps raw licence rows and can drop the requested hash; remove
the .limit(10) and instead return group-level results (or at minimum all
matching rows) so the loader's containment scan won't miss the requested hash.
Modify the db.select/from/where block that builds rows (the call using
hmrcSkilledWorkers, eq(hmrcSkilledWorkers.nameSlug, slug),
orderBy(asc(hmrcSkilledWorkers.hash))) to either remove the limit or replace it
with a GROUP BY on the intended grouping columns (e.g.,
hmrcSkilledWorkers.rating and hmrcSkilledWorkers.route) and select a single
representative per group (e.g., min/max hash) so you get "a handful" of groups
rather than up to 10 arbitrary licence rows.
---
Nitpick comments:
In `@docs/hmrc-csv-format-change.md`:
- Line 96: The doc line references a mysterious "CLAUDE.md invariant" alongside
`fixedHeight:62` and `useCardMetrics`; update this to avoid confusion by either
(a) adding an explicit link to the CLAUDE.md file if it exists, (b) replacing
"CLAUDE.md invariant" with a clearer phrase like "coding guidelines" or
"useCardMetrics sync requirement", or (c) remove the file reference entirely and
leave only the technical constraint (e.g., "`fixedHeight:62` — `useCardMetrics`
must stay in sync"), ensuring the terms `fixedHeight:62` and `useCardMetrics`
remain unchanged so the technical requirement is preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ac48351c-94c0-45ec-b24a-db04fc6a6fab
📒 Files selected for processing (21)
apps/web/scripts/drain-review-queue.tsapps/web/scripts/generate-sitemap.tsapps/web/scripts/ingest-hmrc-csv.tsapps/web/scripts/seed-companies-house.tsapps/web/src/api/cache-headers.tsapps/web/src/api/companiesHouse.tsapps/web/src/api/hmrc.tsapps/web/src/components/HmrcCard.tsxapps/web/src/components/HmrcResults.tsxapps/web/src/components/McpTools.tsxapps/web/src/lib/phase5/sql.tsapps/web/src/routes/company.$id.$slug.tsxdocs/hmrc-csv-format-change.mdpackages/db/migrations/0025_add-sponsor-licence.sqlpackages/db/migrations/0026_drop-town-county.sqlpackages/db/migrations/0027_widen-sponsor-licence.sqlpackages/db/migrations/meta/0025_snapshot.jsonpackages/db/migrations/meta/0026_snapshot.jsonpackages/db/migrations/meta/0027_snapshot.jsonpackages/db/migrations/meta/_journal.jsonpackages/db/src/schema.ts
Summary by CodeRabbit
New Features
Improvements
Documentation