Skip to content

fix: classify merged gitlab MRs as MERGE_REQUEST_MERGED (CM-1298)#4271

Merged
joanagmaia merged 5 commits into
mainfrom
fix/CM-1298-gitlab-merged-mr-type
Jun 26, 2026
Merged

fix: classify merged gitlab MRs as MERGE_REQUEST_MERGED (CM-1298)#4271
joanagmaia merged 5 commits into
mainfrom
fix/CM-1298-gitlab-merged-mr-type

Conversation

@joanagmaia

@joanagmaia joanagmaia commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fix: GitLab API ingestion was emitting MERGE_REQUEST_CLOSED for merge requests with a merged_at timestamp, so MERGE_REQUEST_MERGED activities never landed in the database. Switches the API path to emit MERGE_REQUEST_MERGED for merged MRs; MERGE_REQUEST_CLOSED continues to cover MRs closed without merging. Webhook path was already correct.
  • Cleanup tooling: Generalises the gerrit one-off cleanup script (renamed cleanup-gerrit-activities.tscleanup-activities-by-platform-and-type.ts) so it accepts --platform, --types, and optional --before CLI args. We'll use it to purge the mislabeled gitlab merge_request-closed rows in Postgres and Tinybird so re-ingestion can recreate them with the correct type.
  • Cleanup safety hardening (ports improvements from chore: expand gerrit delete activities script #4080, which this PR supersedes):
    • Pre-flight Tinybird count() on both datasources before any destructive action — shows blast radius.
    • Interactive confirmation prompt with --yes / -y bypass for non-interactive runs.
    • Split filter builders: Postgres (pg-promise params, "updatedAt" double-quoted) vs Tinybird (unquoted ClickHouse). PG cleanup now uses a direct chunked DELETE driven by the PG filter itself, decoupled from Tinybird query throughput.
    • Tinybird wait timeout extended from 1h → 6h for large bulk deletes.
    • Result JSON persisted immediately after triggering Tinybird jobs so job IDs survive a wait timeout.
    • Docstring note that derived materialized views are not cascaded by raw datasource deletes.

Fixes CM-1298.
Supersedes #4080.

Test plan

  • Lint + tsc pass for services/libs/integrations and services/apps/script_executor_worker.
  • Run the cleanup script in --dry-run mode against staging and confirm the Tinybird row counts match expectations.
  • Run cleanup against staging (interactive confirmation), then trigger a GitLab integration run and verify MERGE_REQUEST_MERGED activities appear with timestamp matching merged_at.
  • Confirm MRs closed without merging still produce MERGE_REQUEST_CLOSED.

🤖 Generated with Claude Code


Note

Medium Risk
The GitLab type fix is low risk; the new operational script performs irreversible bulk deletes in Postgres and Tinybird, so mis-specified filters or skipped confirmation could cause significant data loss.

Overview
GitLab ingestion fix: API polling in processStream.ts now emits MERGE_REQUEST_MERGED when a merge request has merged_at, instead of mislabeling those rows as MERGE_REQUEST_CLOSED. MRs with only closed_at still use MERGE_REQUEST_CLOSED, aligning the API path with existing webhook handling and processData parsers.

Cleanup tooling: The hard-coded Gerrit cleanup script is removed and replaced by cleanup-activities-by-platform-and-type.ts, wired via a new pnpm script. Operators pass --platform, --types, and optional --segment-id, --before, --dry-run, --yes, and --tb-token to purge matching rows from Postgres activityRelations and Tinybird activities / activityRelations.

Safety and behavior changes in the new script: Pre-flight Tinybird count() on both datasources, interactive confirmation (skippable with --yes), validated CLI inputs, separate Postgres (parameterized) vs Tinybird (interpolated) filter builders, Postgres deletes via chunked ID fetch/delete on the PG filter (not Tinybird-driven batches), 6h Tinybird job wait with results JSON written before the wait so job IDs survive timeouts, and documentation that raw datasource deletes do not cascade to materialized views.

Reviewed by Cursor Bugbot for commit 24e4b7e. Bugbot is set up for automated code reviews on this repo. Configure here.

API ingestion was emitting MERGE_REQUEST_CLOSED for merge requests with a
merged_at timestamp, so MERGE_REQUEST_MERGED activities never landed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings June 26, 2026 12:03

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes GitLab API stream ingestion classification so merge requests that have a merged_at timestamp emit MERGE_REQUEST_MERGED (instead of incorrectly emitting MERGE_REQUEST_CLOSED), aligning API-stream behavior with the existing merged-handling in processData and webhook ingestion.

Changes:

  • Update handleMergeRequestsStream to emit GitlabActivityType.MERGE_REQUEST_MERGED when item.data.merged_at is present.
Comments suppressed due to low confidence (1)

services/libs/integrations/src/integrations/gitlab/processStream.ts:199

  • Merged merge requests typically have both merged_at and closed_at set. With this change, merged MRs will emit MERGE_REQUEST_MERGED and then also fall through to the closed_at branch, producing an extra MERGE_REQUEST_CLOSED activity. If MERGE_REQUEST_CLOSED is meant to represent “closed without merging”, the closed branch should be else if (or otherwise gated) so it doesn’t run for merged MRs.
        type: GitlabActivityType.MERGE_REQUEST_MERGED,
        projectId: data.projectId,
        pathWithNamespace: data.pathWithNamespace,
      })
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings June 26, 2026 12:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

Comment thread services/libs/integrations/src/integrations/gitlab/processStream.ts
…M-1298)

Generalize the gerrit one-off cleanup so it can target any platform and any
set of activity types via --platform / --types / --before CLI args. Used to
purge mislabeled gitlab merge_request-closed rows so re-ingestion can recreate
them with the correct type.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
@joanagmaia joanagmaia force-pushed the fix/CM-1298-gitlab-merged-mr-type branch from b2a7c4a to f408ec7 Compare June 26, 2026 12:22
@joanagmaia joanagmaia requested a review from mbani01 June 26, 2026 12:27
…ect PG delete (CM-1298)

Ports safety improvements proposed in #4080 onto the parameterized
script:

- Split into separate Tinybird (unquoted) and Postgres (pg-promise params,
  "updatedAt" double-quoted) filter builders so each store gets its own
  dialect.
- Add cheap pre-flight Tinybird count() on both datasources to show blast
  radius before any destructive action.
- Interactive confirmation prompt with --yes / -y bypass for
  non-interactive runs.
- Switch PG cleanup to direct chunked DELETE (fetch matching IDs from
  PG, delete by PK) rather than streaming IDs from Tinybird. Decouples PG
  cleanup throughput from Tinybird.
- Extend Tinybird job wait timeout from 1h to 6h for large bulk deletes.
- Persist result JSON immediately after triggering TB jobs so the job
  IDs survive a wait timeout.
- Docstring note that derived MVs are not cascaded by raw datasource
  deletes.

Supersedes #4080.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings June 26, 2026 12:41

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 6c07a9c. Configure here.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 4 changed files in this pull request and generated no new comments.

mbani01
mbani01 previously approved these changes Jun 26, 2026
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
@joanagmaia joanagmaia merged commit bb97951 into main Jun 26, 2026
18 checks passed
@joanagmaia joanagmaia deleted the fix/CM-1298-gitlab-merged-mr-type branch June 26, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants