Skip to content

jira: add input component for streaming issues, comments, and changelog#4484

Open
squiidz wants to merge 7 commits into
mainfrom
jira-input
Open

jira: add input component for streaming issues, comments, and changelog#4484
squiidz wants to merge 7 commits into
mainfrom
jira-input

Conversation

@squiidz

@squiidz squiidz commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Adds a jira input that streams Jira REST API events into Connect pipelines via JQL with cursor-based incremental polling. The cursor (max issue.updated timestamp) is persisted to a cache resource so progress survives restarts.

Available in both the Cloud and Self-Hosted distributions. Authenticates via API token (email + token). Supports resource={issues,comments,changelog}. The existing jira processor (enrichment lookups) is unchanged.

Shared HTTP/auth setup is extracted into a new internal/impl/jira/jiraauth sub-package consumed by both the processor and the input.

Limitations (v1): no OAuth, no worklogs resource, per-issue comment and changelog pagination is limited to the first response page (truncation logged and counted via the jira_input_child_truncated_total metric).

Adds a `jira` input that streams Jira REST API events into Connect
pipelines via JQL with cursor-based incremental polling. The cursor
(max issue.updated timestamp) is persisted to a cache resource so
progress survives restarts.

Available in both the Cloud and Self-Hosted distributions. Authenticates
via API token (email + token). Supports resource={issues,comments,changelog}.
The existing `jira` processor (enrichment lookups) is unchanged.

Shared HTTP/auth setup is extracted into a new internal/impl/jira/jiraauth
sub-package consumed by both the processor and the input.

Limitations (v1): no OAuth, no worklogs resource, per-issue comment and
changelog pagination is limited to the first response page (truncation
logged and counted via the jira_input_child_truncated_total metric).
Comment on lines +564 to +568
if r.page.pageHasNack.Load() {
// don't advance cursor; reset for the refetch on next Read
r.page.reset()
return
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nacked messages can be permanently lost during multi-page pagination.

On nack, onPageDrained calls r.page.reset() and returns without advancing the cursor. The comment states the intent is to "reset for the refetch on next Read", but r.nextToken is not reset here. In fetchNextPage, r.nextToken was already set to the current page's NextPageToken (pointing at the next page). So the subsequent ReadfetchNextPage fetches the next page rather than re-fetching the nacked page.

Because issues are ordered updated ASC, later pages have higher updated values. Once a subsequent page is acked, the cursor advances past the nacked page's range, and on the next poll cycle the updated >= cursor predicate excludes those records entirely — the nacked messages are never re-read.

Consider resetting r.nextToken (and restarting pagination from the cursor) on the nack path, or otherwise ensuring the nacked page is actually re-fetched. See onPageDrained.

Comment thread internal/impl/jira/input_jira.go Outdated
Comment on lines +740 to +752
q.Set("jql", r.buildJQL())
q.Set("fields", strings.Join(r.cfg.fields, ","))
expand := r.cfg.expand
if r.cfg.resource == resourceChangelog {
expand = appendUnique(expand, "changelog")
}
if len(expand) > 0 {
q.Set("expand", strings.Join(expand, ","))
}
q.Set("maxResults", strconv.Itoa(r.cfg.pageSize))
if r.nextToken != "" {
q.Set("nextPageToken", r.nextToken)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JQL filter changes between paginated requests while nextPageToken is reused.

onPageDrained runs after every page's acks settle, so the cursor advances mid-pagination: page 1 is fetched with the old cursor, then after it is acked the cursor moves to page 1's maxUpdated, and fetchNextPage for page 2 rebuilds the query here with a new updated >= cursor predicate (via buildJQL) while still passing the previous page's nextPageToken.

Jira's /rest/api/3/search/jql cursor pagination expects the JQL to remain stable across the page sequence — the opaque nextPageToken encodes a position in the original result set. Changing the JQL while reusing the token is unsound and can fail or return inconsistent results for any query exceeding page_size. The existing test passes only because the mock server doesn't validate JQL/token consistency.

Consider advancing the cursor only after the full pagination run completes (when nextPageToken is empty), rather than after each individual page. See buildSearchURL.

// See the License for the specific language governing permissions and
// limitations under the License.

//go:build integration

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration tests in this project should not use build tags — the test patterns state: "Do not use build tags. Use integration.CheckSkip(t) at the start of every integration test function." This file already calls integration.CheckSkip(t), so the //go:build integration tag is both redundant and against convention. Consider removing it so the test is gated solely by CheckSkip.

@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown

Commits
LGTM — single commit jira: add input component for streaming issues, comments, and changelog. Message follows the system: message format (lowercase, imperative), is accurate, and the change is one self-contained logical unit (new component + its tests, docs, and shared jiraauth extraction).

Review
New jira input component. The implementation, config spec, registration, info.csv entry, public wrapper coverage, and license headers all look correct. Two correctness concerns in the cursor/pagination interaction and one test-convention note:

  1. Data loss on nack during multi-page paginationonPageDrained does not reset r.nextToken on the nack path, so the nacked page is skipped (the next fetch advances to the following page) and, as later pages advance the cursor, the nacked records fall below the updated >= cursor predicate and are never re-read. (input_jira.go#L563-L578)
  2. JQL changes mid-pagination while reusing nextPageToken — the cursor advances after each acked page, so page 2+ requests rebuild the JQL with a new updated >= predicate while reusing the prior page's opaque continuation token, which Jira's cursor pagination does not support. Advance the cursor only after the full pagination run completes. (input_jira.go#L734-L755)
  3. Integration test uses a build tag//go:build integration is redundant (the test already calls integration.CheckSkip(t)) and against the project test conventions. (input_jira_integration_test.go#L15)

onPageDrained was advancing the cursor after each acked page and not
resetting r.nextToken on nack. This caused two correctness bugs:

- Nacked records on a multi-page response were never re-read: the next
  fetch reused the next-page token, skipping past the nacked page, and
  a subsequent ack advanced the cursor past the nacked range entirely.

- The JQL predicate changed between paginated requests while the opaque
  nextPageToken was reused, violating Jira's cursor-pagination contract
  (JQL must remain stable across the token sequence).

Accumulate max issue.updated across the run and only advance the
persisted cursor when nextPageToken is empty. On nack, reset
r.nextToken and the run accumulator so the next fetch restarts the
pagination run from the current cursor.

Also drop the `//go:build integration` tag from the integration test
per project convention (gating via integration.CheckSkip is enough).
Comment on lines +415 to +428
func newReader(conf *service.ParsedConfig, mgr *service.Resources) (*reader, error) {
cfg, err := parseInputConfig(conf)
if err != nil {
return nil, err
}
return &reader{
cfg: cfg,
mgr: mgr,
log: mgr.Logger(),
page: &pageState{},
cacheSetErrors: mgr.Metrics().NewCounter("jira_input_cache_set_errors_total"),
childTruncated: mgr.Metrics().NewCounter("jira_input_child_truncated_total", "resource"),
}, nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new jira input constructor does not enforce an enterprise license, but its sibling jira processor in this same package gates on one as its very first step:

func newJiraProcessor(conf *service.ParsedConfig, mgr *service.Resources) (*jiraProcessor, error) {
	if err := license.CheckRunningEnterprise(mgr); err != nil {
		return nil, err
	}
	...

(see processor_jira.go#L109-L112)

Both components are classified identically in info.csv (certified, cloud=y) and live in the same package, yet newReader has no license.CheckRunningEnterprise(mgr) call. In the full self-hosted redpanda-connect binary this means the jira input runs without an enterprise license while the jira processor refuses to. If the Jira connector family is meant to be enterprise-gated, the input is missing this check; please confirm the intended licensing and add the check here if it should mirror the processor.

@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown

Commits
LGTM

Review
Reviewed the new jira input component (incremental JQL polling with cursor checkpointing), the shared jiraauth extraction, info.csv/docs/changelog, and tests. The cursor-advancement and ack/nack pagination logic (advance only when nextPageToken is empty, reset token+accumulator on nack) is carefully constructed and matches the fix described in the second commit. Registration is covered automatically via the existing public/components/jira wrapper (already in both all and cloud bundles).

  1. Missing enterprise license check in the input constructornewReader does not call license.CheckRunningEnterprise(mgr), while the sibling jira processor in the same package gates on it as its first step. Both are classified identically (certified, cloud=y), so in the full self-hosted binary the input would run without a license while the processor would not. Flagged inline on internal/impl/jira/input_jira.go — please confirm intended licensing.

@claude

claude Bot commented Jun 4, 2026

Copy link
Copy Markdown

Commits

LGTM

Review

Reviewed the new jira input: the page/ack draining concurrency model (close-once done channel, pre-loaded outstandingAcks, cursor advancement gated on end-of-run nextPageToken=="", and the nack path resetting nextToken/runMaxUpdated), cursor persistence + restart resumption, the /myself auth check, enterprise-license gating, metric label arity, distribution classification (info.csv cloud=y plus the runtime enterprise gate matches the sibling processor), and the JQL/secret/injection surfaces. The implementation is correct, follows the package's existing conventions, and is well covered by unit + integration tests.

LGTM

squiidz added 2 commits June 12, 2026 08:08
The `updated >=` cursor predicate re-matches boundary issues on every
poll, so an idle project re-emitted them once per poll interval forever.
The cursor now carries a pruned seen-set of emitted issue versions and
suppresses duplicates while still emitting genuinely newer updates.

Progress is persisted after every fully-acked page (the in-flight run's
JQL is frozen so Jira's token/JQL pairing stays stable), making large
backfills resumable mid-run. The jira processor is marked deprecated in
favour of the input, and the input's version stamp is fixed to 4.96.0.
@claude

claude Bot commented Jun 15, 2026

Copy link
Copy Markdown

Commits

Message format and quality are good across all commits (system: imperative form, lowercase, descriptive bodies). One granularity issue:

  1. a8dda18 ("jira: dedup boundary issues, checkpoint per page, deprecate processor") bundles unrelated work. The input cursor changes (dedup boundary issues, per-page checkpoint) are mixed with deprecating the separate jira processor (processor_jira.go + processors/jira.adoc). Per the granularity policy a change to a different component belongs in its own commit.

(The iterative fix commits 29ef5e6/91c2496 touch code introduced earlier in the same PR; ideally squashed, but each is self-contained with a clear message, so not blocking.)

Review

New jira input: JQL-based incremental polling with cursor checkpointing to a cache resource, plus a shared jiraauth sub-package extracted from the now-deprecated processor.

LGTM — no high-signal issues found.

  • The cursor / pagination / ack concurrency model is carefully synchronized: outstandingAcks is preloaded in load(), done is closed exactly once, onPageDrained is runMu-guarded and idempotent across the Read- and Close-side calls, and the allDispatched() guard prevents Close hanging on acks that will never fire. Nack-restart-from-cursor, boundary dedup, mid-run checkpointing, and v1-cursor compatibility are all covered by tests.
  • License headers (Apache 2.0, matching the sibling processor_jira.go), bundle wiring (the input rides along the existing public/components/jira wrapper already present in the all and cloud bundles), and info.csv classification (input deprecated=n, processor flipped to deprecated=y, both cloud=y) are consistent.
  • Validation moved from the processor into jiraauth.BuildClient preserves the prior empty-username/empty-token error behavior.

@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown

Commits

  1. Commit d71f97ftest fix is a vague, non-conforming message. It does not match any allowed format (no system: message prefix, not sentence-case, imperative-vague) and gives no signal about what changed. It reads like an unsquashed fixup and should be squashed into the relevant commit (e.g. the cursor/dedup work) with a descriptive jira: ... message.

The remaining jira: ... commits follow the convention and have clear, accurate messages.

Review

Reviewed the new jira input, the jiraauth extraction shared with the now-deprecated jira processor, docs/info.csv/config changes, and the test suite.

Checked in particular: the ack/done-channel page lifecycle and runMu/curMu lock ordering (no deadlock, single-close invariant holds), nack-restart resetting nextToken, frozen runJQL keeping Jira cursor pagination valid across mid-run checkpoints, seen-set dedup + prune for boundary issues, the validation moved into jiraauth.BuildClient (empty username/api_token still rejected, behavior preserved), Apache-2.0 license headers consistent with sibling files, enterprise gating matching the processor, registration via the existing public/components/jira wrapper, and info.csv column correctness (processor deprecated=y, input cloud=y). Coverage is thorough.

LGTM

@Jeffail Jeffail left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite impressed with my claude, looks like it caught a certified doozy, theres also two minor nits that should probably be looked at

🤖 ─────── my bot's two cents ───────

Thank you for this — it is a genuinely impressive piece of work, and the concurrency model and test coverage in particular are excellent. One correctness concern that I believe warrants a change before merge, plus two minor points:

1. (Blocking) The incremental cursor predicate is rendered in UTC, but JQL interprets date literals in the requesting account's timezone.

In buildJQL, the threshold is formatted as a UTC wall-clock string:

threshold := cur.Updated.Add(-r.cfg.cursorOverlap)
parts = append(parts, fmt.Sprintf(`updated >= "%s"`, threshold.UTC().Format("2006-01-02 15:04")))

Jira evaluates JQL date/time literals relative to the profile timezone of the account making the request, not UTC, and JQL has no syntax to attach an explicit offset to a date literal. The effective threshold instant is therefore shifted by the account's timezone offset:

  • For an account behind UTC (e.g. any of the Americas), the effective threshold lands later than intended. Issues whose updated falls within that gap drop below the predicate on the subsequent poll and are never re-matched — a silent data-loss window that the default 60s overlap cannot absorb.
  • For an account ahead of UTC, the threshold lands earlier, producing a wider window that the seen-set dedup absorbs harmlessly (at the cost of some redundant fetching).

Conveniently, the fix is already within reach: Connect calls /rest/api/3/myself for auth validation and currently discards the body. That response includes a timeZone field, so the threshold could be formatted in the account's location (threshold.In(accountLoc).Format(...)) to match Jira's interpretation exactly. If reading the account timezone is undesirable for any reason, the minimum bar would be to document prominently that the token's account must be configured to UTC — but deriving it from the /myself response already in hand seems both more robust and only marginally more code.

References for the JQL timezone behaviour:

2. (Minor) base_url with a trailing slash produces a double slash.

Every request URL is constructed as BaseURL + "/rest/api/3/...". A base_url configured with a trailing / yields //rest/.... Jira most likely tolerates this, but trimming a trailing slash at parse time would be a tidy, defensive guard.

3. (Minor) The empty-page-with-continuation-token path has no backoff.

In Read, an empty page that still carries a nextPageToken falls through to continue with no pause. It is network-bound rather than a CPU spin (each iteration issues an HTTP request), so it is not dangerous, but a server that persistently returns empty pages alongside a continuation token would be polled in a tight request loop. A small guard there would be prudent.

Everything else looks great — happy to re-review once the timezone handling is addressed.

Jira evaluates JQL date literals in the requesting account's timezone, not
UTC, and there is no syntax to attach an offset to a literal. Rendering the
`updated >=` threshold in UTC therefore shifted the effective instant by the
account's offset; for accounts behind UTC this opened a silent data-loss
window the default overlap could not absorb. Connect already calls /myself for
auth validation, so read its `timeZone` field and render the threshold in that
location (falling back to UTC with a warning when absent/unknown).

Also addresses two minor review nits:
- trim a trailing slash from base_url so request URLs don't get a double slash.
- back off briefly when the server returns an empty page that still carries a
  continuation token, so a misbehaving server cannot drive a tight request
  loop; fully-deduped issue-bearing pages still chain immediately.
@claude

claude Bot commented Jun 19, 2026

Copy link
Copy Markdown

Commits

  1. Commit test fix (d71f97f) violates the message policy: it is both vague (gives no indication of what was fixed) and does not match the required format. It should be scoped and imperative, e.g. jira: fix flaky input test.
  2. Minor (granularity): jira: dedup boundary issues, checkpoint per page, deprecate processor (a8dda18) bundles the processor deprecation — a distinct logical change — with the input dedup/checkpoint work. jira: render JQL cursor predicate in the account timezone (6392dee) similarly folds two unrelated review nits (trailing-slash trim, empty-page backoff) into a timezone feature commit. Consider splitting in future PRs. The remaining commit messages are well-formed.

Review

Reviewed the new jira input, the extracted jiraauth package, the processor deprecation, docs, and info.csv. I traced the ack/pagination/cursor-advancement concurrency, the dedup seen-set + prune-window math, license headers, enterprise gating, and registration.

LGTM — no high-signal correctness, security, or pattern issues found. The cursor handling (freeze JQL per pagination run, advance only on fully-acked pages, reset token on nack, prune the seen-set with a minute-of-slack to match JQL truncation) is carefully reasoned and well covered by unit + integration tests. License headers (Apache 2.0) and string-literal field names are consistent with the existing jira package, enterprise gating matches the sibling processor, and info.csv is updated correctly (input added, processor marked deprecated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants