Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Dec 3, 2025

Summary

Fixes a bug where ConcurrentCursor._get_concurrent_state() was using slices[0] (the earliest slice) instead of slices[-1] (the most recent slice) when extracting the cursor value from partitioned state.

Since merge_intervals() sorts slices in ascending order by (START_KEY, END_KEY), using the first slice caused should_be_synced() to use the earliest timestamp as the lower bound, admitting ALL records instead of only new records. This made incremental syncs behave like full refreshes.

Impact: This bug affected all low-code connectors using DatetimeBasedCursor with is_client_side_incremental: true, including Notion's pages, databases, comments, and blocks streams.

Fixes: airbytehq/oncall#8854

Review & Testing Checklist for Human

  • Verify slice ordering assumption: Confirm that merge_intervals() always produces slices sorted in ascending order by (START_KEY, END_KEY) - the fix relies on this invariant
  • Check for side effects on slice generation: The _get_concurrent_state() return value is also used in stream_slices() for gap detection - verify this change doesn't break slice generation logic
  • End-to-end test with Notion connector: Run an incremental sync with the Notion connector (pages stream) with existing state to verify only new records are emitted on subsequent syncs

Recommended test plan:

  1. Set up a Notion connection with pages stream in incremental mode
  2. Run initial sync, note record count
  3. Run second sync without modifying any Notion pages
  4. Verify second sync emits 0 records (or only records modified since first sync)

Updates since last revision

  • Added unit test test_given_partitioned_state_with_multiple_slices_when_should_be_synced_then_use_upper_boundary_of_last_slice_to_filter to verify the fix
  • All 42 cursor tests pass locally

Notes

The _get_concurrent_state() method was using slices[0] (the earliest slice)
instead of slices[-1] (the most recent slice) when extracting the cursor
value from partitioned state. Since merge_intervals() sorts slices in
ascending order by (START_KEY, END_KEY), using the first slice caused
should_be_synced() to use the earliest timestamp as the lower bound,
admitting ALL records instead of only new records.

This bug affected all low-code connectors using DatetimeBasedCursor with
is_client_side_incremental: true, including Notion's pages, databases,
comments, and blocks streams.

Fixes: airbytehq/oncall#8854
Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Contributor Author

Original prompt from API User
Comment from @agarctfi: /ai-fix\n\nIMPORTANT: The user will expect a response posted back to the PR. You should post exactly one comment back to the respective issue PR. If the user requested a code change or PR, your comment should contain a link to the PR. Assume the user has no access to your session or conversation thread unless/until you respond back to them.\n\nIssue #8854 by @jnr0790: Python L3: Notion - `pages` stream ignoring connection state\n\nIssue URL: https://github.com/airbytehq/oncall/issues/8854\n\nPlease use playbook macro: !issue_fix

PLAYBOOK_md:
# `/ai-fix` Slash Command Playbook

You are AI Fix Devin, an expert at reproducing and fixing Airbyte-related issues.

## Context

You are working on the issue linked above in context. You will also need to pull issue comments for full context.

## Rule: Immediate Issue Comment After PR Creation

**MANDATORY REQUIREMENT**: If you create a PR during an AI Fix workflow, your **first action** after creating the PR must be to create a comment on the originating issue. If you cannot create a PR, likewise, your action should be to comment back to the issue.

## Properly note breaking changes

Types of breaking changes:

- spec change
  - a spec field is removed or renamed.
  - a new required spec field is added.
- schema change
  - a field is removed or renamed, or, the datatype is changed.
- stream or property removal
  - a stream is removed.
- state changes
  - the format of the state is changed.

Consult development guides for how to document in the metadata that a change is breaking (if so), and try to avoid breaking changes where necessary. Appropriate updates will also need to be made to the docs changelog and migration guide. Refer to the [Managing Breaking Changes in Connectors](https://docs.airbyte.com/platform/next/connector-development/connector-breaking-changes) documentation to understand what types of changes are considered "breaking" and how to handle these types of changes via connector version... (3878 chars truncated...)

@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions github-actions bot added the bug Something isn't working label Dec 3, 2025
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1764803847-fix-client-side-incremental-state#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1764803847-fix-client-side-incremental-state

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

…ental filtering

Updated test_given_partitioned_state_with_multiple_slices_when_should_be_synced
to verify that the LAST slice's end value is used for filtering, not the first.
This reflects the correct behavior for client-side incremental filtering.

Co-Authored-By: unknown <>
@devin-ai-integration devin-ai-integration bot changed the title fix(concurrent): use last slice for client-side incremental filtering (do not merge) fix(concurrent): use last slice for client-side incremental filtering Dec 3, 2025
@github-actions
Copy link

github-actions bot commented Dec 3, 2025

PyTest Results (Fast)

3 818 tests  ±0   3 806 ✅ ±0   6m 30s ⏱️ +15s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit ba87ec2. ± Comparison against base commit daf7d48.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.
unit_tests.sources.streams.concurrent.test_cursor ‑ test_given_partitioned_state_with_multiple_slices_when_should_be_synced_then_use_upper_boundary_of_first_slice_to_filter
unit_tests.sources.streams.concurrent.test_cursor ‑ test_given_partitioned_state_with_multiple_slices_when_should_be_synced_then_use_upper_boundary_of_last_slice_to_filter

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

PyTest Results (Full)

3 821 tests  ±0   3 808 ✅  - 1   10m 59s ⏱️ +6s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       1 ❌ +1 

For more details on these failures, see this check.

Results for commit ba87ec2. ± Comparison against base commit daf7d48.

This pull request removes 1 and adds 1 tests. Note that renamed tests count towards both.
unit_tests.sources.streams.concurrent.test_cursor ‑ test_given_partitioned_state_with_multiple_slices_when_should_be_synced_then_use_upper_boundary_of_first_slice_to_filter
unit_tests.sources.streams.concurrent.test_cursor ‑ test_given_partitioned_state_with_multiple_slices_when_should_be_synced_then_use_upper_boundary_of_last_slice_to_filter

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in the concurrent cursor's client-side incremental filtering logic. The bug caused incremental syncs to behave like full refreshes by using the earliest slice instead of the most recent slice when determining which records to sync.

Key Changes:

  • Fixed ConcurrentCursor._get_concurrent_state() to use slices[-1] (most recent) instead of slices[0] (earliest) for extracting cursor values
  • Updated test to correctly validate that only records at or after the last slice's end are synced
  • Added clarifying comments about slice ordering guarantees from merge_intervals()

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
airbyte_cdk/sources/streams/concurrent/cursor.py Fixed cursor value extraction to use last slice instead of first slice, with added documentation about slice ordering
unit_tests/sources/streams/concurrent/test_cursor.py Updated test name, logic, and assertions to correctly validate filtering behavior with multiple slices

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants