Skip to content

Python: fix: prevent colliding message_id values in incremental compaction#5290

Open
LEDazzio01 wants to merge 2 commits intomicrosoft:mainfrom
LEDazzio01:fix/compaction-colliding-message-ids
Open

Python: fix: prevent colliding message_id values in incremental compaction#5290
LEDazzio01 wants to merge 2 commits intomicrosoft:mainfrom
LEDazzio01:fix/compaction-colliding-message-ids

Conversation

@LEDazzio01
Copy link
Copy Markdown
Contributor

Fixes

#5237_ensure_message_ids assigns colliding IDs when called incrementally via annotate_message_groups, breaking SummarizationStrategy and other group-ID-dependent consumers.

Root Cause

_ensure_message_ids assigns f"msg_{index}" where index is the position within the list it receives. However, annotate_message_groups calls group_messages(messages[start_index:]) on a slice, so the slice always starts at index 0. Successive incremental calls produce msg_0, msg_1, etc., colliding with IDs already assigned by earlier calls.

The issue reporter's reproduction shows:

  0: role=user      id='msg_0'
  1: role=assistant id='msg_1'
  2: role=user      id='msg_1'  ← collision!
  3: role=assistant id='msg_2'
  4: role=user      id='msg_1'  ← collision!

Fix

Move the _ensure_message_ids(messages) call from group_messages() (which operates on the slice) to annotate_message_groups() (which has access to the full list). By running it on the full list before slicing, positional indices are globally unique.

# annotate_message_groups():
+    _ensure_message_ids(messages)  # full list — globally unique indices
     spans = group_messages(messages[start_index:])

# group_messages():
-    _ensure_message_ids(messages)  # operated on slice — colliding indices
     spans: list[dict[str, Any]] = []

Test

Added test_incremental_annotation_produces_unique_message_ids which simulates a 4-turn conversation with incremental annotation (matching the CompactionProvider usage pattern) and asserts all message_id values are unique.

Move _ensure_message_ids() from group_messages() (which operates on a
slice) to annotate_message_groups() (which has access to the full list).

Previously, group_messages received messages[start_index:] — a slice
that always starts at index 0 — so _ensure_message_ids would assign
msg_0, msg_1, etc. on every incremental call, colliding with IDs from
earlier calls. This caused SummarizationStrategy to collapse distinct
groups into one because _group_id_for derives group_id from message_id.

By running _ensure_message_ids on the full list first, positional
indices are globally unique across incremental annotation calls.

Fixes microsoft#5237
Copilot AI review requested due to automatic review settings April 15, 2026 23:05
@github-actions github-actions bot changed the title fix: prevent colliding message_id values in incremental compaction Python: fix: prevent colliding message_id values in incremental compaction Apr 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a bug in Python compaction grouping where incremental (slice-based) annotation could assign colliding message_id values, breaking downstream group-ID-dependent logic (e.g., SummarizationStrategy).

Changes:

  • Moves _ensure_message_ids() invocation from group_messages() (slice scope) to annotate_message_groups() (full-list scope) to prevent ID collisions across incremental calls.
  • Updates group_messages() documentation to reflect new caller responsibility for ensuring message_id is set.
  • Adds a regression test that simulates incremental annotation over multiple turns and asserts message_id uniqueness.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
python/packages/core/agent_framework/_compaction.py Adjusts where message IDs are assigned to avoid collisions during incremental grouping/annotation.
python/packages/core/tests/core/test_compaction.py Adds a regression test ensuring incremental annotation produces unique message_id values.

Comment on lines +114 to +117
Note:
Callers must ensure ``message_id`` is set on each message before
calling this function. ``annotate_message_groups`` handles this
automatically via ``_ensure_message_ids``.
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_messages() no longer calls _ensure_message_ids(), which changes behavior for any callers that previously relied on group_messages() to populate missing message_id values. Since group_messages is exported from this module, consider preserving backward compatibility (e.g., keep ensuring IDs here with an offset-aware approach) or clearly documenting this as a breaking precondition change (and/or making group_messages private if it’s not intended for external use).

Copilot uses AI. Check for mistakes.
Comment on lines +446 to +448
# Assign message IDs over the *full* list before slicing to avoid
# index-based collisions across incremental calls (fixes #5237).
_ensure_message_ids(messages)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling _ensure_message_ids(messages) on the full list on every incremental annotate_message_groups() invocation makes ID assignment O(n) per call and can become O(n²) over a growing history. Consider making _ensure_message_ids accept an index offset (or otherwise assign IDs based on absolute position) so you can run it only on the suffix being re-annotated (messages[start_index:]) while still guaranteeing globally unique IDs across incremental calls.

Suggested change
# Assign message IDs over the *full* list before slicing to avoid
# index-based collisions across incremental calls (fixes #5237).
_ensure_message_ids(messages)
# Preserve globally unique, index-based message IDs without re-scanning
# the entire history on every incremental suffix re-annotation.
if start_index == 0:
_ensure_message_ids(messages)
else:
for absolute_index, message in enumerate(messages[start_index:], start=start_index):
if _message_id(message) is None:
_write_message_id(message, str(absolute_index))

Copilot uses AI. Check for mistakes.
… ID assignment

1. Restore _ensure_message_ids() in group_messages() so external
   callers that relied on it keep working. Update docstring.

2. In annotate_message_groups(), replace full-list _ensure_message_ids()
   with offset-aware inline loop that only assigns IDs to the new
   suffix (messages[start_index:]) using enumerate(..., start=start_index).
   This keeps IDs globally unique while avoiding O(n²) re-scanning.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants