Fix #13882: Decrement vocab.length when memory_zone clears transient …#13931
Open
Dzhud wants to merge 1 commit intoexplosion:masterfrom
Open
Fix #13882: Decrement vocab.length when memory_zone clears transient …#13931Dzhud wants to merge 1 commit intoexplosion:masterfrom
Dzhud wants to merge 1 commit intoexplosion:masterfrom
Conversation
…ransient lexemes
a2a139e to
7ea5d76
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix #13882: Decrement vocab.length when memory_zone clears transient lexemes
Description
This PR fixes issue #13882 where the
Vocab.lengthcounter was incremented when adding lexemes but never decremented whenmemory_zonecleared transient lexemes. This causedlen(vocab)to grow continuously even though the actual lexemes were properly removed from the internal hash map, making it unreliable for monitoring memory_zone effectiveness in production environments.Changes:
spacy/vocab.pyx: Enhanced_clear_transient_orths()to track and decrementself.lengthby the number of cleared lexemes, with NULL check for edge casesspacy/tests/vocab_vectors/test_memory_zone.py:test_memory_zone_vocab_length_decremented: Verifies single memory_zone cycletest_memory_zone_multiple_cycles: Verifies multiple cycles@pytest.mark.issue(13882)Testing:
All tests pass successfully:
len(vocab)now correctly decrementsblackand passesflake8lintingThe fix ensures
len(vocab)correctly reflects actual lexeme count and matches iteration count over vocab.Types of change
Bug fix - fixes issue #13882 where vocab.length counter was not properly maintained when memory_zone cleared transient lexemes.
Checklist