Skip to content

Conversation

@eyalleshem
Copy link
Contributor

@eyalleshem eyalleshem commented Dec 1, 2025

his PR adds a lifetime parameter to the Token enum to enable zero-copy tokenization in the future.

Backward Compatibility:
To maintain backward compatibility, we renamed Token to BorrowedToken<'a> and introduced Token as a type alias for BorrowedToken<'static>. This allows existing consumers of the library to continue using Token without needing to handle
lifetimes throughout their code.

New API:
This PR also adds a tokenized_owned() method for use cases where consumers prefer to pay the cost of copying in exchange for owned tokens.

Current State:
This commit does not yet change the tokenizer's behavior—all string allocations remain in place. The goal of the following commits is to replace String with Cow<'a, str> in as many places as possible, leveraging the Borrowed variant to
achieve zero-copy tokenization where feasible.

eyalsatori and others added 2 commits December 7, 2025 21:19
Changed all parsing methods to take '&self' instead of '\&mut self'.
Mutable parser state (token index and parser state) now uses
for interior mutability.

This refactoring is preparation for the borrowed tokenizer work. When
holding borrowed tokens from the parser (with lifetime tied to '\&self'),
we cannot call methods requiring '\&mut self' due to Rust's borrowing
rules. Using interior mutability resolves this conflict by allowing
state mutations through shared references.
  This change introduces a lifetime parameter 'a to BorrowedToken enum
  to prepare for zero-copy tokenization support. This is a foundational
  step toward reducing memory allocations during SQL parsing.

  Changes:
  - Added lifetime parameter to BorrowedToken<'a> enum
  - Added _Phantom(Cow<'a, str>) variant to carry the lifetime
  - Implemented Visit and VisitMut traits for Cow<'a, str> to support
    the visitor pattern with the new lifetime parameter
  - Fixed lifetime issues in visitor tests by using tokenized_owned()
    instead of tokenize() where owned tokens are required
  - Type alias Token = BorrowedToken<'static> maintains backward
    compatibility
@eyalleshem eyalleshem force-pushed the add_lifetime_to_token branch from 9d5f00b to 26f485d Compare December 7, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants