A PHP extension that provides advanced string iteration capabilities for UTF-8 strings with support for grapheme clusters, Unicode codepoints, and byte-level iteration.
- Grapheme Cluster Iteration: Iterate over grapheme clusters (user-perceived characters) using PCRE2
- Unicode Codepoint Iteration: Iterate over individual Unicode codepoints
- Byte-level Iteration: Iterate over individual bytes for low-level string processing
- UTF-8 Safe: Proper handling of multibyte UTF-8 characters
- Standard PHP Interfaces: Implements Iterator, IteratorAggregate, and Countable interfaces for seamless integration
- PHP 8.1 or higher
- PCRE2 library (libpcre2-dev)
PIE (PHP Installer for Extensions) is the recommended way to install this extension.
# Install PIE if you haven't already
composer global require php/pie
# Install the extension
pie install masakielastic/striterPIE automatically handles building and enabling the extension.
# Install dependencies (Ubuntu/Debian)
sudo apt-get install libpcre2-dev
# Build extension
cd ext
phpize
./configure --enable-striter
make
sudo make installAdd to your php.ini:
extension=striter.so<?php
// Create a string iterator
$iterator = str_iter("Hello World");
// Iterate using foreach
foreach ($iterator as $index => $char) {
echo "[$index] => '$char'\n";
}Iterates over grapheme clusters (user-perceived characters):
<?php
$text = "Helloπ";
$iterator = str_iter($text, "grapheme");
foreach ($iterator as $index => $char) {
echo "[$index] => '$char'\n";
}
// Output:
// [0] => 'H'
// [1] => 'e'
// [2] => 'l'
// [3] => 'l'
// [4] => 'o'
// [5] => 'π'Iterates over individual Unicode codepoints:
<?php
$text = "Helloπ";
$iterator = str_iter($text, "codepoint");
foreach ($iterator as $index => $char) {
echo "[$index] => '$char'\n";
}Iterates over individual bytes:
<?php
$text = "Hello";
$iterator = str_iter($text, "byte");
foreach ($iterator as $index => $byte) {
echo "[$index] => '" . ord($byte) . "'\n";
}<?php
$text = "Helloπ";
$iterator = str_iter($text, "grapheme");
echo "Total characters: " . count($iterator) . "\n"; // Output: 6<?php
$text = "ABC";
$iterator = str_iter($text);
// Get inner iterator for advanced operations
$innerIterator = $iterator->getIterator();
foreach ($innerIterator as $key => $value) {
echo "[$key] => '$value'\n";
}Creates a new string iterator.
Parameters:
$str(string): The string to iterate over$mode(string, optional): Iteration mode - "grapheme", "codepoint", or "byte"
Returns: _StrIterIterator object
The returned iterator implements PHP's IteratorAggregate and Countable interfaces:
IteratorAggregate Methods:
getIterator(): Returns the iterator itself for nested iteration
Countable Methods:
count(): Returns the total number of elements in the iterator
<?php
// Complex emoji with skin tone modifiers
$text = "π¨βπ©βπ§βπ¦ππ½";
$iterator = str_iter($text, "grapheme");
foreach ($iterator as $index => $char) {
echo "Grapheme $index: '$char'\n";
}<?php
$text = "γγγ«γ‘γ―δΈη";
$iterator = str_iter($text, "grapheme");
foreach ($iterator as $index => $char) {
echo "Character $index: '$char'\n";
}<?php
$data = "\x48\x65\x6C\x6C\x6F"; // "Hello" in hex
$iterator = str_iter($data, "byte");
foreach ($iterator as $index => $byte) {
echo "Byte $index: 0x" . dechex(ord($byte)) . "\n";
}The extension uses PCRE2's \X pattern to detect grapheme clusters, which properly handles:
- Base characters with combining marks
- Emoji sequences
- Regional indicator sequences
- Hangul syllable sequences
The extension includes proper UTF-8 validation and handles invalid sequences gracefully by treating them as individual bytes.
The extension properly manages memory for string copies and PCRE2 objects, preventing memory leaks.
Run the included test files:
php tests/test_basic.php
php tests/test_grapheme.php
php tests/test_byte_mode.php
php tests/test_emoji_bug.php
php tests/test_invalid_utf8.php- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
This project is open source. Please refer to the project's license file for details.
- Initial release
- Support for grapheme, codepoint, and byte iteration modes
- PCRE2 integration for proper grapheme cluster detection
- Full Iterator interface implementation