Hello. Thanks for this useful lib
In most cases for building parser/tokenizer on top of strcan, I dont need such method to peek character from some offset without moving position. But while building https://github.com/le0pard/json_mend to repair broken JSON I found that in many cases I need look ahead in string, so can understand with what broken part of a JSON I am dealing with. For now I have such method:
# Peeks the next character without advancing the scanner
def peek_char(offset = 0)
# Handle the common 0-offset case
if offset.zero?
# peek(1) returns the next BYTE, not character
byte_str = @scanner.peek(1)
return nil if byte_str.empty?
# Fast path: If it's a standard ASCII char (0-127), return it directly.
# This avoids the regex overhead for standard JSON characters ({, [, ", etc).
return byte_str if byte_str.getbyte(0) < 128
# Slow path: If it's a multibyte char (e.g. “), use regex to match the full character.
return @scanner.check(/./m)
end
# For offsets > 0, we must scan to skip correctly (as characters can be variable width)
saved_pos = @scanner.pos
res = nil
(offset + 1).times do
res = @scanner.getch
break if res.nil?
end
@scanner.pos = saved_pos
res
end
As you can see, I can use check(/./m) to get first character without advancing, but regex is not so fast (that is why even exists this byte_str.getbyte(0) < 128 optimization). For read in some offset I need loop by getch and back original position.
Will be good, if library will have similar method like peek, but which works with characters (name can be peekch).
Thanks
Hello. Thanks for this useful lib
In most cases for building parser/tokenizer on top of strcan, I dont need such method to peek character from some offset without moving position. But while building https://github.com/le0pard/json_mend to repair broken JSON I found that in many cases I need look ahead in string, so can understand with what broken part of a JSON I am dealing with. For now I have such method:
As you can see, I can use
check(/./m)to get first character without advancing, but regex is not so fast (that is why even exists thisbyte_str.getbyte(0) < 128optimization). For read in some offset I need loop bygetchand back original position.Will be good, if library will have similar method like
peek, but which works with characters (name can bepeekch).Thanks