-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi! I would have reached out privately but I did not had another way to do so. I like your parser! Nice and simple implementation.
There is however a security issue with the parser where it will allow some illegal UTF8 codes to be parsed. In some cases, depending on where and how the parser is used, this could result in bypassing validation etc...
An example of an illegal input that makes it through:
char data[] = {'\xC0', '\xA6', '\x27', '\x27', '\x27', '\x27','\x00'};
0xC0 will allow a 2 byte long character to be decoded but 0xC0 is not legal. Obviously, the resulting impact is all based on where the parser is used.
Potential security issues and spec limitation are documented in the RFC at the following spots:
https://datatracker.ietf.org/doc/html/rfc3629#section-4
https://datatracker.ietf.org/doc/html/rfc3629#section-10
For your information, this is very similar to CVE-2025-1094 (a PostgreSQL vulnerability published last week) .
I found this library as I was looking at various UTF-8 parsers for similar issues.
In the case of utf8 library, this would map to the following CWE:
https://cwe.mitre.org/data/definitions/791.html
If you feel like it, you could create a security advisory, right here on github, for this issue: https://docs.github.com/en/code-security/security-advisories/working-with-repository-security-advisories/creating-a-repository-security-advisory
Thank you for your time and for sharing code with the world!