feat(env): add environment variable APIs#3227
Conversation
Pull Request Test Coverage Report for Build 2872Details
💛 - Coveralls |
eddbbf5 to
4d2f8ab
Compare
4d2f8ab to
e717a2d
Compare
| fn string_to_extern(s : String) -> XExternString { | ||
| let handle = begin_create_string() | ||
| for i = 0; i < s.length(); i = i + 1 { | ||
| string_append_char(handle, s.code_unit_at(i).unsafe_to_char()) | ||
| } | ||
| finish_create_string(handle) | ||
| } |
There was a problem hiding this comment.
🔴 string_to_extern passes surrogate halves instead of full codepoints for non-BMP characters
The string_to_extern function in env_wasm.mbt iterates over UTF-16 code units and passes each one individually to string_append_char. For characters outside the Basic Multilingual Plane (e.g., emoji like 😀), MoonBit strings use surrogate pairs (two UTF-16 code units). This function sends each surrogate half (e.g., 0xD83D and 0xDE00) as a separate Char to string_append_char, which is incorrect — surrogates are not valid Unicode scalar values.
This is asymmetric with string_from_extern (env_wasm.mbt:57-70) which reads full Unicode codepoints via string_read_char (documented as returning "the unicode codepoint of the character"). It is also inconsistent with the native backend's mbt_string_to_utf8_bytes (env_native.mbt:81-85) which properly detects and combines surrogate pairs into full codepoints.
Any env var key or value containing non-BMP characters (emoji, some CJK extensions, mathematical symbols, etc.) will be corrupted on the WASM backend.
| fn string_to_extern(s : String) -> XExternString { | |
| let handle = begin_create_string() | |
| for i = 0; i < s.length(); i = i + 1 { | |
| string_append_char(handle, s.code_unit_at(i).unsafe_to_char()) | |
| } | |
| finish_create_string(handle) | |
| } | |
| fn string_to_extern(s : String) -> XExternString { | |
| let handle = begin_create_string() | |
| let len = s.length() | |
| let mut i = 0 | |
| while i < len { | |
| let mut c = s.code_unit_at(i).to_int() | |
| if 0xD800 <= c && c <= 0xDBFF && i + 1 < len { | |
| let lo = s.code_unit_at(i + 1).to_int() | |
| if 0xDC00 <= lo && lo <= 0xDFFF { | |
| c = ((c - 0xD800) << 10) + (lo - 0xDC00) + 0x10000 | |
| i = i + 1 | |
| } | |
| } | |
| string_append_char(handle, c.unsafe_to_char()) | |
| i = i + 1 | |
| } | |
| finish_create_string(handle) | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e717a2d956
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| #| return []; | ||
| #| } | ||
| #| const result = []; | ||
| #| for (const key in process.env) { |
There was a problem hiding this comment.
Filter inherited properties when enumerating env vars
get_env_vars_array_internal uses for (const key in process.env), which iterates enumerable properties from the prototype chain in addition to actual environment entries. If an enumerable property is added on Object.prototype (prototype pollution or a buggy dependency), get_env_vars() will return non-environment keys and can corrupt callers that trust this map; iterate own keys (Object.keys) or add a hasOwnProperty guard in this loop.
Useful? React with 👍 / 👎.
| if 0xD800 <= c && c <= 0xDBFF { | ||
| c -= 0xD800 | ||
| i = i + 1 | ||
| let l = str.code_unit_at(i).to_int() - 0xDC00 |
There was a problem hiding this comment.
Guard surrogate encoding from out-of-bounds access
In mbt_string_to_utf8_bytes, when a leading surrogate is seen, i is incremented and str.code_unit_at(i) is read without verifying that another code unit exists. A string that ends with an unmatched leading surrogate (possible in MoonBit) will trigger an out-of-range access and panic, so native get_env_var/set_env_var/unset_env_var can crash on malformed keys or values.
Useful? React with 👍 / 👎.
Notice that this requires nightly for the native backend.