Description
UTILITY::Clean_name() in SOFIE_common.cxx erases dot characters (.) from tensor names instead of replacing them with a safe character (e.g., _). This causes tensor name collisions when parsing models whose intermediate tensor names contain dots.
The Collision
ONNX graphs exported from frameworks like TorchScript generate intermediate tensor names using a sequential dotted pattern:
input.1 → input0 → input1 → input2 → ...
When Clean_name() processes these:
| Original Name |
After Clean_name() |
Intended Role |
input.1 |
input1 |
Model input tensor |
input1 |
input1 |
Intermediate tensor (output of first op) |
Both map to the same C++ variable name input1, causing the generated inference code to contain duplicate declarations or silently overwrite tensor data.
Root Cause
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}
The function replaces - → _ explicitly, but all other non-alphanumeric characters (including .) are erased rather than replaced. This is inconsistent: - gets a safe replacement while . gets deleted.
Expected behavior
input.1 → input_1 (distinct from input1)
Current behavior
input.1 → input1 (collides with input1)
Fix
Replace dots with underscores before erasing, consistent with the existing - → _ treatment:
std::string UTILITY::Clean_name(std::string input_tensor_name){
std::string s (input_tensor_name);
std::replace( s.begin(), s.end(), '-', '_');
std::replace( s.begin(), s.end(), '.', '_'); // <-- add this line
s.erase(std::remove_if(s.begin(), s.end(),
[]( char const& c ) -> bool { return !std::isalnum(c) && c != '_'; } ), s.end());
return s;
}
This makes input.1 → input_1 (distinct from input1), eliminating the collision.
Related
Description
UTILITY::Clean_name()inSOFIE_common.cxxerases dot characters (.) from tensor names instead of replacing them with a safe character (e.g.,_). This causes tensor name collisions when parsing models whose intermediate tensor names contain dots.The Collision
ONNX graphs exported from frameworks like TorchScript generate intermediate tensor names using a sequential dotted pattern:
When
Clean_name()processes these:Clean_name()input.1input1input1input1Both map to the same C++ variable name
input1, causing the generated inference code to contain duplicate declarations or silently overwrite tensor data.Root Cause
The function replaces
-→_explicitly, but all other non-alphanumeric characters (including.) are erased rather than replaced. This is inconsistent:-gets a safe replacement while.gets deleted.Expected behavior
input.1→input_1(distinct frominput1)Current behavior
input.1→input1(collides withinput1)Fix
Replace dots with underscores before erasing, consistent with the existing
-→_treatment:This makes
input.1→input_1(distinct frominput1), eliminating the collision.Related