Skip to content

Conversation

@TunaCici
Copy link
Contributor

@TunaCici TunaCici commented Jul 5, 2023

The function decodeString(...) does not handle the UnicodeDecodeError exception very well. Although trying again /w errors ignored might work in theory; in practice it does NOT. Some special characters (e.g. ü, ş, ç. ö) raises another exception and it fails the whole proccesss. Instead of trying to ignore the error, we should try to fix it.

I was getting similar errors mentioned in #81 and #35. The fix by @faisal-hameed in #81 uses Regex to "filter out" any non-ASCII characters. The idea is good, but regex is heavy (CPU time & Memory). When I tried /w Python 2.7.18 on my Windows 10 VM Machine /w Intel Xeon E-2236 and 16GiB of memory, the program runs for a few seconds and then crashes. I believe this is due to how the re regex library in Python 2.7.x works.

The ClearCase view I was trying is relatively old (3-5 years). So the encodedstr is pretty long and Python 2.7.x's regex just can't keep up with it on my host machine.

A relatively "better" solution is to use Python's native join(...) operation and convert EVERY character in encodedstr to ASCII characters. It works by checking each character's decimal value using ord(...). If the char values is less than 128 (max ASCII char value), then it is kept. If not then we just ignore it.

This way we are manually converting from ANY Unicode string to ASCII string. I'm sure there are better ways to handle the UnicodeDecodeError exception, but this one seemed the most trivial solution and it just Works™.

If there are anyone experiencing the same error mentioned in #35 and #81. Try using this patch.

Hope that this helps <3

@charleso charleso merged commit 4ff7f90 into charleso:master Jul 7, 2023
@charleso
Copy link
Owner

charleso commented Jul 7, 2023

Thanks @TunaCici! I'm sorry my python knowledge and decoding was so poor !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants