Describe the bug
Some words with umlauts are causing UnicodeEncodeError when using Wiktionary dumps as source.
To Reproduce
Steps to reproduce the behavior:
- Configure an English Wiktionary dump as the primary source.
- Try to download the definition for the word "sihinää".
- See an error dialog with a UnicodeEncodeError exception.
Expected behavior
The lookup is successful or at least fails gracefully without a modal dialog with a traceback. The same word can be located in the online Wiktionary no problem.
Screenshots

Logs
2025-01-05 13:53:32.119 | DEBUG | vocabsieve.main:getKnownDataOnThread:426 - Some data sources aren't available, not getting known data now
2025-01-05 13:53:38.577 | DEBUG | vocabsieve.ui.searchable_boldable_text_edit:bold:11 - bolding sihinää
2025-01-05 13:53:38.579 | DEBUG | vocabsieve.ui.multi_definition_widget:lookup:138 - Looking up sihinää in [<vocabsieve.sources.local_dictionary_source.LocalDictionarySource object at 0x335bab590>]
2025-01-05 13:53:38.580 | ERROR | vocabsieve.uncaught_hook:make_error_box:17 - Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/sources/local_dictionary_source.py", line 14, in _lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/local_dictionary.py", line 97, in define
KeyError: 'Word sihinää not found in raw-wiktextract-data'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/main.py", line 827, in lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/ui/multi_definition_widget.py", line 140, in lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/ui/multi_definition_widget.py", line 160, in _lookup_in_source
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/models.py", line 325, in define
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/models.py", line 336, in _fmt_lookup
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/vocabsieve/sources/local_dictionary_source.py", line 17, in _lookup
UnicodeEncodeError: 'ascii' codec can't encode characters in position 20-21: ordinal not in range(128)
Desktop (please complete the following information):
- OS: macOS 14.6.1
- Vocabsieve version (if nightly, must be latest): 0.12.4
Describe the bug
Some words with umlauts are causing UnicodeEncodeError when using Wiktionary dumps as source.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The lookup is successful or at least fails gracefully without a modal dialog with a traceback. The same word can be located in the online Wiktionary no problem.
Screenshots

Logs
Desktop (please complete the following information):