Add and Enhance Finnish Language Support with cp1257 Charset and Improved Input Handling#86
Merged
Syed-Shahrukh-OSSRevival merged 11 commits intoProject-OSS-Revival:masterfrom Aug 1, 2025
Conversation
- Introduced Finnish language support in the language detection library. - Added encoding data for ISO 8859-4 (ISO 88594) specific to Finnish. - Created necessary files for Finnish language handling, including `finnish.h`, `lang_fi.c`, and associated data files. - Updated language list to include Finnish and its corresponding character sets. - Modified locale detection logic to accommodate Finnish language detection. - Ensured proper memory management and data structures for Finnish language support.
…rrectly process space character counts
…d updating related files. Modify doit.sh to include cp1257, expand finnish.h to define RAW_CP1257, and update iso88594.base and rawcounts.iso88594 with new data. Implement hooks in lang_fi.c for charset selection between iso8859-4 and cp1257.
…le_detect.c, and simtable.c for improved readability and consistency.
…files in cp1257, iso88594, and utf8 encodings
…and utf8 encodings
…and utf8 encodings
…g proper formatting for cp1257, iso88594, and utf8 encodings.
Syed-Shahrukh-OSSRevival
approved these changes
Aug 1, 2025
17ea238
into
Project-OSS-Revival:master
3 of 4 checks passed
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR expands Finnish language support by adding cp1257 charset handling, updating configuration and data files, and enhancing build scripts for full integration. It also improves
normalize.plto skip malformed lines and correctly count space characters for more robust processing.Key Changes
Finnish language support:
finnish.h,iso88594.base, anddoit.sh.Makefileto include Finnish support.Input handling improvements:
normalize.plto skip malformed lines.Impact
These changes ensure Finnish language files are correctly processed and built, while improving the reliability of data normalization and charset handling across the project.