Skip to content

Add and Enhance Finnish Language Support with cp1257 Charset and Improved Input Handling#86

Merged
Syed-Shahrukh-OSSRevival merged 11 commits intoProject-OSS-Revival:masterfrom
Syed-Shahrukh-OSSRevival:t1
Aug 1, 2025
Merged

Add and Enhance Finnish Language Support with cp1257 Charset and Improved Input Handling#86
Syed-Shahrukh-OSSRevival merged 11 commits intoProject-OSS-Revival:masterfrom
Syed-Shahrukh-OSSRevival:t1

Conversation

@Egor-OSSRevival
Copy link
Contributor

@Egor-OSSRevival Egor-OSSRevival commented Jul 26, 2025

This PR expands Finnish language support by adding cp1257 charset handling, updating configuration and data files, and enhancing build scripts for full integration. It also improves normalize.pl to skip malformed lines and correctly count space characters for more robust processing.

Key Changes

  • Finnish language support:

    • Added encoding data and configurations.
    • Integrated cp1257 charset into finnish.h, iso88594.base, and doit.sh.
    • Updated Makefile to include Finnish support.
  • Input handling improvements:

    • Enhanced normalize.pl to skip malformed lines.
    • Fixed handling of space character counts.

Impact

These changes ensure Finnish language files are correctly processed and built, while improving the reliability of data normalization and charset handling across the project.

- Introduced Finnish language support in the language detection library.
- Added encoding data for ISO 8859-4 (ISO 88594) specific to Finnish.
- Created necessary files for Finnish language handling, including `finnish.h`, `lang_fi.c`, and associated data files.
- Updated language list to include Finnish and its corresponding character sets.
- Modified locale detection logic to accommodate Finnish language detection.
- Ensured proper memory management and data structures for Finnish language support.
…d updating related files. Modify doit.sh to include cp1257, expand finnish.h to define RAW_CP1257, and update iso88594.base and rawcounts.iso88594 with new data. Implement hooks in lang_fi.c for charset selection between iso8859-4 and cp1257.
…le_detect.c, and simtable.c for improved readability and consistency.
…files in cp1257, iso88594, and utf8 encodings
…g proper formatting for cp1257, iso88594, and utf8 encodings.
@Syed-Shahrukh-OSSRevival Syed-Shahrukh-OSSRevival merged commit 17ea238 into Project-OSS-Revival:master Aug 1, 2025
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants