Skip to content

Wrong non latin string encoding when saving result to output file in Windows OS #123

@xxxxxxbox

Description

@xxxxxxbox

Hi, there.
Thanx for a great app.

I'm trying to minify a simple peace of code that contains Latin characters and non-Latin characters (Cyrylic) in a string. All non-Latin characters in the output file are converted incorrectly. Please find enclosed screenshot.

print("Hello world, Привет мир, Hello world")

Image

PYTHONUTF8=1 from #113 (comment) has no effect.

If I run minification of the same string with output to console (stdout), the result contains no errors.

I use Windows 10, source file is in UTF-8 encoding. python-minifier v2.11.3

I followed your advise here #100 (comment) and fix my problem.
Now I've got output code saved to file like this:

print('Hello world, \u041f\u0440\u0438\u0432\u0435\u0442 \u043c\u0438\u0440, Hello world')

This approach has two advantages at once. The file contains no errors and can be executed. Non-Latin characters have become even more difficult to understand.

My suggestion. Can you add this option to your main version so it can be used from command-line? For example, you can add the --convert-strings-to-ascii parameter or use any other name for parameter.

Second sugesstion: If I replace encoding="ascii" to encoding="utf8" in the following example I get correct result with non-Lating strings saved to output file. They are exact the same as in source.py. Can you fix your package so that it force encoding="utf8" by default when outputing result to a file?

Here is my helper code:

from python_minifier import minify
from pathlib import Path


FILENAMES_TO_MINIFY: list[str] = [
    "source.py",
    "some_folder/source2.py",
]

INPUT_DIR = Path().cwd()
OUTPUT_DIR = INPUT_DIR / ".minified"


def xminify(
    input_filename: str | Path,
    output_filename: str | Path,
):
    if isinstance(output_filename, Path):
        output_filename.parent.mkdir(parents=True, exist_ok=True)

    with open(input_filename, "rb") as f:
        source = f.read()

    minified = minify(
        source=source,
        remove_literal_statements=True,
        rename_globals=True,
        remove_asserts=True,
        remove_debug=True,
    )

    print(f"<<< {str(input_filename)}\n>>> {str(output_filename)}\n")

    with open(
        file=output_filename,
        mode="w",
        # encoding="utf8",  # correct result v1
        encoding="ascii",  # correct result v2 with escape sequence
        errors="backslashreplace",
    ) as f:
        f.write(minified)


def batch_xminify(
    filenames_to_minify: list[str],
    input_dir: Path,
    output_dir: Path | None = None,
):
    if not output_dir:
        output_dir = input_dir / ".minified"
    for filename in filenames_to_minify:
        xminify(
            input_filename=input_dir / filename,
            output_filename=output_dir / filename,
        )


if __name__ == "__main__":
    batch_xminify(
        filenames_to_minify=FILENAMES_TO_MINIFY,
        input_dir=INPUT_DIR,
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions