Commit 40af3b3
committed
Fix UnicodeDecodeError when reading packed-refs with non-UTF8 characters
Fixes #2064
The packed-refs file can contain ref names that are not valid UTF-8
(e.g., Latin-1 encoded tag names created by older Git versions or
non-UTF8 systems). Previously, opening the file with encoding='UTF-8'
would raise UnicodeDecodeError.
Changes:
- Add errors='surrogateescape' to the open() call in _iter_packed_refs()
- This allows reading files with arbitrary byte sequences while still
treating valid UTF-8 as text
- Add test that verifies non-UTF8 packed-refs can be read successfully
The 'surrogateescape' error handler is the standard Python approach for
handling potentially non-UTF8 data in filesystem operations, as it
preserves the original bytes in a reversible way.1 parent eecc28d commit 40af3b3
2 files changed
+40
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
126 | | - | |
| 126 | + | |
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
629 | 629 | | |
630 | 630 | | |
631 | 631 | | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
632 | 671 | | |
633 | 672 | | |
634 | 673 | | |
| |||
0 commit comments