Port DArc to MicroHs: pipeline fixes + LZMA compression/decompression#28
Merged
Conversation
…pression Root cause: MicroHs's ffe_eval/evali uses longjmp for green-thread blocking, which causes re-entrancy when a Haskell callback called from C blocks on a takeMVar. In the LZMA path, the read callback inside ffe_eval blocks waiting for storingProcess to provide data; MHS then schedules storingProcess, which sends NoMoreData, but instead of resuming the original blocked callback it re-enters ffe_eval with a fresh callback invocation that sees an empty pipe and returns 0 bytes — so LZMA compresses nothing. Fix: for any compression method that goes through C (not storing/fake), use compressMem/decompressMem (buffer-to-buffer, no callbacks) instead of the streaming compress/decompress (which requires Haskell callbacks from C). - deCompressProcess (#ifdef __MHS__): collect all uncompressed input via the pipe reader in Haskell (safe blocking, no ffe_eval), then call compressMem, then send the compressed output forward via sendP/receive_backP. - decompressBlock (#ifdef __MHS__): collect all compressed bytes via reader (archive reads, safe), then call decompressMem, then feed decompressed data directly to writer (decompressStep handles the file dispatch via the outer pipe in normal Haskell threads). - collectInputMHS: new helper that drains a reader into a contiguous malloc'd buffer by reading 65536-byte chunks until EOF; handles files of any size. - Add Foreign.Marshal.Alloc import and compressMem to CompressionLib imports. Verified: arc a -mlzma and arc x round-trip correctly for single files, multi-file solid blocks, and files >65536 bytes (multi-chunk collection). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add MHS buffer-to-buffer compression path in ArcvProcessCompress: collects pipe DataChunks into a buffer (acking each to unblock the producer), then applies compressMem chain in forward order - Fix MHS type inference: annotate times MVar with explicit element type to resolve Show constraint at uiFinishDeCompression call site - Fix MHS type inference: annotate numeric literals (0 :: Integer/Int) in decompressBlock (startPos, writer NoMoreData, result ref) - Remove all debug hPutStrLn stderr traces from ArcvProcessExtract, ArcvProcessCompress, and Compression/CompressionLib - Restore aDEFAULT_DIR_COMPRESSION = "lzma:bt4:1m" (was "storing") - Set aDEFAULT_COMPRESSOR = "lzma" under __MHS__; the numeric level "4" expands to dict+lzp+ppmd for text files; PPMD has a C-level bug in its in-memory CompressMem interface that causes SIGSEGV - Add CPP pragma and MHS-specific imports to ArcvProcessCompress - Multi-method compression chains (e.g. bcj+lzma) now work in MHS Tested: arc a / arc e round-trips with lzma, bcj+lzma, -mstoring all pass. Extracted files byte-for-byte identical to originals. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… shims Fix PPMD compression/decompression crash on 64-bit Unix. Root cause: PPMdType.h typedef'd DWORD as unsigned long (8 bytes on LP64), but the PPMd algorithm assumes 32-bit throughout: PPM_CONTEXT (must be 12 bytes = UNIT_SIZE), arithmetic coder variables (must wrap at 32 bits), and BLK_NODE free-list management (sizeof(MEM_BLK) must equal UNIT_SIZE). Changes: - PPMdType.h: DWORD = unsigned int (guaranteed 32-bit) - SubAlloc.hpp: CTX_REF/STATE_REF as unsigned int for 12-byte contexts; BLK_NODE.next as 4-byte heap ref (BLKREF) so BLK_NODE=8, MEM_BLK=12 - Model.cpp: 3-DWORD context copy uses unsigned int; pointer fields replaced with 1-based heap refs + PPCTX/RPCTX/PPSTAT/RPSTAT helpers - Coder.hpp: range coder variables as unsigned int for 32-bit overflow - Encryption.hs: use darc_urandom_read C helper under MHS (bypasses broken hGetBuf) - Environment.cpp/h: add darc_urandom_read helper - compat-ghc/: MicroHs compatibility shims for GHC-only modules Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port DArc to MicroHs: full compression pipeline
… -m4 dict_decompress used checked_read for the block header, which treated EOF (0 bytes read) as an I/O error. This broke all numeric presets -m3 through -m9 (which use dict+lzp+ppmd chains) when decompressing via buffer-to-buffer DecompressMem. Replace with explicit EOF check so 0-byte read at block boundary returns success. With dict working, restore MicroHs default compressor from "lzma" to "4" (dict+lzp+ppmd), matching the GHC build default. Tested: all presets -m0..-m9, files up to 7MB, arc a/e/l/t/d/m/j. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MicroHs truncates all FFI return values to 32 bits, even for long and long long on LP64 (where they are 8 bytes). This causes file sizes >2GB to overflow, breaking compression of large files. Fix: add _w wrapper functions in Environment.cpp that write 64-bit results via pointer parameter instead of returning them. Update all Haskell FFI call sites to use alloca/peek pattern. Affected functions: darc_bfile_tell, darc_bfile_size, darc_bfile_read, darc_bfile_write, darc_st_size, darc_st_mtime, darc_time, darc_mktime_tz, darc_urandom_read. Tested: file size now correctly displays 2,290,194,432 for a 2.2GB file (was showing -2,004,772,864 before this fix). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…fer tuning
Replace MHS compressMem/decompressMem buffer-to-buffer approach with
streaming C-side pipeline using Compress()/Decompress() via callbacks.
This handles >2GB data (streaming vs int-sized CompressMem), reduces
Haskell pipe iteration overhead for large files (8MB buffers instead
of 64KB), and moves the compression loop entirely into native C.
Key changes:
- Environment.cpp: add darc_pipeline_{init,append,compress_step_w,
decompress_step_w,get_buf_w,free} with growing buffer + streaming
callback
- ArcvProcessCompress.hs: 3-phase MHS compress (collect→C compress→write)
- ArcvProcessExtract.hs: 3-phase MHS decompress for both decompressBlock
and deCompressProcess
- Files.hs: increase aBUFFER_SIZE to 8MB and aLARGE_BUFFER_SIZE to 64MB
under MHS to minimize pipe iterations
- CUI.hs: reduce indicator thread from 0.5s to 10s under MHS to avoid
expensive green thread context switches
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Port FreeArc 0.67 4x4 block-MT compressor (Compression/4x4/C_4x4.{h,cpp})
using pthread with thread-safe per-job MemCB; wire format int32 version +
per-block int32 orig_size (-1=raw) | int32 comp_size | payload
- CRC-32 slice-by-8 in Environment.cpp (was silently byte-by-byte because
PRESENT_UINT32 never defined) — ~5x CRC throughput
- Threaded pipeline (reader/main/writer) in darc_compress_solid_block_w
- Skip block_crc computation for DATA_BLOCK (unused by Haskell reader)
- Wrap -m1..-mx in 4x4 with tuned per-level block sizes, ratio preserved:
1xb=4x4:tor:3, 2xb=4x4:b16m:tor:16m:h64m, 3-9binary=4x4:bN:lzma:N:...
- Beats FreeArc 0.67 at every level on 100MB real data (1.3x to 4.2x)
- Restructure Environment.cpp guards so darc_bfile/pipeline/4x4 paths
compile under FREEARC_WIN; add Windows compat for sysconf/realpath/utime/
ftruncate/urandom via CryptGenRandom/_fullpath/_utime/_chsize_s
- All .cpp now cross-compile clean with x86_64-w64-mingw32-g++-posix
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Raw pokeByteOff of a bare Int writes sizeof(Int) bytes — 4 on Win32, 8 elsewhere — while the position always advances by 8. On 32-bit systems this wrote 4 bytes of uninitialized memory per Int and read only 4 bytes (ignoring the upper 32 bits). Breaks archive format compat between 32- and 64-bit builds. Force Int64 serialization explicitly. Same applied to CTime, whose inner representation varies. Wire format is now identical across Linux x64, Windows x64, and Windows x86. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DavidLee18
reviewed
Apr 12, 2026
DavidLee18
left a comment
Owner
There was a problem hiding this comment.
seems good, but the tests are not run yet, right?
- compile-ghc: GHC 9.4.7 Linux x64 build (analog to MHS compile)
- compile-win64-c: cross-mingw-w64 C++ objects for Win64
- compile-ghc-win64: Wine+GHC 9.4.8 Windows bindist cross-compile
- Win32Files.c: external FFI wrappers for HsBase INLINE helpers,
__hscore_seek_* accessors, and UCRT CRT aliases
- compat-win/System/{Time,Locale}.hs: shims to avoid old-time/old-locale
packages on the Windows GHC install
- Fix MVar Int->Integer annotations in ArcvProcess{Compress,Extract}
Three Win64-specific bugs fixed in Win32Files.c:
1. _wstati64/_fstati64 forward to _wstat32i64/_fstat32i64 (not
_wstat64/_fstat64 which use 64-bit time_t with a different layout).
2. _wfindfirsti64/_wfindnexti64 use _wfindfirst64/_wfindnext64 with
struct conversion, since Wine stubs the 32i64 variants.
3. Wrappers return HsInt (not int) so Haskell FFI sees sign-extended
-1; without this, throwErrnoIfMinus1 never fires and fileExist
reports nonexistent files as existing.
Verified: 100% archive cross-compat between Linux MHS, Linux GHC, and
Win64 builds (md5 identical in both directions, storing + LZMA).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove -DFREEARC_NOURL and link -lwininet. URL.cpp already had a WinInet-based implementation for #ifdef FREEARC_WIN — no libcurl needed on Windows since WinInet ships with the OS. WININET.dll confirmed in arc-win64.exe import table. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
Author
It's still in progress, so please don't confirm anything yet :D |
- ArhiveStructure renames to match 0.67: aARCHIVE_SIGNATURE,
aSCAN_MAX, aTAG_END, block_name (+ call sites in ArhiveDirectory).
- Compression.hs: add isNonSolidMethod, isMemoryBarrier_{Compression,
Decompression} as CompressionLib.compressionIs wrappers.
- Options.hs + Cmdline.hs: add --nodates CLI flag.
- ArhiveDirectory.hs: nodates_ref IORef substitutes fiTime with
aMINIMAL_POSSIBLE_DATETIME in directory block when --nodates is set.
- ArcCreate.hs: propagates opt_nodates to nodates_ref at archive start.
Roundtrip verified: mtime 2010-01-01 stored without flag, epoch with.
Vendored from github.com/Intensity/srep under srep/ (self-contained, not sharing headers with DArc's Compression/). Built via srep/compile to Tests/srep; registered in Installer/bin/arc.ini as [External compressor:srep] with single-quoted arcdatafile template (shell PID expansion guard). Usage: arc a -msrep file.arc input (3.93a: huge-dictionary LZ77 preprocessor for long-range dedup).
Contributor
Author
|
It's still in progress, so please don't confirm anything yet :D |
--shutdown/-ioff: power off the computer after the operation completes. Uses ExitWindowsEx(EWX_POWEROFF) on Windows and `shutdown -h now` on Unix, wired through a perform_shutdown IORef set by uiStartArchive. --arc-32bit-legacy: read archives produced by 32-bit FreeArc/Arc.exe. FreeArc's generic `instance (Storable a) => FastBufferData a` writes Int and CTime with native sizeOf stride, so 32-bit builds emit 4-byte slots where DArc x64 expects 8. Without compensation, directory decode desyncs and eventually hits `Enum.Bool.toEnum: bad arg` on dir_flags. The flag toggles a reader that consumes 4 bytes (stride 4) for Int/CTime. Write path is unchanged; native DArc roundtrip is unaffected. Verified against Arc.exe 0.67 -m0 single-file and multi-file archives (byte-exact). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New streaming wrapper Compression/BSC/C_BSC.{cpp,h} around the vendored
libbsc 3.3.12 sources. Registers bsc:BLOCKSIZE:b<sort>:l<minlen>:h<hash>:c<coder>
via AddCompressionMethod. Defaults to BWT + QLFC_STATIC + FASTMODE + MT,
25 MB blocks.
Build: -fno-rtti/-fno-exceptions to match the rest of DArc; -fopenmp so
libsais/libbsc can use OpenMP parallelism. Top-level compile picks up
C_BSC.o and -lgomp.
Fixes to make libbsc vendorable in a single TU: include platform.h early
so INLINE is defined before rangecoder.h is pulled in, and re-define
INLINE after libsais.c (which does #undef INLINE internally) so later
libbsc TUs still see it.
Smoke-tested: multi-file text roundtrip (100 KiB -> 18.8 KiB, cmp OK) and
970 KiB source blob compressed to 170 KiB in 0.7s. GRZip remains
available under -mgrzip.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirror upstream FreeArc 0.67 layout. Still used as an external compressor (no COMPRESSION_METHOD wrapper), but the sources now live where a future native integration would land. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DisPack is an executable-file preprocessor that recognises x86 instruction patterns and reorders them so the downstream entropy coder compresses better. Useful prepended to LZMA for .exe/.dll/.so payloads. Sourced from upstream FreeArc 0.67. Adjustments for the DArc tree: - Filled in GetDictionary/GetBlockSize/SetDecompressionMem/SetDictionary /SetBlockSize: these are pure virtual in DArc's COMPRESSION_METHOD but don't exist in 0.67. - Dropped the `bool purify` parameter from ShowCompressionMethod to match DArc's signature. - Local compat shims in C_DisPack.cpp for BIGALLOC, READ_LEN, BigFreeAndNil and the big-endian value16b/value32b helpers: they live in 0.67's Compression.h/Common.h but not in DArc's. Smoke-tested: /usr/bin/ls (142 KiB ELF) -> 51.4 KiB under `-mdispack+lzma` vs 54.7 KiB under `-mlzma` alone (6% gain); roundtrip cmp-exact. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrates Meta's Zstandard as -mzstd, coexisting with the existing codecs. Syntax: -mzstd:N compression level 1..22 (default 3) -mzstd:N:long[W] enable long-range mode, window log W (default 27) -mzstd:N:w<K> K compression worker threads Uses zstd's native streaming API (ZSTD_compressStream2 / ZSTD_decompressStream), which maps directly onto FreeArc's CALLBACK_FUNC* I/O — no block framing added on top. Build: vendor lib/common, lib/compress, lib/decompress (no legacy, no deprecated, no dll). Each TU is compiled as C99 and merged into a single C_Zstd.o via `ld -r`, so the top-level link stays one object. -DZSTD_DISABLE_ASM keeps the build portable (no .S file dependency). On a 100 KiB Haskell source corpus zstd:19 reaches 20.4 KiB, beating lzma's 20.7 KiB; zstd:3 gets 24.7 KiB at ~500 MB/s class speed. Fills the gap between LZ4 (fast, weak ratio) and LZMA (slow, max ratio). Roundtrip cmp-exact across levels 1/3/9/19. -mN presets are untouched — zstd is available only via explicit -mzstd until we decide whether to thread it into the level shortcuts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ByteStream: 32-bit archive read compatibility (--arc-32bit-legacy) - ArhiveDirectory: remove_unsafe_dirs + make_OS_native_path on read, unixifyPath on write (data-transform gap for FA 0.67.1 compat) - Multi-volume virtual file reader (darc_volfile_*) — no disk duplication - New options: -ac/--ClearArchiveBit, -ao/--SelectArchiveBit (Win32), --nodates, --shutdown - LZMA 24.09 upgrade + LZMA2 codec, LZ4 1.10.0, libbsc 3.3.12, zstd 1.5.6, DisPack - Native 7z read support (SDK 26.00) via Arc7z.hs - GHC 9.4.7 alternative build (compile-ghc) + cross-compile to Win64 (compile-ghc-win64) - Banner: URL → github.com/DavidLee18/DArc, credit line added Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ByteStream: move legacy32bitRead IORef out of __MHS__-only block so GHC builds can see the symbol from UI.hs - C_7z.c: add darc_mkdir() shim (POSIX mkdir(path,mode) vs mingw _mkdir(path)) - compile-ghc-win86: add LZMA 7z24 SDK + 7z SDK objects to link list Build produces PE32 i386 executable. Runtime crashes (c0000005) at startup under Wine+GHC 8.6.5 i386 — likely Wine/RTS incompatibility, not a bug in the build itself. Needs testing on real Win32 to confirm. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Owner
I just wanted this program for my personal use, so anything is fine! |
GHC 8.6.5 i386 incremental link uses `ld -r` per Haskell module, which bundles all -optl C objects (with their .idata sections) into modules like CompressionLib.o and Scripting/Lua.o. The merged .idata sections later confuse the final PE merge into producing a 13th bogus import descriptor labelled ADVAPI32.dll but containing KERNEL32/WININET/USER32 thunks. Win7 loader rejects it with 0xC0000139. Two-stage build: --make -c -no-link, then objcopy --remove-section '.idata$*' on every Haskell .o that absorbed them, then re-invoke GHC to link only. Result: 12 clean import descriptors, arc86.exe loads on Win7 x64 and round-trips storing + lzma archives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove compile-ghc-win86 and compile-win86-c (moved to DArc86 repo) - Remove [mingw32] target from mhs-targets.conf (64-bit only now) - Reword Win32Files.c comment to reference msvcrt MinGW toolchains generically instead of DArc86 - Add a single compat line in README linking to the DArc86 fork --arc-32bit-legacy / legacy32bitRead are unrelated (they read FreeArc 0.67 32-bit archives, not DArc86 archives) and stay untouched. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
compile-ghc-win64 was only linking the C_*.cpp wrappers but missed the underlying SDK sources (LZMA 7z24, 7z native reader, libzstd), which caused undefined-symbol errors for LzmaEnc_*, ZSTD_*, darc_7z_*. - compile-win64-c: compile LZMA/7z24 C sources, 7z SDK with sdk_* prefix, and libzstd common/compress/decompress *.c files - compile-ghc-win64: link the new objects, add --allow-multiple-definition for the overlapping LZMA/Alloc symbols between 7z24 and 7z/sdk Verified: arc-win64.exe roundtrip passes for -m0, -m1..9, -mlzma2, -mppmd, -mzstd, -mbsc, -mx, encryption. Linux<->Win64 interop works for storing, LZMA, LZP, BSC, zstd, LZMA2. PPMD cross-arch remains incompatible (pre-existing) — Windows and Linux users can roundtrip on their own platform.
Win64 cross-compile used generic -O2 -std=c++17 for all PPMD sources, while Compression/PPMD/makefile (Linux) uses -O1 -fstrict-aliasing -fno-exceptions -fno-rtti -fomit-frame-pointer -funroll-loops. The flag mismatch produced divergent codegen for the (WORD&) SWAP pattern in Model.cpp, making PPMD archives byte-incompatible between Linux and Win64 builds (decoder hung at 0% on cross-extract). Add a per-directory override in build() so */PPMD/* sources get the exact Linux flags. Verified: PPMD archives produced by arc (Linux) and arc-win64.exe now byte-identical; bidirectional extraction works. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… smaller than GHC)
Add `compile-mhs-win64` build path that produces `Tests/arc-mhs-win64.exe`
linked statically against mingw runtime. End-to-end roundtrip verified in
Wine for storing/lzma/ppmd/default/encrypted, plus cross-format interop
with the Linux MHS build.
Critical runtime fix: MHS's `openBinaryFileM` calls `fopen("r"/"w")` with
no `b` flag, so Windows opens in text mode and corrupts archive bytes.
New `compat-ghc/MhsBinaryOpen.hs` reimplements `openBinaryFile` using
`System.IO.Internal` primitives and `fopen("rb"/"wb"/"ab"/"w+b")`.
`Files.hs` routes `fOpen`/`fCreate`/`fCreateRW` through it under MHS+WIN.
Charsets defaults: gate the FREEARC_WIN block with `&& !defined(__MHS__)`
so MHS-Win uses UTF-8 defaults, matching the CString filesystem API the
Linux-MHS path already exposes.
Module-level CPP gates restructured across Errors/Files/FileInfo/CUI/
Charsets so MHS-Win falls through portable code paths instead of pulling
in GHC-only Win32/Posix bindings (System.Win32.Types, Win32Files,
GHC.ConsoleHandler, CWString, stdcall imports).
`compile-win64-c` accepts `MHS=1` to add `-D__MHS__` to the C objects so
they ABI-match the MHS-emitted Haskell.
Contributor
Author
|
All set! |
Owner
|
great, thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full port of DArc to MicroHs + modernized codec stack + FreeArc 0.67 wire-format compatibility.
MicroHs runtime fixes
Chan(broken in MicroHs due toreadMVar/put_mvarsemantics) with a customOurChanusingtakeMVaron holes, applied to both the forward channel andinner_backback-channel increatePnewChan→newEmptyMVarfor the per-block backdoor channel (one write per block, no buffering needed)deCompressProcessclosure fix: movedcopyData/processNextInstructionout of awhereinto a sharedletwith explicit parameters (MicroHs closure-capture bug in recursive let-bound functions)#ifdef __MHS__): MicroHs cannot safely re-enterffe_evalfrom C callbacks; under__MHS__, real C-based compressors (LZMA, PPMD, ...) collect input withcollectInputMHSthen callcompressMem/decompressMem(no callbacks). Storing and fake methods still use the streaming pathg_volfile_slots[64]) and pointer out-params for 64-bit returnshGetBuf/hPutBufbypassed viadarc_bfile_read/darc_bfile_write(MHSevalintcrash)C hot path (main performance win)
updateCRCbyte-loop was the extraction bottleneck — rewritten in CCodecs
-mbsc, coexists with GRZip)-mzstd)DWORD = unsigned long = 8 byteson Linux x64 — PR Translate Russian comments to English #2)Compression/PPMD/makefileflags exactly (-O1 -fstrict-aliasing -fno-exceptions -fno-rtti -fomit-frame-pointer -funroll-loops). Any mismatch on the(WORD&)SWAP pattern inModel.cppmakes encoder output diverge — Linux↔Win64 PPMD archives now byte-identicalArc7z.hs— list / extract / test. Create/update via7zzfork.-msrep, vendored undersrep/)FreeArc 0.67 compatibility
ArhiveDirectory.hs(DArc vs FA 0.67.1) confirmed no extended directory tags — the wire gap was just 3 data transforms, now ported:remove_unsafe_dirs+make_OS_native_pathonreadDir(also closes a path-traversal hole)unixifyPathonwriteDir(cross-OS interop)aARCHIVE_VERSIONstays at0.51intentionally — format is already identical, bumping would break 0.51 readers--arc-32bit-legacyflag: reads archives produced by 32-bit FreeArc/Arc.exe 0.67 (4-byteInt/CTimestride,Storablestride quirk on i386)aARCHIVE_SIGNATURE,aSCAN_MAX,aTAG_END,block_name,isNonSolidMethod,isMemoryBarrier_*,getMin{Compression,Decompression}Mem--nodates,--shutdown/-ioff,-ao/--SelectArchiveBit(Win32 functional, Linux stub),-ac/--ClearArchiveBitUNSUPPORTED_METHOD,DATA_ERROR,DATA_ERROR_ENCRYPTED,BAD_CRC_ENCRYPTED,UNKNOWN_ERRORLarge-archive support
darc_volfile_*): opensarchive.001...NNNas one logical stream, no disk duplication. Critical for 10–100 GB archives.Cross-platform builds
./compile)./compile-ghc) — archives 100% compat with MHS build./compile-ghc-win64) — full PE32+ x86-64 binary (Tests/arc-win64.exe, 15M stripped). LZMA 7z SDK 24.09, libzstd 1.5.6, native 7z reader SDK all cross-built; runtime DLLs (libc++,libunwind,libwinpthread) shipped alongside. Cross-arch archive compatibility verified end-to-end (storing/LZMA/LZMA2/PPMD/BSC/zstd/encryption).[Word8], notpeekCStringLen— locale encoding was corrupting keyshttps://github.com/DavidLee18/DArc, credit line addedBenchmark (500 MB
test-files.tar)Test plan
arc a -m0 archive.arc files/— storing round-triparc a -mlzma archive.arc files/— LZMA round-triparc a -mppmd archive.arc files/— PPMD round-triparc a -mbsc / -mzstd / -mlzma2 archive.arc files/— new codecsarc a archive.arc -p<key> files/+arc x -p<key>— encryptionarc x archive.arc— extractionmhson Linux x86-64 (./compile)./compile-ghc)./compile-ghc-win64) —Tests/arc-win64.exePE32+ x86-64arc a -m9→arc x→ md5 match--arc-32bit-legacydarc_volfile_*remove_unsafe_dirs)--arc-32bit-legacy)./compile-mhs-win64producesTests/arc-mhs-win64.exe(4.08 MB PE32+ x86-64, 5.5× smaller than the 22 MB GHC build, deps: only system DLLs). Wine roundtrip OK for storing/LZMA/PPMD/default/encrypted; cross-format interop with Linux MHS verified. Strategy: gate GHC-only Win32 paths with&& !defined(__MHS__)so MHS-Win falls through to the portable Linux-MHS path (Handle/BFILE + CString FFI) instead of portingSystem.Win32manually. Touched:Errors.hs,Files.hs,Charsets.hs,FileInfo.hs,CUI.hs. New:compat-ghc/MhsBinaryOpen.hsworks around an MHS runtime bug —openBinaryFileMcallsfopenwith"r"/"w"(nob), corrupting binaries on Windows; the shim uses"rb"/"wb"/"ab"/"w+b"viaSystem.IO.Internalprimitives.🤖 Generated with Claude Code