Skip to content

Port DArc to MicroHs: pipeline fixes + LZMA compression/decompression#28

Merged
DavidLee18 merged 26 commits into
DavidLee18:mainfrom
YadeWira:main
Apr 26, 2026
Merged

Port DArc to MicroHs: pipeline fixes + LZMA compression/decompression#28
DavidLee18 merged 26 commits into
DavidLee18:mainfrom
YadeWira:main

Conversation

@YadeWira

@YadeWira YadeWira commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Full port of DArc to MicroHs + modernized codec stack + FreeArc 0.67 wire-format compatibility.

MicroHs runtime fixes

  • OurChan in Process.hs: replaced Chan (broken in MicroHs due to readMVar/put_mvar semantics) with a custom OurChan using takeMVar on holes, applied to both the forward channel and inner_back back-channel in createP
  • Backdoor channel in ArcCreate.hs: newChannewEmptyMVar for the per-block backdoor channel (one write per block, no buffering needed)
  • deCompressProcess closure fix: moved copyData / processNextInstruction out of a where into a shared let with explicit parameters (MicroHs closure-capture bug in recursive let-bound functions)
  • Buffer-to-buffer compression/decompression (#ifdef __MHS__): MicroHs cannot safely re-enter ffe_eval from C callbacks; under __MHS__, real C-based compressors (LZMA, PPMD, ...) collect input with collectInputMHS then call compressMem / decompressMem (no callbacks). Storing and fake methods still use the streaming path
  • FFI truncation workaround: MicroHs FFI truncates both int and pointer return values to 32 bits — fixed via int-slot handle pattern (g_volfile_slots[64]) and pointer out-params for 64-bit returns
  • I/O primitives: hGetBuf/hPutBuf bypassed via darc_bfile_read / darc_bfile_write (MHS evalint crash)

C hot path (main performance win)

  • Compression + extraction both moved out of per-chunk Haskell loops into single C functions
  • 230–650× compression speedup, ~300× extraction speedup
  • updateCRC byte-loop was the extraction bottleneck — rewritten in C

Codecs

  • LZMA 24.09 (current 7-Zip) + LZMA2
  • LZ4 1.10.0
  • libbsc 3.3.12 (-mbsc, coexists with GRZip)
  • zstd 1.5.6 (-mzstd)
  • DisPack (ported from FA 0.67)
  • PPMD 64-bit fix (root cause: DWORD = unsigned long = 8 bytes on Linux x64 — PR Translate Russian comments to English #2)
  • PPMD cross-arch flags fix: Win64 cross-compile must match Linux Compression/PPMD/makefile flags exactly (-O1 -fstrict-aliasing -fno-exceptions -fno-rtti -fomit-frame-pointer -funroll-loops). Any mismatch on the (WORD&) SWAP pattern in Model.cpp makes encoder output diverge — Linux↔Win64 PPMD archives now byte-identical
  • Native .7z read (SDK 26.00) via Arc7z.hs — list / extract / test. Create/update via 7zz fork.
  • SREP 3.93a as external compressor (-msrep, vendored under srep/)
  • 4×4 multi-threaded compressor

FreeArc 0.67 compatibility

  • Wire format bit-compatible with FA 0.67 (read + write). Source diff of ArhiveDirectory.hs (DArc vs FA 0.67.1) confirmed no extended directory tags — the wire gap was just 3 data transforms, now ported:
    • remove_unsafe_dirs + make_OS_native_path on readDir (also closes a path-traversal hole)
    • unixifyPath on writeDir (cross-OS interop)
  • aARCHIVE_VERSION stays at 0.51 intentionally — format is already identical, bumping would break 0.51 readers
  • --arc-32bit-legacy flag: reads archives produced by 32-bit FreeArc/Arc.exe 0.67 (4-byte Int/CTime stride, Storable stride quirk on i386)
  • Renames + helpers: aARCHIVE_SIGNATURE, aSCAN_MAX, aTAG_END, block_name, isNonSolidMethod, isMemoryBarrier_*, getMin{Compression,Decompression}Mem
  • New options: --nodates, --shutdown/-ioff, -ao/--SelectArchiveBit (Win32 functional, Linux stub), -ac/--ClearArchiveBit
  • New error types: UNSUPPORTED_METHOD, DATA_ERROR, DATA_ERROR_ENCRYPTED, BAD_CRC_ENCRYPTED, UNKNOWN_ERROR

Large-archive support

  • Multi-volume virtual reader (darc_volfile_*): opens archive.001...NNN as one logical stream, no disk duplication. Critical for 10–100 GB archives.

Cross-platform builds

  • Primary: MicroHs on Linux x86-64 (./compile)
  • Alternative: GHC 9.4.7 on Linux x86-64 (./compile-ghc) — archives 100% compat with MHS build
  • Cross-compile GHC → Win64 (./compile-ghc-win64) — full PE32+ x86-64 binary (Tests/arc-win64.exe, 15M stripped). LZMA 7z SDK 24.09, libzstd 1.5.6, native 7z reader SDK all cross-built; runtime DLLs (libc++, libunwind, libwinpthread) shipped alongside. Cross-arch archive compatibility verified end-to-end (storing/LZMA/LZMA2/PPMD/BSC/zstd/encryption).
  • GHC encryption fix (Linux): marshal binary strings via [Word8], not peekCStringLen — locale encoding was corrupting keys
  • Wire format 100% compat Linux ↔ Windows (Int/CTime fixed-width 64-bit)
  • Banner: URL → https://github.com/DavidLee18/DArc, credit line added

Benchmark (500 MB test-files.tar)

Level DArc size DArc time FreeArc size FreeArc time Notes
-m0 500,000,211 1.50 s 500,000,231 1.95 s tie (storing)
-m1 381,658,235 2.06 s 384,610,498 2.98 s DArc wins (−0.77% size, 1.45× faster)
-m2 357,149,769 11.23 s 364,527,059 6.04 s DArc −2.03% size; FA 1.86× faster
-m3 343,673,207 10.25 s 364,944,660 30.23 s DArc wins (−5.83% size, 2.95× faster)
-m4 332,754,879 21.65 s 355,938,121 35.02 s DArc wins (−6.51% size, 1.62× faster)
-m5 332,699,299 21.49 s 354,629,982 35.67 s DArc wins (−6.19% size, 1.66× faster)
-m6 332,253,450 25.09 s 329,512,687 137.95 s FA −0.83% size; DArc 5.50× faster
-m7 331,875,456 45.33 s 329,308,596 142.74 s FA −0.78% size; DArc 3.15× faster
-m8 331,568,963 82.87 s 328,871,588 151.86 s FA −0.82% size; DArc 1.83× faster
-m9 331,346,258 132.48 s 328,686,046 149.89 s FA −0.80% size; DArc 1.13× faster
-mx 331,346,258 134.93 s 328,674,172 149.72 s alias of max preset; FA −0.81% size; DArc 1.11× faster

Test plan

  • arc a -m0 archive.arc files/ — storing round-trip
  • arc a -mlzma archive.arc files/ — LZMA round-trip
  • arc a -mppmd archive.arc files/ — PPMD round-trip
  • arc a -mbsc / -mzstd / -mlzma2 archive.arc files/ — new codecs
  • arc a archive.arc -p<key> files/ + arc x -p<key> — encryption
  • arc x archive.arc — extraction
  • Multi-file solid blocks
  • Build with mhs on Linux x86-64 (./compile)
  • Build with GHC 9.4.7 on Linux x86-64 (./compile-ghc)
  • Cross-compile GHC → Win64 (./compile-ghc-win64) — Tests/arc-win64.exe PE32+ x86-64
  • Cross-arch archive interop Linux↔Win64 byte-identical (storing/LZMA/LZMA2/PPMD/BSC/zstd/encryption)
  • Roundtrip arc a -m9arc x → md5 match
  • Read FreeArc/Arc.exe 0.67 32-bit archives with --arc-32bit-legacy
  • Multi-volume read via darc_volfile_*
  • Path-traversal sanitization (remove_unsafe_dirs)
  • DArc86 (Win32) with isolated GHC 8.6.5 — verified on real Win7 SP1 x64 (roundtrip + Linux↔arc86 PPMD interop via --arc-32bit-legacy)
  • MHS → Win64 port — ./compile-mhs-win64 produces Tests/arc-mhs-win64.exe (4.08 MB PE32+ x86-64, 5.5× smaller than the 22 MB GHC build, deps: only system DLLs). Wine roundtrip OK for storing/LZMA/PPMD/default/encrypted; cross-format interop with Linux MHS verified. Strategy: gate GHC-only Win32 paths with && !defined(__MHS__) so MHS-Win falls through to the portable Linux-MHS path (Handle/BFILE + CString FFI) instead of porting System.Win32 manually. Touched: Errors.hs, Files.hs, Charsets.hs, FileInfo.hs, CUI.hs. New: compat-ghc/MhsBinaryOpen.hs works around an MHS runtime bug — openBinaryFileM calls fopen with "r"/"w" (no b), corrupting binaries on Windows; the shim uses "rb"/"wb"/"ab"/"w+b" via System.IO.Internal primitives.

🤖 Generated with Claude Code

YadeWira and others added 10 commits April 11, 2026 09:56
…pression

Root cause: MicroHs's ffe_eval/evali uses longjmp for green-thread blocking,
which causes re-entrancy when a Haskell callback called from C blocks on a
takeMVar.  In the LZMA path, the read callback inside ffe_eval blocks waiting
for storingProcess to provide data; MHS then schedules storingProcess, which
sends NoMoreData, but instead of resuming the original blocked callback it
re-enters ffe_eval with a fresh callback invocation that sees an empty pipe
and returns 0 bytes — so LZMA compresses nothing.

Fix: for any compression method that goes through C (not storing/fake),
use compressMem/decompressMem (buffer-to-buffer, no callbacks) instead of
the streaming compress/decompress (which requires Haskell callbacks from C).

- deCompressProcess (#ifdef __MHS__): collect all uncompressed input via the
  pipe reader in Haskell (safe blocking, no ffe_eval), then call compressMem,
  then send the compressed output forward via sendP/receive_backP.
- decompressBlock (#ifdef __MHS__): collect all compressed bytes via reader
  (archive reads, safe), then call decompressMem, then feed decompressed data
  directly to writer (decompressStep handles the file dispatch via the outer
  pipe in normal Haskell threads).
- collectInputMHS: new helper that drains a reader into a contiguous malloc'd
  buffer by reading 65536-byte chunks until EOF; handles files of any size.
- Add Foreign.Marshal.Alloc import and compressMem to CompressionLib imports.

Verified: arc a -mlzma and arc x round-trip correctly for single files,
multi-file solid blocks, and files >65536 bytes (multi-chunk collection).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add MHS buffer-to-buffer compression path in ArcvProcessCompress:
  collects pipe DataChunks into a buffer (acking each to unblock the
  producer), then applies compressMem chain in forward order
- Fix MHS type inference: annotate times MVar with explicit element type
  to resolve Show constraint at uiFinishDeCompression call site
- Fix MHS type inference: annotate numeric literals (0 :: Integer/Int)
  in decompressBlock (startPos, writer NoMoreData, result ref)
- Remove all debug hPutStrLn stderr traces from ArcvProcessExtract,
  ArcvProcessCompress, and Compression/CompressionLib
- Restore aDEFAULT_DIR_COMPRESSION = "lzma:bt4:1m" (was "storing")
- Set aDEFAULT_COMPRESSOR = "lzma" under __MHS__; the numeric level "4"
  expands to dict+lzp+ppmd for text files; PPMD has a C-level bug in its
  in-memory CompressMem interface that causes SIGSEGV
- Add CPP pragma and MHS-specific imports to ArcvProcessCompress
- Multi-method compression chains (e.g. bcj+lzma) now work in MHS

Tested: arc a / arc e round-trips with lzma, bcj+lzma, -mstoring all
pass. Extracted files byte-for-byte identical to originals.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… shims

Fix PPMD compression/decompression crash on 64-bit Unix. Root cause:
PPMdType.h typedef'd DWORD as unsigned long (8 bytes on LP64), but the
PPMd algorithm assumes 32-bit throughout: PPM_CONTEXT (must be 12 bytes
= UNIT_SIZE), arithmetic coder variables (must wrap at 32 bits), and
BLK_NODE free-list management (sizeof(MEM_BLK) must equal UNIT_SIZE).

Changes:
- PPMdType.h: DWORD = unsigned int (guaranteed 32-bit)
- SubAlloc.hpp: CTX_REF/STATE_REF as unsigned int for 12-byte contexts;
  BLK_NODE.next as 4-byte heap ref (BLKREF) so BLK_NODE=8, MEM_BLK=12
- Model.cpp: 3-DWORD context copy uses unsigned int; pointer fields
  replaced with 1-based heap refs + PPCTX/RPCTX/PPSTAT/RPSTAT helpers
- Coder.hpp: range coder variables as unsigned int for 32-bit overflow
- Encryption.hs: use darc_urandom_read C helper under MHS (bypasses
  broken hGetBuf)
- Environment.cpp/h: add darc_urandom_read helper
- compat-ghc/: MicroHs compatibility shims for GHC-only modules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port DArc to MicroHs: full compression pipeline
… -m4

dict_decompress used checked_read for the block header, which treated
EOF (0 bytes read) as an I/O error. This broke all numeric presets -m3
through -m9 (which use dict+lzp+ppmd chains) when decompressing via
buffer-to-buffer DecompressMem. Replace with explicit EOF check so
0-byte read at block boundary returns success.

With dict working, restore MicroHs default compressor from "lzma" to
"4" (dict+lzp+ppmd), matching the GHC build default.

Tested: all presets -m0..-m9, files up to 7MB, arc a/e/l/t/d/m/j.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MicroHs truncates all FFI return values to 32 bits, even for long and
long long on LP64 (where they are 8 bytes). This causes file sizes >2GB
to overflow, breaking compression of large files.

Fix: add _w wrapper functions in Environment.cpp that write 64-bit
results via pointer parameter instead of returning them. Update all
Haskell FFI call sites to use alloca/peek pattern.

Affected functions: darc_bfile_tell, darc_bfile_size, darc_bfile_read,
darc_bfile_write, darc_st_size, darc_st_mtime, darc_time,
darc_mktime_tz, darc_urandom_read.

Tested: file size now correctly displays 2,290,194,432 for a 2.2GB file
(was showing -2,004,772,864 before this fix).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…fer tuning

Replace MHS compressMem/decompressMem buffer-to-buffer approach with
streaming C-side pipeline using Compress()/Decompress() via callbacks.
This handles >2GB data (streaming vs int-sized CompressMem), reduces
Haskell pipe iteration overhead for large files (8MB buffers instead
of 64KB), and moves the compression loop entirely into native C.

Key changes:
- Environment.cpp: add darc_pipeline_{init,append,compress_step_w,
  decompress_step_w,get_buf_w,free} with growing buffer + streaming
  callback
- ArcvProcessCompress.hs: 3-phase MHS compress (collect→C compress→write)
- ArcvProcessExtract.hs: 3-phase MHS decompress for both decompressBlock
  and deCompressProcess
- Files.hs: increase aBUFFER_SIZE to 8MB and aLARGE_BUFFER_SIZE to 64MB
  under MHS to minimize pipe iterations
- CUI.hs: reduce indicator thread from 0.5s to 10s under MHS to avoid
  expensive green thread context switches

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Port FreeArc 0.67 4x4 block-MT compressor (Compression/4x4/C_4x4.{h,cpp})
  using pthread with thread-safe per-job MemCB; wire format int32 version +
  per-block int32 orig_size (-1=raw) | int32 comp_size | payload
- CRC-32 slice-by-8 in Environment.cpp (was silently byte-by-byte because
  PRESENT_UINT32 never defined) — ~5x CRC throughput
- Threaded pipeline (reader/main/writer) in darc_compress_solid_block_w
- Skip block_crc computation for DATA_BLOCK (unused by Haskell reader)
- Wrap -m1..-mx in 4x4 with tuned per-level block sizes, ratio preserved:
  1xb=4x4:tor:3, 2xb=4x4:b16m:tor:16m:h64m, 3-9binary=4x4:bN:lzma:N:...
- Beats FreeArc 0.67 at every level on 100MB real data (1.3x to 4.2x)
- Restructure Environment.cpp guards so darc_bfile/pipeline/4x4 paths
  compile under FREEARC_WIN; add Windows compat for sysconf/realpath/utime/
  ftruncate/urandom via CryptGenRandom/_fullpath/_utime/_chsize_s
- All .cpp now cross-compile clean with x86_64-w64-mingw32-g++-posix

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Raw pokeByteOff of a bare Int writes sizeof(Int) bytes — 4 on Win32, 8
elsewhere — while the position always advances by 8. On 32-bit systems
this wrote 4 bytes of uninitialized memory per Int and read only 4 bytes
(ignoring the upper 32 bits). Breaks archive format compat between 32- and
64-bit builds.

Force Int64 serialization explicitly. Same applied to CTime, whose inner
representation varies. Wire format is now identical across Linux x64,
Windows x64, and Windows x86.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@DavidLee18 DavidLee18 left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems good, but the tests are not run yet, right?

YadeWira and others added 2 commits April 12, 2026 01:59
- compile-ghc: GHC 9.4.7 Linux x64 build (analog to MHS compile)
- compile-win64-c: cross-mingw-w64 C++ objects for Win64
- compile-ghc-win64: Wine+GHC 9.4.8 Windows bindist cross-compile
- Win32Files.c: external FFI wrappers for HsBase INLINE helpers,
  __hscore_seek_* accessors, and UCRT CRT aliases
- compat-win/System/{Time,Locale}.hs: shims to avoid old-time/old-locale
  packages on the Windows GHC install
- Fix MVar Int->Integer annotations in ArcvProcess{Compress,Extract}

Three Win64-specific bugs fixed in Win32Files.c:
1. _wstati64/_fstati64 forward to _wstat32i64/_fstat32i64 (not
   _wstat64/_fstat64 which use 64-bit time_t with a different layout).
2. _wfindfirsti64/_wfindnexti64 use _wfindfirst64/_wfindnext64 with
   struct conversion, since Wine stubs the 32i64 variants.
3. Wrappers return HsInt (not int) so Haskell FFI sees sign-extended
   -1; without this, throwErrnoIfMinus1 never fires and fileExist
   reports nonexistent files as existing.

Verified: 100% archive cross-compat between Linux MHS, Linux GHC, and
Win64 builds (md5 identical in both directions, storing + LZMA).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove -DFREEARC_NOURL and link -lwininet. URL.cpp already had a
WinInet-based implementation for #ifdef FREEARC_WIN — no libcurl
needed on Windows since WinInet ships with the OS.

WININET.dll confirmed in arc-win64.exe import table.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@YadeWira

Copy link
Copy Markdown
Contributor Author

seems good, but the tests are not run yet, right?

It's still in progress, so please don't confirm anything yet :D

- ArhiveStructure renames to match 0.67: aARCHIVE_SIGNATURE,
  aSCAN_MAX, aTAG_END, block_name (+ call sites in ArhiveDirectory).
- Compression.hs: add isNonSolidMethod, isMemoryBarrier_{Compression,
  Decompression} as CompressionLib.compressionIs wrappers.
- Options.hs + Cmdline.hs: add --nodates CLI flag.
- ArhiveDirectory.hs: nodates_ref IORef substitutes fiTime with
  aMINIMAL_POSSIBLE_DATETIME in directory block when --nodates is set.
- ArcCreate.hs: propagates opt_nodates to nodates_ref at archive start.

Roundtrip verified: mtime 2010-01-01 stored without flag, epoch with.
Vendored from github.com/Intensity/srep under srep/ (self-contained,
not sharing headers with DArc's Compression/). Built via srep/compile
to Tests/srep; registered in Installer/bin/arc.ini as
[External compressor:srep] with single-quoted arcdatafile template
(shell PID expansion guard).

Usage: arc a -msrep file.arc input   (3.93a: huge-dictionary LZ77
preprocessor for long-range dedup).
@YadeWira

YadeWira commented Apr 12, 2026

Copy link
Copy Markdown
Contributor Author

It's still in progress, so please don't confirm anything yet :D
Do you want the new file format to be ".darc" or to remain ".arc"?

YadeWira and others added 7 commits April 12, 2026 13:34
--shutdown/-ioff: power off the computer after the operation completes.
Uses ExitWindowsEx(EWX_POWEROFF) on Windows and `shutdown -h now` on Unix,
wired through a perform_shutdown IORef set by uiStartArchive.

--arc-32bit-legacy: read archives produced by 32-bit FreeArc/Arc.exe.
FreeArc's generic `instance (Storable a) => FastBufferData a` writes Int
and CTime with native sizeOf stride, so 32-bit builds emit 4-byte slots
where DArc x64 expects 8. Without compensation, directory decode desyncs
and eventually hits `Enum.Bool.toEnum: bad arg` on dir_flags. The flag
toggles a reader that consumes 4 bytes (stride 4) for Int/CTime. Write
path is unchanged; native DArc roundtrip is unaffected. Verified against
Arc.exe 0.67 -m0 single-file and multi-file archives (byte-exact).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New streaming wrapper Compression/BSC/C_BSC.{cpp,h} around the vendored
libbsc 3.3.12 sources. Registers bsc:BLOCKSIZE:b<sort>:l<minlen>:h<hash>:c<coder>
via AddCompressionMethod. Defaults to BWT + QLFC_STATIC + FASTMODE + MT,
25 MB blocks.

Build: -fno-rtti/-fno-exceptions to match the rest of DArc; -fopenmp so
libsais/libbsc can use OpenMP parallelism. Top-level compile picks up
C_BSC.o and -lgomp.

Fixes to make libbsc vendorable in a single TU: include platform.h early
so INLINE is defined before rangecoder.h is pulled in, and re-define
INLINE after libsais.c (which does #undef INLINE internally) so later
libbsc TUs still see it.

Smoke-tested: multi-file text roundtrip (100 KiB -> 18.8 KiB, cmp OK) and
970 KiB source blob compressed to 170 KiB in 0.7s. GRZip remains
available under -mgrzip.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirror upstream FreeArc 0.67 layout. Still used as an external
compressor (no COMPRESSION_METHOD wrapper), but the sources now live
where a future native integration would land.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
DisPack is an executable-file preprocessor that recognises x86
instruction patterns and reorders them so the downstream entropy
coder compresses better. Useful prepended to LZMA for .exe/.dll/.so
payloads.

Sourced from upstream FreeArc 0.67. Adjustments for the DArc tree:

- Filled in GetDictionary/GetBlockSize/SetDecompressionMem/SetDictionary
  /SetBlockSize: these are pure virtual in DArc's COMPRESSION_METHOD
  but don't exist in 0.67.
- Dropped the `bool purify` parameter from ShowCompressionMethod to
  match DArc's signature.
- Local compat shims in C_DisPack.cpp for BIGALLOC, READ_LEN,
  BigFreeAndNil and the big-endian value16b/value32b helpers: they
  live in 0.67's Compression.h/Common.h but not in DArc's.

Smoke-tested: /usr/bin/ls (142 KiB ELF) -> 51.4 KiB under
`-mdispack+lzma` vs 54.7 KiB under `-mlzma` alone (6% gain); roundtrip
cmp-exact.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Integrates Meta's Zstandard as -mzstd, coexisting with the existing
codecs. Syntax:

  -mzstd:N            compression level 1..22 (default 3)
  -mzstd:N:long[W]    enable long-range mode, window log W (default 27)
  -mzstd:N:w<K>       K compression worker threads

Uses zstd's native streaming API (ZSTD_compressStream2 /
ZSTD_decompressStream), which maps directly onto FreeArc's
CALLBACK_FUNC* I/O — no block framing added on top.

Build: vendor lib/common, lib/compress, lib/decompress (no legacy,
no deprecated, no dll). Each TU is compiled as C99 and merged into a
single C_Zstd.o via `ld -r`, so the top-level link stays one object.
-DZSTD_DISABLE_ASM keeps the build portable (no .S file dependency).

On a 100 KiB Haskell source corpus zstd:19 reaches 20.4 KiB, beating
lzma's 20.7 KiB; zstd:3 gets 24.7 KiB at ~500 MB/s class speed. Fills
the gap between LZ4 (fast, weak ratio) and LZMA (slow, max ratio).
Roundtrip cmp-exact across levels 1/3/9/19.

-mN presets are untouched — zstd is available only via explicit -mzstd
until we decide whether to thread it into the level shortcuts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ByteStream: 32-bit archive read compatibility (--arc-32bit-legacy)
- ArhiveDirectory: remove_unsafe_dirs + make_OS_native_path on read, unixifyPath on write (data-transform gap for FA 0.67.1 compat)
- Multi-volume virtual file reader (darc_volfile_*) — no disk duplication
- New options: -ac/--ClearArchiveBit, -ao/--SelectArchiveBit (Win32), --nodates, --shutdown
- LZMA 24.09 upgrade + LZMA2 codec, LZ4 1.10.0, libbsc 3.3.12, zstd 1.5.6, DisPack
- Native 7z read support (SDK 26.00) via Arc7z.hs
- GHC 9.4.7 alternative build (compile-ghc) + cross-compile to Win64 (compile-ghc-win64)
- Banner: URL → github.com/DavidLee18/DArc, credit line added

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ByteStream: move legacy32bitRead IORef out of __MHS__-only block so GHC builds can see the symbol from UI.hs
- C_7z.c: add darc_mkdir() shim (POSIX mkdir(path,mode) vs mingw _mkdir(path))
- compile-ghc-win86: add LZMA 7z24 SDK + 7z SDK objects to link list

Build produces PE32 i386 executable. Runtime crashes (c0000005) at startup
under Wine+GHC 8.6.5 i386 — likely Wine/RTS incompatibility, not a bug in
the build itself. Needs testing on real Win32 to confirm.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@DavidLee18

Copy link
Copy Markdown
Owner

It's still in progress, so please don't confirm anything yet :D Do you want the new file format to be ".darc" or to remain ".arc"?

I just wanted this program for my personal use, so anything is fine!

YadeWira and others added 5 commits April 12, 2026 23:48
GHC 8.6.5 i386 incremental link uses `ld -r` per Haskell module, which
bundles all -optl C objects (with their .idata sections) into modules
like CompressionLib.o and Scripting/Lua.o. The merged .idata sections
later confuse the final PE merge into producing a 13th bogus import
descriptor labelled ADVAPI32.dll but containing KERNEL32/WININET/USER32
thunks. Win7 loader rejects it with 0xC0000139.

Two-stage build: --make -c -no-link, then objcopy --remove-section
'.idata$*' on every Haskell .o that absorbed them, then re-invoke GHC
to link only. Result: 12 clean import descriptors, arc86.exe loads on
Win7 x64 and round-trips storing + lzma archives.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove compile-ghc-win86 and compile-win86-c (moved to DArc86 repo)
- Remove [mingw32] target from mhs-targets.conf (64-bit only now)
- Reword Win32Files.c comment to reference msvcrt MinGW toolchains
  generically instead of DArc86
- Add a single compat line in README linking to the DArc86 fork

--arc-32bit-legacy / legacy32bitRead are unrelated (they read FreeArc 0.67
32-bit archives, not DArc86 archives) and stay untouched.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
compile-ghc-win64 was only linking the C_*.cpp wrappers but missed the
underlying SDK sources (LZMA 7z24, 7z native reader, libzstd), which
caused undefined-symbol errors for LzmaEnc_*, ZSTD_*, darc_7z_*.

- compile-win64-c: compile LZMA/7z24 C sources, 7z SDK with sdk_* prefix,
  and libzstd common/compress/decompress *.c files
- compile-ghc-win64: link the new objects, add --allow-multiple-definition
  for the overlapping LZMA/Alloc symbols between 7z24 and 7z/sdk

Verified: arc-win64.exe roundtrip passes for -m0, -m1..9, -mlzma2,
-mppmd, -mzstd, -mbsc, -mx, encryption. Linux<->Win64 interop works
for storing, LZMA, LZP, BSC, zstd, LZMA2. PPMD cross-arch remains
incompatible (pre-existing) — Windows and Linux users can roundtrip
on their own platform.
Win64 cross-compile used generic -O2 -std=c++17 for all PPMD sources, while
Compression/PPMD/makefile (Linux) uses -O1 -fstrict-aliasing -fno-exceptions
-fno-rtti -fomit-frame-pointer -funroll-loops. The flag mismatch produced
divergent codegen for the (WORD&) SWAP pattern in Model.cpp, making PPMD
archives byte-incompatible between Linux and Win64 builds (decoder hung at
0% on cross-extract).

Add a per-directory override in build() so */PPMD/* sources get the exact
Linux flags. Verified: PPMD archives produced by arc (Linux) and
arc-win64.exe now byte-identical; bidirectional extraction works.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… smaller than GHC)

Add `compile-mhs-win64` build path that produces `Tests/arc-mhs-win64.exe`
linked statically against mingw runtime. End-to-end roundtrip verified in
Wine for storing/lzma/ppmd/default/encrypted, plus cross-format interop
with the Linux MHS build.

Critical runtime fix: MHS's `openBinaryFileM` calls `fopen("r"/"w")` with
no `b` flag, so Windows opens in text mode and corrupts archive bytes.
New `compat-ghc/MhsBinaryOpen.hs` reimplements `openBinaryFile` using
`System.IO.Internal` primitives and `fopen("rb"/"wb"/"ab"/"w+b")`.
`Files.hs` routes `fOpen`/`fCreate`/`fCreateRW` through it under MHS+WIN.

Charsets defaults: gate the FREEARC_WIN block with `&& !defined(__MHS__)`
so MHS-Win uses UTF-8 defaults, matching the CString filesystem API the
Linux-MHS path already exposes.

Module-level CPP gates restructured across Errors/Files/FileInfo/CUI/
Charsets so MHS-Win falls through portable code paths instead of pulling
in GHC-only Win32/Posix bindings (System.Win32.Types, Win32Files,
GHC.ConsoleHandler, CWString, stdcall imports).

`compile-win64-c` accepts `MHS=1` to add `-D__MHS__` to the C objects so
they ABI-match the MHS-emitted Haskell.
@YadeWira

Copy link
Copy Markdown
Contributor Author

All set!

@DavidLee18

Copy link
Copy Markdown
Owner

great, thanks!

@DavidLee18 DavidLee18 merged commit 61fd06b into DavidLee18:main Apr 26, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants