Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 128 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ A Ruby gem providing FFI bindings to the LZMA SDK (7-Zip) for archive operations

## Features

- ✅ **Create and extract archives** - Full read/write support for 7z archives
- ✅ **Password-protected archives** - Full support for encrypted 7z files
- ✅ **Memory buffer operations** - Work with archives in memory (StringIO/String)
- ✅ **Archive verification** - Validate archive integrity before extraction
- ✅ **Fast extraction** - Direct FFI bindings to native 7-Zip SDK
- ✅ **Configurable compression** - Multiple compression levels and methods
- ✅ **Fast operations** - Direct FFI bindings to native 7-Zip SDK
- ✅ **Type hints** - RBS type signatures included
- ✅ **Thread-safe** - No global state

Expand Down Expand Up @@ -53,9 +55,36 @@ rake compile

## Quick Start

### Creating Archives

```ruby
require 'ruby_lzma'

# Create a new archive
RubyLzma::Archive::Writer.open('archive.7z') do |writer|
# Add files from disk
writer.add_file('/path/to/file.txt', 'file.txt')

# Add data from memory
writer.add_data('Hello, World!', 'greeting.txt')

# Create directories
writer.mkdir('empty_folder')
end

# Create encrypted archive with compression options
RubyLzma::Archive::Writer.open('secure.7z',
password: 'secret123',
level: RubyLzma::Archive::Writer::LEVEL_ULTRA
) do |writer|
writer.add_file('/path/to/sensitive.pdf', 'documents/sensitive.pdf')
writer.add_data(secret_data, 'data/secrets.bin')
end
```

### Extracting Archives

```ruby
# Extract all files from an archive
RubyLzma::Archive::Reader.open('archive.7z') do |reader|
reader.extract_all('/output/directory')
Expand Down Expand Up @@ -84,8 +113,9 @@ end

For detailed usage examples, see [EXAMPLES.md](EXAMPLES.md) which covers:

- **Basic Operations**: List contents, extract files, pattern matching
- **Password-Protected Archives**: Working with encrypted archives, password recovery
- **Creating Archives**: Add files, add data from memory, compression options
- **Extracting Archives**: List contents, extract files, pattern matching
- **Password-Protected Archives**: Working with encrypted archives (read/write)
- **Memory Operations**: HTTP downloads, database storage, stream processing
- **Advanced Usage**: Metadata inspection, verification, concurrent processing
- **Error Handling**: Comprehensive error recovery patterns
Expand Down Expand Up @@ -118,6 +148,31 @@ RubyLzma.sdk_info # => Full SDK info with copyright
| `close` | Close archive | void |
| `closed?` | Check if closed | Boolean |

### Writer Class

| Method | Description | Returns |
|--------|-------------|---------|
| `new(path, format:, password:, level:)` | Create writer for file | Writer |
| `open(path, ...)` | Open file with auto-close | Writer (or block result) |
| `add_file(source, archive_path)` | Add file from disk | void |
| `add_data(data, archive_path)` | Add data from memory | void |
| `add_directory(source, archive_path)` | Add directory recursively | void |
| `mkdir(archive_path)` | Create empty directory | void |
| `size` / `count` / `length` | Entry count | Integer |
| `close` | Finalize and write archive | void |
| `closed?` | Check if closed | Boolean |

#### Compression Levels

```ruby
RubyLzma::Archive::Writer::LEVEL_STORE # No compression (fastest)
RubyLzma::Archive::Writer::LEVEL_FASTEST # Minimal compression
RubyLzma::Archive::Writer::LEVEL_FAST # Fast compression
RubyLzma::Archive::Writer::LEVEL_NORMAL # Balanced (default)
RubyLzma::Archive::Writer::LEVEL_MAXIMUM # High compression
RubyLzma::Archive::Writer::LEVEL_ULTRA # Maximum compression (slowest)
```

### Entry Class

| Attribute | Type | Description |
Expand All @@ -141,13 +196,13 @@ RubyLzma::FFI::Constants::SZ_FORMAT_7Z # => 0
RubyLzma::FFI::Constants::SZ_FORMAT_ZIP # => 1
RubyLzma::FFI::Constants::SZ_FORMAT_TAR # => 2

# Compression levels (for future Writer support)
RubyLzma::FFI::Constants::SZ_LEVEL_STORE # => 0
# Compression levels (used with Writer)
RubyLzma::FFI::Constants::SZ_LEVEL_STORE # => 0 (no compression)
RubyLzma::FFI::Constants::SZ_LEVEL_FASTEST # => 1
RubyLzma::FFI::Constants::SZ_LEVEL_FAST # => 3
RubyLzma::FFI::Constants::SZ_LEVEL_NORMAL # => 5
RubyLzma::FFI::Constants::SZ_LEVEL_NORMAL # => 5 (default)
RubyLzma::FFI::Constants::SZ_LEVEL_MAXIMUM # => 7
RubyLzma::FFI::Constants::SZ_LEVEL_ULTRA # => 9
RubyLzma::FFI::Constants::SZ_LEVEL_ULTRA # => 9 (best compression)
```

## Type Hints
Expand Down Expand Up @@ -186,24 +241,82 @@ See [SDK_UPDATE.md](SDK_UPDATE.md) for detailed instructions on updating the emb

The gem consists of three layers:

1. **C API** (`lzma_sdk_wrapper.h/c`) - Clean C interface
2. **C++ Bridge** (`cpp_bridge.cpp`) - LZMA SDK COM interface wrapper
1. **C API** (`lzma_sdk_wrapper.h/c`) - Clean C interface for archive operations
2. **C++ Bridge** (`cpp_bridge.cpp`) - LZMA SDK COM interface wrapper (IInArchive/IOutArchive)
3. **Ruby FFI** (`lib/ffi/`) - Ruby bindings via FFI

Key classes:
- `RubyLzma::Archive::Reader` - Read and extract from archives
- `RubyLzma::Archive::Writer` - Create new archives
- `RubyLzma::Entry` - Archive entry metadata

This architecture provides:
- Type safety (C API contract)
- Performance (direct FFI calls)
- Performance (direct FFI calls, native LZMA2 compression)
- Maintainability (clear separation of concerns)
- Memory safety (no leaks in native code)

## Security

This library includes protection against common archive-related security vulnerabilities.

### Zip Slip Protection (Path Traversal)

The `Writer` class sanitizes all archive paths to prevent Zip Slip attacks:

```ruby
# These will raise RubyLzma::Archive::Writer::PathTraversalError
writer.add_data("data", "../../../etc/passwd") # Parent traversal
writer.add_data("data", "/etc/passwd") # Absolute path
writer.add_data("data", "foo\x00bar.txt") # Null bytes
```

The `Reader` class validates extraction paths during `extract_all`:

```ruby
# Paths are validated to stay within the target directory
reader.extract_all("/safe/output/dir")
```

### Archive Bomb Protection

The library protects against archive bomb (zip bomb) attacks with configurable limits:

```ruby
# Default limits (class-level configuration)
RubyLzma::Archive::Reader.max_entry_size # 1 GB per entry
RubyLzma::Archive::Reader.max_total_size # 10 GB total
RubyLzma::Archive::Reader.max_compression_ratio # 1000:1 ratio

# Customize limits
RubyLzma::Archive::Reader.max_entry_size = 100 * 1024 * 1024 # 100 MB
RubyLzma::Archive::Reader.max_compression_ratio = 500 # 500:1

# Disable specific limits (use with caution)
RubyLzma::Archive::Reader.max_entry_size = nil
```

When limits are exceeded, `ExtractionLimitError` is raised with a helpful message:

```ruby
begin
reader.extract_data(entry)
rescue RubyLzma::Archive::Reader::ExtractionLimitError => e
puts e.message # Includes instructions to disable limit if needed
end
```

## Limitations

Current version focuses on **reading** archives:
Current version supports **reading and writing** 7z archives:
- ✅ Create archives
- ✅ Extract archives
- ✅ Password-protected archives
- ✅ Memory operations
- ❌ Create archives
- ❌ Modify archives
- ✅ Password-protected archives (read/write)
- ✅ Memory operations (add data from memory, extract to memory)
- ✅ Configurable compression levels
- ❌ Modify existing archives (append/delete entries)
- ❌ Multi-volume archives
- ❌ ZIP/TAR formats (7z only)


## Contributing
Expand Down
3 changes: 2 additions & 1 deletion benchmark/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/fixtures/
/output/
results.txt
results.txt
valgrind_report.txt
92 changes: 89 additions & 3 deletions benchmark/check_c_leaks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,18 +36,23 @@ fi

# Create the test script content
TEST_SCRIPT_CONTENT='
require "fileutils"
require_relative "/workspace/lib/ruby_lzma"

archive = ARGV[0]
output_dir = "/tmp/valgrind_test_output"
FileUtils.mkdir_p(output_dir)

puts "Running memory leak tests..."

# ============= READER TESTS =============

# Test 1: Open and close
10.times do
reader = RubyLzma::Archive::Reader.new(archive)
reader.close
end
puts " Test 1: Open/Close - done"
puts " Test 1: Reader Open/Close - done"

# Test 2: Extract to memory
RubyLzma::Archive::Reader.open(archive) do |reader|
Expand All @@ -56,14 +61,95 @@ RubyLzma::Archive::Reader.open(archive) do |reader|
data = reader.extract_data(i)
end
end
puts " Test 2: Extract to memory - done"
puts " Test 2: Reader Extract to memory - done"

# Test 3: Verify
RubyLzma::Archive::Reader.open(archive) do |reader|
reader.test
end
puts " Test 3: Verify - done"
puts " Test 3: Reader Verify - done"

# ============= WRITER TESTS =============

# Test 4: Create archive with add_data
10.times do |n|
output_archive = "#{output_dir}/test_write_#{n}.7z"
RubyLzma::Archive::Writer.open(output_archive) do |writer|
writer.add_data("Test data content #{n}", "file.txt")
end
FileUtils.rm_f(output_archive)
end
puts " Test 4: Writer Create with add_data - done"

# Test 5: Create archive with multiple entries
5.times do |n|
output_archive = "#{output_dir}/test_multi_#{n}.7z"
RubyLzma::Archive::Writer.open(output_archive) do |writer|
5.times do |i|
writer.add_data("Content for file #{i}" * 100, "file_#{i}.txt")
end
writer.mkdir("empty_dir")
end
FileUtils.rm_f(output_archive)
end
puts " Test 5: Writer Create multiple entries - done"

# Test 6: Create encrypted archive
5.times do |n|
output_archive = "#{output_dir}/test_encrypted_#{n}.7z"
RubyLzma::Archive::Writer.open(output_archive, password: "TestPassword123") do |writer|
writer.add_data("Secret data #{n}" * 50, "secret.txt")
end
FileUtils.rm_f(output_archive)
end
puts " Test 6: Writer Create encrypted - done"

# Test 7: Create large archive
3.times do |n|
output_archive = "#{output_dir}/test_large_#{n}.7z"
large_data = "x" * 100_000 # 100KB
RubyLzma::Archive::Writer.open(output_archive) do |writer|
3.times do |i|
writer.add_data(large_data, "large_#{i}.txt")
end
end
FileUtils.rm_f(output_archive)
end
puts " Test 7: Writer Create large archive - done"

# Test 8: Writer open and close explicitly
5.times do |n|
output_archive = "#{output_dir}/test_explicit_#{n}.7z"
writer = RubyLzma::Archive::Writer.new(output_archive)
writer.add_data("Test content", "test.txt")
writer.close
FileUtils.rm_f(output_archive)
end
puts " Test 8: Writer explicit open/close - done"

# ============= ROUND-TRIP TEST =============

# Test 9: Write and then read back (round-trip)
3.times do |n|
output_archive = "#{output_dir}/test_roundtrip_#{n}.7z"
test_content = "Round trip test data #{n}" * 100

# Write
RubyLzma::Archive::Writer.open(output_archive) do |writer|
writer.add_data(test_content, "roundtrip.txt")
end

# Read back
RubyLzma::Archive::Reader.open(output_archive) do |reader|
data = reader.extract_data(0)
raise "Data mismatch!" unless data == test_content
end

FileUtils.rm_f(output_archive)
end
puts " Test 9: Round-trip write/read - done"

FileUtils.rm_rf(output_dir)
puts "All tests completed successfully"
'

Expand Down
36 changes: 35 additions & 1 deletion benchmark/check_memory_leaks.rb
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,41 @@ def run_all_tests
entries = reader.entries
end
end


# Test 7: Create archive (Writer)
output_archive = File.join(FIXTURES_DIR, '..', 'output', 'test_write.7z')
FileUtils.mkdir_p(File.dirname(output_archive))
test_operation("Create archive (Writer)", archive_path, nil) do |path, pwd|
FileUtils.rm_f(output_archive)
RubyLzma::Archive::Writer.open(output_archive) do |writer|
writer.add_data("Test data content", "test.txt")
writer.add_data("More content here", "data.txt")
end
end
FileUtils.rm_f(output_archive)

# Test 8: Create encrypted archive
test_operation("Create encrypted archive", archive_path, nil) do |path, pwd|
FileUtils.rm_f(output_archive)
RubyLzma::Archive::Writer.open(output_archive, password: 'test123') do |writer|
writer.add_data("Secret data", "secret.txt")
end
end
FileUtils.rm_f(output_archive)

# Test 9: Create large archive
large_data = "x" * 100_000
test_operation("Create large archive", archive_path, nil) do |path, pwd|
FileUtils.rm_f(output_archive)
RubyLzma::Archive::Writer.open(output_archive) do |writer|
5.times do |i|
writer.add_data(large_data, "large_#{i}.txt")
end
end
end
FileUtils.rm_f(output_archive)
FileUtils.rm_rf(File.dirname(output_archive))

# Summary
print_summary
end
Expand Down
Loading