Skip to content

Conversation

@zbowling
Copy link

This commit modernizes warctools for Python 3.10+ with comprehensive improvements to code quality, testing, and tooling:

Project Structure:

  • Migrate to src/ layout for proper package structure
  • Move hanzo package to src/hanzo/
  • Add src/warctools/ for backward compatibility re-exports
  • Update build system to uv_build backend

Code Modernization:

  • Remove all future imports (Python 3.10+ only)
  • Add comprehensive type hints throughout codebase
  • Migrate all CLI tools from optparse to click (100% argument compatible)
  • Update f-string usage and modernize string formatting
  • Remove unnecessary object inheritance (UP004)
  • Fix all linting issues (ruff, mypy) systematically

Dependencies & Build:

  • Increment version to 6.0.0
  • Update requires-python to >=3.10
  • Add click>=8.0.0 dependency
  • Switch from setuptools to uv_build
  • Add dev dependencies: pytest, ruff, mypy

Testing:

  • Add comprehensive integration test suite (15 tests)
  • Add CLI help tests for all tools
  • Fix legacy unittest offset calculation bugs
  • All 33 tests passing (integration + unit + CLI)

CI/CD:

  • Add GitHub Actions workflow for automated testing
  • Update Travis CI configuration for modern Python versions
  • Run ruff format, ruff check, mypy, and pytest in CI

Bug Fixes:

  • Fix gzip member offset tracking in GzipRecordStream
  • Fix RecordStream offset calculation for accurate record positioning
  • Fix exception handling and error messages
  • Fix variable naming issues (B007, N806, E741)
  • Fix import ordering and unused imports

Documentation:

  • Add AGENTS.md for future AI agent guidance
  • Document project layout, build process, and tool preferences

All tools tested and verified working on real-world WARC archives.

@zbowling
Copy link
Author

I hit a few bugs and when I went to try and debug I noticed this project didn't use a lot of modern python things. I went nuts here. You don't have to accept this form like this because I know it's a lot but I added signficant tests, modernized the code, switch the arg parser, etc. Brought the code from 2012 era python to 2025 python.

@zbowling zbowling force-pushed the modernize-python-3.10 branch 2 times, most recently from 67d3557 to 8668e22 Compare November 12, 2025 07:57
…esting

This commit modernizes warctools for Python 3.10+ with comprehensive
improvements to code quality, testing, and tooling:

Project Structure:
- Migrate to src/ layout for proper package structure
- Move hanzo package to src/hanzo/
- Add src/warctools/ for backward compatibility re-exports
- Update build system to uv_build backend

Code Modernization:
- Remove all __future__ imports (Python 3.10+ only)
- Add comprehensive type hints throughout codebase
- Migrate all CLI tools from optparse to click (100% argument compatible)
- Update f-string usage and modernize string formatting
- Remove unnecessary object inheritance (UP004)
- Fix all linting issues (ruff, mypy) systematically

Dependencies & Build:
- Increment version to 6.0.0
- Update requires-python to >=3.10
- Add click>=8.0.0 dependency
- Switch from setuptools to uv_build
- Add dev dependencies: pytest, ruff, mypy

Testing:
- Add comprehensive integration test suite (15 tests)
- Add CLI help tests for all tools
- Fix legacy unittest offset calculation bugs
- All 33 tests passing (integration + unit + CLI)

CI/CD:
- Add GitHub Actions workflow for automated testing
- Update Travis CI configuration for modern Python versions
- Run ruff format, ruff check, mypy, and pytest in CI

Bug Fixes:
- Fix gzip member offset tracking in GzipRecordStream
- Fix RecordStream offset calculation for accurate record positioning
- Fix exception handling and error messages
- Fix variable naming issues (B007, N806, E741)
- Fix import ordering and unused imports

Documentation:
- Add AGENTS.md for future AI agent guidance
- Document project layout, build process, and tool preferences

All tools tested and verified working on real-world WARC archives.
@zbowling zbowling force-pushed the modernize-python-3.10 branch from 8668e22 to 5506102 Compare November 12, 2025 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant