Skip to content

Refactor hardware access with dependency injection#137

Draft
mmccarty wants to merge 3 commits intorapidsai:mainfrom
mmccarty:refactor-di
Draft

Refactor hardware access with dependency injection#137
mmccarty wants to merge 3 commits intorapidsai:mainfrom
mmccarty:refactor-di

Conversation

@mmccarty
Copy link
Contributor

Summary

  • Add rapids_cli/hardware.py with provider abstractions: DeviceInfo dataclass, GpuInfoProvider/SystemInfoProvider protocols, NvmlGpuInfo/DefaultSystemInfo real implementations (lazy-loading, cached), and FakeGpuInfo/FakeSystemInfo/FailingGpuInfo/FailingSystemInfo test fakes
  • Refactor gpu.py, cuda_driver.py, memory.py, nvlink.py, and debug.py to accept optional provider parameters instead of calling pynvml/psutil/cuda.pathfinder directly
  • Update doctor.py to create a shared NvmlGpuInfo instance and pass it to all checks
  • Rewrite check/debug tests to use plain dataclass fakes instead of mock.patch — eliminates ~51 hardware patches and ~11 MagicMock objects from those tests
  • Add test_hardware.py with 19 tests for all provider implementations
  • Fix nvlink bug: nvmlDeviceGetNvLinkState(handle, 0) always passed 0 instead of actual nvlink_id

Note: This branch includes commits from #135 (tests) and #136 (docs). The DI-specific changes are in commit 28e3957.

Test plan

  • pytest — all 72 tests pass
  • Coverage at 97.72%, above 95% threshold
  • pre-commit run --all-files passes

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant