Mission-critical audit: downloader safety, endpoint hardening, and test stabilization#33
Conversation
Mission-critical audit fixes: - Align DiffusionModelDownloaderTests compile gating with production MLX/Hub/HuggingFace availability - Harden DiffusionModelDownloader correctness: - reserve disk budget per in-flight model to prevent concurrent overcommit checks - cancel/await in-flight task during deleteModel to prevent re-registration races - release reserved bytes on cancellation and cleanup - make deleteAllModels report file deletion failures instead of silently clearing state - enforce checksum verification failure when expected checksum is provided but no safetensors file is present - Replace detached model download launcher with child Task in ModelManager for better cancellation/lifecycle semantics - Eliminate Azure endpoint crash risk by sanitizing resource names and constructing URL via URLComponents (with safe fallback) - Add Azure endpoint regression tests for invalid/empty resource inputs - Remove brittle hardcoded registry/provider totals in tests and assert invariant behavior against current registry data - Update ProviderType case-iterable test coverage for newly added providers - Clean up strict-concurrency warning path in GenerationSchema encoding error payload type - Update DeviceCapabilities C-string decoding to non-deprecated UTF-8 decoding Verification: - swift test (352 tests, 60 suites) passes - swift build passes
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63e604a278
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| activeDownloads[modelId]?.cancel() | ||
| activeDownloads.removeValue(forKey: modelId) | ||
| releaseReservedDiskSpace(for: modelId) |
There was a problem hiding this comment.
Preserve disk reservation until cancellation completes
Releasing reserved bytes inside cancelDownload before the underlying task has finished makes the new disk-safety accounting inaccurate under cooperative cancellation. The download task only checks cancellation around the long snapshot await, so a canceled transfer can keep consuming disk for a while; if another download starts in that window, checkAvailableDiskSpace no longer includes the in-flight reservation and can admit an overcommit that this change is meant to prevent. Keep the reservation until task termination (e.g., via the existing deferred cleanup) rather than clearing it immediately on cancel.
Useful? React with 👍 / 👎.
|
@claude Releasing reserved bytes inside cancelDownload before the underlying task has finished makes the new disk-safety accounting inaccurate under cooperative cancellation. The download task only checks cancellation around the long snapshot await, so a canceled transfer can keep consuming disk for a while; if another download starts in that window, checkAvailableDiskSpace no longer includes the in-flight reservation and can admit an overcommit that this change is meant to prevent. Keep the reservation until task termination (e.g., via the existing deferred cleanup) rather than clearing it immediately on cancel @claude fix merge conflicts with main |
Summary
This PR applies mission-critical correctness and safety fixes identified during the framework audit, plus regression test stabilization to prevent false negatives.
What Changed
1) Build/Test gate correctness
DiffusionModelDownloaderTestscompile guard to match production symbol availability:#if canImport(Hub)#if CONDUIT_TRAIT_MLX && canImport(MLX) && (canImport(Hub) || canImport(HuggingFace))2) Diffusion downloader correctness hardening
File:
Sources/Conduit/ImageGeneration/DiffusionModelDownloader.swiftreservedDiskBytesByModeltracks reservation per model download.checkAvailableDiskSpacenow accounts for in-progress reservations to avoid concurrent overcommit false safety checks.deleteModel(modelId:)now cancels and awaits any in-flight task before deleting/removing registry state.deleteAllModels()no longer silently ignores deletion failures and then wipes registry.AIError.fileErrorwhen real deletion failures occur..safetensorsfile is found, now throwsAIError.checksumMismatchinstead of silently succeeding.cancelDownload,cancelAllDownloads, deferred completion).3) ModelManager task lifecycle safety
File:
Sources/Conduit/ModelManagement/ModelManager.swiftTask.detachedwithTaskfor background download kickoff indownloadTask(for:).4) OpenAI Azure endpoint crash hardening
File:
Sources/Conduit/Providers/OpenAI/OpenAIEndpoint.swiftURLComponents.5) OpenAI endpoint regression tests
File:
Tests/ConduitTests/Providers/OpenAI/OpenAIProviderTests.swiftdefault6) Registry/provider test stability improvements
Files:
Tests/ConduitTests/Core/ModelIdentifierTests.swiftTests/ConduitTests/Core/ProtocolCompilationTests.swiftReplaced brittle hardcoded model/provider totals with invariant assertions based on current registry contents.
Updated
ProviderTypecase-iterable expectations to include newer providers (kimi,minimax) and current case count.Prevents unrelated provider catalog growth from causing false failing tests.
7) Compiler-safety cleanup
Files:
Sources/Conduit/Core/Types/GenerationSchema.swiftSources/Conduit/Core/Types/DeviceCapabilities.swiftGenerationSchema: changedEncodingError.invalidValue(Any, Context)toEncodingError.invalidValue(String, Context)to avoid strict-concurrency Sendable warnings aroundAnypayloads.DeviceCapabilities: replaced deprecated C-string conversion with safe UTF-8 byte decoding path.Verification
swift test-> PASS (352 tests in 60 suites)swift build-> PASSRisk Notes