fix: Correct race conditions #1126
Conversation
| :project_use_test_preprocessor_tests => false, | ||
| :project_use_test_preprocessor_mocks => false | ||
| } | ||
| } |
There was a problem hiding this comment.
Remark: I ran a simple formatter to clean whitespaces since I had some stray whitespace in my original work, and it deleted some pre-existing whitespace.
|
@gavin-dunlap-luminar Thank you for this. You're absolutely correct about the preprocessor cacheing problem. This is a temporary addition of some simple cacheing while delta builds have been removed (a necessary consequence of slowly removing a reliance on the problematic Rake library — Rake is Ruby make). Delta builds only rebuilt those things that changed. We need to recreate this in 1.0.0+. That small bit of YAML cacheing was indeed incomplete given the new reality of parallel builds. Totally missed that. You have independently discovered one of the reasons we are removing Rake as a core library from Ceedling. It provided a great deal of infrastructure and features. But, as time has worn on it's become an anchor around the project's neck. The handy-dandy |
This PR addresses two race conditions that caused intermittent build failures when using Ceedling's parallel test builds.
Bug 1: Preprocessor Cache Race Condition
Problem: When multiple threads simultaneously processed the same source file for the first time, they would race to read/write the YAML cache file containing #include statement listings. This caused cache corruption and intermittent build failures.
Root Cause:
The preprocess_includes() method in preprocessinator.rb performed check-then-act operations (check if cache exists -> read OR extract -> write) without synchronization. Multiple threads could interleave these operations:
Thread A checks cache doesn't exist
Thread B checks cache doesn't exist
Thread A extracts includes and writes cache
Thread B extracts includes and writes cache (corrupting A's write)
Fix:
Added per-file mutex locking around the entire cache operation sequence. Each cache file gets its own mutex (stored in a thread-safe hash with mutex protection), ensuring only one thread can check/read/write a specific cache file at a time while still allowing parallel processing of different files.
Bug 2: FileList Lazy Evaluation Race Condition
Problem: Intermittent linker errors with "undefined reference" messages occurred when source files were missing from the build. Tests would fail sporadically (roughly 1 per 1000 runs for my workload.)
Root Cause:
Ruby's Rake::FileList uses lazy evaluation - glob patterns aren't expanded until the list is accessed. In configurator_builder.rb and file_path_collection_utils.rb, FileLists were created from glob patterns but not materialized before being shared across threads. When multiple threads accessed the same lazy FileList simultaneously during the BUILD phase, they would trigger concurrent pattern expansion, causing list corruption and resulting in missing source files during linking.
Fix:
Added .resolve() call in file_path_collection_utils.rb within the revise_filelist() method to force FileList expansion before returning the result
Added .resolve() calls in configurator_builder.rb to materialize FileLists immediately after creation, before any threading occurs
This ensures all file collections are fully materialized during the configuration phase before being used in the threaded build phases.