Skip to content

Data race in AdaptiveNonLocalMeansDenoisingImageFilter ThreadedGenerateData causes flaky test under parallel load #6419

@hjmjohnson

Description

@hjmjohnson

Data race in AdaptiveNonLocalMeansDenoisingImageFilter::ThreadedGenerateData (scatter write to m_RicianBiasImage)

itk::AdaptiveNonLocalMeansDenoisingImageFilter produces nondeterministic output under multi-threaded execution because ThreadedGenerateData scatter-writes a shared image at neighborhood-offset indices that cross work-unit boundaries.

Root cause

In Modules/Filtering/AdaptiveDenoising/include/itkAdaptiveNonLocalMeansDenoisingImageFilter.hxx, ThreadedGenerateData computes, for each center pixel, indices neighborhoodPatchIndex = centerIndex + neighborhoodPatchOffsetList[n] and writes the shared bias image at those neighbor locations:

IndexType neighborhoodPatchIndex = centerIndex + neighborhoodPatchOffsetList[n];
...
this->m_RicianBiasImage->SetPixel(neighborhoodPatchIndex, 0.0);
...
this->m_RicianBiasImage->SetPixel(neighborhoodPatchIndex, minimumDistance);

Because each work unit owns a contiguous output region but writes to centerIndex ± patchOffset, the written pixels fall into neighboring work units' regions. Multiple work units therefore perform unsynchronized SetPixel on the same shared m_RicianBiasImage pixels — a data race. The corrupted bias image then feeds the output in AfterThreadedGenerateData.

The race is benign at low thread contention (it almost never interleaves badly), so the test passes when run in isolation. Under CPU oversubscription — e.g. ctest -j8, where each of several concurrent tests spawns a full thread pool — preemption widens the race window and the output diverges from the baseline.

Reproduction

AdaptiveNonLocalMeansDenoisingImageFilterTest1 flakes under parallel load:

Condition Result
default work units, ctest -j8 fails 50-100% of runs (ImageError != 0)
filter->SetNumberOfWorkUnits(1), ctest -j8 passes 100% (6/6), deterministic
AdaptiveNonLocalMeansDenoisingImageFilterTest1 alone passes

The current Baseline/r16denoised_*.nrrd baselines match the single-threaded (race-free) result, confirming the multi-threaded path is the one that is wrong.

Suggested fix (proper)

The scatter write should be made race-free in the filter, e.g. one of:

  • Reformulate as a gather: for each owned output pixel, read its contributing neighbors instead of scatter-writing neighbors.
  • Accumulate into per-work-unit buffers and merge in AfterThreadedGenerateData.
  • Restrict each work unit to write only pixels within its own output region.

Interim mitigation

A companion test-only change pins the test filter to a single work unit for determinism (SetNumberOfWorkUnits(1)), so CI stops flaking while the filter fix is pending. PR to follow; this issue tracks the underlying filter race.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions