Skip to content

Add eval for replacing stream forEach mutation with result-producing collectors #42

Description

@martinfrancois

Problem

Add a Java Streams skill eval for refactoring stream pipelines that look stream-based but still mutate external collections or maps with forEach(...).

The lesson is not that every forEach(...) is wrong. The lesson is that when the operation is to build a returned list or map, the stream should normally produce that result directly with Stream.concat(...), toList(), or a collector instead of mutating a collection or map that lives outside the pipeline.

This is eval-worthy because the original code is correct, but the stream usage is misleading: the stream does transformation work and then hands accumulation back to external mutable state.

Code before the prompt was executed

The first method built a list of configured and environment-provided paths by mutating an external list from two stream terminal operations, then streaming the list again to remove duplicates.

private List<Path> additionalWritableRoots(Path workflowDirectory, Map<String, Object> codex) {
    List<Path> roots = new ArrayList<>();
    list(codex, "additional_writable_roots", List.of()).stream()
            .map(value -> path(workflowDirectory, value))
            .forEach(roots::add);
    environmentValue("SYMPHONY_CODEX_ADDITIONAL_WRITABLE_ROOTS").stream()
            .flatMap(value -> Arrays.stream(value.split(Pattern.quote(File.pathSeparator))))
            .map(String::trim)
            .filter(value -> !value.isBlank())
            .map(value -> path(workflowDirectory, value))
            .forEach(roots::add);
    return roots.stream().distinct().toList();
}

The second method counted normalized states by mutating an external map from a stream terminal operation.

private Map<String, Integer> runningCountsByState() {
    Map<String, Integer> counts = new HashMap<>();
    running.values().stream()
            .map(entry -> StateNames.normalize(entry.card.state()))
            .forEach(state -> counts.merge(state, 1, Integer::sum));
    return counts;
}

Prompt that caused the implementation

A maintainer asked to continue the existing cleanup plan after merging the skill-installation PR. The active work item was to sweep the Java sources and apply the Java Streams skill where it made code meaningfully better.

Relevant instruction:

continue with our plan

The plan item was to use the installed Java Optional and Java Streams skills to clean up production code where the skill guidance identifies a contained, behavior-preserving improvement.

Later prompt that exposed the issue

There was no later maintainer correction for these two methods. The cleanup was found by applying the Streams skill during the planned source sweep.

Prompt-produced code before maintainer correction

There was no separate rejected intermediate implementation. The code in the first section is the pre-refactor code being corrected.

Why the old code is bad

The old code is not bad because it produces the wrong result. It is bad because the stream pipeline does not own the result it is building.

  1. forEach(roots::add) and forEach(state -> counts.merge(...)) mutate external structures.
  2. The stream is doing transformation work, but accumulation happens outside the pipeline.
  3. The list method allocates a mutable intermediate list only so it can stream it again for distinct().
  4. The map method hides the counting operation behind a side-effecting merge lambda.
  5. The code makes the reader audit external mutation when a result-producing terminal operation can state the intent directly.

Maintainer-preferred code

The list-building method should let the stream produce the final list. The two ordered sources are named and then concatenated before duplicate removal.

private List<Path> additionalWritableRoots(Path workflowDirectory, Map<String, Object> codex) {
    Stream<Path> configuredRoots = list(codex, "additional_writable_roots", List.of()).stream()
            .map(value -> path(workflowDirectory, value));
    Stream<Path> environmentRoots = environmentValue("SYMPHONY_CODEX_ADDITIONAL_WRITABLE_ROOTS").stream()
            .flatMap(value -> Arrays.stream(value.split(Pattern.quote(File.pathSeparator))))
            .map(String::trim)
            .filter(value -> !value.isBlank())
            .map(value -> path(workflowDirectory, value));
    return Stream.concat(configuredRoots, environmentRoots).distinct().toList();
}

The counting method should use a collector with explicit duplicate-key merge behavior and an explicit mutable map type.

private Map<String, Integer> runningCountsByState() {
    return running.values().stream()
            .map(entry -> StateNames.normalize(entry.card.state()))
            .collect(Collectors.toMap(state -> state, state -> 1, Integer::sum, HashMap::new));
}

Why the replacement is better

The replacement makes each stream pipeline produce the requested value directly.

For the list case:

  • configuredRoots and environmentRoots name the two ordered sources.
  • Stream.concat(...) preserves the original precedence order: configured roots first, environment roots second.
  • distinct() still removes duplicates using encounter order.
  • toList() returns the same unmodifiable result shape as the old final roots.stream().distinct().toList().
  • The unnecessary mutable ArrayList disappears.

For the map case:

  • Collectors.toMap(...) states that the result is a map from normalized state to count.
  • The merge function Integer::sum states how duplicate states are counted.
  • HashMap::new preserves the previous concrete map behavior where callers received a mutable HashMap-style result.
  • The external side-effecting counts.merge(...) lambda disappears.

Desired eval behavior

  • Reward identifying stream().forEach(...) external collection or map mutation as a refactor target when the goal is a returned value.
  • Reward using Stream.concat(...) when two ordered sources should be combined before later stream operations.
  • Reward preserving encounter order and duplicate handling when replacing list mutation with a stream pipeline.
  • Reward Collectors.toMap(...) with a merge function when duplicate keys are expected.
  • Reward preserving result mutability or concrete map behavior when surrounding code may rely on it.
  • Reward explaining that the old code is functionally correct but less clear and less idiomatic.

Anti-patterns the eval should reject

  • Replacing the mutable list with two separate lists and concatenating them imperatively.
  • Keeping forEach(roots::add) or forEach(state -> counts.merge(...)) as the final answer.
  • Using Collectors.toMap(...) without a merge function for data that can contain duplicate keys.
  • Changing the list result from ordered first occurrence to an unordered set.
  • Changing the map result to an immutable map if the original method returned a mutable map and mutability was not audited.
  • Adding parallelStream() or concurrent collectors without measured CPU-heavy work and a concurrency requirement.
  • Claiming the direct collector version is guaranteed faster instead of describing it as a readability and side-effect-boundary improvement.

Suggested eval name

collector-owned-result-instead-of-foreach-mutation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions