Improve task directory structure, design system integration, and TDD guidance by codingarchitect-wq · Pull Request #2 · etr/groundwork

codingarchitect-wq · 2026-03-13T17:40:20Z

Summary

Tasks directory restructure: Changes task storage from a single specs/tasks.md file to a specs/tasks/ directory with per-milestone
subdirectories (M1-core-auth/, M2-upload/), individual TASK-NNN.md files, a centralized _index.md with a status table, and a parking-lot.md
for deferred tasks. Updates tasks, next-task, execute-task, and task-validation-loop skills to work with the new structure.
Design system & spec context loading: build-unplanned-feature, execute-task, and product-design skills now load existing specs
(architecture, design system, PRD) before planning, so implementations follow established patterns and contradictions are caught early.
TDD anti-patterns consolidated: Inlines the separate testing-anti-patterns.md reference into the main TDD skill with concrete code examples
and decision gates. Removes the standalone file.
Validation loop TDD enforcement: Behavioral fixes discovered during validation must now follow TDD (write failing test first), while cosmetic
fixes can be applied directly.
Next-task optimization: next-task now reads only _index.md status table instead of aggregating all task files.
Product design flow: Adds "Create tasks" as a next-step option after PRD updates.

…ned to respect spec, arch and design

etr

Thanks! There is a lot of goodness here. Just one minor comment on the split of the task file.

etr · 2026-03-14T19:02:19Z

skills/tasks/SKILL.md

-Iterate until user approves, then write to `specs/tasks.md`.
+Iterate until user approves, then write task files to `specs/tasks/`.
+
+**Output structure:**


Did you notice any reduction in quality by splitting the tasks? In the past, I've noticed the agent exploring the tasks file to look for future tasks to understand whether it was the right/wrong time to do something.

Not sure if that less/more/equally effective in multiple files - I don't have evidence either way, but curious if you had done some testing on it.

The next is written entirely by me, no AI involved :)

First of all: congratulations for the great plugin. It matches my workflow and quality requirements the best so far. The TDD and the Design System approaches are really great.

I have a "benchmark" setup where I evaluated stock claude code, GSD, superpowers, groundwork and my own variation named Forge with "best" feature of each (I still have on my list to evaluate: Paul, agent-os, BMAD-Method, claude-conductor, OpenSpec). So far your plugin gave me the best results but there are some things that I believe can be improved.

The main issues I am trying to solve are:

context management for the orchestrator: growing context is probably the biggest factor that seems to affect output quality, therefore I try to keep it as small as possible:

since tasks can grow unbounded, splitting tasks in separate files and keeping a status table in the _index.md. this way only the relevant task is loaded into context and lookup is fast.

another idea is to have the plan agent not dump it's output back to the orchestrator, but rather work with plan files and the orchestrator just gets the file path. This is a bigger refactor since the execute-task skill validates the plan, so it would still read the file...would need to find another solution, either the plan agent validates the plan itself or maybe a plan-validator agent could to that. the goal is to have the orchestrator context slim

tests quality

this was always my biggest pain, claude code was not following TDD, cutting corners, lying that tests are passing when they were failing :D. Your plugin is amazingly good in enforcing TDD, but still the test quality is not that good. Probably resembles a lot the training data since not many projects have good tests :D

so I added more guidance regarding the good and bad testing practices.

The next is written entirely by me, no AI involved :)

So fun to live at a time where we have to say this :)

I have a "benchmark" setup where I evaluated stock claude code, GSD, superpowers, groundwork and my own variation named Forge with "best" feature of each (I still have on my list to evaluate: Paul, agent-os, BMAD-Method, claude-conductor, OpenSpec). So far your plugin gave me the best results but there are some things that I believe can be improved.

Thanks :). I am happy it is helping you and yeah, there are definitely a few things I think could benefit from improving.

since tasks can grow unbounded, splitting tasks in separate files and keeping a status table in the _index.md. this way only the relevant task is loaded into context and lookup is fast.

Got it - that makes sense and answers my question. The real struggle appears to constantly be "how to strike a balance between more information and context rot" - I think we will all have to fight this for a while, but your intuition of splitting data and using agents is what I believe is the right direction.

another idea is to have the plan agent not dump it's output back to the orchestrator, but rather work with plan files and the orchestrator just gets the file path. This is a bigger refactor since the execute-task skill validates the plan, so it would still read the file...would need to find another solution, either the plan agent validates the plan itself or maybe a plan-validator agent could to that. the goal is to have the orchestrator context slim

I think this is where I had started from and had to back away for some reason (my context, just like the agent's has rotten, I suspect), so exploring this direction might be a smart thing to do.

so I added more guidance regarding the good and bad testing practices.

Those are very helpful. I am testing locally a new agent to be added to the validation loop that checks specifically the quality of tests so I can give it a better understanding of what good versus bad tests are.

codingarchitect-wq added 4 commits March 11, 2026 22:05

improve TDD skill

f02ce9e

improve tasks dir structure

ea2ecca

split the tasks in folders and individual files, improve build-unplan…

d99362e

…ned to respect spec, arch and design

improve skills, design system, TDD loop

9947209

etr reviewed Mar 14, 2026

View reviewed changes

etr merged commit e2fead2 into etr:main Mar 14, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve task directory structure, design system integration, and TDD guidance #2

Improve task directory structure, design system integration, and TDD guidance #2
etr merged 4 commits intoetr:mainfrom
codingarchitect-wq:improve-tasks-dir-structure

codingarchitect-wq commented Mar 13, 2026

Uh oh!

etr left a comment

Uh oh!

etr Mar 14, 2026

Uh oh!

codingarchitect-wq Mar 14, 2026

Uh oh!

etr Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

codingarchitect-wq commented Mar 13, 2026

Summary

Uh oh!

etr left a comment

Choose a reason for hiding this comment

Uh oh!

etr Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

codingarchitect-wq Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

etr Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants