Conversation
etr
left a comment
There was a problem hiding this comment.
Thanks! There is a lot of goodness here. Just one minor comment on the split of the task file.
| Iterate until user approves, then write to `specs/tasks.md`. | ||
| Iterate until user approves, then write task files to `specs/tasks/`. | ||
|
|
||
| **Output structure:** |
There was a problem hiding this comment.
Did you notice any reduction in quality by splitting the tasks? In the past, I've noticed the agent exploring the tasks file to look for future tasks to understand whether it was the right/wrong time to do something.
Not sure if that less/more/equally effective in multiple files - I don't have evidence either way, but curious if you had done some testing on it.
There was a problem hiding this comment.
The next is written entirely by me, no AI involved :)
First of all: congratulations for the great plugin. It matches my workflow and quality requirements the best so far. The TDD and the Design System approaches are really great.
I have a "benchmark" setup where I evaluated stock claude code, GSD, superpowers, groundwork and my own variation named Forge with "best" feature of each (I still have on my list to evaluate: Paul, agent-os, BMAD-Method, claude-conductor, OpenSpec). So far your plugin gave me the best results but there are some things that I believe can be improved.
The main issues I am trying to solve are:
-
context management for the orchestrator: growing context is probably the biggest factor that seems to affect output quality, therefore I try to keep it as small as possible:
- since tasks can grow unbounded, splitting tasks in separate files and keeping a status table in the _index.md. this way only the relevant task is loaded into context and lookup is fast.
- another idea is to have the plan agent not dump it's output back to the orchestrator, but rather work with plan files and the orchestrator just gets the file path. This is a bigger refactor since the execute-task skill validates the plan, so it would still read the file...would need to find another solution, either the plan agent validates the plan itself or maybe a plan-validator agent could to that. the goal is to have the orchestrator context slim
-
tests quality
- this was always my biggest pain, claude code was not following TDD, cutting corners, lying that tests are passing when they were failing :D. Your plugin is amazingly good in enforcing TDD, but still the test quality is not that good. Probably resembles a lot the training data since not many projects have good tests :D
- so I added more guidance regarding the good and bad testing practices.
There was a problem hiding this comment.
The next is written entirely by me, no AI involved :)
So fun to live at a time where we have to say this :)
I have a "benchmark" setup where I evaluated stock claude code, GSD, superpowers, groundwork and my own variation named Forge with "best" feature of each (I still have on my list to evaluate: Paul, agent-os, BMAD-Method, claude-conductor, OpenSpec). So far your plugin gave me the best results but there are some things that I believe can be improved.
Thanks :). I am happy it is helping you and yeah, there are definitely a few things I think could benefit from improving.
since tasks can grow unbounded, splitting tasks in separate files and keeping a status table in the _index.md. this way only the relevant task is loaded into context and lookup is fast.
Got it - that makes sense and answers my question. The real struggle appears to constantly be "how to strike a balance between more information and context rot" - I think we will all have to fight this for a while, but your intuition of splitting data and using agents is what I believe is the right direction.
another idea is to have the plan agent not dump it's output back to the orchestrator, but rather work with plan files and the orchestrator just gets the file path. This is a bigger refactor since the execute-task skill validates the plan, so it would still read the file...would need to find another solution, either the plan agent validates the plan itself or maybe a plan-validator agent could to that. the goal is to have the orchestrator context slim
I think this is where I had started from and had to back away for some reason (my context, just like the agent's has rotten, I suspect), so exploring this direction might be a smart thing to do.
so I added more guidance regarding the good and bad testing practices.
Those are very helpful. I am testing locally a new agent to be added to the validation loop that checks specifically the quality of tests so I can give it a better understanding of what good versus bad tests are.
Summary
specs/tasks.mdfile to aspecs/tasks/directory with per-milestonesubdirectories (
M1-core-auth/,M2-upload/), individualTASK-NNN.mdfiles, a centralized_index.mdwith a status table, and aparking-lot.mdfor deferred tasks. Updates
tasks,next-task,execute-task, andtask-validation-loopskills to work with the new structure.build-unplanned-feature,execute-task, andproduct-designskills now load existing specs(architecture, design system, PRD) before planning, so implementations follow established patterns and contradictions are caught early.
testing-anti-patterns.mdreference into the main TDD skill with concrete code examplesand decision gates. Removes the standalone file.
fixes can be applied directly.
next-tasknow reads only_index.mdstatus table instead of aggregating all task files.