[Help Wanted] Design review: spec-to-epic decomposition workflow with subgraphs, custom Python tools and local LLM

Hi! I'm building a spec decomposition pipeline (SDLC → epics/tasks) using ChatDev (DevAll) and a local model (qwopus 3.5 via OpenAI-compatible API).
I've implemented the Python tools and subgraph YAMLs, but I'm struggling with the correct wiring — the workflow either fails validation or the agents don't seem to invoke tools properly. I'd appreciate a design review rather than treating this as a bug.
What I'm trying to achieve
```
START → Spec Scanner (subgraph) → Task Extractor (agent + tool) → Decomposer (subgraph) → Quality Gate (agent) → [REVISE ↻|APPROVED →] Summarizer (agent + tool) → SnippetWriter (agent + tool) → [Loop Gate → loop or END]
```
**What I've done**

Custom Python tools in functions/:
- extract_tasks_from_markdown(folder_path, output_json)
- generate_context_summary(tasks_json, summary_file, project_root)
- write_snippets(folder_path, snippets_file)
Subgraphs:
- subgraphs/spec_scanner.yaml — scans markdown files
- subgraphs/reflexion_loop.yaml — iterative decomposition
Main workflow (see below) 



Where I need help
- Subgraph syntax — I used type: subgraph with nested config.type: file + config.path. I suspect the correct field is config.graph_path at the top level of config. Can someone confirm the exact schema?
 
- Tooling / function calling — My tools are in functions/ and seem to load, but the agent either:

 hallucinates the arguments, or the tool result is not fed back into the agent context.

 Is the tooling block above the right shape? Should I use type: function with auto_load: true or explicitly reference the module path? Any gotchas with local / non-OpenAI models and tool schemas?

- Passing variables into prompts — I hardcoded <PROJECT_ROOT> inside the role prompt, expecting it to be substituted from the input payload. What is the correct interpolation syntax? {{project_root}}? ${inputs.project_root}? Or should I use a literal / variable node to inject it into the context?
Loop timer semantics — Loop Gate has two outgoing edges with condition: 'true' (one to Spec Scanner, one to END). My intent is timer not expired 

-


[markdown_task_extractor.py](https://github.com/user-attachments/files/28513657/markdown_task_extractor.py)

```
version: 0.0.0
vars:
  MODEL_NAME: qwopus
  BASE_URL: http://192.168.9.2:1113/v1
  API_KEY: dsa
  SNIPPETS_FILE: docs/snippets.md
graph:
  id: SDLCSpecs_v2
  description: SDLCSpecs v2 — scans project specs, extracts tasks, decomposes with LLM  reflexion loop, generates summaries and snippets. Supports single, batch,  and nightly continuous modes.
  log_level: DEBUG
  is_majority_voting: false
  nodes:
    - id: START
      type: passthrough
      config:
        only_last_message: true
      description: ''
      context_window: 0
    - id: END
      type: passthrough
      config:
        only_last_message: true
      description: ''
      context_window: 0
    - id: Spec Scanner
      type: subgraph
      config:
        type: file
        config:
          path: subgraphs/spec_scanner.yaml
      description: Scan project files and aggregate spec content
      context_window: 0
    - id: Decomposer
      type: subgraph
      config:
        type: file
        config:
          path: subgraphs/reflexion_loop.yaml
      description: Reflexion loop for iterative task decomposition
      context_window: 0
    - id: Loop Gate
      type: loop_timer
      config:
        max_duration: 3600
        duration_unit: seconds
        reset_on_emit: true
        message: Nightly run complete — starting next cycle
        passthrough: false
      description: Timer gate for nightly continuous mode
      context_window: 0
    - id: Task Extractor
      type: agent
      config:
        name: ${MODEL_NAME}
        provider: openai
        base_url: ${BASE_URL}
        api_key: ${API_KEY}
        role: |-
          You are a task extractor in the SDLCSpecs pipeline.
          You receive aggregated spec scan results (files with their content). Your job is to extract all actionable tasks, TODOs, and specifications.
          Use the `extract_tasks_from_markdown` tool to extract tasks from  markdown files in the project at the provided folder path.
          The project root path is in the "project_root" field of the input.
          Steps: 1. Call extract_tasks_from_markdown(folder_path="<PROJECT_ROOT>", 
             output_json="docs/tasks/extracted_tasks.json")
          2. Review the extracted tasks 3. Identify missing tasks that weren't caught by regex 4. Output: the full extracted tasks JSON plus your additional findings
          Return ONLY the result. No extra text.
        tooling:
          - type: function
            config:
              auto_load: true
              tools:
                - name: extract_tasks_from_markdown
        thinking: null
        memories: []
        skills: null
        retry:
          enabled: true
          max_attempts: 2
          min_wait_seconds: 1
          max_wait_seconds: 5
      description: Extract tasks from scanned spec files
      context_window: 0
      log_output: true
    - id: Quality Gate
      type: agent
      config:
        name: ${MODEL_NAME}
        provider: openai
        base_url: ${BASE_URL}
        api_key: ${API_KEY}
        role: |-
          ### Role: You are a "Quality Inspector" in the SDLCSpecs pipeline.

          ### Context:
          You receive the decomposition output from the Reflexion loop.
          Your job is to evaluate the quality of the decomposition.

          ### Evaluation Criteria:
          1. Completeness: Are all extracted tasks covered?
          2. Granularity: Are tasks decomposed to actionable size?
          3. Clarity: Are task descriptions clear and unambiguous?
          4. Consistency: No conflicting or overlapping tasks?

          ### Decision:
          - If quality is acceptable (score >= 7/10):
            Output: VERDICT: APPROVED
          - If quality needs improvement (score < 7/10):
            Output: VERDICT: REVISE
            Followed by specific improvement suggestions.

          ### Output format:
          Score: <0-10>
          Issues: <list of issues if any>
          VERDICT: APPROVED|REVISE
        params:
          temperature: 0.1
          max_tokens: 500
        tooling: []
        thinking: null
        memories: []
        skills: null
        retry: null
      description: Evaluate decomposition quality
      context_window: 0
      log_output: true
    - id: SnippetWriter
      type: agent
      config:
        name: ${MODEL_NAME}
        provider: openai
        base_url: ${BASE_URL}
        api_key: ${API_KEY}
        role: |-
          You are a snippet writer in the SDLCSpecs pipeline.
          The project root path is available from the input in the "project_root" field.
          Call write_snippets(
            folder_path="<PROJECT_ROOT>",
            snippets_file="docs/snippets.md"
          ) where <PROJECT_ROOT> is the value from the input's project_root field.
          Return ONLY the function result. No extra text.
        tooling:
          - type: function
            config:
              auto_load: true
              tools:
                - name: write_snippets
        thinking: null
        memories: []
        skills: null
        retry: null
      description: Write code snippets based on analysis
      context_window: 0
      log_output: true
    - id: Summarizer
      type: agent
      config:
        name: ${MODEL_NAME}
        provider: openai
        base_url: ${BASE_URL}
        api_key: ${API_KEY}
        role: |-
          You are a summary generator in the SDLCSpecs pipeline.
          The project root path is available from the input in the "project_root" field.
          Call generate_context_summary(
            tasks_json="docs/tasks/decomposed_tasks.json",
            summary_file="docs/tasks/context_summary.md",
            project_root="<PROJECT_ROOT>"
          ) where <PROJECT_ROOT> is the value from the input's project_root field.
          Return ONLY the function result. No extra text.
        tooling:
          - type: function
            config:
              auto_load: true
              tools:
                - name: generate_context_summary
        thinking: null
        memories: []
        skills: null
        retry: null
      description: Generate context summary from decomposed tasks
      context_window: 0
      log_output: true
  edges:
    - from: START
      to: Spec Scanner
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: Spec Scanner
      to: Task Extractor
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: Task Extractor
      to: Decomposer
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: Decomposer
      to: Quality Gate
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: Quality Gate
      to: Decomposer
      trigger: true
      condition:
        type: keyword
        config:
          any:
            - REVISE
          none: []
          regex: []
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: Quality Gate
      to: Summarizer
      trigger: true
      condition:
        type: keyword
        config:
          any:
            - APPROVED
          none: []
          regex: []
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: Summarizer
      to: SnippetWriter
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: SnippetWriter
      to: Loop Gate
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true
      clear_context: false
      clear_kept_context: false
    - from: Loop Gate
      to: Spec Scanner
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: false
      clear_context: false
      clear_kept_context: false
    - from: Loop Gate
      to: END
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: false
      clear_context: false
      clear_kept_context: false
  memory: []
  initial_instruction: 'Run SDLCSpecs v2 pipeline: scan project specs, extract tasks, decompose with  LLM reflexion loop, validate quality, generate summary and snippets. Supports single-run (no timer) or nightly (timer-based loop).'
  start:
    - START
  end:
    - END

```
```
graph:
  id: spec_scanner
  description: Scans project files for specifications, reads them in parallel, and aggregates results.
  log_level: DEBUG
  is_majority_voting: false
  nodes:
    - id: START
      type: passthrough
      config:
        only_last_message: true
      description: ''
      context_window: 0
    - id: END
      type: passthrough
      config:
        only_last_message: true
      description: ''
      context_window: 0
    - id: File Finder
      type: agent
      config:
        name: ${MODEL_NAME}
        provider: openai
        role: |-
          You are a file discovery agent. The user provides a project root path.

          Use the `code_executor` tool to run Python code to find files. Example code:

          ```python
          import os
          root = "/path/to/project"
          skip_dirs = {"node_modules", ".git", "__pycache__", "dist", "build", ".venv"}
          for dirpath, dirnames, filenames in os.walk(root):
              dirnames[:] = [d for d in dirnames if d not in skip_dirs]
              for f in filenames:
                  if f.endswith(('.md', '.yaml', '.yml')):
                      print(os.path.join(dirpath, f))
          ```

          After running the code, collect the results and output:
          <file>: /path/to/file1.md
          <file>: /path/to/file2.yaml

          If no files found, output exactly: <no files found>

          **TERMINATION CONDITION: Stop after finding up to 50 files. If more than 50 files are found, output exactly: "Terminated: Found more than 50 files, showing first 50 only." Do not continue scanning further.**
        base_url: ${BASE_URL}
        api_key: ${API_KEY}
        params:
          max_rounds: 10
          max_tool_calls: 20
        tooling:
          - type: function
            config:
              tools:
                - name: file:All
                - name: code_executor:All
              timeout: null
            prefix: ''
        thinking: null
        memories: []
        retry:
          enabled: true
          max_attempts: 2
          min_wait_seconds: 1
          max_wait_seconds: 5
      description: Finds all spec-related files in the project
      context_window: 0
    - id: File Reader
      type: agent
      config:
        name: ${MODEL_NAME}
        provider: openai
        role: |-
          ### Role: You are a "File Content Reader" in the SDLCSpecs system.

          ### Context:
          You receive a single file path. Read its content and extract structured info.

          ### Task:
          1. Read the file using available tools
          2. Identify spec type:
             - .md: extract sections, headings, task descriptions, TODO items
             - .yaml/.yml: identify workflow structure, nodes, edges
             - .py: identify functions, classes, docstrings, TODO comments

          ### Output format:
          File: <path>
          Type: <md|yaml|py|other>
          Summary: <1-2 sentence summary>
          Key Items:
          - <key finding 1>

          Content:
          ```
          <full file content>
          ```
        base_url: ${BASE_URL}
        api_key: ${API_KEY}
        params: {}
        tooling:
          - type: function
            config:
              tools:
                - name: file:All
                - name: code_executor:All
              timeout: null
            prefix: ''
        thinking: null
        memories: []
        retry: null
      description: Reads and summarizes a single spec file
      context_window: 0
    - id: Content Aggregator
      type: agent
      config:
        name: ${MODEL_NAME}
        provider: openai
        role: |-
          ### Role: You are a "Content Aggregation Specialist" in the SDLCSpecs system.

          ### Context:
          You receive multiple file reading reports. Combine them into one overview.

          ### Task:
          1. Collect all incoming file reports
          2. Group files by type (md, yaml, py)
          3. Produce a final aggregated document

          ### Output format:

          # Scan Results

          ## Summary
          Total files scanned: <count>
          Files by type:
          - Markdown (.md): <count>
          - YAML (.yaml): <count>
          - Python (.py): <count>

          ## Files

          ### <path>
          - Type: <type>
          - Summary: <summary>
          - Key Items:
            - <item 1>

          ...
        base_url: ${BASE_URL}
        api_key: ${API_KEY}
        params: {}
        tooling: []
        thinking: null
        memories: []
        retry: null
      description: Aggregates all file reading results
      context_window: 0
  edges:
    - from: START
      to: File Finder
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true

    - from: File Finder
      to: File Reader
      trigger: true
      condition:
        type: keyword
        config:
          any:
            - '<file>:'
          none: []
          regex: []
      carry_data: true
      keep_message: false
      dynamic:
        type: map
        split:
          type: regex
          config:
            pattern: <file>:\s*(.*)
        config:
          max_parallel: 10

    - from: File Finder
      to: END
      trigger: true
      condition:
        type: keyword
        config:
          any: []
          none:
            - '<file>:'
          regex: []
      carry_data: true
      keep_message: true

    - from: File Reader
      to: Content Aggregator
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true

    - from: Content Aggregator
      to: END
      trigger: true
      condition: 'true'
      carry_data: true
      keep_message: true

  memory: []
  initial_instruction: ''
  start:
    - START
  end:
    - END
version: 0.0.0
vars:
  MODEL_NAME: qwopus

```
```
ersion: 0.4.0
graph:
  id: reflexion_loop
  description: Reflexion loop subgraph with actor/evaluator and memory storage.
  log_level: INFO
  is_majority_voting: false
  start:
    - Task
  end:
  - Final Synthesizer
  memory:
    - name: reflexion_blackboard
      type: blackboard
      config:
        max_items: 500
  nodes:
  - id: Task
    type: passthrough
    config: {}
  - id: Reflexion Actor
    type: agent
    description: Actor (πθ) generates a strategy draft based on blackboard experience and short-term context.
    config:
      provider: openai
      base_url: ${BASE_URL}
      api_key: ${API_KEY}
      name: qwopus
      input_mode: messages
      role: |
        You are the Actor. If there are relevant memories, refer to that experience and output the latest action draft; if there are no relevant memories, provide an action draft to the best of your ability.
        - Structure:
          Thought: ...
          Draft: ...
      memories:
      - name: reflexion_blackboard
        retrieve_stage:
        - gen
        top_k: 5
        read: true
        write: false
      params:
        temperature: 0.2
        max_tokens: 1200
  - id: Reflexion Evaluator
    type: agent
    description: Evaluator (Me) provides scores and improvement directions for the Actor's draft.
    config:
      provider: openai
      base_url: ${BASE_URL}
      api_key: ${API_KEY}
      name: qwopus
      input_mode: messages
      role: |
        You are the Evaluator. Receive and read the Actor's latest output and task objectives, and evaluate whether they meet the goals.
        Append `Verdict: CONTINUE` or `Verdict: STOP` at the end of the output.
        When you think the current plan is good enough, you should give `Verdict: STOP`. Other fields can be skipped.
        Output:
        - Score: <0-1>
        - Reason: <Failure reasons or highlights>
        - Next Focus: <Key points to focus on in the next round>
        - Verdict: CONTINUE|STOP
      params:
        temperature: 0.1
        max_tokens: 800
  - id: Self Reflection Writer
    type: agent
    description: Self-Reflection (Msr) converts Evaluator results into reusable experience.
    config:
      provider: openai
      base_url: ${BASE_URL}
      api_key: ${API_KEY}
      name: qwopus
      input_mode: messages
      role: |
        You are responsible for refining the Evaluator output and Actor Draft into JSON experience:
        {
          "issues": [..],
          "fix_plan": [..],
          "memory_cue": "A short reminder"
        }
        - JSON must not contain extra text.
      memories:
      - name: reflexion_blackboard
        read: false
        write: true
      params:
        temperature: 0.1
        max_tokens: 500
  - id: Final Synthesizer
    type: agent
    description: Converge the final answer, absorbing the latest Draft and Evaluator tips.
    config:
      provider: openai
      base_url: ${BASE_URL}
      api_key: ${API_KEY}
      name: qwopus
      input_mode: messages
      role: |
        Please synthesize all inputs and provide a final answer. Be comprehensive. Do not include any extra text other than the final answer.
      params:
        temperature: 0.1
        max_tokens: 1000
  edges:
  - from: Task
    to: Reflexion Actor
    keep_message: True
  - from: Task
    to: Reflexion Evaluator
    keep_message: True
    trigger: false
  - from: Reflexion Actor
    to: Reflexion Actor
    trigger: false
  - from: Reflexion Actor
    to: Reflexion Evaluator
  - from: Reflexion Evaluator
    to: Self Reflection Writer
    condition: need_reflection_loop
  - from: Self Reflection Writer
    to: Reflexion Actor
    carry_data: true
  - from: Reflexion Actor
    to: Final Synthesizer
    trigger: false
  - from: Reflexion Evaluator
    to: Final Synthesizer
    condition: should_stop_loop
    carry_data: false
````
 loop, timer expired → exit. Will both edges fire? What is the canonical pattern for a conditional loop break?
Context window — I set context_window: 0 everywhere. Does this strip all previous messages, making agents "forget" prior step outputs? What is a sensible default for a multi-step agent pipeline?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Help Wanted] Design review: spec-to-epic decomposition workflow with subgraphs, custom Python tools and local LLM #631

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Help Wanted] Design review: spec-to-epic decomposition workflow with subgraphs, custom Python tools and local LLM #631

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions