distbit0 · distbit0 · Jan 6, 2026 · Jan 6, 2026 · Jan 6, 2026 · Jan 6, 2026
diff --git a/INTERVIEW.md b/INTERVIEW.md
@@ -1,26 +0,0 @@
-# TODO 7: Multiline grouping approach (excluded from LLM body, protected from patches)
-
-> What exact on-disk syntax do you want for the multiline grouping section? Please provide a concrete before/after example (including where it sits relative to `---` front matter and the `# -- SCRATCHPAD` heading).
-
-after the front matter, before the scratchpad heading. it should be before any other content in the file after the front matter
-
-> How should the end of the multiline grouping section be detected (e.g., first blank line, next heading, a closing marker, end-of-file)? Can the grouping text itself contain blank lines?
-
-use explicit opening and closing marker syntax. if markdown frontmatter satisfies the criteria e.g. support for multiline content, use that. otherwise implement this in the way which makes the most sense / is the most aligned w/ good practice.
-
-> For legacy documents that still use a single-line `Grouping approach: ...` prefix, should the tool leave that line as-is, or migrate it to the new multiline format when writing the file? If migration is desired, should it happen only when `--grouping` is provided / user is prompted, or always?
-
-leave as is 
-
-> The current CLI prompt uses `input()` (single line). How do you want multiline grouping input to be entered (e.g., read until a lone `.` line, read until EOF, open $EDITOR, allow literal `\n` escapes, etc.)?
-
-yes figure out best/most simple but also useable way to support multiline input
-
-> Should the grouping section be preserved verbatim (whitespace/indentation), or normalized (trim lines, collapse spaces) before inserting into the prompt’s “Maintain the grouping approach: …” line?
-
-preserved verbatim 
-
-> Do you want the grouping section to be strictly immutable during patch application (i.e., patches only apply to the body after removing the grouping block), or should we also detect and error if a patch’s SEARCH text matches inside the grouping section?
-
-strictly immutable. it should be as if it didn't exist in the body at all. so if a search block matches it and nothing else, it results in an error > retry. there should not need to be any special handling logic for these cases. it simply isn't part of the document body for the purposes of search/replace or substitute blocks.
-

diff --git a/TODO.md b/TODO.md
@@ -1,7 +1,7 @@
 - ask model to provide snippets from integrate text for each find/replace, to clarify what it intended it to integrate
     - on fail, only ask the model to provide just that single failed block instead of asking it to provide all blocks again. this is only possible once the model returns what integrated text each block relates 
 - only ask for start and end lines of search block instead of exact text. if there are multiple matches, ask model to provide a sufficient number of of lines at start or end to narrow down options to a single match. ensure that the search block which the verification prompt sees is not affected by this, by populating the SEARCH section of the block given in the verification prompt with the matching text from the file, instead of only including the start and end lines provided by the model
-- modify so that it uses tool calling + hierarchical markdown parsing to avoid needing to ever send the entire document to the model, and instead allow the model to find the relevant section(s) (could often be more than one section which should be modified even to integrate a single piece of info) to modify, and then once it has found the sections, it provides the search/replace diffs
-    - ensure that the model uses arbitrarily nested md headers, to make this approach scalable instead of only e.g. using one level of headings
-- move logs outside of src/
-- put group strat into front matter
+
+
+
+- make sure prompts mention importance of 
diff --git a/src/SPEC.md b/src/SPEC.md
diff --git a/src/integrate_notes.py b/src/integrate_notes.py
@@ -201,7 +201,10 @@ def extract_grouping_section(body: str) -> tuple[GroupingSection | None, str]:
     while grouping_index < len(lines) and not lines[grouping_index].strip():
         grouping_index += 1
 
-    if grouping_index < len(lines) and lines[grouping_index].strip() == GROUPING_BLOCK_START:
+    if (
+        grouping_index < len(lines)
+        and lines[grouping_index].strip() == GROUPING_BLOCK_START
+    ):
         end_index = grouping_index + 1
         while end_index < len(lines) and lines[end_index].strip() != GROUPING_BLOCK_END:
             end_index += 1
@@ -242,7 +245,9 @@ def _format_grouping_block(grouping_text: str) -> str:
 
 
 def render_grouping_section(
-    grouping_text: str, existing_section: GroupingSection | None, preserve_existing: bool
+    grouping_text: str,
+    existing_section: GroupingSection | None,
+    preserve_existing: bool,
 ) -> str:
     if not grouping_text.strip():
         raise ValueError("Grouping approach cannot be empty.")
@@ -287,7 +292,7 @@ def prompt_for_grouping() -> str:
         f"{GROUPING_PREFIX} at the top of the document.\n"
         "Enter multiline text and finish with a single line containing only a '.'.\n"
         "Examples:\n"
-        "- Grouping approach: Group points according to what problem each idea/proposal/mechanism/concept addresses/are trying to solve, which you will need to figure out yourself based on context. Do not combine multiple goals/problems into one group. Keep goals/problems specific. Ensure groups are mutually exclusive and collectively exhaustive. Avoid overlap between group's goals/problems. sub-headings should be per-mechanism/per-solution i.e. according to which \"idea\"/solution each point relates to.\n"
+        '- Grouping approach: Group points according to what problem each idea/proposal/mechanism/concept addresses/are trying to solve, which you will need to figure out yourself based on context. Do not combine multiple goals/problems into one group. Keep goals/problems specific. Ensure groups are mutually exclusive and collectively exhaustive. Avoid overlap between group\'s goals/problems. sub-headings should be per-mechanism/per-solution i.e. according to which "idea"/solution each point relates to.\n'
         "- Group points according to what you think the most useful/interesting/relevant groupings are. Ensure similar, related and contradictory points are adjacent.\n"
         "Your input:\n"
     )
@@ -1391,7 +1396,9 @@ def integrate_notes(
     source_body, source_scratchpad = split_document_sections(source_content)
     grouping_section, working_body = extract_grouping_section(source_body)
 
-    resolved_grouping = grouping or (grouping_section.text if grouping_section else None)
+    resolved_grouping = grouping or (
+        grouping_section.text if grouping_section else None
+    )
     if not resolved_grouping:
         resolved_grouping = prompt_for_grouping()
         logger.info("Recorded new grouping approach from user input.")

diff --git a/src/integrate_notes_spec.py b/src/integrate_notes_spec.py
@@ -0,0 +1,287 @@
+from __future__ import annotations
+
+import argparse
+import random
+import sys
+from pathlib import Path
+from time import perf_counter
+
+from loguru import logger
+
+from spec_chunking import request_chunk_groups
+from spec_config import (
+    SCRATCHPAD_HEADING,
+    SpecConfig,
+    default_log_path,
+    load_config,
+    repo_root,
+)
+from spec_editing import request_and_apply_edits
+from spec_exploration import explore_until_checkout
+from spec_llm import create_openai_client
+from spec_logging import configure_logging
+from spec_markdown import (
+    build_document,
+    format_duration,
+    normalize_paragraphs,
+    split_document_sections,
+)
+from spec_notes import NoteRepository
+from spec_summary import SummaryService
+from spec_verification import VerificationManager, build_verification_prompt
+
+
+def parse_arguments() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Integrate scratchpad notes into a markdown repository (SPEC flow)."
+    )
+    parser.add_argument(
+        "--source", required=False, help="Path to the root markdown document."
+    )
+    parser.add_argument(
+        "--disable-verification",
+        action="store_true",
+        help="Disable verification prompts and background verification checks.",
+    )
+    return parser.parse_args()
+
+
+def resolve_source_path(provided_path: str | None) -> Path:
+    if provided_path:
+        path = Path(provided_path).expanduser().resolve()
+    else:
+        user_input = input("Enter path to the root markdown document: ").strip()
+        if not user_input:
+            raise ValueError("Document path is required to proceed.")
+        path = Path(user_input).expanduser().resolve()
+    if not path.exists():
+        raise FileNotFoundError(f"Source document not found at {path}.")
+    if not path.is_file():
+        raise ValueError(f"Source path {path} is not a file.")
+    return path
+
+
+def _select_sample_filenames(
+    repo: NoteRepository, reachable: list[Path], config: SpecConfig
+) -> list[str]:
+    candidates = []
+    for path in reachable:
+        if repo.is_index_note(path, config.index_filename_suffix):
+            continue
+        if repo.get_word_count(path) < config.granularity_sample_min_words:
+            continue
+        candidates.append(path.name)
+
+    if not candidates:
+        return []
+
+    if len(candidates) <= config.granularity_sample_size:
+        return candidates
+
+    return random.sample(candidates, config.granularity_sample_size)
+
+
+def _ensure_scratchpad_matches(
+    source_path: Path, expected_paragraphs: list[str]
+) -> tuple[str, list[str]]:
+    content = source_path.read_text(encoding="utf-8")
+    body, scratchpad = split_document_sections(content, SCRATCHPAD_HEADING)
+    paragraphs = normalize_paragraphs(scratchpad)
+    if paragraphs != expected_paragraphs:
+        raise RuntimeError(
+            "Scratchpad changed while integration was running; aborting to avoid data loss."
+        )
+    return body, paragraphs
+
+
+def _write_updated_files(
+    source_path: Path,
+    root_body: str,
+    remaining_paragraphs: list[str],
+    updated_files: dict[Path, str],
+    repo: NoteRepository,
+    summaries: SummaryService,
+) -> None:
+    for path, content in updated_files.items():
+        if path == source_path:
+            root_body = content
+        else:
+            path.write_text(content, encoding="utf-8")
+            repo.invalidate_content(path)
+            summaries.invalidate(path)
+
+    document = build_document(root_body, SCRATCHPAD_HEADING, remaining_paragraphs)
+    source_path.write_text(document, encoding="utf-8")
+    repo.set_root_body(root_body)
+
+
+def integrate_notes_spec(source_path: Path, disable_verification: bool) -> Path:
+    config = load_config(repo_root() / "config.json")
+    source_content = source_path.read_text(encoding="utf-8")
+    source_body, source_scratchpad = split_document_sections(
+        source_content, SCRATCHPAD_HEADING
+    )
+    scratchpad_paragraphs = normalize_paragraphs(source_scratchpad)
+
+    repo = NoteRepository(source_path, source_body, source_path.parent)
+    client = create_openai_client()
+    summaries = SummaryService(repo, client, config)
+    verification_manager = (
+        None if disable_verification else VerificationManager(client, source_path)
+    )
+
+    try:
+        if not scratchpad_paragraphs:
+            logger.info(
+                "No scratchpad notes to integrate; ensuring scratchpad heading remains present."
+            )
+            source_path.write_text(
+                build_document(source_body, SCRATCHPAD_HEADING, []),
+                encoding="utf-8",
+            )
+            return source_path
+
+        reachable = repo.iter_reachable_paths()
+        sample_filenames = _select_sample_filenames(repo, reachable, config)
+        chunk_groups = request_chunk_groups(
+            client, scratchpad_paragraphs, sample_filenames, config
+        )
+
+        remaining_indices = set(range(1, len(scratchpad_paragraphs) + 1))
+        total_chunks = len(chunk_groups)
+        chunks_completed = 0
+        integration_start = perf_counter()
+        current_body = source_body
+
+        for group in chunk_groups:
+            if any(index not in remaining_indices for index in group):
+                raise RuntimeError(
+                    "Chunk references paragraphs that were already integrated; aborting."
+                )
+            chunk_paragraphs = [scratchpad_paragraphs[index - 1] for index in group]
+            chunk_text = "\n\n".join(chunk_paragraphs)
+
+            expected_remaining = [
+                scratchpad_paragraphs[index - 1]
+                for index in sorted(remaining_indices)
+            ]
+            file_body, _ = _ensure_scratchpad_matches(
+                source_path, expected_remaining
+            )
+            if file_body != current_body:
+                raise RuntimeError(
+                    "Root document body changed while integration was running; aborting."
+                )
+
+            repo.set_root_body(current_body)
+            reachable = repo.iter_reachable_paths()
+            summary_map = summaries.get_summaries(reachable)
+
+            root_summary = summary_map[source_path]
+            root_headings = repo.get_headings(source_path)
+            root_links = repo.get_links(source_path)
+            root_link_summaries = [
+                (path, summary_map[path])
+                for path in root_links
+                if path in summary_map
+            ]
+
+            chunk_label = f"chunk {chunks_completed + 1}/{total_chunks}"
+            checkout_paths = explore_until_checkout(
+                client,
+                chunk_text,
+                source_path,
+                root_summary,
+                root_headings,
+                root_links,
+                root_link_summaries,
+                summary_map,
+                repo,
+                config,
+            )
+
+            checked_out_contents = {
+                path: repo.get_note_content(path) for path in checkout_paths
+            }
+            edit_application = request_and_apply_edits(
+                client,
+                chunk_text,
+                checked_out_contents,
+                checkout_paths,
+                chunk_label,
+            )
+
+            for path, content in edit_application.updated_contents.items():
+                if path == source_path:
+                    current_body = content
+            for path in edit_application.updated_contents:
+                if path != source_path:
+                    repo.invalidate_content(path)
+
+            for index in group:
+                remaining_indices.remove(index)
+            remaining_paragraphs = [
+                scratchpad_paragraphs[index - 1]
+                for index in sorted(remaining_indices)
+            ]
+
+            _write_updated_files(
+                source_path,
+                current_body,
+                remaining_paragraphs,
+                edit_application.updated_contents,
+                repo,
+                summaries,
+            )
+
+            if verification_manager is not None:
+                verification_prompt = build_verification_prompt(
+                    chunk_text,
+                    edit_application.patch_replacements,
+                    edit_application.duplicate_texts,
+                )
+                verification_manager.enqueue_prompt(
+                    verification_prompt,
+                    chunk_label,
+                    chunks_completed,
+                    total_chunks,
+                )
+
+            chunks_completed += 1
+            remaining_chunks = total_chunks - chunks_completed
+            if remaining_chunks > 0:
+                elapsed_seconds = perf_counter() - integration_start
+                average_duration = elapsed_seconds / chunks_completed
+                estimated_seconds_remaining = average_duration * remaining_chunks
+                logger.info(
+                    f"Estimated time remaining: {format_duration(estimated_seconds_remaining)}"
+                    f" for {remaining_chunks} remaining chunk(s)."
+                )
+
+        logger.info("All scratchpad notes integrated; scratchpad section cleared.")
+        return source_path
+    finally:
+        summaries.shutdown()
+        if verification_manager is not None:
+            verification_manager.shutdown()
+
+
+def main() -> None:
+    configure_logging(default_log_path())
+    try:
+        args = parse_arguments()
+        source_path = resolve_source_path(args.source)
+        integrated_path = integrate_notes_spec(
+            source_path,
+            args.disable_verification,
+        )
+        logger.info(
+            f"Integration completed. Updated document available at {integrated_path}."
+        )
+    except Exception as error:
+        logger.exception(f"Integration failed: {error}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()