Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 0 additions & 26 deletions INTERVIEW.md
Original file line number Diff line number Diff line change
@@ -1,26 +0,0 @@
# TODO 7: Multiline grouping approach (excluded from LLM body, protected from patches)

> What exact on-disk syntax do you want for the multiline grouping section? Please provide a concrete before/after example (including where it sits relative to `---` front matter and the `# -- SCRATCHPAD` heading).

after the front matter, before the scratchpad heading. it should be before any other content in the file after the front matter

> How should the end of the multiline grouping section be detected (e.g., first blank line, next heading, a closing marker, end-of-file)? Can the grouping text itself contain blank lines?

use explicit opening and closing marker syntax. if markdown frontmatter satisfies the criteria e.g. support for multiline content, use that. otherwise implement this in the way which makes the most sense / is the most aligned w/ good practice.

> For legacy documents that still use a single-line `Grouping approach: ...` prefix, should the tool leave that line as-is, or migrate it to the new multiline format when writing the file? If migration is desired, should it happen only when `--grouping` is provided / user is prompted, or always?

leave as is

> The current CLI prompt uses `input()` (single line). How do you want multiline grouping input to be entered (e.g., read until a lone `.` line, read until EOF, open $EDITOR, allow literal `\n` escapes, etc.)?

yes figure out best/most simple but also useable way to support multiline input

> Should the grouping section be preserved verbatim (whitespace/indentation), or normalized (trim lines, collapse spaces) before inserting into the prompt’s “Maintain the grouping approach: …” line?

preserved verbatim

> Do you want the grouping section to be strictly immutable during patch application (i.e., patches only apply to the body after removing the grouping block), or should we also detect and error if a patch’s SEARCH text matches inside the grouping section?

strictly immutable. it should be as if it didn't exist in the body at all. so if a search block matches it and nothing else, it results in an error > retry. there should not need to be any special handling logic for these cases. it simply isn't part of the document body for the purposes of search/replace or substitute blocks.

8 changes: 4 additions & 4 deletions TODO.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
- ask model to provide snippets from integrate text for each find/replace, to clarify what it intended it to integrate
- on fail, only ask the model to provide just that single failed block instead of asking it to provide all blocks again. this is only possible once the model returns what integrated text each block relates
- only ask for start and end lines of search block instead of exact text. if there are multiple matches, ask model to provide a sufficient number of of lines at start or end to narrow down options to a single match. ensure that the search block which the verification prompt sees is not affected by this, by populating the SEARCH section of the block given in the verification prompt with the matching text from the file, instead of only including the start and end lines provided by the model
- modify so that it uses tool calling + hierarchical markdown parsing to avoid needing to ever send the entire document to the model, and instead allow the model to find the relevant section(s) (could often be more than one section which should be modified even to integrate a single piece of info) to modify, and then once it has found the sections, it provides the search/replace diffs
- ensure that the model uses arbitrarily nested md headers, to make this approach scalable instead of only e.g. using one level of headings
- move logs outside of src/
- put group strat into front matter



- make sure prompts mention importance of
Empty file added src/SPEC.md
Empty file.
15 changes: 11 additions & 4 deletions src/integrate_notes.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,10 @@ def extract_grouping_section(body: str) -> tuple[GroupingSection | None, str]:
while grouping_index < len(lines) and not lines[grouping_index].strip():
grouping_index += 1

if grouping_index < len(lines) and lines[grouping_index].strip() == GROUPING_BLOCK_START:
if (
grouping_index < len(lines)
and lines[grouping_index].strip() == GROUPING_BLOCK_START
):
end_index = grouping_index + 1
while end_index < len(lines) and lines[end_index].strip() != GROUPING_BLOCK_END:
end_index += 1
Expand Down Expand Up @@ -242,7 +245,9 @@ def _format_grouping_block(grouping_text: str) -> str:


def render_grouping_section(
grouping_text: str, existing_section: GroupingSection | None, preserve_existing: bool
grouping_text: str,
existing_section: GroupingSection | None,
preserve_existing: bool,
) -> str:
if not grouping_text.strip():
raise ValueError("Grouping approach cannot be empty.")
Expand Down Expand Up @@ -287,7 +292,7 @@ def prompt_for_grouping() -> str:
f"{GROUPING_PREFIX} at the top of the document.\n"
"Enter multiline text and finish with a single line containing only a '.'.\n"
"Examples:\n"
"- Grouping approach: Group points according to what problem each idea/proposal/mechanism/concept addresses/are trying to solve, which you will need to figure out yourself based on context. Do not combine multiple goals/problems into one group. Keep goals/problems specific. Ensure groups are mutually exclusive and collectively exhaustive. Avoid overlap between group's goals/problems. sub-headings should be per-mechanism/per-solution i.e. according to which \"idea\"/solution each point relates to.\n"
'- Grouping approach: Group points according to what problem each idea/proposal/mechanism/concept addresses/are trying to solve, which you will need to figure out yourself based on context. Do not combine multiple goals/problems into one group. Keep goals/problems specific. Ensure groups are mutually exclusive and collectively exhaustive. Avoid overlap between group\'s goals/problems. sub-headings should be per-mechanism/per-solution i.e. according to which "idea"/solution each point relates to.\n'
"- Group points according to what you think the most useful/interesting/relevant groupings are. Ensure similar, related and contradictory points are adjacent.\n"
"Your input:\n"
)
Expand Down Expand Up @@ -1391,7 +1396,9 @@ def integrate_notes(
source_body, source_scratchpad = split_document_sections(source_content)
grouping_section, working_body = extract_grouping_section(source_body)

resolved_grouping = grouping or (grouping_section.text if grouping_section else None)
resolved_grouping = grouping or (
grouping_section.text if grouping_section else None
)
if not resolved_grouping:
resolved_grouping = prompt_for_grouping()
logger.info("Recorded new grouping approach from user input.")
Expand Down
287 changes: 287 additions & 0 deletions src/integrate_notes_spec.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
from __future__ import annotations

import argparse
import random
import sys
from pathlib import Path
from time import perf_counter

from loguru import logger

from spec_chunking import request_chunk_groups
from spec_config import (
SCRATCHPAD_HEADING,
SpecConfig,
default_log_path,
load_config,
repo_root,
)
from spec_editing import request_and_apply_edits
from spec_exploration import explore_until_checkout
from spec_llm import create_openai_client
from spec_logging import configure_logging
from spec_markdown import (
build_document,
format_duration,
normalize_paragraphs,
split_document_sections,
)
from spec_notes import NoteRepository
from spec_summary import SummaryService
from spec_verification import VerificationManager, build_verification_prompt


def parse_arguments() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Integrate scratchpad notes into a markdown repository (SPEC flow)."
)
parser.add_argument(
"--source", required=False, help="Path to the root markdown document."
)
parser.add_argument(
"--disable-verification",
action="store_true",
help="Disable verification prompts and background verification checks.",
)
return parser.parse_args()


def resolve_source_path(provided_path: str | None) -> Path:
if provided_path:
path = Path(provided_path).expanduser().resolve()
else:
user_input = input("Enter path to the root markdown document: ").strip()
if not user_input:
raise ValueError("Document path is required to proceed.")
path = Path(user_input).expanduser().resolve()
if not path.exists():
raise FileNotFoundError(f"Source document not found at {path}.")
if not path.is_file():
raise ValueError(f"Source path {path} is not a file.")
return path


def _select_sample_filenames(
repo: NoteRepository, reachable: list[Path], config: SpecConfig
) -> list[str]:
candidates = []
for path in reachable:
if repo.is_index_note(path, config.index_filename_suffix):
continue
if repo.get_word_count(path) < config.granularity_sample_min_words:
continue
candidates.append(path.name)

if not candidates:
return []

if len(candidates) <= config.granularity_sample_size:
return candidates

return random.sample(candidates, config.granularity_sample_size)


def _ensure_scratchpad_matches(
source_path: Path, expected_paragraphs: list[str]
) -> tuple[str, list[str]]:
content = source_path.read_text(encoding="utf-8")
body, scratchpad = split_document_sections(content, SCRATCHPAD_HEADING)
paragraphs = normalize_paragraphs(scratchpad)
if paragraphs != expected_paragraphs:
raise RuntimeError(
"Scratchpad changed while integration was running; aborting to avoid data loss."
)
return body, paragraphs


def _write_updated_files(
source_path: Path,
root_body: str,
remaining_paragraphs: list[str],
updated_files: dict[Path, str],
repo: NoteRepository,
summaries: SummaryService,
) -> None:
for path, content in updated_files.items():
if path == source_path:
root_body = content
else:
path.write_text(content, encoding="utf-8")
repo.invalidate_content(path)
summaries.invalidate(path)

document = build_document(root_body, SCRATCHPAD_HEADING, remaining_paragraphs)
source_path.write_text(document, encoding="utf-8")
repo.set_root_body(root_body)


def integrate_notes_spec(source_path: Path, disable_verification: bool) -> Path:
config = load_config(repo_root() / "config.json")
source_content = source_path.read_text(encoding="utf-8")
source_body, source_scratchpad = split_document_sections(
source_content, SCRATCHPAD_HEADING
)
scratchpad_paragraphs = normalize_paragraphs(source_scratchpad)

repo = NoteRepository(source_path, source_body, source_path.parent)
client = create_openai_client()
summaries = SummaryService(repo, client, config)
verification_manager = (
None if disable_verification else VerificationManager(client, source_path)
)

try:
if not scratchpad_paragraphs:
logger.info(
"No scratchpad notes to integrate; ensuring scratchpad heading remains present."
)
source_path.write_text(
build_document(source_body, SCRATCHPAD_HEADING, []),
encoding="utf-8",
)
return source_path

reachable = repo.iter_reachable_paths()
sample_filenames = _select_sample_filenames(repo, reachable, config)
chunk_groups = request_chunk_groups(
client, scratchpad_paragraphs, sample_filenames, config
)

remaining_indices = set(range(1, len(scratchpad_paragraphs) + 1))
total_chunks = len(chunk_groups)
chunks_completed = 0
integration_start = perf_counter()
current_body = source_body

for group in chunk_groups:
if any(index not in remaining_indices for index in group):
raise RuntimeError(
"Chunk references paragraphs that were already integrated; aborting."
)
chunk_paragraphs = [scratchpad_paragraphs[index - 1] for index in group]
chunk_text = "\n\n".join(chunk_paragraphs)

expected_remaining = [
scratchpad_paragraphs[index - 1]
for index in sorted(remaining_indices)
]
file_body, _ = _ensure_scratchpad_matches(
source_path, expected_remaining
)
if file_body != current_body:
raise RuntimeError(
"Root document body changed while integration was running; aborting."
)

repo.set_root_body(current_body)
reachable = repo.iter_reachable_paths()
summary_map = summaries.get_summaries(reachable)

root_summary = summary_map[source_path]
root_headings = repo.get_headings(source_path)
root_links = repo.get_links(source_path)
root_link_summaries = [
(path, summary_map[path])
for path in root_links
if path in summary_map
]

chunk_label = f"chunk {chunks_completed + 1}/{total_chunks}"
checkout_paths = explore_until_checkout(
client,
chunk_text,
source_path,
root_summary,
root_headings,
root_links,
root_link_summaries,
summary_map,
repo,
config,
)

checked_out_contents = {
path: repo.get_note_content(path) for path in checkout_paths
}
edit_application = request_and_apply_edits(
client,
chunk_text,
checked_out_contents,
checkout_paths,
chunk_label,
)

for path, content in edit_application.updated_contents.items():
if path == source_path:
current_body = content
for path in edit_application.updated_contents:
if path != source_path:
repo.invalidate_content(path)

for index in group:
remaining_indices.remove(index)
remaining_paragraphs = [
scratchpad_paragraphs[index - 1]
for index in sorted(remaining_indices)
]

_write_updated_files(
source_path,
current_body,
remaining_paragraphs,
edit_application.updated_contents,
repo,
summaries,
)

if verification_manager is not None:
verification_prompt = build_verification_prompt(
chunk_text,
edit_application.patch_replacements,
edit_application.duplicate_texts,
)
verification_manager.enqueue_prompt(
verification_prompt,
chunk_label,
chunks_completed,
total_chunks,
)

chunks_completed += 1
remaining_chunks = total_chunks - chunks_completed
if remaining_chunks > 0:
elapsed_seconds = perf_counter() - integration_start
average_duration = elapsed_seconds / chunks_completed
estimated_seconds_remaining = average_duration * remaining_chunks
logger.info(
f"Estimated time remaining: {format_duration(estimated_seconds_remaining)}"
f" for {remaining_chunks} remaining chunk(s)."
)

logger.info("All scratchpad notes integrated; scratchpad section cleared.")
return source_path
finally:
summaries.shutdown()
if verification_manager is not None:
verification_manager.shutdown()


def main() -> None:
configure_logging(default_log_path())
try:
args = parse_arguments()
source_path = resolve_source_path(args.source)
integrated_path = integrate_notes_spec(
source_path,
args.disable_verification,
)
logger.info(
f"Integration completed. Updated document available at {integrated_path}."
)
except Exception as error:
logger.exception(f"Integration failed: {error}")
sys.exit(1)


if __name__ == "__main__":
main()
Loading