Skip to content

core: add ingestion engine and markdown parser#48

Open
y-71 wants to merge 28 commits intomainfrom
map-reduce-ingestion
Open

core: add ingestion engine and markdown parser#48
y-71 wants to merge 28 commits intomainfrom
map-reduce-ingestion

Conversation

@y-71
Copy link
Collaborator

@y-71 y-71 commented Mar 1, 2026

Summary

  • Add parseMarkdownSections() — parses markdown into a RawSection[] tree using mdast, handling preamble content, nested headings, level skips, and code-fenced headings
  • Add flatMapSections() and mapReduceSections() ingestion strategies for processing raw sections
  • Add ComplexDocumentSource type and wire up source type exports
  • Restructure core as a standalone package with its own package.json, tsconfig.json, and bun lockfile

Test plan

  • parseMarkdownSections — 10 test cases covering empty input, preamble, nesting, level skips, code block headings
  • flatMapSections — unit tests with mock map functions
  • mapReduceSections — unit tests with mock map/reduce functions

feuersteiner and others added 28 commits February 24, 2026 22:45
Co-authored-by: feuersteiner <18667704+feuersteiner@users.noreply.github.com>
Co-authored-by: feuersteiner <18667704+feuersteiner@users.noreply.github.com>
Remove non-existent `the-plan.md` reference from repo layout
Make `kind` property readonly in `ReferenceListSource`
Add sectioned document source for structured documents with per-section
metadata. Update source barrel exports and docs.
Add flatMapSections engine that walks a RawSection tree and generates
metadata for every section independently via a user-supplied map function.
All map calls run in parallel, bounded by an optional concurrency limit.
Bottom-up map/reduce over section trees: leaves get mapped, parents
reduce over child metadata. Large leaves are chunked on paragraph
boundaries before mapping. Concurrency-bounded semaphore prevents
deadlocks at concurrency=1.
Parse markdown into a RawSection[] tree using mdast-util-from-markdown.
Handles preamble content, nested headings, level skips, and correctly
ignores headings inside fenced code blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants