Skip to content

Reduce duplication & accurcy for RAG index collection pages#537

Merged
SmittieC merged 9 commits into
mainfrom
rag-index-collections
Jul 3, 2026
Merged

Reduce duplication & accurcy for RAG index collection pages#537
SmittieC merged 9 commits into
mainfrom
rag-index-collections

Conversation

@lisa-tarbo

@lisa-tarbo lisa-tarbo commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Background to changes

While working on adding search/retrieval to RAG developer documentation dimagi/open-chat-studio#1716, I noticed inaccuracies and duplication in the user guides on this topic

Improvements made

  1. Versioning.md contained information about collection files for published chatbots. This is now merged into Collections concept page. Reduced duplication
  2. Gemini was missing from the LLM providers that provide local indexed collections
  3. Clarify doc source only supports 2 types (Github and Confluence) early in page as it impacts the users decisions

Also formatting, grammar and readablity

Comment thread docs/concepts/collections/index.md Outdated
Comment thread docs/concepts/collections/indexed.md Outdated
Comment thread docs/how-to/document_sources.md Outdated
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Docs review

Nice consolidation — the versioning.md → collections/index.md merge cleanly removes the duplication, the drift-detection note and existing-published-bots warning were preserved verbatim, and the Gemini addition is consistent with the rest of the docs. I verified all changed cross-references resolve (indexed.md#document-sources-for-indexed-collections, index.md#collections-and-published-chatbots, document_sources.md#monitoring-sync-status) and no lingering references to the removed anchors (#document-sources, #what-is-frozen-and-what-is-live) remain anywhere in docs/.

A few items, left as inline comments:

  • ⚠️ collections/index.md (lines 5–8) — the new - markers on the definition-list terms will likely break def_list rendering (every other def-list on the site uses a bare term line). Please preview locally before merging — this is the one blocking item.
  • collections/indexed.md (line 65) — double space in the providers list (nit).
  • how-to/document_sources.md (line 6) — "external platform" collides with the channel = "platform" terminology used elsewhere; suggest "external system" / "external document source" for consistency.

Minor / non-blocking: indexed.md frontmatter title (singular) no longer matches the H1 (plural, now "...(for RAG applications)") — good chance to align while the line's already being touched.

@lisa-tarbo lisa-tarbo marked this pull request as ready for review July 2, 2026 15:25
Comment thread docs/concepts/collections/index.md Outdated
@claude

claude Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Documentation review

The consolidation of the versioning/collections content is well done — no information is lost, both admonitions and the 2026-06-03 date carry over intact, the heading rename is reflected everywhere, and all the updated anchors (#document-sources-for-indexed-collections, #collections-and-published-chatbots, etc.) resolve. Adding Gemini and clarifying the two document-source types up front are good, user-facing improvements.

One thing to fix before merge, plus a couple of optional nits.

🔴 Fix before merge

  • docs/concepts/collections/index.md (lines 5–6): the two collection entries lost their definition-list markup and now render as a single run-together paragraph (no blank line between the lines). Left an inline suggestion to convert them to a bulleted list.

🟡 Suggestions (optional)

  • Provider ordering is inconsistent between the two companion pages: indexed.md says "Confluence … or GitHub" while document_sources.md lists "GitHub and Confluence". Picking one canonical order would read as more deliberate.
  • index.md line 16: the colon-into-bullets works, but the media-collection bullet's trigger (adding/removing files) isn't really "via a document-source sync" — only the second bullet is. Minor clarity nit.

👍 Nice

  • Promoting the two troubleshooting items in document_sources.md to real ### headings makes them TOC- and anchor-addressable.

No code examples to verify (docs-only). Nothing broke in the cross-references.

@lisa-tarbo lisa-tarbo requested a review from SmittieC July 2, 2026 15:33
@lisa-tarbo lisa-tarbo changed the title Rag index collections Reduce duplication & accurcy for RAG index collection pages Jul 2, 2026
@SmittieC SmittieC merged commit c7ea53f into main Jul 3, 2026
1 check passed
@SmittieC SmittieC deleted the rag-index-collections branch July 3, 2026 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants