Da 1153 autovec unstructured data #83

giriraj-singh-couchbase · 2025-12-10T19:22:38Z

This pull request updates the tutorial and notebook for auto-vectorization of unstructured data in S3 buckets using Couchbase Capella AI Services. The changes modernize the workflow to use the latest Capella features and LangChain Couchbase integration, clarify instructions, and update code to reflect best practices and current APIs.

Documentation and Workflow Updates:

Removed the redundant __frontmatter__.md file, consolidating documentation into the notebook.
Updated notebook section headings and instructions for clarity, including deployment steps, configuration, and model selection. [1] [2] [3] [4]

Code Modernization and API Updates:

Migrated vector search code from CouchbaseSearchVectorStore to CouchbaseQueryVectorStore with DistanceStrategy.COSINE, reflecting the move to Hyperscale Vector Search indexes and best practices for similarity search. [1] [2]
Updated installation instructions to require langchain-couchbase==1.0.1 and clarified minimum version requirements.
Improved credential and endpoint naming for Capella, and updated example code for connecting to clusters and performing similarity search. [1] [2] [3]

Semantic Search and Results Presentation:

Changed similarity search logic to use similarity_search instead of similarity_search_with_score, and updated result formatting for clarity and relevance. [1] [2]
Updated the explanation and interpretation of results to match the new workflow and APIs.

These updates ensure the tutorial is aligned with the latest Couchbase Capella AI Services and LangChain integration, making it easier for users to follow and implement auto-vectorization and semantic search workflows.

github-actions · 2025-12-10T19:22:52Z

Caution

Notebooks or Frontmatter Files Have Been Modified

Please ensure that a frontmatter.md file is accompanying the notebook file, and that the frontmatter is up to date.
These changes will be published to the developer portal tutorials only if frontmatter.md is included.
Proofread all changes before merging, as changes to notebook and frontmatter content will update the developer tutorial.

43 Notebook Files Modified:

Notebook File	Frontmatter Included?
`autovec_unstructured/autovec_unstructured.ipynb`	✅
`awsbedrock-agents/lambda-approach/Bedrock_Agents_Lambda.ipynb`	✅
`awsbedrock/RAG_with_Couchbase_and_Bedrock.ipynb`	❌
`awsbedrock/gsi/RAG_with_Couchbase_and_Bedrock.ipynb`	✅
`azure/RAG_with_Couchbase_and_AzureOpenAI.ipynb`	❌
`azure/fts/RAG_with_Couchbase_and_AzureOpenAI.ipynb`	✅
`azure/gsi/RAG_with_Couchbase_and_AzureOpenAI.ipynb`	✅
`capella-ai/haystack/RAG_with_Couchbase_Capella.ipynb`	❌
`capella-ai/langchain/RAG_with_Couchbase_Capella.ipynb`	❌
`capella-ai/llamaindex/RAG_with_Couchbase_Capella.ipynb`	❌
`capella-model-services/langchain/search_based/RAG_with_Capella_Model_Services_and_LangChain.ipynb`	✅
`claudeai/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb`	❌
`claudeai/fts/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb`	✅
`claudeai/gsi/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb`	✅
`cohere/RAG_with_Couchbase_and_Cohere.ipynb`	❌
`cohere/fts/RAG_with_Couchbase_and_Cohere.ipynb`	✅
`cohere/gsi/RAG_with_Couchbase_and_Cohere.ipynb`	✅
`crewai-short-term-memory/CouchbaseStorage_Demo.ipynb`	❌
`crewai-short-term-memory/fts/CouchbaseStorage_Demo.ipynb`	✅
`crewai-short-term-memory/gsi/CouchbaseStorage_Demo.ipynb`	✅
`crewai/RAG_with_Couchbase_and_CrewAI.ipynb`	❌
`crewai/fts/RAG_with_Couchbase_and_CrewAI.ipynb`	✅
`crewai/gsi/RAG_with_Couchbase_and_CrewAI.ipynb`	✅
`haystack/query_based/RAG_with_Couchbase_Capella_and_OpenAI.ipynb`	✅
`haystack/search_based/RAG_with_Couchbase_Capella_and_OpenAI.ipynb`	✅
`huggingface/gsi/hugging_face.ipynb`	✅
`huggingface/hugging_face.ipynb`	❌
`jinaai/RAG_with_Couchbase_and_Jina_AI.ipynb`	❌
`jinaai/query_based/RAG_with_Couchbase_and_Jina_AI.ipynb`	✅
`jinaai/search_based/RAG_with_Couchbase_and_Jina_AI.ipynb`	✅
`llamaindex/fts/RAG_with_Couchbase_Capella_and_OpenAI.ipynb`	✅
`llamaindex/gsi/RAG_with_Couchbase_Capella_and_OpenAI.ipynb`	✅
`mistralai/gsi/mistralai.ipynb`	✅
`mistralai/mistralai.ipynb`	❌
`openrouter-deepseek/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb`	❌
`openrouter-deepseek/gsi/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb`	✅
`pydantic_ai/RAG_with_Couchbase_and_PydanticAI.ipynb`	❌
`pydantic_ai/fts/RAG_with_Couchbase_and_PydanticAI.ipynb`	✅
`pydantic_ai/gsi/RAG_with_Couchbase_and_PydanticAI.ipynb`	✅
`smolagents/RAG_with_Couchbase_and_SmolAgents.ipynb`	❌
`smolagents/fts/RAG_with_Couchbase_and_SmolAgents.ipynb`	✅
`smolagents/gsi/RAG_with_Couchbase_and_SmolAgents.ipynb`	✅
`voyage/RAG_with_Couchbase_and_Voyage.ipynb`	✅

31 Frontmatter Files Modified:

Frontmatter File
`ag2/frontmatter.md`
`autovec_unstructured/frontmatter.md`
`awsbedrock/frontmatter.md`
`awsbedrock/gsi/frontmatter.md`
`azure/frontmatter.md`
`azure/gsi/frontmatter.md`
`capella-model-services/langchain/search_based/frontmatter.md`
`claudeai/frontmatter.md`
`claudeai/gsi/frontmatter.md`
`cohere/frontmatter.md`
`cohere/fts/frontmatter.md`
`crewai-short-term-memory/frontmatter.md`
`crewai-short-term-memory/gsi/frontmatter.md`
`crewai/frontmatter.md`
`crewai/gsi/frontmatter.md`
`haystack/query_based/frontmatter.md`
`haystack/search_based/frontmatter.md`
`huggingface/fts/frontmatter.md`
`jinaai/frontmatter.md`
`jinaai/query_based/frontmatter.md`
`llamaindex/fts/frontmatter.md`
`llamaindex/gsi/frontmatter.md`
`mistralai/frontmatter.md`
`mistralai/gsi/frontmatter.md`
`openrouter-deepseek/frontmatter.md`
`openrouter-deepseek/gsi/frontmatter.md`
`pydantic_ai/frontmatter.md`
`pydantic_ai/gsi/frontmatter.md`
`smolagents/frontmatter.md`
`smolagents/gsi/frontmatter.md`
`voyage/frontmatter.md`
Note: frontmatter will be checked and tested in the Test Frontmatter* workflow.*

gemini-code-assist · 2025-12-10T19:22:59Z

Summary of Changes

Hello @giriraj-singh-couchbase, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request modernizes the tutorial and notebook for auto-vectorization of unstructured data in S3 buckets using Couchbase Capella AI Services. The changes ensure the tutorial aligns with the latest Capella features and LangChain Couchbase integration, providing users with a clearer and more up-to-date guide for implementing auto-vectorization and semantic search workflows.

Highlights

Documentation Modernization: The tutorial's structure and clarity have been significantly improved, including updated section headings and more precise instructions for Capella configuration and model deployment. The __frontmatter__.md file was effectively replaced by frontmatter.md with an updated path, and some initial tutorial text was moved from the notebook.
API and Integration Update: The core vector search implementation was migrated from CouchbaseSearchVectorStore to the newer CouchbaseQueryVectorStore, leveraging Hyperscale Vector Search indexes and DistanceStrategy.COSINE for improved similarity search.
Dependency and Credential Updates: The langchain-couchbase dependency was pinned to version 1.0.1, and credential/endpoint variable names were standardized for better clarity and alignment with Capella AI Services.
Semantic Search Refinement: The similarity search logic was updated to use similarity_search instead of similarity_search_with_score, and the output formatting for search results was enhanced for better readability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively updates the auto-vectorization tutorial and notebook to reflect the latest Couchbase Capella AI Services and LangChain integration. The changes include migrating from CouchbaseSearchVectorStore to CouchbaseQueryVectorStore, updating dependency versions, and clarifying instructions and code examples. The refactoring of the frontmatter documentation is also a positive improvement, enhancing the overall clarity and accuracy of the tutorial.

autovec_unstructured/frontmatter.md

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

giriraj-singh-couchbase added 2 commits December 11, 2025 00:49

updated tutoriak with latest langchain udpates

b624f63

updated vector index terminology

c487a48

gemini-code-assist bot reviewed Dec 10, 2025

View reviewed changes

autovec_unstructured/frontmatter.md Outdated Show resolved Hide resolved

giriraj-singh-couchbase and others added 6 commits December 11, 2025 00:56

updated frontmatter

1c27d14

Update autovec_unstructured/frontmatter.md

4c96755

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

updated frontmatter

6c1ddd5

updated frontmatter

cb73abd

updated frontmatter

d1e1346

Added vector search term at some places

e42a747

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Da 1153 autovec unstructured data #83

Da 1153 autovec unstructured data #83

giriraj-singh-couchbase commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025 •

edited

Loading

Notebooks or Frontmatter Files Have Been Modified

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Da 1153 autovec unstructured data #83

Are you sure you want to change the base?

Da 1153 autovec unstructured data #83

Conversation

giriraj-singh-couchbase commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notebooks or Frontmatter Files Have Been Modified

43 Notebook Files Modified:

31 Frontmatter Files Modified:

Uh oh!

gemini-code-assist bot commented Dec 10, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 10, 2025 •

edited

Loading