Autovectorization Tutorial #54

giriraj-singh-couchbase · 2025-09-15T22:39:20Z

This guide is a comprehensive tutorial demonstrating how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your data into vector embeddings and perform semantic search using LangChain.

📋 Overview

The main tutorial is contained in the Jupyter notebook autovec_langchain.ipynb, which walks you through:

Couchbase Capella Setup - Creating account, cluster, and access controls
Data Upload & Processing - Using sample data
Model Deployment - Deploying embedding models for vectorization
Auto-Vectorization Workflow - Setting up automated embedding generation
LangChain Integration - Building semantic search applications with vector similarity

gemini-code-assist

Summary of Changes

Hello @giriraj-singh-couchbase, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new, detailed tutorial designed to guide users through the process of leveraging Couchbase Capella's AI Services for automatic data vectorization. The tutorial provides a complete walkthrough, from initial Capella account and cluster setup to deploying embedding models and configuring auto-vectorization workflows, culminating in practical examples of semantic search using LangChain. The aim is to empower users to easily transform their data into vector embeddings and build intelligent search applications.

Highlights

New Auto-Vectorization Tutorial: Introduces a comprehensive tutorial demonstrating the use of Couchbase Capella's AI Services auto-vectorization feature to convert data into vector embeddings.
LangChain Integration: The tutorial showcases how to perform semantic search using the generated vector embeddings by integrating with LangChain.
Step-by-Step Guide: The tutorial covers essential steps including Couchbase Capella setup, data upload and processing, embedding model deployment, auto-vectorization workflow configuration, and practical LangChain integration examples.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2025-09-15T22:39:34Z

Caution

Notebooks or Frontmatter Files Have Been Modified

Please ensure that a frontmatter.md file is accompanying the notebook file, and that the frontmatter is up to date.
These changes will be published to the developer portal tutorials only if frontmatter.md is included.
Proofread all changes before merging, as changes to notebook and frontmatter content will update the developer tutorial.

1 Notebook Files Modified:

Notebook File	Frontmatter Included?
`autovec-structured/autovec_langchain.ipynb`	✅

1 Frontmatter Files Modified:

Frontmatter File
`autovec-structured/frontmatter.md`
Note: frontmatter will be checked and tested in the Test Frontmatter* workflow.*

gemini-code-assist

Code Review

This pull request introduces a comprehensive tutorial on using Couchbase Capella's AI Services for auto-vectorization with LangChain. The tutorial is well-structured, but there are several areas for improvement to enhance clarity, correctness, and security for the end-user. My review includes feedback on the README file and the Jupyter notebook, addressing issues such as placeholder values, dependency management, broken links, inconsistent formatting, typos, and a hardcoded credential. Addressing these points will make the tutorial more polished and easier for users to follow.

autovec-tutorial/autovec_langchain.ipynb

autovec-tutorial/README.md

autovec-tutorial/autovec_langchain.ipynb

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…se-examples/vector-search-cookbook into DA-1096_autovec_tutorial

nithishr

Can you apply the same comments as in #57 to this one as well?

nithishr

Same comments as in #57 are relevant here as well.

autovec-tutorial/autovec_langchain.ipynb

autovec-structured/autovec_langchain.ipynb

autovec-tutorial/__frontmatter__.md

autovec-structured/img/Access_control.png

nithishr · 2025-12-08T15:55:55Z

autovec-structured/autovec_langchain.ipynb

Looks okay. A few things to make it more user friendly.

Can you use the hierarchy of markdown (##, ###, ####) instead of numbers to organize the tutorial?

The model has to be deployed in the same region as the database for workflows to work.

Can you link to the model & workflows documentation in the relevant places?

Imo, the combination of address, description & id is not a great field to vectorize. Address & id has no use in vector search. Only the description is meaningful.

Can you show the document structure before & after the workfow is run?

Will the tutorial work? The embedding field is different from the default

Yes, the tutorial works.

I will update with these comments. Thank you.

Is there specific documentation for for model services? I was only able to get the AI-services tab where there is very little about the workflows.

for the document structure I have added a line. Since, there is one field added at the end of the document I dont think the full image is needed.

For the sake of ease I have used all source field into vector embeddings and updated accordingly.

Linking to docs (https://docs.couchbase.com/ai/build/vectorization-service/data-processing.html) in the relevant sections of Workflows would be nice.

Please add this The model has to be deployed in the same region as the database for workflows to work. It is important as users could end up having to kill & redeploy clusters due to the region issue. It is also in the docs.

For the sake of ease I have used all source field into vector embeddings and updated accordingly.
Our aim should be to showcase simple realistic use cases not go for simplicity. In this dataset, it makes no sense to vectorize all fields into a single vector.

Since, there is one field added at the end of the document I dont think the full image is needed.
It is the only thing we are getting we are getting out of the Vectorization flow. So in my opinion, we should add it. It can be just a markdown of a sample document.

There are some warnings due to Pydantic not being supported on Python 3.14. Can we get rid of this either by updating the langchain packages to 1.0.x or downgrading Python to 3.13 for the tutorial?

The package versions for langchain-couchbase are not right. Version 0.5.0 & higher are the only ones that support QueryVectorStore.

In the field mapping section, it is not mentioned which approach has been selected. It has to be inferred from the screenshots. You can include either just one approach screenshot or mention explicitly about the selected approach.

After choosing the type of mapping, it is required to either create an index on the new vector_embedding field or the creation of a vector index can be skipped, which is not recommended as the functionality of vector searching will be lost.
This statement is not correct. Vector search will work using brute force without an index.

The comments in the Cluster setup code is a bit too off to the right. Can you reduce the spacing for the comments so that they are visible without scrolling?

The page content in the results does not make much sense. You should include other relevant fields in the vector search response like reviews, name, etc. This is supported by the integration.

autovec-structured/__frontmatter__.md

autovec-structured/frontmatter.md

…ontmatter - used ## & # for headings and subheadings

… to be done

nithishr · 2025-12-09T17:51:11Z

There is also a test failure

TEST FAILURE
Entry: 
   Path: tutorial/markdown/generated/vector-search-cookbook/autovec-structured-autovec_langchain.md
   Title: Auto-Vectorization of Strucutured Data with Couchbase Capella AI Services
Invalid technology: 
    Artificial Intelligence 
Note: 
   Valid technologies: 
   [connectors,kv,query,capella,server,index,mobile,fts,sync gateway,eventing,analytics,udf,vector search,react,edge-server,app-services,hyperscale vector index,composite vector index]

nithishr · 2025-12-09T17:48:34Z

autovec-structured/autovec_langchain.ipynb

Linking to docs (https://docs.couchbase.com/ai/build/vectorization-service/data-processing.html) in the relevant sections of Workflows would be nice.

Please add this The model has to be deployed in the same region as the database for workflows to work. It is important as users could end up having to kill & redeploy clusters due to the region issue. It is also in the docs.

For the sake of ease I have used all source field into vector embeddings and updated accordingly.
Our aim should be to showcase simple realistic use cases not go for simplicity. In this dataset, it makes no sense to vectorize all fields into a single vector.

Since, there is one field added at the end of the document I dont think the full image is needed.
It is the only thing we are getting we are getting out of the Vectorization flow. So in my opinion, we should add it. It can be just a markdown of a sample document.

There are some warnings due to Pydantic not being supported on Python 3.14. Can we get rid of this either by updating the langchain packages to 1.0.x or downgrading Python to 3.13 for the tutorial?

The package versions for langchain-couchbase are not right. Version 0.5.0 & higher are the only ones that support QueryVectorStore.

In the field mapping section, it is not mentioned which approach has been selected. It has to be inferred from the screenshots. You can include either just one approach screenshot or mention explicitly about the selected approach.

After choosing the type of mapping, it is required to either create an index on the new vector_embedding field or the creation of a vector index can be skipped, which is not recommended as the functionality of vector searching will be lost.
This statement is not correct. Vector search will work using brute force without an index.

The comments in the Cluster setup code is a bit too off to the right. Can you reduce the spacing for the comments so that they are visible without scrolling?

The page content in the results does not make much sense. You should include other relevant fields in the vector search response like reviews, name, etc. This is supported by the integration.

autovec-structured/frontmatter.md

… queries

deniswsrosa

Minor change at the beggining

autovec-structured/autovec_langchain.ipynb

deniswsrosa · 2025-12-11T10:25:44Z

autovec-structured/autovec_langchain.ipynb

+    "   <img src=\"./img/workflow.png\" width=\"1000px\" height=\"500px\" style=\"padding: 5px; border-radius: 10px 20px 30px 40px; border: 2px solid #555;\">\n",
+    "   \n",
+    "2. Start your workflow deployment by giving it a name and selecting where your data will be provided to the auto-vectorization service. There are currently 3 options: <B>`pre-processed data (JSON format) from Capella`</B>, <B>`pre-processed data (JSON format) from external sources (S3 buckets)`</B> and <B>`unstructured data from external sources (S3 buckets)`</B>. For this tutorial, we will choose the first option, which is pre-processed data from Capella.\n",
+    "\n",


Link to a docs page (if there is any) that talk about these items.

The link is there at the very starting.

Updating AutoVec tutorial

807c392

giriraj-singh-couchbase requested a review from deniswsrosa September 15, 2025 22:39

gemini-code-assist bot reviewed Sep 15, 2025

View reviewed changes

giriraj-singh-couchbase and others added 13 commits September 16, 2025 04:34

Added frontmatter.md and updated the tutorial

68e04f3

Applied suggestions from code review

9bb439b

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Fixed frontmatter.md

dfca154

fixing minor details

e52bcd5

Merge branch 'DA-1096_autovec_tutorial' of https://github.com/couchba…

604bf29

…se-examples/vector-search-cookbook into DA-1096_autovec_tutorial

Fixed frontmatter.md

fc878a4

Updated autovec_langchain.ipynb

38d1792

fixed some screenshots

1e9bb07

updated screenshot

b85f7a4

fixed grammatical mistakes

69fe694

fixed capella free-tier issue and added some missing content

9bcf7b9

added missing content

0d870b1

Updatede doc

709cfe5

nithishr reviewed Oct 2, 2025

View reviewed changes

teetangh assigned giriraj-singh-couchbase Oct 28, 2025

giriraj-singh-couchbase added 3 commits October 30, 2025 21:40

added version of libraries, removed unnecessary files

55242d5

updated model service name

ce09c83

removed extra code

e48691b

nithishr reviewed Nov 4, 2025

View reviewed changes

giriraj-singh-couchbase added 5 commits November 13, 2025 15:15

updated screenshots

97392d9

Updated tutorial to use couchbase hyperscale vector index

8cb8715

fixed frontmatter path

7274d17

renamed folder

44ed62b

Removed title as it will be used from the frontmatter

90042d1

giriraj-singh-couchbase requested a review from nithishr December 4, 2025 21:33

nithishr requested changes Dec 8, 2025

View reviewed changes

giriraj-singh-couchbase added 2 commits December 8, 2025 23:27

updated tutorial - using all source as vector embeddings - updated fr…

404a0f5

…ontmatter - used ## & # for headings and subheadings

fixed frontmatter for the broken unstructured tuorial link

67e2199

giriraj-singh-couchbase requested a review from nithishr December 8, 2025 18:01

giriraj-singh-couchbase added 2 commits December 9, 2025 13:49

updated the result and added embedding_key against which search needs…

a4f3fd0

… to be done

updated frontmatter

2012327

nithishr requested changes Dec 9, 2025

View reviewed changes

giriraj-singh-couchbase added 5 commits December 10, 2025 15:54

updated screenshorts for custom mapping and added detailed output for…

30fe081

… queries

added vector_field image

729bacf

updated search term

0733cba

updated frontmatter

3626387

updated frontmatter

6f5ee90

deniswsrosa requested changes Dec 11, 2025

View reviewed changes

updated text for importing sample data

9f348a9

Autovectorization Tutorial #54

Are you sure you want to change the base?

Autovectorization Tutorial #54

Uh oh!

Conversation

giriraj-singh-couchbase commented Sep 15, 2025

📋 Overview

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notebooks or Frontmatter Files Have Been Modified

1 Notebook Files Modified:

1 Frontmatter Files Modified:

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithishr left a comment

Choose a reason for hiding this comment

Uh oh!

nithishr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nithishr commented Dec 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

deniswsrosa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

github-actions bot commented Sep 15, 2025 •

edited

Loading