Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
807c392
Updating AutoVec tutorial
giriraj-singh-couchbase Sep 15, 2025
68e04f3
Added frontmatter.md and updated the tutorial
giriraj-singh-couchbase Sep 15, 2025
9bb439b
Applied suggestions from code review
giriraj-singh-couchbase Sep 15, 2025
dfca154
Fixed frontmatter.md
giriraj-singh-couchbase Sep 15, 2025
e52bcd5
fixing minor details
giriraj-singh-couchbase Sep 15, 2025
604bf29
Merge branch 'DA-1096_autovec_tutorial' of https://github.com/couchba…
giriraj-singh-couchbase Sep 15, 2025
fc878a4
Fixed frontmatter.md
giriraj-singh-couchbase Sep 15, 2025
38d1792
Updated autovec_langchain.ipynb
giriraj-singh-couchbase Sep 18, 2025
1e9bb07
fixed some screenshots
giriraj-singh-couchbase Sep 18, 2025
b85f7a4
updated screenshot
giriraj-singh-couchbase Sep 23, 2025
69fe694
fixed grammatical mistakes
giriraj-singh-couchbase Sep 24, 2025
9bcf7b9
fixed capella free-tier issue and added some missing content
giriraj-singh-couchbase Sep 25, 2025
0d870b1
added missing content
giriraj-singh-couchbase Sep 25, 2025
709cfe5
Updatede doc
giriraj-singh-couchbase Sep 26, 2025
55242d5
added version of libraries, removed unnecessary files
giriraj-singh-couchbase Oct 30, 2025
ce09c83
updated model service name
giriraj-singh-couchbase Nov 3, 2025
e48691b
removed extra code
giriraj-singh-couchbase Nov 3, 2025
97392d9
updated screenshots
giriraj-singh-couchbase Nov 13, 2025
8cb8715
Updated tutorial to use couchbase hyperscale vector index
giriraj-singh-couchbase Dec 4, 2025
7274d17
fixed frontmatter path
giriraj-singh-couchbase Dec 4, 2025
44ed62b
renamed folder
giriraj-singh-couchbase Dec 4, 2025
90042d1
Removed title as it will be used from the frontmatter
giriraj-singh-couchbase Dec 4, 2025
404a0f5
updated tutorial - using all source as vector embeddings - updated fr…
giriraj-singh-couchbase Dec 8, 2025
67e2199
fixed frontmatter for the broken unstructured tuorial link
giriraj-singh-couchbase Dec 8, 2025
a4f3fd0
updated the result and added embedding_key against which search needs…
giriraj-singh-couchbase Dec 9, 2025
2012327
updated frontmatter
giriraj-singh-couchbase Dec 9, 2025
30fe081
updated screenshorts for custom mapping and added detailed output for…
giriraj-singh-couchbase Dec 10, 2025
729bacf
added vector_field image
giriraj-singh-couchbase Dec 10, 2025
0733cba
updated search term
giriraj-singh-couchbase Dec 10, 2025
3626387
updated frontmatter
giriraj-singh-couchbase Dec 10, 2025
6f5ee90
updated frontmatter
giriraj-singh-couchbase Dec 10, 2025
9f348a9
updated text for importing sample data
giriraj-singh-couchbase Dec 11, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
434 changes: 434 additions & 0 deletions autovec-structured/autovec_langchain.ipynb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay. A few things to make it more user friendly.

  • Can you use the hierarchy of markdown (##, ###, ####) instead of numbers to organize the tutorial?
  • The model has to be deployed in the same region as the database for workflows to work.
  • Can you link to the model & workflows documentation in the relevant places?
  • Imo, the combination of address, description & id is not a great field to vectorize. Address & id has no use in vector search. Only the description is meaningful.
  • Can you show the document structure before & after the workfow is run?
  • Will the tutorial work? The embedding field is different from the default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tutorial works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update with these comments. Thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there specific documentation for for model services? I was only able to get the AI-services tab where there is very little about the workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the document structure I have added a line. Since, there is one field added at the end of the document I dont think the full image is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of ease I have used all source field into vector embeddings and updated accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Linking to docs (https://docs.couchbase.com/ai/build/vectorization-service/data-processing.html) in the relevant sections of Workflows would be nice.
  • Please add this The model has to be deployed in the same region as the database for workflows to work. It is important as users could end up having to kill & redeploy clusters due to the region issue. It is also in the docs.
  • For the sake of ease I have used all source field into vector embeddings and updated accordingly.
    Our aim should be to showcase simple realistic use cases not go for simplicity. In this dataset, it makes no sense to vectorize all fields into a single vector.

  • Since, there is one field added at the end of the document I dont think the full image is needed.
    It is the only thing we are getting we are getting out of the Vectorization flow. So in my opinion, we should add it. It can be just a markdown of a sample document.

  • There are some warnings due to Pydantic not being supported on Python 3.14. Can we get rid of this either by updating the langchain packages to 1.0.x or downgrading Python to 3.13 for the tutorial?
  • The package versions for langchain-couchbase are not right. Version 0.5.0 & higher are the only ones that support QueryVectorStore.
  • In the field mapping section, it is not mentioned which approach has been selected. It has to be inferred from the screenshots. You can include either just one approach screenshot or mention explicitly about the selected approach.
  • After choosing the type of mapping, it is required to either create an index on the new vector_embedding field or the creation of a vector index can be skipped, which is not recommended as the functionality of vector searching will be lost.
    This statement is not correct. Vector search will work using brute force without an index.

  • The comments in the Cluster setup code is a bit too off to the right. Can you reduce the spacing for the comments so that they are visible without scrolling?
  • The page content in the results does not make much sense. You should include other relevant fields in the vector search response like reviews, name, etc. This is supported by the integration.

Large diffs are not rendered by default.

21 changes: 21 additions & 0 deletions autovec-structured/frontmatter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
# frontmatter
path: "/tutorial-couchbase-capella-autovectorization-workflows-with-structured-data-and-langchain"
title: Auto-Vectorization of Strucutured Data with Couchbase Capella AI Services
short_title: Auto-Vectorization with Couchbase and Semantic Search using LangChain
description:
- Learn how to use Couchbase Capella's AI Services auto-vectorization feature to automatically convert your structured data into vector embeddings.
- To learn about the auto-vectorization of unstuctured data read the following [tutorial](tutorial-couchbase-autovectorization-workdlows-with-unstructured-data-and-langchain).
- This tutorial demonstrates how to set up automated embedding generation workflows and perform semantic search using LangChain.
content_type: tutorial
filter: sdk
technology:
- vector search
tags:
- Hyperscale Vector Index
- Artificial Intelligence
- LangChain
sdk_language:
- python
length: 20 Mins
---
Binary file added autovec-structured/img/Access_control.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/Create_auto_vec.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/cluster_cloud_config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/cluster_no_nodes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/create_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/deploying_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/import_sd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/imported_data_hotel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/importing_model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/login.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/login_.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/model_api_key_form.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/model_setup_access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/password_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/select_cluster.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/setup_access.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/start_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/vector_data_source.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/vector_field.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/vector_field_mapping.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added autovec-structured/img/vector_index.png
Binary file added autovec-structured/img/vector_index_page.png
Binary file added autovec-structured/img/workflow.png
Binary file added autovec-structured/img/workflow_deployed.png
Binary file added autovec-structured/img/workflow_summary.png