diff --git a/docs/docs.json b/docs/docs.json
index 7d1519f7..87db70bc 100644
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -47,7 +47,6 @@
"group": "LanceDB Enterprise",
"pages": [
"enterprise/index",
- "enterprise/quickstart",
"enterprise/architecture",
"enterprise/security",
"enterprise/benchmarks",
diff --git a/docs/enterprise/architecture.mdx b/docs/enterprise/architecture.mdx
index c42f547c..e4c9efa5 100644
--- a/docs/enterprise/architecture.mdx
+++ b/docs/enterprise/architecture.mdx
@@ -81,7 +81,7 @@ A [remote table](/tables-and-namespaces#understanding-tables) is the user-facing
This is why Enterprise feels familiar at the API level while operationally behaving differently. Your application still issues table operations and queries, but it is no longer coupled to a local storage path or a single host. Instead, the cluster takes responsibility for execution, coordination, and background upkeep. In SDK terms, `open_table(...)` returns a `RemoteTable`. Architecturally, a remote table is the bridge between the client-facing API and the storage-backed system behind it.
-This design makes LanceDB Enterprise suitable for catalog-backed layouts, see [Namespaces and the Catalog Model](/namespaces) for more details. For the basic application flow, see the [Enterprise quickstart](/enterprise/quickstart).
+This design makes LanceDB Enterprise suitable for catalog-backed layouts, see [Namespaces and the Catalog Model](/namespaces) for more details. For the basic application flow, see the shared [quickstart](/quickstart).
## Read path
diff --git a/docs/enterprise/index.mdx b/docs/enterprise/index.mdx
index c2cd88f1..69ae3269 100644
--- a/docs/enterprise/index.mdx
+++ b/docs/enterprise/index.mdx
@@ -26,7 +26,7 @@ visibility.
### 1. 100B+ row scale
-LanceDB Enterprise is built for demanding workloads that exceed the capabilities of a single machine, whether that's extremely large data volumes or a high number of concurrent queries. Instead of asking your
+LanceDB Enterprise is built for demanding workloads that exceed the capabilities of a single machine, whether from extremely large data volumes or a high number of concurrent queries. Instead of asking your
application to own caching, query scaling, and maintenance, Enterprise turns those into **platform** capabilities.
This matters when your AI application moves past a prototype and starts serving real users, larger datasets, and
@@ -134,6 +134,68 @@ monitoring. Both enterprise modes are designed for private networking, complianc
Read More: [LanceDB Enterprise Deployment](/enterprise/deployment/)
+## Usage differences between Enterprise and OSS
+
+The [quickstart](/quickstart) guide shows both local embedded connections and Enterprise `db://...`
+connections. Once connected to LanceDB, the table API is largely the same: create a table, search,
+filter, evolve the schema, and store multimodal records. However, there are some semantic differences
+worth understanding when your code is talking to LanceDB Enterprise.
+
+### 1. Connection model
+
+In LanceDB Enterprise, your app connects via a `db://...` URI and sends requests to the cluster API.
+The cluster executes table operations on your behalf. Your code is coupled to a **managed service endpoint**,
+whereas embedded LanceDB is directly coupled to local or object-storage paths.
+
+### 2. Returned table type
+
+Connecting to an Enterprise table via `open_table(...)` returns a `RemoteTable`, unlike embedded LanceDB,
+which returns a `LanceTable`. `RemoteTable` is a catalog-backed table accessed through a server/cluster,
+and does not support all the same methods as `LanceTable` (see below).
+
+### 3. Materialization APIs
+
+For Python users working with LanceDB Enterprise, `RemoteTable` does not support table-level
+materialization methods like `table.to_arrow()` or `table.to_pandas()`. This protects users from
+accidentally materializing tables that are too large to fit in memory.
+
+Instead, materialize results through query/search builders, for example
+`table.search(...).limit(...).to_pandas()` or `table.query(...).to_arrow()`. For quick previews, use
+`table.head()`.
+
+### 4. Maintenance lifecycle
+
+In Enterprise, maintenance operations like `optimize` and `compact_files` are handled by the cluster
+as background work. You can trigger them manually, but they are not required for performance or
+correctness in the same way they are in embedded LanceDB.
+
+That means maintenance is managed by platform behavior and cluster configuration, not by explicit
+per-table maintenance calls in your application code.
+
+### 5. Guardrails and limits
+
+Enterprise can enforce platform-level guardrails, such as index/table limits and safety checks around
+operations like `merge_insert` when too many rows are unindexed. Embedded LanceDB mostly exposes
+storage/format-level behavior, and you tune many lifecycle tasks yourself.
+
+This means an operation in LanceDB Enterprise can fail due to service-level policy, not just because
+of local table shape or schema mismatch.
+
+### 6. Cluster-managed background work
+
+In Enterprise, async writes and reindexing workflows are handled by cluster background systems. In
+embedded LanceDB, if you want ongoing upkeep, you usually schedule and run it yourself in your
+application or jobs.
+
+In practice, your app issues table operations, and the platform handles distributed orchestration for
+maintenance and indexing in the background.
+
+
+As a rule of thumb, all you need to remember is this: treat `db://...` as a remote service boundary,
+use query builders to fetch results, and otherwise interact with your tables as you would in embedded
+LanceDB.
+
+
## Which one should I use?
[It's very simple to get started with OSS](/quickstart/): Get started with `pip install lancedb` and begin ingesting
diff --git a/docs/enterprise/quickstart.mdx b/docs/enterprise/quickstart.mdx
deleted file mode 100644
index cd19406b..00000000
--- a/docs/enterprise/quickstart.mdx
+++ /dev/null
@@ -1,224 +0,0 @@
----
-title: "Enterprise Quickstart"
-sidebarTitle: "Quickstart"
-description: "Run the LanceDB quickstart workflow on a RemoteTable in LanceDB Enterprise."
-icon: "rocket"
----
-
-import {
- PyConnectEnterpriseQuickstart,
- TsConnectEnterpriseQuickstart,
- RsConnectEnterpriseQuickstart,
-} from '/snippets/connection.mdx';
-import {
- PyQuickstartCreateTable,
- PyQuickstartVectorSearch1,
- PyQuickstartOpenTable,
- PyQuickstartAddData,
- PyQuickstartVectorSearch2,
- TsQuickstartCreateTable,
- TsQuickstartVectorSearch1,
- TsQuickstartOpenTable,
- TsQuickstartAddData,
- TsQuickstartVectorSearch2,
- RsQuickstartDefineStruct,
- RsQuickstartCreateTable,
- RsQuickstartVectorSearch1,
- RsQuickstartOpenTable,
- RsQuickstartAddData,
- RsQuickstartVectorSearch2,
-} from '/snippets/quickstart.mdx';
-
-This quickstart follows a similar workflow as the [OSS quickstart](/quickstart), but uses a **`RemoteTable`** through a `db://...` connection.
-
-
-To get a LanceDB Enterprise cluster setup and to obtain credentials and endpoint details, [contact our team](mailto:contact@lancedb.com) to get started.
-This guide assumes your Enterprise cluster is already running.
-
-
-## 1. Install LanceDB
-
-
-```bash Python icon=Python
-pip install lancedb
-```
-
-```bash TypeScript icon=js
-npm install @lancedb/lancedb
-```
-
-```bash Rust icon=Rust
-cargo add lancedb
-```
-
-
-## 2. Connect to Enterprise (`db://...`)
-
-
-
- { "import lancedb\n\n" }
- {PyConnectEnterpriseQuickstart}
-
-
-
- { "import * as lancedb from \"@lancedb/lancedb\";\n\n" }
- {TsConnectEnterpriseQuickstart}
-
-
-
- { "use lancedb::connect;\n\n" }
- {RsConnectEnterpriseQuickstart}
-
-
-
-## 3. Create a table (same sample data as the OSS quickstart)
-
-
-
- {PyQuickstartCreateTable}
-
-
-
- {TsQuickstartCreateTable}
-
-
-
- {RsQuickstartDefineStruct}
- {RsQuickstartCreateTable}
-
-
-
-## 4. Run vector search
-
-
-
- {PyQuickstartVectorSearch1}
-
-
-
- {TsQuickstartVectorSearch1}
-
-
-
- {RsQuickstartVectorSearch1}
-
-
-
-## 5. Open table, add data, and query again
-
-
-
- {PyQuickstartOpenTable}
- {PyQuickstartAddData}
- {PyQuickstartVectorSearch2}
-
-
-
- {TsQuickstartOpenTable}
- {TsQuickstartAddData}
- {TsQuickstartVectorSearch2}
-
-
-
- { "use lancedb::table::Table;\n\n" }
- {RsQuickstartOpenTable}
- {RsQuickstartAddData}
- {RsQuickstartVectorSearch2}
-
-
-
-## Differences between Enterprise and OSS usage
-
-As can be seen, the flow for working with a `RemoteTable` in Enterprise looks more or less
-similar to the [OSS quickstart](/quickstart). However, there are some semantic differences:
-
-### 1. Connection model
-
-In LanceDB Enterprise, your app connects via a `db://...` URI and sends requests to the cluster API. The cluster executes table operations on your behalf.
-Your code is coupled to a **managed service endpoint** (whereas in OSS, your code is directly coupled to storage paths).
-
-### 2. Returned table type
-
-Connecting to an Enterprise table via `open_table(...)` returns a `RemoteTable`, unlike in OSS, which returns a `LanceTable`.
-
-### 3. Materialization APIs
-
-For Python users working with LanceDB Enterprise, `RemoteTable` does not support table-level
-materialization methods like `table.to_arrow()` or `table.to_pandas()`. This is to protect
-users from accidentally materializing tables that are too large to fit in memory.
-
-Instead, you materialize results through query/search builders, for example `table.search(...).limit(...).to_pandas()` or `table.query(...).to_arrow()`. For quick previews, you can use `table.head()`.
-
-### 4. Maintenance lifecycle
-
-In Enterprise, maintenance operations like `optimize`, `compact_files` are handled by the cluster as background work. You can trigger them manually, but they are not required for performance or correctness in the same way they are in OSS.
-
-That means maintenance is managed by platform behavior and cluster configuration, not by explicit per-table maintenance calls in your application code.
-
-### 5. Guardrails and limits
-
-Enterprise can enforce platform-level guardrails, such as index/table limits and safety checks around operations like `merge_insert` when too many rows are unindexed. OSS mostly exposes storage/format-level behavior, and you tune many lifecycle tasks yourself.
-
-This means an operation in LanceDB Enterprise can fail due to service-level policy, not just because of local table shape or schema mismatch.
-
-### 6. Cluster-managed background work
-
-In Enterprise, async writes and reindexing workflows are handled by cluster background systems. In OSS, if you want ongoing upkeep, you usually schedule and run it yourself in your application or jobs.
-
-In practice, your app issues table operations, and the platform handles distributed orchestration for maintenance and indexing in the background.
-
-
-As a rule of thumb, all you need to remember with regard to LanceDB Enterprise is this: treat `db://...` as a remote service boundary, use query builders to fetch results, and otherwise interact with your tables as you would in OSS.**
-
-
-## Advanced usage via namespace-backed connections
-
-LanceDB Enterprise also supports namespace-backed catalog connections. This allows you to resolve tables by namespace, rather than by direct URI, and is accessed via the REST connection mode of `connect_namespace(...)`. This is useful when table location resolution and credential vending are handled by an external catalog/namespace service.
-
-```py Python icon=Python
-import os
-import lancedb
-
-ns_db = lancedb.connect_namespace(
- "rest",
- {
- "uri": "https://",
- "headers.Authorization": f"Bearer {os.environ['CATALOG_TOKEN']}",
- },
-)
-
-# Namespace-scoped table resolution
-table = ns_db.open_table("adventurers", namespace=["prod", "search"])
-```
-
-This mode is useful when table location resolution and credential vending are handled by an external catalog/namespace service.
-
-If you want to stick to a common table flow, start with the `db://` RemoteTable flow shown above.
-
-## Further reading
-
-You can learn more about table operations, namespaces, and the architecture of LanceDB Enterprise in the following guides.
-
-
-
- Build on this quickstart with table creation, updates, and schema tips.
-
-
- Learn how to use namespaces in LanceDB, and connect to an Enterprise namespace via REST.
-
-
- Learn about the architecture of LanceDB Enterprise and how it achieves high performance at scale.
-
-
\ No newline at end of file
diff --git a/docs/index.mdx b/docs/index.mdx
index c9cf4f75..2dcd73d6 100644
--- a/docs/index.mdx
+++ b/docs/index.mdx
@@ -90,7 +90,7 @@ for agents. Start here:
href="/quickstart"
>
Get started with LanceDB in minutes.
-
+
- Get started with LanceDB Enterprise in minutes.
+ Get started with LanceDB in minutes, including Enterprise `db://` connections.
diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx
index c2ba3d85..8bcac2c7 100644
--- a/docs/quickstart.mdx
+++ b/docs/quickstart.mdx
@@ -18,28 +18,54 @@ import {
TsConnectObjectStorage,
} from '/snippets/connection.mdx';
import {
+ PyQuickstartData,
PyQuickstartCreateTable,
PyQuickstartCreateTableAsync,
+ PyQuickstartAddFeature,
+ PyQuickstartCurateWithMetadata,
+ PyQuickstartMultimodalBytes,
+ PyQuickstartQueryFeature,
PyQuickstartVectorSearch1,
PyQuickstartVectorSearch1Async,
PyQuickstartOutputPandas,
+ RsQuickstartAddFeature,
+ RsQuickstartCurateWithMetadata,
RsQuickstartCreateTable,
+ RsQuickstartData,
RsQuickstartDefineStruct,
+ RsQuickstartMultimodalBytes,
+ RsQuickstartQueryFeature,
RsQuickstartVectorSearch1,
+ TsQuickstartAddFeature,
+ TsQuickstartCurateWithMetadata,
TsQuickstartCreateTable,
+ TsQuickstartData,
+ TsQuickstartMultimodalBytes,
+ TsQuickstartQueryFeature,
TsQuickstartVectorSearch1,
} from '/snippets/quickstart.mdx';
-The easiest way to get started with LanceDB is the open source version, which is an embedded database that
-runs in-process (like SQLite). Let's get started in just a few steps!
+As described in [the landing page](/), LanceDB provides one data layer for
+curation, feature engineering, search and retrieval, and model training. Whether you are preparing
+training data, building a RAG or agentic retrieval system, reviewing examples, or adding model-generated
+features, you'll work with the same underlying table and search primitives.
+
+Let's get started in just a few steps!
## 1. Install LanceDB
Install LanceDB in your client SDK.
-```bash Python icon=Python
-pip install lancedb # or uv add lancedb
+```bash pip icon="terminal"
+pip install lancedb
+```
+
+```bash uv icon="terminal"
+uv add lancedb
+
+# Or, in an existing virtual environment:
+uv pip install lancedb
```
```bash TypeScript icon=js
@@ -51,18 +77,43 @@ cargo add lancedb
```
+### Python pre-release builds
+
+To pick up the latest features and bug fixes
+before the next stable release, install a pre-release from LanceDB's Fury index.
+
+
+```bash pip icon="terminal"
+pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb
+```
+
+```bash uv icon="terminal"
+uv venv
+uv pip install --prerelease allow --index https://pypi.fury.io/lancedb/ lancedb
+
+# To add to pyproject.toml, use:
+uv add --prerelease allow --index https://pypi.fury.io/lancedb/ lancedb
+```
+
+
+
+Pre-release builds receive the same level of testing as stable releases, but their availability is not guaranteed
+for more than 6 months after release. For real-world workloads, we recommend you use the latest stable release
+as far as possible.
+
+
## 2. Connect to a LanceDB database
LanceDB supports several URI patterns to connect to a database.
- A local filesystem path (when using it as an embedded library)
- A `db://...` URI (when using LanceDB Enterprise)
-- An object storage URI: `s3://...`, `gs://...`, or `az://...` (OSS mode)
+- An object storage URI: `s3://...`, `gs://...`, or `az://...` (when connecting directly from the client SDK)
-### Connect via local path with LanceDB
+### Connect via local directory path
-The simplest way to begin is to use LanceDB OSS. Simply import LanceDB as an embedded library in your
-client SDK of choice and point to a local path.
+The simplest way to begin is to use LanceDB as an embedded library. Import LanceDB in your
+client SDK of choice and point to a local directory path.
@@ -87,7 +138,7 @@ client SDK of choice and point to a local path.
### Connect via object storage URIs
-You can also connect LanceDB OSS directly to object storage:
+You can also connect directly to object storage from the client SDK:
@@ -112,9 +163,9 @@ For credentials, endpoints, and provider-specific options, see
### Connect to LanceDB Enterprise
-If you're using LanceDB Enterprise, you can connect using a `db://` URI along
-with the API key, region, and cluster endpoint you received from the LanceDB
-team. Pass the cluster endpoint via `host_override` so the client routes
+If you're using LanceDB Enterprise, you can connect to the remote database using the
+`db://` URI along with the API key, region, and cluster endpoint you received from the
+LanceDB team. Pass the cluster endpoint via `host_override` so the client routes
requests to your deployment.
@@ -137,25 +188,57 @@ requests to your deployment.
`host_override` is the full URL of your cluster endpoint, including the scheme
(`https://`) and a port if your deployment listens on a non-default one
-(e.g. `https://your-enterprise-endpoint.com:443`). If you don't know the
+(e.g. `https://your-enterprise-endpoint.com:443`). If you don't have the
endpoint, [contact the LanceDB team](mailto:contact@lancedb.com).
-For a walkthrough on how to use LanceDB Enterprise (including `RemoteTable`
-semantics), see its [quickstart](/enterprise/quickstart). To learn
-more about LanceDB Enterprise overall, see the
-[Enterprise documentation](/enterprise).
+To learn more about `RemoteTable` semantics and how Enterprise differs operationally from
+embedded LanceDB, see the [Enterprise overview](/enterprise).
+
+## 3. Create a new table
+
+Let's create a small table of characters from the kingdom of Camelot. Each row stores source text,
+metadata, structured fields, and a vector embedding in the same LanceDB table.
+
+
+The embeddings we use in this example are synthetic and for demonstration purposes only. In a real AI
+data workflow, you would generate them from text, images, audio, or video using an embedding model of choice.
+
+
+Each row has source text, metadata, structured fields, and a vector:
+
+```json
+{
+ "id": "2",
+ "name": "Merlin",
+ "role": "Wizard",
+ "description": "Advisor and prophet with deep magical knowledge.",
+ "stats": { "strength": 2, "magic": 5, "leadership": 4, "wisdom": 5 },
+ "vector": [0.2, 0.9, 0.4, 0.9]
+}
+```
+
+The full raw records are included below:
-## 3. Obtain data and ingest into LanceDB
+
+
+
+ {PyQuickstartData}
+
-Let's look at an example. We have the following records of characters in an adventure board game.
-The vector column holds 3-dimensional embeddings representing each character.
+
+ {TsQuickstartData}
+
-To ingest the data into LanceDB, obtain data of the required shape
-and pass in the data object to the `create_table` method as shown below.
-Note that LanceDB tables require a schema. If you don't provide one, LanceDB
-will infer it from the data. For the Rust snippet, you can find the helper functions in the
-[code](https://github.com/lancedb/docs/blob/main/tests/rs/quickstart.rs).
+
+ {RsQuickstartDefineStruct}
+ {RsQuickstartData}
+
+
+
+
+You can now create a LanceDB table from those records. The code below creates a LanceDB table
+with the appropriate schema and ingests the data.
@@ -171,25 +254,19 @@ will infer it from the data. For the Rust snippet, you can find the helper funct
- {RsQuickstartDefineStruct}
{RsQuickstartCreateTable}
-
-The `vector` arrays here are synthetic and for demonstration purposes only. In your real-world
-applications, you'd generate these vectors from the raw text fields using a suitable embedding model.
-
-
-## 4. Run a vector similarity search
+## 4. Semantic search
-Now, let's perform a vector similarity search. The query vector should have the same
-dimensionality as your data vectors and be generated using the same embedding model.
-The search returns the most similar vectors based on a chosen distance metric (default is L2,
-or Euclidean distance).
+Search is a useful capability for all kinds of AI data pipelines. Below, we do a vector similarity
+search for samples similar to a "_wise magical advisor_" (transforming the natural language query to
+an embedding), and project only the columns needed by the next step.
-Our query is a vector that represents a "warrior". Let's find the result that's most similar
-to it!
+Search (which requires random access) is a ubiquitous access pattern that appears in many workloads:
+whether you're building a RAG or recommendation system, serving agent memory, or curating a training
+dataset.
@@ -220,28 +297,133 @@ to be used downstream in your application.
-
+## 5. Curation
+
+Searching for relevant results can be more useful when combined with metadata filters.
+In this tiny example, we filter to examples with high `magic` stats.
+
+
+
+ {PyQuickstartCurateWithMetadata}
+
+
+
+ {TsQuickstartCurateWithMetadata}
+
+
+
+ {RsQuickstartCurateWithMetadata}
+
+
+
+When working with large datasets, it's common to use the same pattern to filter on quality labels,
+train/eval splits, numeric fields, categorical values, timestamp windows, or generated tags and labels.
+
+## 6. Add a derived feature
+
+Feature engineering is the process of cleaning up your data and creating new signals that
+help your model learn, make better predictions, or your agent retrieve more useful information.
+In the example below, we add a `power_score` column from the structured `stats` fields.
+Lance supports data evolution, so you can add new columns without rewriting the entire table.
+
+
+
+ {PyQuickstartAddFeature}
+
+
+
+ {TsQuickstartAddFeature}
+
+
+
+ { "use lancedb::table::NewColumnTransform;\n\n" }
+ {RsQuickstartAddFeature}
+
+
+
+Next, you can query a compact view of the new feature:
+
+
+
+ {PyQuickstartQueryFeature}
+
+
+
+ {TsQuickstartQueryFeature}
+
+
+
+ {RsQuickstartQueryFeature}
+
+
+
+| name | role | power_score |
+| --- | --- | --- |
+| King Arthur | King | 3.5 |
+| Merlin | Wizard | 4.0 |
+| Sir Lancelot | Knight | 3.0 |
+
+The same workflow is used for data preparation tasks when adding derived features, cached model signals, review scores, or dataset
+quality indicators.
+
+## 7. Store multimodal data
+
+Multimodal data is a first-class citizen in LanceDB. Binary data (image, audio, video, etc.) is
+stored as blobs or inline Arrow binary types in a LanceDB column, and they benefit from the same
+table operations and data versioning semantics as other data types. All the data is governed
+in the same table, so you can search, filter, and retrieve multimodal records together with structured
+fields, metadata, and embeddings.
+
+In this example, the
+[`lancedb/magical_kingdom`](https://huggingface.co/datasets/lancedb/magical_kingdom) dataset stores
+character images, descriptions, structured stats, image embeddings, and text embeddings together.
+
+Say we downloaded the image for Sir Lancelot from that dataset locally. You can read the image bytes
+in your client SDK and store them in a LanceDB column. The image bytes can be used for downstream tasks
+like retrieval, evaluation, or training.
+
+
+

+
+
+These snippets load the local image file and store the bytes in an `image` column:
+
+
+
+ {PyQuickstartMultimodalBytes}
+
+
+
+ {TsQuickstartMultimodalBytes}
+
+
+
+ {RsQuickstartMultimodalBytes}
+
+
+
+For more examples, see the [multimodal data](/tables/multimodal) section.
+
+## Code
+
See the full code for these examples (including helper functions) in the
`quickstart` file for the appropriate client language in the
-[files provided here](https://github.com/lancedb/docs/tree/main/tests).
-
+[files provided in the repo](https://github.com/lancedb/docs/tree/main/tests).
## What's next?
-You've learned how to install LanceDB, connect, create a table, and run a first
-vector search. In the real world, embeddings capture meaning and vector search
-allows you to find the most relevant data based on semantic similarity.
-
-Note that LanceDB is much more than "just a vector database" -- it's
-[a multimodal lakehouse](https://lancedb.com/blog/multimodal-lakehouse/).
-There's a lot more you can do with it! Continue
-to the [Table management](/tables/) guide to build on
-this example with schema options, appending data, updates, and versioning.
+You've learned how to install LanceDB, connect, create one table for AI data, retrieve related
+examples, curate with metadata, add a derived feature, and represent multimodal records. These same
+primitives apply across the AI data lifecycle, from data preparation and feature engineering to
+retrieval, evaluation, and training.
-As you explore LanceDB further, you can combine vector search with other techniques like filtering based
-on metadata fields, full-text search, hybrid search, and more. Check out the tutorials
-and guides below to continue learning.
+Continue to the table and search guides to build on this example with schema options, appends,
+updates, versioning, indexing, full-text search, hybrid search, and reranking.
Learn how to build Retrieval-Augmented Generation (RAG) applications using LanceDB.
+
+ Create vector, full-text, and scalar indexes to speed up queries on larger datasets.
+
+
+ Use LanceDB for projected, shuffled, random-access reads in training workflows.
+
diff --git a/docs/snippets/quickstart.mdx b/docs/snippets/quickstart.mdx
index bd817e66..810a0c0b 100644
--- a/docs/snippets/quickstart.mdx
+++ b/docs/snippets/quickstart.mdx
@@ -1,50 +1,76 @@
{/* Auto-generated by scripts/mdx_snippets_gen.py. Do not edit manually. */}
-export const PyQuickstartAddData = "more_data = [\n {\"id\": \"7\", \"text\": \"mage\", \"vector\": [0.6, 0.3, 0.4]},\n {\"id\": \"8\", \"text\": \"bard\", \"vector\": [0.3, 0.8, 0.4]},\n]\n\n# Add data to table\ntable.add(more_data)\n";
+export const PyQuickstartAddData = "more_data = [\n {\n \"id\": \"4\",\n \"name\": \"Morgana\",\n \"role\": \"Sorceress\",\n \"description\": \"Powerful sorceress of Avalon.\",\n \"stats\": {\"strength\": 2, \"magic\": 5, \"leadership\": 4, \"wisdom\": 4},\n \"vector\": [0.3, 0.9, 0.6, 0.8],\n \"power_score\": 3.75,\n },\n]\n\n# Add data to table\ntable.add(more_data)\n";
-export const PyQuickstartCreateTable = "data = [\n {\"id\": \"1\", \"text\": \"knight\", \"vector\": [0.9, 0.4, 0.8]},\n {\"id\": \"2\", \"text\": \"ranger\", \"vector\": [0.8, 0.4, 0.7]},\n {\"id\": \"9\", \"text\": \"priest\", \"vector\": [0.6, 0.2, 0.6]},\n {\"id\": \"4\", \"text\": \"rogue\", \"vector\": [0.7, 0.4, 0.7]},\n]\ntable = db.create_table(\"adventurers\", data=data, mode=\"overwrite\")\n";
+export const PyQuickstartAddFeature = "table.add_columns(\n {\n \"power_score\": \"cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)\"\n }\n)\n";
-export const PyQuickstartCreateTableAsync = "async_table = await async_db.create_table(\n \"adventurers\",\n data=data,\n mode=\"overwrite\",\n)\n";
+export const PyQuickstartCreateTable = "table = db.create_table(\"characters\", data=data, mode=\"overwrite\")\n";
-export const PyQuickstartCreateTableNoOverwrite = "table = db.create_table(\"adventurers\", data=data)\n";
+export const PyQuickstartCreateTableAsync = "async_table = await async_db.create_table(\n \"characters\",\n data=data,\n mode=\"overwrite\",\n)\n";
-export const PyQuickstartOpenTable = "table = db.open_table(\"adventurers\")\n";
+export const PyQuickstartCreateTableNoOverwrite = "table = db.create_table(\"characters\", data=data)\n";
+
+export const PyQuickstartCurateWithMetadata = "curated = (\n table.search(query_vector)\n .where(\"stats.magic >= 4\")\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .to_polars()\n)\nprint(curated)\n";
+
+export const PyQuickstartData = "data = [\n {\n \"id\": \"1\",\n \"name\": \"King Arthur\",\n \"role\": \"King\",\n \"description\": \"Leader of Camelot and wielder of Excalibur.\",\n \"stats\": {\"strength\": 4, \"magic\": 1, \"leadership\": 5, \"wisdom\": 4},\n \"vector\": [0.7, 0.1, 0.9, 0.7],\n },\n {\n \"id\": \"2\",\n \"name\": \"Merlin\",\n \"role\": \"Wizard\",\n \"description\": \"Advisor and prophet with deep magical knowledge.\",\n \"stats\": {\"strength\": 2, \"magic\": 5, \"leadership\": 4, \"wisdom\": 5},\n \"vector\": [0.2, 0.9, 0.4, 0.9],\n },\n {\n \"id\": \"3\",\n \"name\": \"Sir Lancelot\",\n \"role\": \"Knight\",\n \"description\": \"Legendary knight known for courage and combat skill.\",\n \"stats\": {\"strength\": 5, \"magic\": 1, \"leadership\": 3, \"wisdom\": 3},\n \"vector\": [0.9, 0.1, 0.5, 0.4],\n },\n]\n";
+
+export const PyQuickstartDataAsync = "data = [\n {\n \"id\": \"1\",\n \"name\": \"King Arthur\",\n \"role\": \"King\",\n \"description\": \"Leader of Camelot and wielder of Excalibur.\",\n \"stats\": {\"strength\": 4, \"magic\": 1, \"leadership\": 5, \"wisdom\": 4},\n \"vector\": [0.7, 0.1, 0.9, 0.7],\n },\n {\n \"id\": \"2\",\n \"name\": \"Merlin\",\n \"role\": \"Wizard\",\n \"description\": \"Advisor and prophet with deep magical knowledge.\",\n \"stats\": {\"strength\": 2, \"magic\": 5, \"leadership\": 4, \"wisdom\": 5},\n \"vector\": [0.2, 0.9, 0.4, 0.9],\n },\n {\n \"id\": \"3\",\n \"name\": \"Sir Lancelot\",\n \"role\": \"Knight\",\n \"description\": \"Legendary knight known for courage and combat skill.\",\n \"stats\": {\"strength\": 5, \"magic\": 1, \"leadership\": 3, \"wisdom\": 3},\n \"vector\": [0.9, 0.1, 0.5, 0.4],\n },\n]\n";
+
+export const PyQuickstartMultimodalBytes = "from pathlib import Path\n\nimage_path = Path(\"docs/static/assets/images/quickstart/sir-lancelot.jpg\")\nimage_bytes = image_path.read_bytes()\n\nmultimodal_table = db.create_table(\n \"character_images\",\n data=[\n {\n \"id\": \"lancelot\",\n \"description\": \"Portrait of Sir Lancelot\",\n \"image\": image_bytes,\n \"vector\": [0.9, 0.1, 0.5, 0.4],\n }\n ],\n mode=\"overwrite\",\n)\n";
+
+export const PyQuickstartOpenTable = "table = db.open_table(\"characters\")\n";
export const PyQuickstartOutputPandas = "# Ensure you run `pip install pandas` beforehand\nresult = table.search(query_vector).limit(2).to_pandas()\nprint(result)\n";
-export const PyQuickstartVectorSearch1 = "# Let's search for vectors similar to \"warrior\"\nquery_vector = [0.8, 0.3, 0.8]\n\n# Ensure you run `pip install polars` beforehand\nresult = table.search(query_vector).limit(2).to_polars()\nprint(result)\n";
+export const PyQuickstartQueryFeature = "features = table.search().select([\"name\", \"role\", \"power_score\"]).to_polars()\nprint(features)\n";
+
+export const PyQuickstartVectorSearch1 = "# Search for examples similar to a \"wise magical advisor\"\nquery_vector = [0.2, 0.8, 0.4, 0.9]\n\n# Ensure you run `pip install polars` beforehand\nresult = (\n table.search(query_vector)\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .to_polars()\n)\nprint(result)\n";
+
+export const PyQuickstartVectorSearch1Async = "# Search for examples similar to a \"wise magical advisor\"\nquery_vector = [0.2, 0.8, 0.4, 0.9]\n\n# Ensure you run `pip install polars` beforehand\nasync_result = await (\n await async_table.search(query_vector)\n).select([\"name\", \"role\", \"description\", \"_distance\"]).limit(2).to_polars()\nprint(async_result)\n";
-export const PyQuickstartVectorSearch1Async = "# Let's search for vectors similar to \"warrior\"\nquery_vector = [0.8, 0.3, 0.8]\n\n# Ensure you run `pip install polars` beforehand\nasync_result = await (await async_table.search(query_vector)).limit(2).to_polars()\nprint(async_result)\n";
+export const PyQuickstartVectorSearch2 = "# Search for examples similar to a \"powerful sorceress\"\nquery_vector = [0.3, 0.9, 0.6, 0.8]\n\nresults = table.search(query_vector).limit(2).to_polars()\nprint(results)\n";
-export const PyQuickstartVectorSearch2 = "# Let's search for vectors similar to \"wizard\"\nquery_vector = [0.7, 0.3, 0.5]\n\nresults = table.search(query_vector).limit(2).to_polars()\nprint(results)\n";
+export const TsQuickstartAddData = "const moreData = [\n {\n id: \"4\",\n name: \"Morgana\",\n role: \"Sorceress\",\n description: \"Powerful sorceress of Avalon.\",\n stats: { strength: 2, magic: 5, leadership: 4, wisdom: 4 },\n vector: [0.3, 0.9, 0.6, 0.8],\n power_score: 3.75,\n },\n];\n\n// Add data to table\nawait table.add(moreData);\n";
-export const TsQuickstartAddData = "const moreData = [\n { id: \"7\", text: \"mage\", vector: [0.6, 0.3, 0.4] },\n { id: \"8\", text: \"bard\", vector: [0.3, 0.8, 0.4] },\n];\n\n// Add data to table\nawait table.add(moreData);\n";
+export const TsQuickstartAddFeature = "await table.addColumns([\n {\n name: \"power_score\",\n valueSql:\n \"cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)\",\n },\n]);\n";
-export const TsQuickstartCreateTable = "const data = [\n { id: \"1\", text: \"knight\", vector: [0.9, 0.4, 0.8] },\n { id: \"2\", text: \"ranger\", vector: [0.8, 0.4, 0.7] },\n { id: \"9\", text: \"priest\", vector: [0.6, 0.2, 0.6] },\n { id: \"4\", text: \"rogue\", vector: [0.7, 0.4, 0.7] },\n];\nlet table = await db.createTable(\"adventurers\", data, { mode: \"overwrite\" });\n";
+export const TsQuickstartCreateTable = "let table = await db.createTable(\"characters\", data, { mode: \"overwrite\" });\n";
-export const TsQuickstartCreateTableNoOverwrite = "table = await db.createTable(\"adventurers\", data);\n";
+export const TsQuickstartCreateTableNoOverwrite = "table = await db.createTable(\"characters\", data);\n";
-export const TsQuickstartOpenTable = "table = await db.openTable(\"adventurers\");\n";
+export const TsQuickstartCurateWithMetadata = "const curated = await table\n .search(queryVector)\n .where(\"stats.magic >= 4\")\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .toArray();\nconsole.table(curated);\n";
+
+export const TsQuickstartData = "const data = [\n {\n id: \"1\",\n name: \"King Arthur\",\n role: \"King\",\n description: \"Leader of Camelot and wielder of Excalibur.\",\n stats: { strength: 4, magic: 1, leadership: 5, wisdom: 4 },\n vector: [0.7, 0.1, 0.9, 0.7],\n },\n {\n id: \"2\",\n name: \"Merlin\",\n role: \"Wizard\",\n description: \"Advisor and prophet with deep magical knowledge.\",\n stats: { strength: 2, magic: 5, leadership: 4, wisdom: 5 },\n vector: [0.2, 0.9, 0.4, 0.9],\n },\n {\n id: \"3\",\n name: \"Sir Lancelot\",\n role: \"Knight\",\n description: \"Legendary knight known for courage and combat skill.\",\n stats: { strength: 5, magic: 1, leadership: 3, wisdom: 3 },\n vector: [0.9, 0.1, 0.5, 0.4],\n },\n];\n";
+
+export const TsQuickstartMultimodalBytes = "const arrow = await import(\"apache-arrow\");\nconst path = await import(\"node:path\");\nconst { readFile } = await import(\"node:fs/promises\");\n\nconst imagePath = path.resolve(\n \"../../docs/static/assets/images/quickstart/sir-lancelot.jpg\",\n);\nconst imageBytes = await readFile(imagePath);\nconst imageSchema = new arrow.Schema([\n new arrow.Field(\"id\", new arrow.Utf8()),\n new arrow.Field(\"description\", new arrow.Utf8()),\n new arrow.Field(\"image\", new arrow.Binary()),\n new arrow.Field(\n \"vector\",\n new arrow.FixedSizeList(\n 4,\n new arrow.Field(\"item\", new arrow.Float32(), true),\n ),\n ),\n]);\nconst imageData = lancedb.makeArrowTable(\n [\n {\n id: \"lancelot\",\n description: \"Portrait of Sir Lancelot\",\n image: imageBytes,\n vector: [0.9, 0.1, 0.5, 0.4],\n },\n ],\n { schema: imageSchema },\n);\nconst multimodalTable = await db.createTable(\n \"character_images\",\n imageData,\n { mode: \"overwrite\" },\n);\n";
+
+export const TsQuickstartOpenTable = "table = await db.openTable(\"characters\");\n";
export const TsQuickstartOutputArray = "result = await table.search(queryVector).limit(2).toArray();\nconsole.table(result);\n";
-export const TsQuickstartVectorSearch1 = "// Let's search for vectors similar to \"warrior\"\nlet queryVector = [0.8, 0.3, 0.8];\n\nlet result = await table.search(queryVector).limit(2).toArray();\nconsole.table(result);\n";
+export const TsQuickstartQueryFeature = "const features = await table\n .query()\n .select([\"name\", \"role\", \"power_score\"])\n .toArray();\nconsole.table(features);\n";
+
+export const TsQuickstartVectorSearch1 = "// Search for examples similar to a \"wise magical advisor\"\nlet queryVector = [0.2, 0.8, 0.4, 0.9];\n\nlet result = await table\n .search(queryVector)\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .toArray();\nconsole.table(result);\n";
+
+export const TsQuickstartVectorSearch2 = "// Search for examples similar to a \"powerful sorceress\"\nqueryVector = [0.3, 0.9, 0.6, 0.8];\n\nconst results = await table.search(queryVector).limit(2).toArray();\nconsole.table(results);\n";
+
+export const RsQuickstartAddFeature = "table\n .add_columns(\n NewColumnTransform::SqlExpressions(vec![(\n \"power_score\".to_string(),\n \"cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)\"\n .to_string(),\n )]),\n None,\n )\n .await\n .unwrap();\n";
-export const TsQuickstartVectorSearch2 = "// Let's search for vectors similar to \"wizard\"\nqueryVector = [0.7, 0.3, 0.5];\n\nconst results = await table.search(queryVector).limit(2).toArray();\nconsole.table(results);\n";
+export const RsQuickstartCreateTable = "let schema = characters_schema();\nlet table = db\n .create_table(\"characters\", characters_to_reader(schema.clone(), &data))\n .mode(CreateTableMode::Overwrite)\n .execute()\n .await\n .unwrap();\n";
-export const RsQuickstartAddData = "let more_data = vec![\n Adventurer {\n id: \"7\".to_string(),\n text: \"mage\".to_string(),\n vector: [0.6, 0.3, 0.4],\n },\n Adventurer {\n id: \"8\".to_string(),\n text: \"bard\".to_string(),\n vector: [0.3, 0.8, 0.4],\n },\n];\n\n// Add data to table\ntable\n .add(adventurers_to_reader(schema.clone(), &more_data))\n .execute()\n .await\n .unwrap();\n";
+export const RsQuickstartCreateTableNoOverwrite = "let table = db\n .create_table(\"characters\", characters_to_reader(schema.clone(), &data))\n .execute()\n .await\n .unwrap();\n";
-export const RsQuickstartCreateTable = "// Define an arrow schema named adventurers_schema beforehand (omitted here for brevity)\nlet schema = adventurers_schema();\nlet data = vec![\n Adventurer {\n id: \"1\".to_string(),\n text: \"knight\".to_string(),\n vector: [0.9, 0.4, 0.8],\n },\n Adventurer {\n id: \"2\".to_string(),\n text: \"ranger\".to_string(),\n vector: [0.8, 0.4, 0.7],\n },\n Adventurer {\n id: \"9\".to_string(),\n text: \"priest\".to_string(),\n vector: [0.6, 0.2, 0.6],\n },\n Adventurer {\n id: \"4\".to_string(),\n text: \"rogue\".to_string(),\n vector: [0.7, 0.4, 0.7],\n },\n];\n// Create a new table with the data, overwriting if it already exists\nlet mut table = db\n .create_table(\"adventurers\", adventurers_to_reader(schema.clone(), &data))\n .mode(CreateTableMode::Overwrite)\n .execute()\n .await\n .unwrap();\n";
+export const RsQuickstartCurateWithMetadata = "let curated: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .only_if(\"stats.magic >= 4\")\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"description\".to_string(),\n \"_distance\".to_string(),\n ]))\n .limit(2)\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{curated:?}\");\n";
-export const RsQuickstartCreateTableNoOverwrite = "table = db\n .create_table(\"adventurers\", adventurers_to_reader(schema.clone(), &data))\n .execute()\n .await\n .unwrap();\n";
+export const RsQuickstartData = "let data = vec![\n Character {\n id: \"1\".to_string(),\n name: \"King Arthur\".to_string(),\n role: \"King\".to_string(),\n description: \"Leader of Camelot and wielder of Excalibur.\".to_string(),\n stats: Stats {\n strength: 4,\n magic: 1,\n leadership: 5,\n wisdom: 4,\n },\n vector: [0.7, 0.1, 0.9, 0.7],\n },\n Character {\n id: \"2\".to_string(),\n name: \"Merlin\".to_string(),\n role: \"Wizard\".to_string(),\n description: \"Advisor and prophet with deep magical knowledge.\".to_string(),\n stats: Stats {\n strength: 2,\n magic: 5,\n leadership: 4,\n wisdom: 5,\n },\n vector: [0.2, 0.9, 0.4, 0.9],\n },\n Character {\n id: \"3\".to_string(),\n name: \"Sir Lancelot\".to_string(),\n role: \"Knight\".to_string(),\n description: \"Legendary knight known for courage and combat skill.\".to_string(),\n stats: Stats {\n strength: 5,\n magic: 1,\n leadership: 3,\n wisdom: 3,\n },\n vector: [0.9, 0.1, 0.5, 0.4],\n },\n];\n";
-export const RsQuickstartDefineStruct = "// Define a struct representing the data schema\n#[derive(Debug, Clone, Serialize, Deserialize)]\nstruct Adventurer {\n id: String,\n text: String,\n vector: [f32; 3],\n}\n\nfn adventurers_schema() -> Arc {\n Arc::new(Schema::new(vec![\n Field::new(\"id\", DataType::LargeUtf8, false),\n Field::new(\"text\", DataType::LargeUtf8, false),\n Field::new(\n \"vector\",\n DataType::FixedSizeList(Arc::new(Field::new(\"item\", DataType::Float32, true)), 3),\n false,\n ),\n ]))\n}\n";
+export const RsQuickstartDefineStruct = "// Define structs representing the data schema\n#[derive(Debug, Clone, Serialize, Deserialize)]\nstruct Stats {\n strength: i8,\n magic: i8,\n leadership: i8,\n wisdom: i8,\n}\n\n#[derive(Debug, Clone, Serialize, Deserialize)]\nstruct Character {\n id: String,\n name: String,\n role: String,\n description: String,\n stats: Stats,\n vector: [f32; 4],\n}\n\nfn characters_schema() -> Arc {\n Arc::new(Schema::new(vec![\n Field::new(\"id\", DataType::LargeUtf8, false),\n Field::new(\"name\", DataType::LargeUtf8, false),\n Field::new(\"role\", DataType::LargeUtf8, false),\n Field::new(\"description\", DataType::LargeUtf8, false),\n Field::new(\n \"stats\",\n DataType::Struct(arrow_schema::Fields::from(vec![\n Arc::new(Field::new(\"strength\", DataType::Int8, false)),\n Arc::new(Field::new(\"magic\", DataType::Int8, false)),\n Arc::new(Field::new(\"leadership\", DataType::Int8, false)),\n Arc::new(Field::new(\"wisdom\", DataType::Int8, false)),\n ])),\n false,\n ),\n Field::new(\n \"vector\",\n DataType::FixedSizeList(Arc::new(Field::new(\"item\", DataType::Float32, true)), 4),\n false,\n ),\n ]))\n}\n";
-export const RsQuickstartOpenTable = "let table: Table = db.open_table(\"adventurers\").execute().await.unwrap();\n";
+export const RsQuickstartMultimodalBytes = "use std::sync::Arc;\n\nuse arrow_array::{\n BinaryArray, FixedSizeListArray, LargeStringArray, RecordBatch, RecordBatchIterator,\n};\nuse arrow_schema::{DataType, Field, Schema};\n\nlet image_path = std::path::Path::new(env!(\"CARGO_MANIFEST_DIR\"))\n .join(\"../../docs/static/assets/images/quickstart/sir-lancelot.jpg\");\nlet image_bytes = std::fs::read(image_path).unwrap();\n\nlet image_schema = Arc::new(Schema::new(vec![\n Field::new(\"id\", DataType::LargeUtf8, false),\n Field::new(\"description\", DataType::LargeUtf8, false),\n Field::new(\"image\", DataType::Binary, false),\n Field::new(\n \"vector\",\n DataType::FixedSizeList(Arc::new(Field::new(\"item\", DataType::Float32, true)), 4),\n false,\n ),\n]));\nlet image_vectors = [[0.9_f32, 0.1, 0.5, 0.4]];\nlet image_batch = RecordBatch::try_new(\n image_schema.clone(),\n vec![\n Arc::new(LargeStringArray::from_iter_values([\"lancelot\"])),\n Arc::new(LargeStringArray::from_iter_values([\n \"Portrait of Sir Lancelot\",\n ])),\n Arc::new(BinaryArray::from_iter_values([image_bytes.as_slice()])),\n Arc::new(\n FixedSizeListArray::from_iter_primitive::(\n image_vectors\n .iter()\n .map(|vector| Some(vector.iter().copied().map(Some).collect::>())),\n 4,\n ),\n ),\n ],\n)\n.unwrap();\nlet image_reader: Box = Box::new(\n RecordBatchIterator::new(vec![Ok(image_batch)].into_iter(), image_schema),\n);\nlet multimodal_table = db\n .create_table(\"character_images\", image_reader)\n .mode(CreateTableMode::Overwrite)\n .execute()\n .await\n .unwrap();\n";
-export const RsQuickstartOutputArray = "let result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .limit(2)\n .select(Select::Columns(vec![\"text\".to_string()]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\nlet text_col = result.column(\"text\").unwrap().str().unwrap();\nlet top_two = vec![\n text_col.get(0).unwrap().to_string(),\n text_col.get(1).unwrap().to_string(),\n];\n";
+export const RsQuickstartOutputArray = "let result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"description\".to_string(),\n \"_distance\".to_string(),\n ]))\n .limit(2)\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\n";
-export const RsQuickstartVectorSearch1 = "// Let's search for vectors similar to \"warrior\"\nlet query_vector = [0.8, 0.3, 0.8];\n\nlet result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .limit(2)\n .select(Select::Columns(vec![\"text\".to_string()]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\n";
+export const RsQuickstartQueryFeature = "let features: DataFrame = table\n .query()\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"power_score\".to_string(),\n ]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{features:?}\");\n";
-export const RsQuickstartVectorSearch2 = "// Let's search for vectors similar to \"wizard\"\nlet query_vector = [0.7, 0.3, 0.5];\n\nlet result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .limit(2)\n .select(Select::Columns(vec![\"text\".to_string()]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\nlet text_col = result.column(\"text\").unwrap().str().unwrap();\nlet top_two = vec![\n text_col.get(0).unwrap().to_string(),\n text_col.get(1).unwrap().to_string(),\n];\n";
+export const RsQuickstartVectorSearch1 = "// Search for examples similar to a \"wise magical advisor\"\nlet query_vector = [0.2, 0.8, 0.4, 0.9];\n\nlet result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"description\".to_string(),\n \"_distance\".to_string(),\n ]))\n .limit(2)\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\n";
diff --git a/docs/static/assets/images/quickstart/sir-lancelot.jpg b/docs/static/assets/images/quickstart/sir-lancelot.jpg
new file mode 100644
index 00000000..e987d60e
Binary files /dev/null and b/docs/static/assets/images/quickstart/sir-lancelot.jpg differ
diff --git a/docs/storage/configuration.mdx b/docs/storage/configuration.mdx
index 0a00be9f..0c9a0d8f 100644
--- a/docs/storage/configuration.mdx
+++ b/docs/storage/configuration.mdx
@@ -36,7 +36,7 @@ When using LanceDB OSS, you can choose where to store your data. The tradeoffs b
**LanceDB Enterprise storage configuration**
-In LanceDB Enterprise, you connect with `db://...` and the cluster owns the storage credentials, so `storage_options` are not passed at runtime. Cloud auth is set at deployment time. For federated databases, the namespace service vends per-request credentials automatically. See the [Enterprise quickstart](/enterprise/quickstart) and the [Azure deployment guide](/enterprise/deployment/azure) for the Enterprise flow.
+In LanceDB Enterprise, you connect with `db://...` and the cluster owns the storage credentials, so `storage_options` are not passed at runtime. Cloud auth is set at deployment time. For federated databases, the namespace service vends per-request credentials automatically. See the [quickstart](/quickstart), [Enterprise overview](/enterprise/), and [Azure deployment guide](/enterprise/deployment/azure) for the Enterprise flow.
## Object stores
diff --git a/tests/py/test_quickstart.py b/tests/py/test_quickstart.py
index 58d25c28..207e6540 100644
--- a/tests/py/test_quickstart.py
+++ b/tests/py/test_quickstart.py
@@ -4,68 +4,151 @@
import lancedb
import pytest
+
def test_quickstart(db_path_factory):
- uri = "quickstart_db"
uri = db_path_factory("quickstart_db")
db = lancedb.connect(uri)
- # --8<-- [start:quickstart_create_table]
+ # --8<-- [start:quickstart_data]
data = [
- {"id": "1", "text": "knight", "vector": [0.9, 0.4, 0.8]},
- {"id": "2", "text": "ranger", "vector": [0.8, 0.4, 0.7]},
- {"id": "9", "text": "priest", "vector": [0.6, 0.2, 0.6]},
- {"id": "4", "text": "rogue", "vector": [0.7, 0.4, 0.7]},
+ {
+ "id": "1",
+ "name": "King Arthur",
+ "role": "King",
+ "description": "Leader of Camelot and wielder of Excalibur.",
+ "stats": {"strength": 4, "magic": 1, "leadership": 5, "wisdom": 4},
+ "vector": [0.7, 0.1, 0.9, 0.7],
+ },
+ {
+ "id": "2",
+ "name": "Merlin",
+ "role": "Wizard",
+ "description": "Advisor and prophet with deep magical knowledge.",
+ "stats": {"strength": 2, "magic": 5, "leadership": 4, "wisdom": 5},
+ "vector": [0.2, 0.9, 0.4, 0.9],
+ },
+ {
+ "id": "3",
+ "name": "Sir Lancelot",
+ "role": "Knight",
+ "description": "Legendary knight known for courage and combat skill.",
+ "stats": {"strength": 5, "magic": 1, "leadership": 3, "wisdom": 3},
+ "vector": [0.9, 0.1, 0.5, 0.4],
+ },
]
- table = db.create_table("adventurers", data=data, mode="overwrite")
+ # --8<-- [end:quickstart_data]
+
+ # --8<-- [start:quickstart_create_table]
+ table = db.create_table("characters", data=data, mode="overwrite")
# --8<-- [end:quickstart_create_table]
- assert len(table) == 4
+ assert len(table) == 3
# Drop the table to test create without overwrite
- db.drop_table("adventurers")
+ db.drop_table("characters")
# --8<-- [start:quickstart_create_table_no_overwrite]
- table = db.create_table("adventurers", data=data)
+ table = db.create_table("characters", data=data)
# --8<-- [end:quickstart_create_table_no_overwrite]
- assert len(table) == 4
+ assert len(table) == 3
# --8<-- [start:quickstart_vector_search_1]
- # Let's search for vectors similar to "warrior"
- query_vector = [0.8, 0.3, 0.8]
+ # Search for examples similar to a "wise magical advisor"
+ query_vector = [0.2, 0.8, 0.4, 0.9]
# Ensure you run `pip install polars` beforehand
- result = table.search(query_vector).limit(2).to_polars()
+ result = (
+ table.search(query_vector)
+ .select(["name", "role", "description", "_distance"])
+ .limit(2)
+ .to_polars()
+ )
print(result)
# --8<-- [end:quickstart_vector_search_1]
- assert result.head(1)["text"][0] == "knight"
+ assert result.head(1)["name"][0] == "Merlin"
+
+ # --8<-- [start:quickstart_curate_with_metadata]
+ curated = (
+ table.search(query_vector)
+ .where("stats.magic >= 4")
+ .select(["name", "role", "description", "_distance"])
+ .limit(2)
+ .to_polars()
+ )
+ print(curated)
+ # --8<-- [end:quickstart_curate_with_metadata]
+ assert curated.head(1)["name"][0] == "Merlin"
# --8<-- [start:quickstart_output_pandas]
# Ensure you run `pip install pandas` beforehand
result = table.search(query_vector).limit(2).to_pandas()
print(result)
# --8<-- [end:quickstart_output_pandas]
- assert result.iloc[0]["text"] == "knight"
+ assert result.iloc[0]["name"] == "Merlin"
+
+ # --8<-- [start:quickstart_add_feature]
+ table.add_columns(
+ {
+ "power_score": "cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)"
+ }
+ )
+ # --8<-- [end:quickstart_add_feature]
+ assert "power_score" in table.schema.names
+
+ # --8<-- [start:quickstart_query_feature]
+ features = table.search().select(["name", "role", "power_score"]).to_polars()
+ print(features)
+ # --8<-- [end:quickstart_query_feature]
+ assert "power_score" in features.columns
+
+ # --8<-- [start:quickstart_multimodal_bytes]
+ from pathlib import Path
+
+ image_path = Path("docs/static/assets/images/quickstart/sir-lancelot.jpg")
+ image_bytes = image_path.read_bytes()
+
+ multimodal_table = db.create_table(
+ "character_images",
+ data=[
+ {
+ "id": "lancelot",
+ "description": "Portrait of Sir Lancelot",
+ "image": image_bytes,
+ "vector": [0.9, 0.1, 0.5, 0.4],
+ }
+ ],
+ mode="overwrite",
+ )
+ # --8<-- [end:quickstart_multimodal_bytes]
+ assert len(multimodal_table) == 1
# --8<-- [start:quickstart_open_table]
- table = db.open_table("adventurers")
+ table = db.open_table("characters")
# --8<-- [end:quickstart_open_table]
# --8<-- [start:quickstart_add_data]
more_data = [
- {"id": "7", "text": "mage", "vector": [0.6, 0.3, 0.4]},
- {"id": "8", "text": "bard", "vector": [0.3, 0.8, 0.4]},
+ {
+ "id": "4",
+ "name": "Morgana",
+ "role": "Sorceress",
+ "description": "Powerful sorceress of Avalon.",
+ "stats": {"strength": 2, "magic": 5, "leadership": 4, "wisdom": 4},
+ "vector": [0.3, 0.9, 0.6, 0.8],
+ "power_score": 3.75,
+ },
]
# Add data to table
table.add(more_data)
# --8<-- [end:quickstart_add_data]
- assert len(table) == 6
+ assert len(table) == 4
# --8<-- [start:quickstart_vector_search_2]
- # Let's search for vectors similar to "wizard"
- query_vector = [0.7, 0.3, 0.5]
+ # Search for examples similar to a "powerful sorceress"
+ query_vector = [0.3, 0.9, 0.6, 0.8]
results = table.search(query_vector).limit(2).to_polars()
print(results)
# --8<-- [end:quickstart_vector_search_2]
- assert results.head(1)["text"][0] == "mage"
+ assert results.head(1)["name"][0] == "Morgana"
@pytest.mark.asyncio
@@ -74,28 +157,52 @@ async def test_quickstart_async_api(db_path_factory):
import lancedb
async_db = await lancedb.connect_async(db_uri)
+ # --8<-- [start:quickstart_data_async]
data = [
- {"id": "1", "text": "knight", "vector": [0.9, 0.4, 0.8]},
- {"id": "2", "text": "ranger", "vector": [0.8, 0.4, 0.7]},
- {"id": "9", "text": "priest", "vector": [0.6, 0.2, 0.6]},
- {"id": "4", "text": "rogue", "vector": [0.7, 0.4, 0.7]},
+ {
+ "id": "1",
+ "name": "King Arthur",
+ "role": "King",
+ "description": "Leader of Camelot and wielder of Excalibur.",
+ "stats": {"strength": 4, "magic": 1, "leadership": 5, "wisdom": 4},
+ "vector": [0.7, 0.1, 0.9, 0.7],
+ },
+ {
+ "id": "2",
+ "name": "Merlin",
+ "role": "Wizard",
+ "description": "Advisor and prophet with deep magical knowledge.",
+ "stats": {"strength": 2, "magic": 5, "leadership": 4, "wisdom": 5},
+ "vector": [0.2, 0.9, 0.4, 0.9],
+ },
+ {
+ "id": "3",
+ "name": "Sir Lancelot",
+ "role": "Knight",
+ "description": "Legendary knight known for courage and combat skill.",
+ "stats": {"strength": 5, "magic": 1, "leadership": 3, "wisdom": 3},
+ "vector": [0.9, 0.1, 0.5, 0.4],
+ },
]
+ # --8<-- [end:quickstart_data_async]
# --8<-- [start:quickstart_create_table_async]
async_table = await async_db.create_table(
- "adventurers",
+ "characters",
data=data,
mode="overwrite",
)
# --8<-- [end:quickstart_create_table_async]
# --8<-- [start:quickstart_vector_search_1_async]
- # Let's search for vectors similar to "warrior"
- query_vector = [0.8, 0.3, 0.8]
+ # Search for examples similar to a "wise magical advisor"
+ query_vector = [0.2, 0.8, 0.4, 0.9]
# Ensure you run `pip install polars` beforehand
- async_result = await (await async_table.search(query_vector)).limit(2).to_polars()
+ async_result = await (
+ await async_table.search(query_vector)
+ ).select(["name", "role", "description", "_distance"]).limit(2).to_polars()
print(async_result)
# --8<-- [end:quickstart_vector_search_1_async]
- assert async_result.head(1)["text"][0] == "knight"
+ assert async_result.head(1)["name"][0] == "Merlin"
diff --git a/tests/rs/quickstart.rs b/tests/rs/quickstart.rs
index ebe4dffc..bc966aab 100644
--- a/tests/rs/quickstart.rs
+++ b/tests/rs/quickstart.rs
@@ -4,31 +4,56 @@
use std::sync::Arc;
use arrow_array::types::Float32Type;
-use arrow_array::{FixedSizeListArray, LargeStringArray, RecordBatch, RecordBatchIterator};
-use arrow_schema::{DataType, Field, Schema};
+use arrow_array::{
+ FixedSizeListArray, Int8Array, LargeStringArray, RecordBatch, RecordBatchIterator, StructArray,
+};
+use arrow_schema::{DataType, Field, FieldRef, Schema};
use lancedb::arrow::IntoPolars;
use lancedb::database::CreateTableMode;
use lancedb::query::{ExecutableQuery, QueryBase, Select};
-use lancedb::{connect, table::Table};
+use lancedb::{connect, table::NewColumnTransform};
use polars::prelude::DataFrame;
use serde::{Deserialize, Serialize};
// --8<-- [start:quickstart_define_struct]
-// Define a struct representing the data schema
+// Define structs representing the data schema
#[derive(Debug, Clone, Serialize, Deserialize)]
-struct Adventurer {
+struct Stats {
+ strength: i8,
+ magic: i8,
+ leadership: i8,
+ wisdom: i8,
+}
+
+#[derive(Debug, Clone, Serialize, Deserialize)]
+struct Character {
id: String,
- text: String,
- vector: [f32; 3],
+ name: String,
+ role: String,
+ description: String,
+ stats: Stats,
+ vector: [f32; 4],
}
-fn adventurers_schema() -> Arc {
+fn characters_schema() -> Arc {
Arc::new(Schema::new(vec![
Field::new("id", DataType::LargeUtf8, false),
- Field::new("text", DataType::LargeUtf8, false),
+ Field::new("name", DataType::LargeUtf8, false),
+ Field::new("role", DataType::LargeUtf8, false),
+ Field::new("description", DataType::LargeUtf8, false),
+ Field::new(
+ "stats",
+ DataType::Struct(arrow_schema::Fields::from(vec![
+ Arc::new(Field::new("strength", DataType::Int8, false)),
+ Arc::new(Field::new("magic", DataType::Int8, false)),
+ Arc::new(Field::new("leadership", DataType::Int8, false)),
+ Arc::new(Field::new("wisdom", DataType::Int8, false)),
+ ])),
+ false,
+ ),
Field::new(
"vector",
- DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 3),
+ DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4),
false,
),
]))
@@ -37,22 +62,57 @@ fn adventurers_schema() -> Arc {
type BatchIter = Box;
-fn adventurers_to_reader(schema: Arc, rows: &[Adventurer]) -> BatchIter {
+fn characters_to_reader(schema: Arc, rows: &[Character]) -> BatchIter {
let ids = LargeStringArray::from_iter_values(rows.iter().map(|row| row.id.as_str()));
- let texts = LargeStringArray::from_iter_values(rows.iter().map(|row| row.text.as_str()));
+ let names = LargeStringArray::from_iter_values(rows.iter().map(|row| row.name.as_str()));
+ let roles = LargeStringArray::from_iter_values(rows.iter().map(|row| row.role.as_str()));
+ let descriptions =
+ LargeStringArray::from_iter_values(rows.iter().map(|row| row.description.as_str()));
+
+ let strength = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.strength));
+ let magic = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.magic));
+ let leadership = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.leadership));
+ let wisdom = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.wisdom));
+ let stats_fields: Vec = vec![
+ Arc::new(Field::new("strength", DataType::Int8, false)),
+ Arc::new(Field::new("magic", DataType::Int8, false)),
+ Arc::new(Field::new("leadership", DataType::Int8, false)),
+ Arc::new(Field::new("wisdom", DataType::Int8, false)),
+ ];
+ let stats = StructArray::new(
+ stats_fields.into(),
+ vec![
+ Arc::new(strength),
+ Arc::new(magic),
+ Arc::new(leadership),
+ Arc::new(wisdom),
+ ],
+ None,
+ );
+
let vectors = FixedSizeListArray::from_iter_primitive::(
rows.iter()
.map(|row| Some(row.vector.iter().copied().map(Some).collect::>())),
- 3,
+ 4,
);
let batch = RecordBatch::try_new(
schema.clone(),
- vec![Arc::new(ids), Arc::new(texts), Arc::new(vectors)],
+ vec![
+ Arc::new(ids),
+ Arc::new(names),
+ Arc::new(roles),
+ Arc::new(descriptions),
+ Arc::new(stats),
+ Arc::new(vectors),
+ ],
)
.unwrap();
- Box::new(RecordBatchIterator::new(vec![Ok(batch)].into_iter(), schema))
+ Box::new(RecordBatchIterator::new(
+ vec![Ok(batch)].into_iter(),
+ schema,
+ ))
}
#[tokio::main]
@@ -61,61 +121,86 @@ async fn main() {
let uri = temp_dir.path().to_str().unwrap();
let db = connect(uri).execute().await.unwrap();
- // --8<-- [start:quickstart_create_table]
- // Define an arrow schema named adventurers_schema beforehand (omitted here for brevity)
- let schema = adventurers_schema();
+ // --8<-- [start:quickstart_data]
let data = vec![
- Adventurer {
+ Character {
id: "1".to_string(),
- text: "knight".to_string(),
- vector: [0.9, 0.4, 0.8],
+ name: "King Arthur".to_string(),
+ role: "King".to_string(),
+ description: "Leader of Camelot and wielder of Excalibur.".to_string(),
+ stats: Stats {
+ strength: 4,
+ magic: 1,
+ leadership: 5,
+ wisdom: 4,
+ },
+ vector: [0.7, 0.1, 0.9, 0.7],
},
- Adventurer {
+ Character {
id: "2".to_string(),
- text: "ranger".to_string(),
- vector: [0.8, 0.4, 0.7],
+ name: "Merlin".to_string(),
+ role: "Wizard".to_string(),
+ description: "Advisor and prophet with deep magical knowledge.".to_string(),
+ stats: Stats {
+ strength: 2,
+ magic: 5,
+ leadership: 4,
+ wisdom: 5,
+ },
+ vector: [0.2, 0.9, 0.4, 0.9],
},
- Adventurer {
- id: "9".to_string(),
- text: "priest".to_string(),
- vector: [0.6, 0.2, 0.6],
- },
- Adventurer {
- id: "4".to_string(),
- text: "rogue".to_string(),
- vector: [0.7, 0.4, 0.7],
+ Character {
+ id: "3".to_string(),
+ name: "Sir Lancelot".to_string(),
+ role: "Knight".to_string(),
+ description: "Legendary knight known for courage and combat skill.".to_string(),
+ stats: Stats {
+ strength: 5,
+ magic: 1,
+ leadership: 3,
+ wisdom: 3,
+ },
+ vector: [0.9, 0.1, 0.5, 0.4],
},
];
- // Create a new table with the data, overwriting if it already exists
- let mut table = db
- .create_table("adventurers", adventurers_to_reader(schema.clone(), &data))
+ // --8<-- [end:quickstart_data]
+
+ // --8<-- [start:quickstart_create_table]
+ let schema = characters_schema();
+ let table = db
+ .create_table("characters", characters_to_reader(schema.clone(), &data))
.mode(CreateTableMode::Overwrite)
.execute()
.await
.unwrap();
// --8<-- [end:quickstart_create_table]
- assert_eq!(table.count_rows(None).await.unwrap(), 4);
- db.drop_table("adventurers", &[]).await.unwrap();
+ assert_eq!(table.count_rows(None).await.unwrap(), 3);
+ db.drop_table("characters", &[]).await.unwrap();
// --8<-- [start:quickstart_create_table_no_overwrite]
- table = db
- .create_table("adventurers", adventurers_to_reader(schema.clone(), &data))
+ let table = db
+ .create_table("characters", characters_to_reader(schema.clone(), &data))
.execute()
.await
.unwrap();
// --8<-- [end:quickstart_create_table_no_overwrite]
- assert_eq!(table.count_rows(None).await.unwrap(), 4);
+ assert_eq!(table.count_rows(None).await.unwrap(), 3);
// --8<-- [start:quickstart_vector_search_1]
- // Let's search for vectors similar to "warrior"
- let query_vector = [0.8, 0.3, 0.8];
+ // Search for examples similar to a "wise magical advisor"
+ let query_vector = [0.2, 0.8, 0.4, 0.9];
let result: DataFrame = table
.query()
.nearest_to(&query_vector)
.unwrap()
+ .select(Select::Columns(vec![
+ "name".to_string(),
+ "role".to_string(),
+ "description".to_string(),
+ "_distance".to_string(),
+ ]))
.limit(2)
- .select(Select::Columns(vec!["text".to_string()]))
.execute()
.await
.unwrap()
@@ -124,16 +209,45 @@ async fn main() {
.unwrap();
println!("{result:?}");
// --8<-- [end:quickstart_vector_search_1]
- let text_col = result.column("text").unwrap().str().unwrap();
- assert_eq!(text_col.get(0).unwrap(), "knight");
+ let name_col = result.column("name").unwrap().str().unwrap();
+ assert_eq!(name_col.get(0).unwrap(), "Merlin");
+
+ // --8<-- [start:quickstart_curate_with_metadata]
+ let curated: DataFrame = table
+ .query()
+ .nearest_to(&query_vector)
+ .unwrap()
+ .only_if("stats.magic >= 4")
+ .select(Select::Columns(vec![
+ "name".to_string(),
+ "role".to_string(),
+ "description".to_string(),
+ "_distance".to_string(),
+ ]))
+ .limit(2)
+ .execute()
+ .await
+ .unwrap()
+ .into_polars()
+ .await
+ .unwrap();
+ println!("{curated:?}");
+ // --8<-- [end:quickstart_curate_with_metadata]
+ let curated_name_col = curated.column("name").unwrap().str().unwrap();
+ assert_eq!(curated_name_col.get(0).unwrap(), "Merlin");
// --8<-- [start:quickstart_output_array]
let result: DataFrame = table
.query()
.nearest_to(&query_vector)
.unwrap()
+ .select(Select::Columns(vec![
+ "name".to_string(),
+ "role".to_string(),
+ "description".to_string(),
+ "_distance".to_string(),
+ ]))
.limit(2)
- .select(Select::Columns(vec!["text".to_string()]))
.execute()
.await
.unwrap()
@@ -141,63 +255,93 @@ async fn main() {
.await
.unwrap();
println!("{result:?}");
- let text_col = result.column("text").unwrap().str().unwrap();
- let top_two = vec![
- text_col.get(0).unwrap().to_string(),
- text_col.get(1).unwrap().to_string(),
- ];
// --8<-- [end:quickstart_output_array]
- assert_eq!(top_two[0], "knight");
-
- // --8<-- [start:quickstart_open_table]
- let table: Table = db.open_table("adventurers").execute().await.unwrap();
- // --8<-- [end:quickstart_open_table]
-
- // --8<-- [start:quickstart_add_data]
- let more_data = vec![
- Adventurer {
- id: "7".to_string(),
- text: "mage".to_string(),
- vector: [0.6, 0.3, 0.4],
- },
- Adventurer {
- id: "8".to_string(),
- text: "bard".to_string(),
- vector: [0.3, 0.8, 0.4],
- },
- ];
+ let name_col = result.column("name").unwrap().str().unwrap();
+ assert_eq!(name_col.get(0).unwrap(), "Merlin");
- // Add data to table
+ // --8<-- [start:quickstart_add_feature]
table
- .add(adventurers_to_reader(schema.clone(), &more_data))
- .execute()
+ .add_columns(
+ NewColumnTransform::SqlExpressions(vec![(
+ "power_score".to_string(),
+ "cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)"
+ .to_string(),
+ )]),
+ None,
+ )
.await
.unwrap();
- // --8<-- [end:quickstart_add_data]
- assert_eq!(table.count_rows(None).await.unwrap(), 6);
-
- // --8<-- [start:quickstart_vector_search_2]
- // Let's search for vectors similar to "wizard"
- let query_vector = [0.7, 0.3, 0.5];
+ // --8<-- [end:quickstart_add_feature]
- let result: DataFrame = table
+ // --8<-- [start:quickstart_query_feature]
+ let features: DataFrame = table
.query()
- .nearest_to(&query_vector)
- .unwrap()
- .limit(2)
- .select(Select::Columns(vec!["text".to_string()]))
+ .select(Select::Columns(vec![
+ "name".to_string(),
+ "role".to_string(),
+ "power_score".to_string(),
+ ]))
.execute()
.await
.unwrap()
.into_polars()
.await
.unwrap();
- println!("{result:?}");
- let text_col = result.column("text").unwrap().str().unwrap();
- let top_two = vec![
- text_col.get(0).unwrap().to_string(),
- text_col.get(1).unwrap().to_string(),
- ];
- // --8<-- [end:quickstart_vector_search_2]
- assert_eq!(top_two[0], "mage");
+ println!("{features:?}");
+ // --8<-- [end:quickstart_query_feature]
+ assert!(features.column("power_score").is_ok());
+
+ // --8<-- [start:quickstart_multimodal_bytes]
+ use std::sync::Arc;
+
+ use arrow_array::{
+ BinaryArray, FixedSizeListArray, LargeStringArray, RecordBatch, RecordBatchIterator,
+ };
+ use arrow_schema::{DataType, Field, Schema};
+
+ let image_path = std::path::Path::new(env!("CARGO_MANIFEST_DIR"))
+ .join("../../docs/static/assets/images/quickstart/sir-lancelot.jpg");
+ let image_bytes = std::fs::read(image_path).unwrap();
+
+ let image_schema = Arc::new(Schema::new(vec![
+ Field::new("id", DataType::LargeUtf8, false),
+ Field::new("description", DataType::LargeUtf8, false),
+ Field::new("image", DataType::Binary, false),
+ Field::new(
+ "vector",
+ DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4),
+ false,
+ ),
+ ]));
+ let image_vectors = [[0.9_f32, 0.1, 0.5, 0.4]];
+ let image_batch = RecordBatch::try_new(
+ image_schema.clone(),
+ vec![
+ Arc::new(LargeStringArray::from_iter_values(["lancelot"])),
+ Arc::new(LargeStringArray::from_iter_values([
+ "Portrait of Sir Lancelot",
+ ])),
+ Arc::new(BinaryArray::from_iter_values([image_bytes.as_slice()])),
+ Arc::new(
+ FixedSizeListArray::from_iter_primitive::(
+ image_vectors
+ .iter()
+ .map(|vector| Some(vector.iter().copied().map(Some).collect::>())),
+ 4,
+ ),
+ ),
+ ],
+ )
+ .unwrap();
+ let image_reader: Box = Box::new(
+ RecordBatchIterator::new(vec![Ok(image_batch)].into_iter(), image_schema),
+ );
+ let multimodal_table = db
+ .create_table("character_images", image_reader)
+ .mode(CreateTableMode::Overwrite)
+ .execute()
+ .await
+ .unwrap();
+ // --8<-- [end:quickstart_multimodal_bytes]
+ assert_eq!(multimodal_table.count_rows(None).await.unwrap(), 1);
}
diff --git a/tests/ts/quickstart.test.ts b/tests/ts/quickstart.test.ts
index 655ac8c5..ecd5c099 100644
--- a/tests/ts/quickstart.test.ts
+++ b/tests/ts/quickstart.test.ts
@@ -8,60 +8,168 @@ test("quickstart example (async)", async () => {
await withTempDirectory(async (databaseDir) => {
const db = await lancedb.connect(databaseDir);
- // --8<-- [start:quickstart_create_table]
+ // --8<-- [start:quickstart_data]
const data = [
- { id: "1", text: "knight", vector: [0.9, 0.4, 0.8] },
- { id: "2", text: "ranger", vector: [0.8, 0.4, 0.7] },
- { id: "9", text: "priest", vector: [0.6, 0.2, 0.6] },
- { id: "4", text: "rogue", vector: [0.7, 0.4, 0.7] },
+ {
+ id: "1",
+ name: "King Arthur",
+ role: "King",
+ description: "Leader of Camelot and wielder of Excalibur.",
+ stats: { strength: 4, magic: 1, leadership: 5, wisdom: 4 },
+ vector: [0.7, 0.1, 0.9, 0.7],
+ },
+ {
+ id: "2",
+ name: "Merlin",
+ role: "Wizard",
+ description: "Advisor and prophet with deep magical knowledge.",
+ stats: { strength: 2, magic: 5, leadership: 4, wisdom: 5 },
+ vector: [0.2, 0.9, 0.4, 0.9],
+ },
+ {
+ id: "3",
+ name: "Sir Lancelot",
+ role: "Knight",
+ description: "Legendary knight known for courage and combat skill.",
+ stats: { strength: 5, magic: 1, leadership: 3, wisdom: 3 },
+ vector: [0.9, 0.1, 0.5, 0.4],
+ },
];
- let table = await db.createTable("adventurers", data, { mode: "overwrite" });
+ // --8<-- [end:quickstart_data]
+
+ // --8<-- [start:quickstart_create_table]
+ let table = await db.createTable("characters", data, { mode: "overwrite" });
// --8<-- [end:quickstart_create_table]
- expect(await table.countRows()).toBe(4);
- await db.dropTable("adventurers");
+ expect(await table.countRows()).toBe(3);
+ await db.dropTable("characters");
// --8<-- [start:quickstart_create_table_no_overwrite]
- table = await db.createTable("adventurers", data);
+ table = await db.createTable("characters", data);
// --8<-- [end:quickstart_create_table_no_overwrite]
- expect(await table.countRows()).toBe(4);
+ expect(await table.countRows()).toBe(3);
// --8<-- [start:quickstart_vector_search_1]
- // Let's search for vectors similar to "warrior"
- let queryVector = [0.8, 0.3, 0.8];
+ // Search for examples similar to a "wise magical advisor"
+ let queryVector = [0.2, 0.8, 0.4, 0.9];
- let result = await table.search(queryVector).limit(2).toArray();
+ let result = await table
+ .search(queryVector)
+ .select(["name", "role", "description", "_distance"])
+ .limit(2)
+ .toArray();
console.table(result);
// --8<-- [end:quickstart_vector_search_1]
- expect(result[0].text).toBe("knight");
+ expect(result[0].name).toBe("Merlin");
+
+ // --8<-- [start:quickstart_curate_with_metadata]
+ const curated = await table
+ .search(queryVector)
+ .where("stats.magic >= 4")
+ .select(["name", "role", "description", "_distance"])
+ .limit(2)
+ .toArray();
+ console.table(curated);
+ // --8<-- [end:quickstart_curate_with_metadata]
+ expect(curated[0].name).toBe("Merlin");
// --8<-- [start:quickstart_output_array]
result = await table.search(queryVector).limit(2).toArray();
console.table(result);
// --8<-- [end:quickstart_output_array]
- expect(result[0].text).toBe("knight");
+ expect(result[0].name).toBe("Merlin");
+
+ // --8<-- [start:quickstart_add_feature]
+ await table.addColumns([
+ {
+ name: "power_score",
+ valueSql:
+ "cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)",
+ },
+ ]);
+ // --8<-- [end:quickstart_add_feature]
+ const schemaWithFeature = await table.schema();
+ expect(schemaWithFeature.fields.some((f) => f.name === "power_score")).toBe(
+ true,
+ );
+
+ // --8<-- [start:quickstart_query_feature]
+ const features = await table
+ .query()
+ .select(["name", "role", "power_score"])
+ .toArray();
+ console.table(features);
+ // --8<-- [end:quickstart_query_feature]
+ expect(features[0]).toHaveProperty("power_score");
+
+ // --8<-- [start:quickstart_multimodal_bytes]
+ const arrow = await import("apache-arrow");
+ const path = await import("node:path");
+ const { readFile } = await import("node:fs/promises");
+
+ const imagePath = path.resolve(
+ "../../docs/static/assets/images/quickstart/sir-lancelot.jpg",
+ );
+ const imageBytes = await readFile(imagePath);
+ const imageSchema = new arrow.Schema([
+ new arrow.Field("id", new arrow.Utf8()),
+ new arrow.Field("description", new arrow.Utf8()),
+ new arrow.Field("image", new arrow.Binary()),
+ new arrow.Field(
+ "vector",
+ new arrow.FixedSizeList(
+ 4,
+ new arrow.Field("item", new arrow.Float32(), true),
+ ),
+ ),
+ ]);
+ const imageData = lancedb.makeArrowTable(
+ [
+ {
+ id: "lancelot",
+ description: "Portrait of Sir Lancelot",
+ image: imageBytes,
+ vector: [0.9, 0.1, 0.5, 0.4],
+ },
+ ],
+ { schema: imageSchema },
+ );
+ const multimodalTable = await db.createTable(
+ "character_images",
+ imageData,
+ { mode: "overwrite" },
+ );
+ // --8<-- [end:quickstart_multimodal_bytes]
+ expect(await multimodalTable.countRows()).toBe(1);
// --8<-- [start:quickstart_open_table]
- table = await db.openTable("adventurers");
+ table = await db.openTable("characters");
// --8<-- [end:quickstart_open_table]
// --8<-- [start:quickstart_add_data]
const moreData = [
- { id: "7", text: "mage", vector: [0.6, 0.3, 0.4] },
- { id: "8", text: "bard", vector: [0.3, 0.8, 0.4] },
+ {
+ id: "4",
+ name: "Morgana",
+ role: "Sorceress",
+ description: "Powerful sorceress of Avalon.",
+ stats: { strength: 2, magic: 5, leadership: 4, wisdom: 4 },
+ vector: [0.3, 0.9, 0.6, 0.8],
+ power_score: 3.75,
+ },
];
// Add data to table
await table.add(moreData);
// --8<-- [end:quickstart_add_data]
- expect(await table.countRows()).toBe(6);
+ expect(await table.countRows()).toBe(4);
// --8<-- [start:quickstart_vector_search_2]
- // Let's search for vectors similar to "wizard"
- queryVector = [0.7, 0.3, 0.5];
+ // Search for examples similar to a "powerful sorceress"
+ queryVector = [0.3, 0.9, 0.6, 0.8];
const results = await table.search(queryVector).limit(2).toArray();
console.table(results);
// --8<-- [end:quickstart_vector_search_2]
- expect(results[0].text).toBe("mage");
+ expect(results[0].name).toBe("Morgana");
});
});