diff --git a/docs/docs.json b/docs/docs.json index 7d1519f7..87db70bc 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -47,7 +47,6 @@ "group": "LanceDB Enterprise", "pages": [ "enterprise/index", - "enterprise/quickstart", "enterprise/architecture", "enterprise/security", "enterprise/benchmarks", diff --git a/docs/enterprise/architecture.mdx b/docs/enterprise/architecture.mdx index c42f547c..e4c9efa5 100644 --- a/docs/enterprise/architecture.mdx +++ b/docs/enterprise/architecture.mdx @@ -81,7 +81,7 @@ A [remote table](/tables-and-namespaces#understanding-tables) is the user-facing This is why Enterprise feels familiar at the API level while operationally behaving differently. Your application still issues table operations and queries, but it is no longer coupled to a local storage path or a single host. Instead, the cluster takes responsibility for execution, coordination, and background upkeep. In SDK terms, `open_table(...)` returns a `RemoteTable`. Architecturally, a remote table is the bridge between the client-facing API and the storage-backed system behind it. -This design makes LanceDB Enterprise suitable for catalog-backed layouts, see [Namespaces and the Catalog Model](/namespaces) for more details. For the basic application flow, see the [Enterprise quickstart](/enterprise/quickstart). +This design makes LanceDB Enterprise suitable for catalog-backed layouts, see [Namespaces and the Catalog Model](/namespaces) for more details. For the basic application flow, see the shared [quickstart](/quickstart). ## Read path diff --git a/docs/enterprise/index.mdx b/docs/enterprise/index.mdx index c2cd88f1..69ae3269 100644 --- a/docs/enterprise/index.mdx +++ b/docs/enterprise/index.mdx @@ -26,7 +26,7 @@ visibility. ### 1. 100B+ row scale -LanceDB Enterprise is built for demanding workloads that exceed the capabilities of a single machine, whether that's extremely large data volumes or a high number of concurrent queries. Instead of asking your +LanceDB Enterprise is built for demanding workloads that exceed the capabilities of a single machine, whether from extremely large data volumes or a high number of concurrent queries. Instead of asking your application to own caching, query scaling, and maintenance, Enterprise turns those into **platform** capabilities. This matters when your AI application moves past a prototype and starts serving real users, larger datasets, and @@ -134,6 +134,68 @@ monitoring. Both enterprise modes are designed for private networking, complianc Read More: [LanceDB Enterprise Deployment](/enterprise/deployment/) +## Usage differences between Enterprise and OSS + +The [quickstart](/quickstart) guide shows both local embedded connections and Enterprise `db://...` +connections. Once connected to LanceDB, the table API is largely the same: create a table, search, +filter, evolve the schema, and store multimodal records. However, there are some semantic differences +worth understanding when your code is talking to LanceDB Enterprise. + +### 1. Connection model + +In LanceDB Enterprise, your app connects via a `db://...` URI and sends requests to the cluster API. +The cluster executes table operations on your behalf. Your code is coupled to a **managed service endpoint**, +whereas embedded LanceDB is directly coupled to local or object-storage paths. + +### 2. Returned table type + +Connecting to an Enterprise table via `open_table(...)` returns a `RemoteTable`, unlike embedded LanceDB, +which returns a `LanceTable`. `RemoteTable` is a catalog-backed table accessed through a server/cluster, +and does not support all the same methods as `LanceTable` (see below). + +### 3. Materialization APIs + +For Python users working with LanceDB Enterprise, `RemoteTable` does not support table-level +materialization methods like `table.to_arrow()` or `table.to_pandas()`. This protects users from +accidentally materializing tables that are too large to fit in memory. + +Instead, materialize results through query/search builders, for example +`table.search(...).limit(...).to_pandas()` or `table.query(...).to_arrow()`. For quick previews, use +`table.head()`. + +### 4. Maintenance lifecycle + +In Enterprise, maintenance operations like `optimize` and `compact_files` are handled by the cluster +as background work. You can trigger them manually, but they are not required for performance or +correctness in the same way they are in embedded LanceDB. + +That means maintenance is managed by platform behavior and cluster configuration, not by explicit +per-table maintenance calls in your application code. + +### 5. Guardrails and limits + +Enterprise can enforce platform-level guardrails, such as index/table limits and safety checks around +operations like `merge_insert` when too many rows are unindexed. Embedded LanceDB mostly exposes +storage/format-level behavior, and you tune many lifecycle tasks yourself. + +This means an operation in LanceDB Enterprise can fail due to service-level policy, not just because +of local table shape or schema mismatch. + +### 6. Cluster-managed background work + +In Enterprise, async writes and reindexing workflows are handled by cluster background systems. In +embedded LanceDB, if you want ongoing upkeep, you usually schedule and run it yourself in your +application or jobs. + +In practice, your app issues table operations, and the platform handles distributed orchestration for +maintenance and indexing in the background. + + +As a rule of thumb, all you need to remember is this: treat `db://...` as a remote service boundary, +use query builders to fetch results, and otherwise interact with your tables as you would in embedded +LanceDB. + + ## Which one should I use? [It's very simple to get started with OSS](/quickstart/): Get started with `pip install lancedb` and begin ingesting diff --git a/docs/enterprise/quickstart.mdx b/docs/enterprise/quickstart.mdx deleted file mode 100644 index cd19406b..00000000 --- a/docs/enterprise/quickstart.mdx +++ /dev/null @@ -1,224 +0,0 @@ ---- -title: "Enterprise Quickstart" -sidebarTitle: "Quickstart" -description: "Run the LanceDB quickstart workflow on a RemoteTable in LanceDB Enterprise." -icon: "rocket" ---- - -import { - PyConnectEnterpriseQuickstart, - TsConnectEnterpriseQuickstart, - RsConnectEnterpriseQuickstart, -} from '/snippets/connection.mdx'; -import { - PyQuickstartCreateTable, - PyQuickstartVectorSearch1, - PyQuickstartOpenTable, - PyQuickstartAddData, - PyQuickstartVectorSearch2, - TsQuickstartCreateTable, - TsQuickstartVectorSearch1, - TsQuickstartOpenTable, - TsQuickstartAddData, - TsQuickstartVectorSearch2, - RsQuickstartDefineStruct, - RsQuickstartCreateTable, - RsQuickstartVectorSearch1, - RsQuickstartOpenTable, - RsQuickstartAddData, - RsQuickstartVectorSearch2, -} from '/snippets/quickstart.mdx'; - -This quickstart follows a similar workflow as the [OSS quickstart](/quickstart), but uses a **`RemoteTable`** through a `db://...` connection. - - -To get a LanceDB Enterprise cluster setup and to obtain credentials and endpoint details, [contact our team](mailto:contact@lancedb.com) to get started. -This guide assumes your Enterprise cluster is already running. - - -## 1. Install LanceDB - - -```bash Python icon=Python -pip install lancedb -``` - -```bash TypeScript icon=js -npm install @lancedb/lancedb -``` - -```bash Rust icon=Rust -cargo add lancedb -``` - - -## 2. Connect to Enterprise (`db://...`) - - - - { "import lancedb\n\n" } - {PyConnectEnterpriseQuickstart} - - - - { "import * as lancedb from \"@lancedb/lancedb\";\n\n" } - {TsConnectEnterpriseQuickstart} - - - - { "use lancedb::connect;\n\n" } - {RsConnectEnterpriseQuickstart} - - - -## 3. Create a table (same sample data as the OSS quickstart) - - - - {PyQuickstartCreateTable} - - - - {TsQuickstartCreateTable} - - - - {RsQuickstartDefineStruct} - {RsQuickstartCreateTable} - - - -## 4. Run vector search - - - - {PyQuickstartVectorSearch1} - - - - {TsQuickstartVectorSearch1} - - - - {RsQuickstartVectorSearch1} - - - -## 5. Open table, add data, and query again - - - - {PyQuickstartOpenTable} - {PyQuickstartAddData} - {PyQuickstartVectorSearch2} - - - - {TsQuickstartOpenTable} - {TsQuickstartAddData} - {TsQuickstartVectorSearch2} - - - - { "use lancedb::table::Table;\n\n" } - {RsQuickstartOpenTable} - {RsQuickstartAddData} - {RsQuickstartVectorSearch2} - - - -## Differences between Enterprise and OSS usage - -As can be seen, the flow for working with a `RemoteTable` in Enterprise looks more or less -similar to the [OSS quickstart](/quickstart). However, there are some semantic differences: - -### 1. Connection model - -In LanceDB Enterprise, your app connects via a `db://...` URI and sends requests to the cluster API. The cluster executes table operations on your behalf. -Your code is coupled to a **managed service endpoint** (whereas in OSS, your code is directly coupled to storage paths). - -### 2. Returned table type - -Connecting to an Enterprise table via `open_table(...)` returns a `RemoteTable`, unlike in OSS, which returns a `LanceTable`. - -### 3. Materialization APIs - -For Python users working with LanceDB Enterprise, `RemoteTable` does not support table-level -materialization methods like `table.to_arrow()` or `table.to_pandas()`. This is to protect -users from accidentally materializing tables that are too large to fit in memory. - -Instead, you materialize results through query/search builders, for example `table.search(...).limit(...).to_pandas()` or `table.query(...).to_arrow()`. For quick previews, you can use `table.head()`. - -### 4. Maintenance lifecycle - -In Enterprise, maintenance operations like `optimize`, `compact_files` are handled by the cluster as background work. You can trigger them manually, but they are not required for performance or correctness in the same way they are in OSS. - -That means maintenance is managed by platform behavior and cluster configuration, not by explicit per-table maintenance calls in your application code. - -### 5. Guardrails and limits - -Enterprise can enforce platform-level guardrails, such as index/table limits and safety checks around operations like `merge_insert` when too many rows are unindexed. OSS mostly exposes storage/format-level behavior, and you tune many lifecycle tasks yourself. - -This means an operation in LanceDB Enterprise can fail due to service-level policy, not just because of local table shape or schema mismatch. - -### 6. Cluster-managed background work - -In Enterprise, async writes and reindexing workflows are handled by cluster background systems. In OSS, if you want ongoing upkeep, you usually schedule and run it yourself in your application or jobs. - -In practice, your app issues table operations, and the platform handles distributed orchestration for maintenance and indexing in the background. - - -As a rule of thumb, all you need to remember with regard to LanceDB Enterprise is this: treat `db://...` as a remote service boundary, use query builders to fetch results, and otherwise interact with your tables as you would in OSS.** - - -## Advanced usage via namespace-backed connections - -LanceDB Enterprise also supports namespace-backed catalog connections. This allows you to resolve tables by namespace, rather than by direct URI, and is accessed via the REST connection mode of `connect_namespace(...)`. This is useful when table location resolution and credential vending are handled by an external catalog/namespace service. - -```py Python icon=Python -import os -import lancedb - -ns_db = lancedb.connect_namespace( - "rest", - { - "uri": "https://", - "headers.Authorization": f"Bearer {os.environ['CATALOG_TOKEN']}", - }, -) - -# Namespace-scoped table resolution -table = ns_db.open_table("adventurers", namespace=["prod", "search"]) -``` - -This mode is useful when table location resolution and credential vending are handled by an external catalog/namespace service. - -If you want to stick to a common table flow, start with the `db://` RemoteTable flow shown above. - -## Further reading - -You can learn more about table operations, namespaces, and the architecture of LanceDB Enterprise in the following guides. - - - - Build on this quickstart with table creation, updates, and schema tips. - - - Learn how to use namespaces in LanceDB, and connect to an Enterprise namespace via REST. - - - Learn about the architecture of LanceDB Enterprise and how it achieves high performance at scale. - - \ No newline at end of file diff --git a/docs/index.mdx b/docs/index.mdx index c9cf4f75..2dcd73d6 100644 --- a/docs/index.mdx +++ b/docs/index.mdx @@ -90,7 +90,7 @@ for agents. Start here: href="/quickstart" > Get started with LanceDB in minutes. - + - Get started with LanceDB Enterprise in minutes. + Get started with LanceDB in minutes, including Enterprise `db://` connections. diff --git a/docs/quickstart.mdx b/docs/quickstart.mdx index c2ba3d85..8bcac2c7 100644 --- a/docs/quickstart.mdx +++ b/docs/quickstart.mdx @@ -18,28 +18,54 @@ import { TsConnectObjectStorage, } from '/snippets/connection.mdx'; import { + PyQuickstartData, PyQuickstartCreateTable, PyQuickstartCreateTableAsync, + PyQuickstartAddFeature, + PyQuickstartCurateWithMetadata, + PyQuickstartMultimodalBytes, + PyQuickstartQueryFeature, PyQuickstartVectorSearch1, PyQuickstartVectorSearch1Async, PyQuickstartOutputPandas, + RsQuickstartAddFeature, + RsQuickstartCurateWithMetadata, RsQuickstartCreateTable, + RsQuickstartData, RsQuickstartDefineStruct, + RsQuickstartMultimodalBytes, + RsQuickstartQueryFeature, RsQuickstartVectorSearch1, + TsQuickstartAddFeature, + TsQuickstartCurateWithMetadata, TsQuickstartCreateTable, + TsQuickstartData, + TsQuickstartMultimodalBytes, + TsQuickstartQueryFeature, TsQuickstartVectorSearch1, } from '/snippets/quickstart.mdx'; -The easiest way to get started with LanceDB is the open source version, which is an embedded database that -runs in-process (like SQLite). Let's get started in just a few steps! +As described in [the landing page](/), LanceDB provides one data layer for +curation, feature engineering, search and retrieval, and model training. Whether you are preparing +training data, building a RAG or agentic retrieval system, reviewing examples, or adding model-generated +features, you'll work with the same underlying table and search primitives. + +Let's get started in just a few steps! ## 1. Install LanceDB Install LanceDB in your client SDK. -```bash Python icon=Python -pip install lancedb # or uv add lancedb +```bash pip icon="terminal" +pip install lancedb +``` + +```bash uv icon="terminal" +uv add lancedb + +# Or, in an existing virtual environment: +uv pip install lancedb ``` ```bash TypeScript icon=js @@ -51,18 +77,43 @@ cargo add lancedb ``` +### Python pre-release builds + +To pick up the latest features and bug fixes +before the next stable release, install a pre-release from LanceDB's Fury index. + + +```bash pip icon="terminal" +pip install --pre --extra-index-url https://pypi.fury.io/lancedb/ lancedb +``` + +```bash uv icon="terminal" +uv venv +uv pip install --prerelease allow --index https://pypi.fury.io/lancedb/ lancedb + +# To add to pyproject.toml, use: +uv add --prerelease allow --index https://pypi.fury.io/lancedb/ lancedb +``` + + + +Pre-release builds receive the same level of testing as stable releases, but their availability is not guaranteed +for more than 6 months after release. For real-world workloads, we recommend you use the latest stable release +as far as possible. + + ## 2. Connect to a LanceDB database LanceDB supports several URI patterns to connect to a database. - A local filesystem path (when using it as an embedded library) - A `db://...` URI (when using LanceDB Enterprise) -- An object storage URI: `s3://...`, `gs://...`, or `az://...` (OSS mode) +- An object storage URI: `s3://...`, `gs://...`, or `az://...` (when connecting directly from the client SDK) -### Connect via local path with LanceDB +### Connect via local directory path -The simplest way to begin is to use LanceDB OSS. Simply import LanceDB as an embedded library in your -client SDK of choice and point to a local path. +The simplest way to begin is to use LanceDB as an embedded library. Import LanceDB in your +client SDK of choice and point to a local directory path. @@ -87,7 +138,7 @@ client SDK of choice and point to a local path. ### Connect via object storage URIs -You can also connect LanceDB OSS directly to object storage: +You can also connect directly to object storage from the client SDK: @@ -112,9 +163,9 @@ For credentials, endpoints, and provider-specific options, see ### Connect to LanceDB Enterprise -If you're using LanceDB Enterprise, you can connect using a `db://` URI along -with the API key, region, and cluster endpoint you received from the LanceDB -team. Pass the cluster endpoint via `host_override` so the client routes +If you're using LanceDB Enterprise, you can connect to the remote database using the +`db://` URI along with the API key, region, and cluster endpoint you received from the +LanceDB team. Pass the cluster endpoint via `host_override` so the client routes requests to your deployment. @@ -137,25 +188,57 @@ requests to your deployment. `host_override` is the full URL of your cluster endpoint, including the scheme (`https://`) and a port if your deployment listens on a non-default one -(e.g. `https://your-enterprise-endpoint.com:443`). If you don't know the +(e.g. `https://your-enterprise-endpoint.com:443`). If you don't have the endpoint, [contact the LanceDB team](mailto:contact@lancedb.com). -For a walkthrough on how to use LanceDB Enterprise (including `RemoteTable` -semantics), see its [quickstart](/enterprise/quickstart). To learn -more about LanceDB Enterprise overall, see the -[Enterprise documentation](/enterprise). +To learn more about `RemoteTable` semantics and how Enterprise differs operationally from +embedded LanceDB, see the [Enterprise overview](/enterprise). + +## 3. Create a new table + +Let's create a small table of characters from the kingdom of Camelot. Each row stores source text, +metadata, structured fields, and a vector embedding in the same LanceDB table. + + +The embeddings we use in this example are synthetic and for demonstration purposes only. In a real AI +data workflow, you would generate them from text, images, audio, or video using an embedding model of choice. + + +Each row has source text, metadata, structured fields, and a vector: + +```json +{ + "id": "2", + "name": "Merlin", + "role": "Wizard", + "description": "Advisor and prophet with deep magical knowledge.", + "stats": { "strength": 2, "magic": 5, "leadership": 4, "wisdom": 5 }, + "vector": [0.2, 0.9, 0.4, 0.9] +} +``` + +The full raw records are included below: -## 3. Obtain data and ingest into LanceDB + + + + {PyQuickstartData} + -Let's look at an example. We have the following records of characters in an adventure board game. -The vector column holds 3-dimensional embeddings representing each character. + + {TsQuickstartData} + -To ingest the data into LanceDB, obtain data of the required shape -and pass in the data object to the `create_table` method as shown below. -Note that LanceDB tables require a schema. If you don't provide one, LanceDB -will infer it from the data. For the Rust snippet, you can find the helper functions in the -[code](https://github.com/lancedb/docs/blob/main/tests/rs/quickstart.rs). + + {RsQuickstartDefineStruct} + {RsQuickstartData} + + + + +You can now create a LanceDB table from those records. The code below creates a LanceDB table +with the appropriate schema and ingests the data. @@ -171,25 +254,19 @@ will infer it from the data. For the Rust snippet, you can find the helper funct - {RsQuickstartDefineStruct} {RsQuickstartCreateTable} - -The `vector` arrays here are synthetic and for demonstration purposes only. In your real-world -applications, you'd generate these vectors from the raw text fields using a suitable embedding model. - - -## 4. Run a vector similarity search +## 4. Semantic search -Now, let's perform a vector similarity search. The query vector should have the same -dimensionality as your data vectors and be generated using the same embedding model. -The search returns the most similar vectors based on a chosen distance metric (default is L2, -or Euclidean distance). +Search is a useful capability for all kinds of AI data pipelines. Below, we do a vector similarity +search for samples similar to a "_wise magical advisor_" (transforming the natural language query to +an embedding), and project only the columns needed by the next step. -Our query is a vector that represents a "warrior". Let's find the result that's most similar -to it! +Search (which requires random access) is a ubiquitous access pattern that appears in many workloads: +whether you're building a RAG or recommendation system, serving agent memory, or curating a training +dataset. @@ -220,28 +297,133 @@ to be used downstream in your application. - +## 5. Curation + +Searching for relevant results can be more useful when combined with metadata filters. +In this tiny example, we filter to examples with high `magic` stats. + + + + {PyQuickstartCurateWithMetadata} + + + + {TsQuickstartCurateWithMetadata} + + + + {RsQuickstartCurateWithMetadata} + + + +When working with large datasets, it's common to use the same pattern to filter on quality labels, +train/eval splits, numeric fields, categorical values, timestamp windows, or generated tags and labels. + +## 6. Add a derived feature + +Feature engineering is the process of cleaning up your data and creating new signals that +help your model learn, make better predictions, or your agent retrieve more useful information. +In the example below, we add a `power_score` column from the structured `stats` fields. +Lance supports data evolution, so you can add new columns without rewriting the entire table. + + + + {PyQuickstartAddFeature} + + + + {TsQuickstartAddFeature} + + + + { "use lancedb::table::NewColumnTransform;\n\n" } + {RsQuickstartAddFeature} + + + +Next, you can query a compact view of the new feature: + + + + {PyQuickstartQueryFeature} + + + + {TsQuickstartQueryFeature} + + + + {RsQuickstartQueryFeature} + + + +| name | role | power_score | +| --- | --- | --- | +| King Arthur | King | 3.5 | +| Merlin | Wizard | 4.0 | +| Sir Lancelot | Knight | 3.0 | + +The same workflow is used for data preparation tasks when adding derived features, cached model signals, review scores, or dataset +quality indicators. + +## 7. Store multimodal data + +Multimodal data is a first-class citizen in LanceDB. Binary data (image, audio, video, etc.) is +stored as blobs or inline Arrow binary types in a LanceDB column, and they benefit from the same +table operations and data versioning semantics as other data types. All the data is governed +in the same table, so you can search, filter, and retrieve multimodal records together with structured +fields, metadata, and embeddings. + +In this example, the +[`lancedb/magical_kingdom`](https://huggingface.co/datasets/lancedb/magical_kingdom) dataset stores +character images, descriptions, structured stats, image embeddings, and text embeddings together. + +Say we downloaded the image for Sir Lancelot from that dataset locally. You can read the image bytes +in your client SDK and store them in a LanceDB column. The image bytes can be used for downstream tasks +like retrieval, evaluation, or training. + +
+ Sir Lancelot from the lancedb/magical_kingdom dataset +
+ +These snippets load the local image file and store the bytes in an `image` column: + + + + {PyQuickstartMultimodalBytes} + + + + {TsQuickstartMultimodalBytes} + + + + {RsQuickstartMultimodalBytes} + + + +For more examples, see the [multimodal data](/tables/multimodal) section. + +## Code + See the full code for these examples (including helper functions) in the `quickstart` file for the appropriate client language in the -[files provided here](https://github.com/lancedb/docs/tree/main/tests). -
+[files provided in the repo](https://github.com/lancedb/docs/tree/main/tests). ## What's next? -You've learned how to install LanceDB, connect, create a table, and run a first -vector search. In the real world, embeddings capture meaning and vector search -allows you to find the most relevant data based on semantic similarity. - -Note that LanceDB is much more than "just a vector database" -- it's -[a multimodal lakehouse](https://lancedb.com/blog/multimodal-lakehouse/). -There's a lot more you can do with it! Continue -to the [Table management](/tables/) guide to build on -this example with schema options, appending data, updates, and versioning. +You've learned how to install LanceDB, connect, create one table for AI data, retrieve related +examples, curate with metadata, add a derived feature, and represent multimodal records. These same +primitives apply across the AI data lifecycle, from data preparation and feature engineering to +retrieval, evaluation, and training. -As you explore LanceDB further, you can combine vector search with other techniques like filtering based -on metadata fields, full-text search, hybrid search, and more. Check out the tutorials -and guides below to continue learning. +Continue to the table and search guides to build on this example with schema options, appends, +updates, versioning, indexing, full-text search, hybrid search, and reranking. Learn how to build Retrieval-Augmented Generation (RAG) applications using LanceDB. + + Create vector, full-text, and scalar indexes to speed up queries on larger datasets. + + + Use LanceDB for projected, shuffled, random-access reads in training workflows. + diff --git a/docs/snippets/quickstart.mdx b/docs/snippets/quickstart.mdx index bd817e66..810a0c0b 100644 --- a/docs/snippets/quickstart.mdx +++ b/docs/snippets/quickstart.mdx @@ -1,50 +1,76 @@ {/* Auto-generated by scripts/mdx_snippets_gen.py. Do not edit manually. */} -export const PyQuickstartAddData = "more_data = [\n {\"id\": \"7\", \"text\": \"mage\", \"vector\": [0.6, 0.3, 0.4]},\n {\"id\": \"8\", \"text\": \"bard\", \"vector\": [0.3, 0.8, 0.4]},\n]\n\n# Add data to table\ntable.add(more_data)\n"; +export const PyQuickstartAddData = "more_data = [\n {\n \"id\": \"4\",\n \"name\": \"Morgana\",\n \"role\": \"Sorceress\",\n \"description\": \"Powerful sorceress of Avalon.\",\n \"stats\": {\"strength\": 2, \"magic\": 5, \"leadership\": 4, \"wisdom\": 4},\n \"vector\": [0.3, 0.9, 0.6, 0.8],\n \"power_score\": 3.75,\n },\n]\n\n# Add data to table\ntable.add(more_data)\n"; -export const PyQuickstartCreateTable = "data = [\n {\"id\": \"1\", \"text\": \"knight\", \"vector\": [0.9, 0.4, 0.8]},\n {\"id\": \"2\", \"text\": \"ranger\", \"vector\": [0.8, 0.4, 0.7]},\n {\"id\": \"9\", \"text\": \"priest\", \"vector\": [0.6, 0.2, 0.6]},\n {\"id\": \"4\", \"text\": \"rogue\", \"vector\": [0.7, 0.4, 0.7]},\n]\ntable = db.create_table(\"adventurers\", data=data, mode=\"overwrite\")\n"; +export const PyQuickstartAddFeature = "table.add_columns(\n {\n \"power_score\": \"cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)\"\n }\n)\n"; -export const PyQuickstartCreateTableAsync = "async_table = await async_db.create_table(\n \"adventurers\",\n data=data,\n mode=\"overwrite\",\n)\n"; +export const PyQuickstartCreateTable = "table = db.create_table(\"characters\", data=data, mode=\"overwrite\")\n"; -export const PyQuickstartCreateTableNoOverwrite = "table = db.create_table(\"adventurers\", data=data)\n"; +export const PyQuickstartCreateTableAsync = "async_table = await async_db.create_table(\n \"characters\",\n data=data,\n mode=\"overwrite\",\n)\n"; -export const PyQuickstartOpenTable = "table = db.open_table(\"adventurers\")\n"; +export const PyQuickstartCreateTableNoOverwrite = "table = db.create_table(\"characters\", data=data)\n"; + +export const PyQuickstartCurateWithMetadata = "curated = (\n table.search(query_vector)\n .where(\"stats.magic >= 4\")\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .to_polars()\n)\nprint(curated)\n"; + +export const PyQuickstartData = "data = [\n {\n \"id\": \"1\",\n \"name\": \"King Arthur\",\n \"role\": \"King\",\n \"description\": \"Leader of Camelot and wielder of Excalibur.\",\n \"stats\": {\"strength\": 4, \"magic\": 1, \"leadership\": 5, \"wisdom\": 4},\n \"vector\": [0.7, 0.1, 0.9, 0.7],\n },\n {\n \"id\": \"2\",\n \"name\": \"Merlin\",\n \"role\": \"Wizard\",\n \"description\": \"Advisor and prophet with deep magical knowledge.\",\n \"stats\": {\"strength\": 2, \"magic\": 5, \"leadership\": 4, \"wisdom\": 5},\n \"vector\": [0.2, 0.9, 0.4, 0.9],\n },\n {\n \"id\": \"3\",\n \"name\": \"Sir Lancelot\",\n \"role\": \"Knight\",\n \"description\": \"Legendary knight known for courage and combat skill.\",\n \"stats\": {\"strength\": 5, \"magic\": 1, \"leadership\": 3, \"wisdom\": 3},\n \"vector\": [0.9, 0.1, 0.5, 0.4],\n },\n]\n"; + +export const PyQuickstartDataAsync = "data = [\n {\n \"id\": \"1\",\n \"name\": \"King Arthur\",\n \"role\": \"King\",\n \"description\": \"Leader of Camelot and wielder of Excalibur.\",\n \"stats\": {\"strength\": 4, \"magic\": 1, \"leadership\": 5, \"wisdom\": 4},\n \"vector\": [0.7, 0.1, 0.9, 0.7],\n },\n {\n \"id\": \"2\",\n \"name\": \"Merlin\",\n \"role\": \"Wizard\",\n \"description\": \"Advisor and prophet with deep magical knowledge.\",\n \"stats\": {\"strength\": 2, \"magic\": 5, \"leadership\": 4, \"wisdom\": 5},\n \"vector\": [0.2, 0.9, 0.4, 0.9],\n },\n {\n \"id\": \"3\",\n \"name\": \"Sir Lancelot\",\n \"role\": \"Knight\",\n \"description\": \"Legendary knight known for courage and combat skill.\",\n \"stats\": {\"strength\": 5, \"magic\": 1, \"leadership\": 3, \"wisdom\": 3},\n \"vector\": [0.9, 0.1, 0.5, 0.4],\n },\n]\n"; + +export const PyQuickstartMultimodalBytes = "from pathlib import Path\n\nimage_path = Path(\"docs/static/assets/images/quickstart/sir-lancelot.jpg\")\nimage_bytes = image_path.read_bytes()\n\nmultimodal_table = db.create_table(\n \"character_images\",\n data=[\n {\n \"id\": \"lancelot\",\n \"description\": \"Portrait of Sir Lancelot\",\n \"image\": image_bytes,\n \"vector\": [0.9, 0.1, 0.5, 0.4],\n }\n ],\n mode=\"overwrite\",\n)\n"; + +export const PyQuickstartOpenTable = "table = db.open_table(\"characters\")\n"; export const PyQuickstartOutputPandas = "# Ensure you run `pip install pandas` beforehand\nresult = table.search(query_vector).limit(2).to_pandas()\nprint(result)\n"; -export const PyQuickstartVectorSearch1 = "# Let's search for vectors similar to \"warrior\"\nquery_vector = [0.8, 0.3, 0.8]\n\n# Ensure you run `pip install polars` beforehand\nresult = table.search(query_vector).limit(2).to_polars()\nprint(result)\n"; +export const PyQuickstartQueryFeature = "features = table.search().select([\"name\", \"role\", \"power_score\"]).to_polars()\nprint(features)\n"; + +export const PyQuickstartVectorSearch1 = "# Search for examples similar to a \"wise magical advisor\"\nquery_vector = [0.2, 0.8, 0.4, 0.9]\n\n# Ensure you run `pip install polars` beforehand\nresult = (\n table.search(query_vector)\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .to_polars()\n)\nprint(result)\n"; + +export const PyQuickstartVectorSearch1Async = "# Search for examples similar to a \"wise magical advisor\"\nquery_vector = [0.2, 0.8, 0.4, 0.9]\n\n# Ensure you run `pip install polars` beforehand\nasync_result = await (\n await async_table.search(query_vector)\n).select([\"name\", \"role\", \"description\", \"_distance\"]).limit(2).to_polars()\nprint(async_result)\n"; -export const PyQuickstartVectorSearch1Async = "# Let's search for vectors similar to \"warrior\"\nquery_vector = [0.8, 0.3, 0.8]\n\n# Ensure you run `pip install polars` beforehand\nasync_result = await (await async_table.search(query_vector)).limit(2).to_polars()\nprint(async_result)\n"; +export const PyQuickstartVectorSearch2 = "# Search for examples similar to a \"powerful sorceress\"\nquery_vector = [0.3, 0.9, 0.6, 0.8]\n\nresults = table.search(query_vector).limit(2).to_polars()\nprint(results)\n"; -export const PyQuickstartVectorSearch2 = "# Let's search for vectors similar to \"wizard\"\nquery_vector = [0.7, 0.3, 0.5]\n\nresults = table.search(query_vector).limit(2).to_polars()\nprint(results)\n"; +export const TsQuickstartAddData = "const moreData = [\n {\n id: \"4\",\n name: \"Morgana\",\n role: \"Sorceress\",\n description: \"Powerful sorceress of Avalon.\",\n stats: { strength: 2, magic: 5, leadership: 4, wisdom: 4 },\n vector: [0.3, 0.9, 0.6, 0.8],\n power_score: 3.75,\n },\n];\n\n// Add data to table\nawait table.add(moreData);\n"; -export const TsQuickstartAddData = "const moreData = [\n { id: \"7\", text: \"mage\", vector: [0.6, 0.3, 0.4] },\n { id: \"8\", text: \"bard\", vector: [0.3, 0.8, 0.4] },\n];\n\n// Add data to table\nawait table.add(moreData);\n"; +export const TsQuickstartAddFeature = "await table.addColumns([\n {\n name: \"power_score\",\n valueSql:\n \"cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)\",\n },\n]);\n"; -export const TsQuickstartCreateTable = "const data = [\n { id: \"1\", text: \"knight\", vector: [0.9, 0.4, 0.8] },\n { id: \"2\", text: \"ranger\", vector: [0.8, 0.4, 0.7] },\n { id: \"9\", text: \"priest\", vector: [0.6, 0.2, 0.6] },\n { id: \"4\", text: \"rogue\", vector: [0.7, 0.4, 0.7] },\n];\nlet table = await db.createTable(\"adventurers\", data, { mode: \"overwrite\" });\n"; +export const TsQuickstartCreateTable = "let table = await db.createTable(\"characters\", data, { mode: \"overwrite\" });\n"; -export const TsQuickstartCreateTableNoOverwrite = "table = await db.createTable(\"adventurers\", data);\n"; +export const TsQuickstartCreateTableNoOverwrite = "table = await db.createTable(\"characters\", data);\n"; -export const TsQuickstartOpenTable = "table = await db.openTable(\"adventurers\");\n"; +export const TsQuickstartCurateWithMetadata = "const curated = await table\n .search(queryVector)\n .where(\"stats.magic >= 4\")\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .toArray();\nconsole.table(curated);\n"; + +export const TsQuickstartData = "const data = [\n {\n id: \"1\",\n name: \"King Arthur\",\n role: \"King\",\n description: \"Leader of Camelot and wielder of Excalibur.\",\n stats: { strength: 4, magic: 1, leadership: 5, wisdom: 4 },\n vector: [0.7, 0.1, 0.9, 0.7],\n },\n {\n id: \"2\",\n name: \"Merlin\",\n role: \"Wizard\",\n description: \"Advisor and prophet with deep magical knowledge.\",\n stats: { strength: 2, magic: 5, leadership: 4, wisdom: 5 },\n vector: [0.2, 0.9, 0.4, 0.9],\n },\n {\n id: \"3\",\n name: \"Sir Lancelot\",\n role: \"Knight\",\n description: \"Legendary knight known for courage and combat skill.\",\n stats: { strength: 5, magic: 1, leadership: 3, wisdom: 3 },\n vector: [0.9, 0.1, 0.5, 0.4],\n },\n];\n"; + +export const TsQuickstartMultimodalBytes = "const arrow = await import(\"apache-arrow\");\nconst path = await import(\"node:path\");\nconst { readFile } = await import(\"node:fs/promises\");\n\nconst imagePath = path.resolve(\n \"../../docs/static/assets/images/quickstart/sir-lancelot.jpg\",\n);\nconst imageBytes = await readFile(imagePath);\nconst imageSchema = new arrow.Schema([\n new arrow.Field(\"id\", new arrow.Utf8()),\n new arrow.Field(\"description\", new arrow.Utf8()),\n new arrow.Field(\"image\", new arrow.Binary()),\n new arrow.Field(\n \"vector\",\n new arrow.FixedSizeList(\n 4,\n new arrow.Field(\"item\", new arrow.Float32(), true),\n ),\n ),\n]);\nconst imageData = lancedb.makeArrowTable(\n [\n {\n id: \"lancelot\",\n description: \"Portrait of Sir Lancelot\",\n image: imageBytes,\n vector: [0.9, 0.1, 0.5, 0.4],\n },\n ],\n { schema: imageSchema },\n);\nconst multimodalTable = await db.createTable(\n \"character_images\",\n imageData,\n { mode: \"overwrite\" },\n);\n"; + +export const TsQuickstartOpenTable = "table = await db.openTable(\"characters\");\n"; export const TsQuickstartOutputArray = "result = await table.search(queryVector).limit(2).toArray();\nconsole.table(result);\n"; -export const TsQuickstartVectorSearch1 = "// Let's search for vectors similar to \"warrior\"\nlet queryVector = [0.8, 0.3, 0.8];\n\nlet result = await table.search(queryVector).limit(2).toArray();\nconsole.table(result);\n"; +export const TsQuickstartQueryFeature = "const features = await table\n .query()\n .select([\"name\", \"role\", \"power_score\"])\n .toArray();\nconsole.table(features);\n"; + +export const TsQuickstartVectorSearch1 = "// Search for examples similar to a \"wise magical advisor\"\nlet queryVector = [0.2, 0.8, 0.4, 0.9];\n\nlet result = await table\n .search(queryVector)\n .select([\"name\", \"role\", \"description\", \"_distance\"])\n .limit(2)\n .toArray();\nconsole.table(result);\n"; + +export const TsQuickstartVectorSearch2 = "// Search for examples similar to a \"powerful sorceress\"\nqueryVector = [0.3, 0.9, 0.6, 0.8];\n\nconst results = await table.search(queryVector).limit(2).toArray();\nconsole.table(results);\n"; + +export const RsQuickstartAddFeature = "table\n .add_columns(\n NewColumnTransform::SqlExpressions(vec![(\n \"power_score\".to_string(),\n \"cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)\"\n .to_string(),\n )]),\n None,\n )\n .await\n .unwrap();\n"; -export const TsQuickstartVectorSearch2 = "// Let's search for vectors similar to \"wizard\"\nqueryVector = [0.7, 0.3, 0.5];\n\nconst results = await table.search(queryVector).limit(2).toArray();\nconsole.table(results);\n"; +export const RsQuickstartCreateTable = "let schema = characters_schema();\nlet table = db\n .create_table(\"characters\", characters_to_reader(schema.clone(), &data))\n .mode(CreateTableMode::Overwrite)\n .execute()\n .await\n .unwrap();\n"; -export const RsQuickstartAddData = "let more_data = vec![\n Adventurer {\n id: \"7\".to_string(),\n text: \"mage\".to_string(),\n vector: [0.6, 0.3, 0.4],\n },\n Adventurer {\n id: \"8\".to_string(),\n text: \"bard\".to_string(),\n vector: [0.3, 0.8, 0.4],\n },\n];\n\n// Add data to table\ntable\n .add(adventurers_to_reader(schema.clone(), &more_data))\n .execute()\n .await\n .unwrap();\n"; +export const RsQuickstartCreateTableNoOverwrite = "let table = db\n .create_table(\"characters\", characters_to_reader(schema.clone(), &data))\n .execute()\n .await\n .unwrap();\n"; -export const RsQuickstartCreateTable = "// Define an arrow schema named adventurers_schema beforehand (omitted here for brevity)\nlet schema = adventurers_schema();\nlet data = vec![\n Adventurer {\n id: \"1\".to_string(),\n text: \"knight\".to_string(),\n vector: [0.9, 0.4, 0.8],\n },\n Adventurer {\n id: \"2\".to_string(),\n text: \"ranger\".to_string(),\n vector: [0.8, 0.4, 0.7],\n },\n Adventurer {\n id: \"9\".to_string(),\n text: \"priest\".to_string(),\n vector: [0.6, 0.2, 0.6],\n },\n Adventurer {\n id: \"4\".to_string(),\n text: \"rogue\".to_string(),\n vector: [0.7, 0.4, 0.7],\n },\n];\n// Create a new table with the data, overwriting if it already exists\nlet mut table = db\n .create_table(\"adventurers\", adventurers_to_reader(schema.clone(), &data))\n .mode(CreateTableMode::Overwrite)\n .execute()\n .await\n .unwrap();\n"; +export const RsQuickstartCurateWithMetadata = "let curated: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .only_if(\"stats.magic >= 4\")\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"description\".to_string(),\n \"_distance\".to_string(),\n ]))\n .limit(2)\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{curated:?}\");\n"; -export const RsQuickstartCreateTableNoOverwrite = "table = db\n .create_table(\"adventurers\", adventurers_to_reader(schema.clone(), &data))\n .execute()\n .await\n .unwrap();\n"; +export const RsQuickstartData = "let data = vec![\n Character {\n id: \"1\".to_string(),\n name: \"King Arthur\".to_string(),\n role: \"King\".to_string(),\n description: \"Leader of Camelot and wielder of Excalibur.\".to_string(),\n stats: Stats {\n strength: 4,\n magic: 1,\n leadership: 5,\n wisdom: 4,\n },\n vector: [0.7, 0.1, 0.9, 0.7],\n },\n Character {\n id: \"2\".to_string(),\n name: \"Merlin\".to_string(),\n role: \"Wizard\".to_string(),\n description: \"Advisor and prophet with deep magical knowledge.\".to_string(),\n stats: Stats {\n strength: 2,\n magic: 5,\n leadership: 4,\n wisdom: 5,\n },\n vector: [0.2, 0.9, 0.4, 0.9],\n },\n Character {\n id: \"3\".to_string(),\n name: \"Sir Lancelot\".to_string(),\n role: \"Knight\".to_string(),\n description: \"Legendary knight known for courage and combat skill.\".to_string(),\n stats: Stats {\n strength: 5,\n magic: 1,\n leadership: 3,\n wisdom: 3,\n },\n vector: [0.9, 0.1, 0.5, 0.4],\n },\n];\n"; -export const RsQuickstartDefineStruct = "// Define a struct representing the data schema\n#[derive(Debug, Clone, Serialize, Deserialize)]\nstruct Adventurer {\n id: String,\n text: String,\n vector: [f32; 3],\n}\n\nfn adventurers_schema() -> Arc {\n Arc::new(Schema::new(vec![\n Field::new(\"id\", DataType::LargeUtf8, false),\n Field::new(\"text\", DataType::LargeUtf8, false),\n Field::new(\n \"vector\",\n DataType::FixedSizeList(Arc::new(Field::new(\"item\", DataType::Float32, true)), 3),\n false,\n ),\n ]))\n}\n"; +export const RsQuickstartDefineStruct = "// Define structs representing the data schema\n#[derive(Debug, Clone, Serialize, Deserialize)]\nstruct Stats {\n strength: i8,\n magic: i8,\n leadership: i8,\n wisdom: i8,\n}\n\n#[derive(Debug, Clone, Serialize, Deserialize)]\nstruct Character {\n id: String,\n name: String,\n role: String,\n description: String,\n stats: Stats,\n vector: [f32; 4],\n}\n\nfn characters_schema() -> Arc {\n Arc::new(Schema::new(vec![\n Field::new(\"id\", DataType::LargeUtf8, false),\n Field::new(\"name\", DataType::LargeUtf8, false),\n Field::new(\"role\", DataType::LargeUtf8, false),\n Field::new(\"description\", DataType::LargeUtf8, false),\n Field::new(\n \"stats\",\n DataType::Struct(arrow_schema::Fields::from(vec![\n Arc::new(Field::new(\"strength\", DataType::Int8, false)),\n Arc::new(Field::new(\"magic\", DataType::Int8, false)),\n Arc::new(Field::new(\"leadership\", DataType::Int8, false)),\n Arc::new(Field::new(\"wisdom\", DataType::Int8, false)),\n ])),\n false,\n ),\n Field::new(\n \"vector\",\n DataType::FixedSizeList(Arc::new(Field::new(\"item\", DataType::Float32, true)), 4),\n false,\n ),\n ]))\n}\n"; -export const RsQuickstartOpenTable = "let table: Table = db.open_table(\"adventurers\").execute().await.unwrap();\n"; +export const RsQuickstartMultimodalBytes = "use std::sync::Arc;\n\nuse arrow_array::{\n BinaryArray, FixedSizeListArray, LargeStringArray, RecordBatch, RecordBatchIterator,\n};\nuse arrow_schema::{DataType, Field, Schema};\n\nlet image_path = std::path::Path::new(env!(\"CARGO_MANIFEST_DIR\"))\n .join(\"../../docs/static/assets/images/quickstart/sir-lancelot.jpg\");\nlet image_bytes = std::fs::read(image_path).unwrap();\n\nlet image_schema = Arc::new(Schema::new(vec![\n Field::new(\"id\", DataType::LargeUtf8, false),\n Field::new(\"description\", DataType::LargeUtf8, false),\n Field::new(\"image\", DataType::Binary, false),\n Field::new(\n \"vector\",\n DataType::FixedSizeList(Arc::new(Field::new(\"item\", DataType::Float32, true)), 4),\n false,\n ),\n]));\nlet image_vectors = [[0.9_f32, 0.1, 0.5, 0.4]];\nlet image_batch = RecordBatch::try_new(\n image_schema.clone(),\n vec![\n Arc::new(LargeStringArray::from_iter_values([\"lancelot\"])),\n Arc::new(LargeStringArray::from_iter_values([\n \"Portrait of Sir Lancelot\",\n ])),\n Arc::new(BinaryArray::from_iter_values([image_bytes.as_slice()])),\n Arc::new(\n FixedSizeListArray::from_iter_primitive::(\n image_vectors\n .iter()\n .map(|vector| Some(vector.iter().copied().map(Some).collect::>())),\n 4,\n ),\n ),\n ],\n)\n.unwrap();\nlet image_reader: Box = Box::new(\n RecordBatchIterator::new(vec![Ok(image_batch)].into_iter(), image_schema),\n);\nlet multimodal_table = db\n .create_table(\"character_images\", image_reader)\n .mode(CreateTableMode::Overwrite)\n .execute()\n .await\n .unwrap();\n"; -export const RsQuickstartOutputArray = "let result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .limit(2)\n .select(Select::Columns(vec![\"text\".to_string()]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\nlet text_col = result.column(\"text\").unwrap().str().unwrap();\nlet top_two = vec![\n text_col.get(0).unwrap().to_string(),\n text_col.get(1).unwrap().to_string(),\n];\n"; +export const RsQuickstartOutputArray = "let result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"description\".to_string(),\n \"_distance\".to_string(),\n ]))\n .limit(2)\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\n"; -export const RsQuickstartVectorSearch1 = "// Let's search for vectors similar to \"warrior\"\nlet query_vector = [0.8, 0.3, 0.8];\n\nlet result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .limit(2)\n .select(Select::Columns(vec![\"text\".to_string()]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\n"; +export const RsQuickstartQueryFeature = "let features: DataFrame = table\n .query()\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"power_score\".to_string(),\n ]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{features:?}\");\n"; -export const RsQuickstartVectorSearch2 = "// Let's search for vectors similar to \"wizard\"\nlet query_vector = [0.7, 0.3, 0.5];\n\nlet result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .limit(2)\n .select(Select::Columns(vec![\"text\".to_string()]))\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\nlet text_col = result.column(\"text\").unwrap().str().unwrap();\nlet top_two = vec![\n text_col.get(0).unwrap().to_string(),\n text_col.get(1).unwrap().to_string(),\n];\n"; +export const RsQuickstartVectorSearch1 = "// Search for examples similar to a \"wise magical advisor\"\nlet query_vector = [0.2, 0.8, 0.4, 0.9];\n\nlet result: DataFrame = table\n .query()\n .nearest_to(&query_vector)\n .unwrap()\n .select(Select::Columns(vec![\n \"name\".to_string(),\n \"role\".to_string(),\n \"description\".to_string(),\n \"_distance\".to_string(),\n ]))\n .limit(2)\n .execute()\n .await\n .unwrap()\n .into_polars()\n .await\n .unwrap();\nprintln!(\"{result:?}\");\n"; diff --git a/docs/static/assets/images/quickstart/sir-lancelot.jpg b/docs/static/assets/images/quickstart/sir-lancelot.jpg new file mode 100644 index 00000000..e987d60e Binary files /dev/null and b/docs/static/assets/images/quickstart/sir-lancelot.jpg differ diff --git a/docs/storage/configuration.mdx b/docs/storage/configuration.mdx index 0a00be9f..0c9a0d8f 100644 --- a/docs/storage/configuration.mdx +++ b/docs/storage/configuration.mdx @@ -36,7 +36,7 @@ When using LanceDB OSS, you can choose where to store your data. The tradeoffs b **LanceDB Enterprise storage configuration** -In LanceDB Enterprise, you connect with `db://...` and the cluster owns the storage credentials, so `storage_options` are not passed at runtime. Cloud auth is set at deployment time. For federated databases, the namespace service vends per-request credentials automatically. See the [Enterprise quickstart](/enterprise/quickstart) and the [Azure deployment guide](/enterprise/deployment/azure) for the Enterprise flow. +In LanceDB Enterprise, you connect with `db://...` and the cluster owns the storage credentials, so `storage_options` are not passed at runtime. Cloud auth is set at deployment time. For federated databases, the namespace service vends per-request credentials automatically. See the [quickstart](/quickstart), [Enterprise overview](/enterprise/), and [Azure deployment guide](/enterprise/deployment/azure) for the Enterprise flow. ## Object stores diff --git a/tests/py/test_quickstart.py b/tests/py/test_quickstart.py index 58d25c28..207e6540 100644 --- a/tests/py/test_quickstart.py +++ b/tests/py/test_quickstart.py @@ -4,68 +4,151 @@ import lancedb import pytest + def test_quickstart(db_path_factory): - uri = "quickstart_db" uri = db_path_factory("quickstart_db") db = lancedb.connect(uri) - # --8<-- [start:quickstart_create_table] + # --8<-- [start:quickstart_data] data = [ - {"id": "1", "text": "knight", "vector": [0.9, 0.4, 0.8]}, - {"id": "2", "text": "ranger", "vector": [0.8, 0.4, 0.7]}, - {"id": "9", "text": "priest", "vector": [0.6, 0.2, 0.6]}, - {"id": "4", "text": "rogue", "vector": [0.7, 0.4, 0.7]}, + { + "id": "1", + "name": "King Arthur", + "role": "King", + "description": "Leader of Camelot and wielder of Excalibur.", + "stats": {"strength": 4, "magic": 1, "leadership": 5, "wisdom": 4}, + "vector": [0.7, 0.1, 0.9, 0.7], + }, + { + "id": "2", + "name": "Merlin", + "role": "Wizard", + "description": "Advisor and prophet with deep magical knowledge.", + "stats": {"strength": 2, "magic": 5, "leadership": 4, "wisdom": 5}, + "vector": [0.2, 0.9, 0.4, 0.9], + }, + { + "id": "3", + "name": "Sir Lancelot", + "role": "Knight", + "description": "Legendary knight known for courage and combat skill.", + "stats": {"strength": 5, "magic": 1, "leadership": 3, "wisdom": 3}, + "vector": [0.9, 0.1, 0.5, 0.4], + }, ] - table = db.create_table("adventurers", data=data, mode="overwrite") + # --8<-- [end:quickstart_data] + + # --8<-- [start:quickstart_create_table] + table = db.create_table("characters", data=data, mode="overwrite") # --8<-- [end:quickstart_create_table] - assert len(table) == 4 + assert len(table) == 3 # Drop the table to test create without overwrite - db.drop_table("adventurers") + db.drop_table("characters") # --8<-- [start:quickstart_create_table_no_overwrite] - table = db.create_table("adventurers", data=data) + table = db.create_table("characters", data=data) # --8<-- [end:quickstart_create_table_no_overwrite] - assert len(table) == 4 + assert len(table) == 3 # --8<-- [start:quickstart_vector_search_1] - # Let's search for vectors similar to "warrior" - query_vector = [0.8, 0.3, 0.8] + # Search for examples similar to a "wise magical advisor" + query_vector = [0.2, 0.8, 0.4, 0.9] # Ensure you run `pip install polars` beforehand - result = table.search(query_vector).limit(2).to_polars() + result = ( + table.search(query_vector) + .select(["name", "role", "description", "_distance"]) + .limit(2) + .to_polars() + ) print(result) # --8<-- [end:quickstart_vector_search_1] - assert result.head(1)["text"][0] == "knight" + assert result.head(1)["name"][0] == "Merlin" + + # --8<-- [start:quickstart_curate_with_metadata] + curated = ( + table.search(query_vector) + .where("stats.magic >= 4") + .select(["name", "role", "description", "_distance"]) + .limit(2) + .to_polars() + ) + print(curated) + # --8<-- [end:quickstart_curate_with_metadata] + assert curated.head(1)["name"][0] == "Merlin" # --8<-- [start:quickstart_output_pandas] # Ensure you run `pip install pandas` beforehand result = table.search(query_vector).limit(2).to_pandas() print(result) # --8<-- [end:quickstart_output_pandas] - assert result.iloc[0]["text"] == "knight" + assert result.iloc[0]["name"] == "Merlin" + + # --8<-- [start:quickstart_add_feature] + table.add_columns( + { + "power_score": "cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)" + } + ) + # --8<-- [end:quickstart_add_feature] + assert "power_score" in table.schema.names + + # --8<-- [start:quickstart_query_feature] + features = table.search().select(["name", "role", "power_score"]).to_polars() + print(features) + # --8<-- [end:quickstart_query_feature] + assert "power_score" in features.columns + + # --8<-- [start:quickstart_multimodal_bytes] + from pathlib import Path + + image_path = Path("docs/static/assets/images/quickstart/sir-lancelot.jpg") + image_bytes = image_path.read_bytes() + + multimodal_table = db.create_table( + "character_images", + data=[ + { + "id": "lancelot", + "description": "Portrait of Sir Lancelot", + "image": image_bytes, + "vector": [0.9, 0.1, 0.5, 0.4], + } + ], + mode="overwrite", + ) + # --8<-- [end:quickstart_multimodal_bytes] + assert len(multimodal_table) == 1 # --8<-- [start:quickstart_open_table] - table = db.open_table("adventurers") + table = db.open_table("characters") # --8<-- [end:quickstart_open_table] # --8<-- [start:quickstart_add_data] more_data = [ - {"id": "7", "text": "mage", "vector": [0.6, 0.3, 0.4]}, - {"id": "8", "text": "bard", "vector": [0.3, 0.8, 0.4]}, + { + "id": "4", + "name": "Morgana", + "role": "Sorceress", + "description": "Powerful sorceress of Avalon.", + "stats": {"strength": 2, "magic": 5, "leadership": 4, "wisdom": 4}, + "vector": [0.3, 0.9, 0.6, 0.8], + "power_score": 3.75, + }, ] # Add data to table table.add(more_data) # --8<-- [end:quickstart_add_data] - assert len(table) == 6 + assert len(table) == 4 # --8<-- [start:quickstart_vector_search_2] - # Let's search for vectors similar to "wizard" - query_vector = [0.7, 0.3, 0.5] + # Search for examples similar to a "powerful sorceress" + query_vector = [0.3, 0.9, 0.6, 0.8] results = table.search(query_vector).limit(2).to_polars() print(results) # --8<-- [end:quickstart_vector_search_2] - assert results.head(1)["text"][0] == "mage" + assert results.head(1)["name"][0] == "Morgana" @pytest.mark.asyncio @@ -74,28 +157,52 @@ async def test_quickstart_async_api(db_path_factory): import lancedb async_db = await lancedb.connect_async(db_uri) + # --8<-- [start:quickstart_data_async] data = [ - {"id": "1", "text": "knight", "vector": [0.9, 0.4, 0.8]}, - {"id": "2", "text": "ranger", "vector": [0.8, 0.4, 0.7]}, - {"id": "9", "text": "priest", "vector": [0.6, 0.2, 0.6]}, - {"id": "4", "text": "rogue", "vector": [0.7, 0.4, 0.7]}, + { + "id": "1", + "name": "King Arthur", + "role": "King", + "description": "Leader of Camelot and wielder of Excalibur.", + "stats": {"strength": 4, "magic": 1, "leadership": 5, "wisdom": 4}, + "vector": [0.7, 0.1, 0.9, 0.7], + }, + { + "id": "2", + "name": "Merlin", + "role": "Wizard", + "description": "Advisor and prophet with deep magical knowledge.", + "stats": {"strength": 2, "magic": 5, "leadership": 4, "wisdom": 5}, + "vector": [0.2, 0.9, 0.4, 0.9], + }, + { + "id": "3", + "name": "Sir Lancelot", + "role": "Knight", + "description": "Legendary knight known for courage and combat skill.", + "stats": {"strength": 5, "magic": 1, "leadership": 3, "wisdom": 3}, + "vector": [0.9, 0.1, 0.5, 0.4], + }, ] + # --8<-- [end:quickstart_data_async] # --8<-- [start:quickstart_create_table_async] async_table = await async_db.create_table( - "adventurers", + "characters", data=data, mode="overwrite", ) # --8<-- [end:quickstart_create_table_async] # --8<-- [start:quickstart_vector_search_1_async] - # Let's search for vectors similar to "warrior" - query_vector = [0.8, 0.3, 0.8] + # Search for examples similar to a "wise magical advisor" + query_vector = [0.2, 0.8, 0.4, 0.9] # Ensure you run `pip install polars` beforehand - async_result = await (await async_table.search(query_vector)).limit(2).to_polars() + async_result = await ( + await async_table.search(query_vector) + ).select(["name", "role", "description", "_distance"]).limit(2).to_polars() print(async_result) # --8<-- [end:quickstart_vector_search_1_async] - assert async_result.head(1)["text"][0] == "knight" + assert async_result.head(1)["name"][0] == "Merlin" diff --git a/tests/rs/quickstart.rs b/tests/rs/quickstart.rs index ebe4dffc..bc966aab 100644 --- a/tests/rs/quickstart.rs +++ b/tests/rs/quickstart.rs @@ -4,31 +4,56 @@ use std::sync::Arc; use arrow_array::types::Float32Type; -use arrow_array::{FixedSizeListArray, LargeStringArray, RecordBatch, RecordBatchIterator}; -use arrow_schema::{DataType, Field, Schema}; +use arrow_array::{ + FixedSizeListArray, Int8Array, LargeStringArray, RecordBatch, RecordBatchIterator, StructArray, +}; +use arrow_schema::{DataType, Field, FieldRef, Schema}; use lancedb::arrow::IntoPolars; use lancedb::database::CreateTableMode; use lancedb::query::{ExecutableQuery, QueryBase, Select}; -use lancedb::{connect, table::Table}; +use lancedb::{connect, table::NewColumnTransform}; use polars::prelude::DataFrame; use serde::{Deserialize, Serialize}; // --8<-- [start:quickstart_define_struct] -// Define a struct representing the data schema +// Define structs representing the data schema #[derive(Debug, Clone, Serialize, Deserialize)] -struct Adventurer { +struct Stats { + strength: i8, + magic: i8, + leadership: i8, + wisdom: i8, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +struct Character { id: String, - text: String, - vector: [f32; 3], + name: String, + role: String, + description: String, + stats: Stats, + vector: [f32; 4], } -fn adventurers_schema() -> Arc { +fn characters_schema() -> Arc { Arc::new(Schema::new(vec![ Field::new("id", DataType::LargeUtf8, false), - Field::new("text", DataType::LargeUtf8, false), + Field::new("name", DataType::LargeUtf8, false), + Field::new("role", DataType::LargeUtf8, false), + Field::new("description", DataType::LargeUtf8, false), + Field::new( + "stats", + DataType::Struct(arrow_schema::Fields::from(vec![ + Arc::new(Field::new("strength", DataType::Int8, false)), + Arc::new(Field::new("magic", DataType::Int8, false)), + Arc::new(Field::new("leadership", DataType::Int8, false)), + Arc::new(Field::new("wisdom", DataType::Int8, false)), + ])), + false, + ), Field::new( "vector", - DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 3), + DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4), false, ), ])) @@ -37,22 +62,57 @@ fn adventurers_schema() -> Arc { type BatchIter = Box; -fn adventurers_to_reader(schema: Arc, rows: &[Adventurer]) -> BatchIter { +fn characters_to_reader(schema: Arc, rows: &[Character]) -> BatchIter { let ids = LargeStringArray::from_iter_values(rows.iter().map(|row| row.id.as_str())); - let texts = LargeStringArray::from_iter_values(rows.iter().map(|row| row.text.as_str())); + let names = LargeStringArray::from_iter_values(rows.iter().map(|row| row.name.as_str())); + let roles = LargeStringArray::from_iter_values(rows.iter().map(|row| row.role.as_str())); + let descriptions = + LargeStringArray::from_iter_values(rows.iter().map(|row| row.description.as_str())); + + let strength = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.strength)); + let magic = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.magic)); + let leadership = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.leadership)); + let wisdom = Int8Array::from_iter_values(rows.iter().map(|row| row.stats.wisdom)); + let stats_fields: Vec = vec![ + Arc::new(Field::new("strength", DataType::Int8, false)), + Arc::new(Field::new("magic", DataType::Int8, false)), + Arc::new(Field::new("leadership", DataType::Int8, false)), + Arc::new(Field::new("wisdom", DataType::Int8, false)), + ]; + let stats = StructArray::new( + stats_fields.into(), + vec![ + Arc::new(strength), + Arc::new(magic), + Arc::new(leadership), + Arc::new(wisdom), + ], + None, + ); + let vectors = FixedSizeListArray::from_iter_primitive::( rows.iter() .map(|row| Some(row.vector.iter().copied().map(Some).collect::>())), - 3, + 4, ); let batch = RecordBatch::try_new( schema.clone(), - vec![Arc::new(ids), Arc::new(texts), Arc::new(vectors)], + vec![ + Arc::new(ids), + Arc::new(names), + Arc::new(roles), + Arc::new(descriptions), + Arc::new(stats), + Arc::new(vectors), + ], ) .unwrap(); - Box::new(RecordBatchIterator::new(vec![Ok(batch)].into_iter(), schema)) + Box::new(RecordBatchIterator::new( + vec![Ok(batch)].into_iter(), + schema, + )) } #[tokio::main] @@ -61,61 +121,86 @@ async fn main() { let uri = temp_dir.path().to_str().unwrap(); let db = connect(uri).execute().await.unwrap(); - // --8<-- [start:quickstart_create_table] - // Define an arrow schema named adventurers_schema beforehand (omitted here for brevity) - let schema = adventurers_schema(); + // --8<-- [start:quickstart_data] let data = vec![ - Adventurer { + Character { id: "1".to_string(), - text: "knight".to_string(), - vector: [0.9, 0.4, 0.8], + name: "King Arthur".to_string(), + role: "King".to_string(), + description: "Leader of Camelot and wielder of Excalibur.".to_string(), + stats: Stats { + strength: 4, + magic: 1, + leadership: 5, + wisdom: 4, + }, + vector: [0.7, 0.1, 0.9, 0.7], }, - Adventurer { + Character { id: "2".to_string(), - text: "ranger".to_string(), - vector: [0.8, 0.4, 0.7], + name: "Merlin".to_string(), + role: "Wizard".to_string(), + description: "Advisor and prophet with deep magical knowledge.".to_string(), + stats: Stats { + strength: 2, + magic: 5, + leadership: 4, + wisdom: 5, + }, + vector: [0.2, 0.9, 0.4, 0.9], }, - Adventurer { - id: "9".to_string(), - text: "priest".to_string(), - vector: [0.6, 0.2, 0.6], - }, - Adventurer { - id: "4".to_string(), - text: "rogue".to_string(), - vector: [0.7, 0.4, 0.7], + Character { + id: "3".to_string(), + name: "Sir Lancelot".to_string(), + role: "Knight".to_string(), + description: "Legendary knight known for courage and combat skill.".to_string(), + stats: Stats { + strength: 5, + magic: 1, + leadership: 3, + wisdom: 3, + }, + vector: [0.9, 0.1, 0.5, 0.4], }, ]; - // Create a new table with the data, overwriting if it already exists - let mut table = db - .create_table("adventurers", adventurers_to_reader(schema.clone(), &data)) + // --8<-- [end:quickstart_data] + + // --8<-- [start:quickstart_create_table] + let schema = characters_schema(); + let table = db + .create_table("characters", characters_to_reader(schema.clone(), &data)) .mode(CreateTableMode::Overwrite) .execute() .await .unwrap(); // --8<-- [end:quickstart_create_table] - assert_eq!(table.count_rows(None).await.unwrap(), 4); - db.drop_table("adventurers", &[]).await.unwrap(); + assert_eq!(table.count_rows(None).await.unwrap(), 3); + db.drop_table("characters", &[]).await.unwrap(); // --8<-- [start:quickstart_create_table_no_overwrite] - table = db - .create_table("adventurers", adventurers_to_reader(schema.clone(), &data)) + let table = db + .create_table("characters", characters_to_reader(schema.clone(), &data)) .execute() .await .unwrap(); // --8<-- [end:quickstart_create_table_no_overwrite] - assert_eq!(table.count_rows(None).await.unwrap(), 4); + assert_eq!(table.count_rows(None).await.unwrap(), 3); // --8<-- [start:quickstart_vector_search_1] - // Let's search for vectors similar to "warrior" - let query_vector = [0.8, 0.3, 0.8]; + // Search for examples similar to a "wise magical advisor" + let query_vector = [0.2, 0.8, 0.4, 0.9]; let result: DataFrame = table .query() .nearest_to(&query_vector) .unwrap() + .select(Select::Columns(vec![ + "name".to_string(), + "role".to_string(), + "description".to_string(), + "_distance".to_string(), + ])) .limit(2) - .select(Select::Columns(vec!["text".to_string()])) .execute() .await .unwrap() @@ -124,16 +209,45 @@ async fn main() { .unwrap(); println!("{result:?}"); // --8<-- [end:quickstart_vector_search_1] - let text_col = result.column("text").unwrap().str().unwrap(); - assert_eq!(text_col.get(0).unwrap(), "knight"); + let name_col = result.column("name").unwrap().str().unwrap(); + assert_eq!(name_col.get(0).unwrap(), "Merlin"); + + // --8<-- [start:quickstart_curate_with_metadata] + let curated: DataFrame = table + .query() + .nearest_to(&query_vector) + .unwrap() + .only_if("stats.magic >= 4") + .select(Select::Columns(vec![ + "name".to_string(), + "role".to_string(), + "description".to_string(), + "_distance".to_string(), + ])) + .limit(2) + .execute() + .await + .unwrap() + .into_polars() + .await + .unwrap(); + println!("{curated:?}"); + // --8<-- [end:quickstart_curate_with_metadata] + let curated_name_col = curated.column("name").unwrap().str().unwrap(); + assert_eq!(curated_name_col.get(0).unwrap(), "Merlin"); // --8<-- [start:quickstart_output_array] let result: DataFrame = table .query() .nearest_to(&query_vector) .unwrap() + .select(Select::Columns(vec![ + "name".to_string(), + "role".to_string(), + "description".to_string(), + "_distance".to_string(), + ])) .limit(2) - .select(Select::Columns(vec!["text".to_string()])) .execute() .await .unwrap() @@ -141,63 +255,93 @@ async fn main() { .await .unwrap(); println!("{result:?}"); - let text_col = result.column("text").unwrap().str().unwrap(); - let top_two = vec![ - text_col.get(0).unwrap().to_string(), - text_col.get(1).unwrap().to_string(), - ]; // --8<-- [end:quickstart_output_array] - assert_eq!(top_two[0], "knight"); - - // --8<-- [start:quickstart_open_table] - let table: Table = db.open_table("adventurers").execute().await.unwrap(); - // --8<-- [end:quickstart_open_table] - - // --8<-- [start:quickstart_add_data] - let more_data = vec![ - Adventurer { - id: "7".to_string(), - text: "mage".to_string(), - vector: [0.6, 0.3, 0.4], - }, - Adventurer { - id: "8".to_string(), - text: "bard".to_string(), - vector: [0.3, 0.8, 0.4], - }, - ]; + let name_col = result.column("name").unwrap().str().unwrap(); + assert_eq!(name_col.get(0).unwrap(), "Merlin"); - // Add data to table + // --8<-- [start:quickstart_add_feature] table - .add(adventurers_to_reader(schema.clone(), &more_data)) - .execute() + .add_columns( + NewColumnTransform::SqlExpressions(vec![( + "power_score".to_string(), + "cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)" + .to_string(), + )]), + None, + ) .await .unwrap(); - // --8<-- [end:quickstart_add_data] - assert_eq!(table.count_rows(None).await.unwrap(), 6); - - // --8<-- [start:quickstart_vector_search_2] - // Let's search for vectors similar to "wizard" - let query_vector = [0.7, 0.3, 0.5]; + // --8<-- [end:quickstart_add_feature] - let result: DataFrame = table + // --8<-- [start:quickstart_query_feature] + let features: DataFrame = table .query() - .nearest_to(&query_vector) - .unwrap() - .limit(2) - .select(Select::Columns(vec!["text".to_string()])) + .select(Select::Columns(vec![ + "name".to_string(), + "role".to_string(), + "power_score".to_string(), + ])) .execute() .await .unwrap() .into_polars() .await .unwrap(); - println!("{result:?}"); - let text_col = result.column("text").unwrap().str().unwrap(); - let top_two = vec![ - text_col.get(0).unwrap().to_string(), - text_col.get(1).unwrap().to_string(), - ]; - // --8<-- [end:quickstart_vector_search_2] - assert_eq!(top_two[0], "mage"); + println!("{features:?}"); + // --8<-- [end:quickstart_query_feature] + assert!(features.column("power_score").is_ok()); + + // --8<-- [start:quickstart_multimodal_bytes] + use std::sync::Arc; + + use arrow_array::{ + BinaryArray, FixedSizeListArray, LargeStringArray, RecordBatch, RecordBatchIterator, + }; + use arrow_schema::{DataType, Field, Schema}; + + let image_path = std::path::Path::new(env!("CARGO_MANIFEST_DIR")) + .join("../../docs/static/assets/images/quickstart/sir-lancelot.jpg"); + let image_bytes = std::fs::read(image_path).unwrap(); + + let image_schema = Arc::new(Schema::new(vec![ + Field::new("id", DataType::LargeUtf8, false), + Field::new("description", DataType::LargeUtf8, false), + Field::new("image", DataType::Binary, false), + Field::new( + "vector", + DataType::FixedSizeList(Arc::new(Field::new("item", DataType::Float32, true)), 4), + false, + ), + ])); + let image_vectors = [[0.9_f32, 0.1, 0.5, 0.4]]; + let image_batch = RecordBatch::try_new( + image_schema.clone(), + vec![ + Arc::new(LargeStringArray::from_iter_values(["lancelot"])), + Arc::new(LargeStringArray::from_iter_values([ + "Portrait of Sir Lancelot", + ])), + Arc::new(BinaryArray::from_iter_values([image_bytes.as_slice()])), + Arc::new( + FixedSizeListArray::from_iter_primitive::( + image_vectors + .iter() + .map(|vector| Some(vector.iter().copied().map(Some).collect::>())), + 4, + ), + ), + ], + ) + .unwrap(); + let image_reader: Box = Box::new( + RecordBatchIterator::new(vec![Ok(image_batch)].into_iter(), image_schema), + ); + let multimodal_table = db + .create_table("character_images", image_reader) + .mode(CreateTableMode::Overwrite) + .execute() + .await + .unwrap(); + // --8<-- [end:quickstart_multimodal_bytes] + assert_eq!(multimodal_table.count_rows(None).await.unwrap(), 1); } diff --git a/tests/ts/quickstart.test.ts b/tests/ts/quickstart.test.ts index 655ac8c5..ecd5c099 100644 --- a/tests/ts/quickstart.test.ts +++ b/tests/ts/quickstart.test.ts @@ -8,60 +8,168 @@ test("quickstart example (async)", async () => { await withTempDirectory(async (databaseDir) => { const db = await lancedb.connect(databaseDir); - // --8<-- [start:quickstart_create_table] + // --8<-- [start:quickstart_data] const data = [ - { id: "1", text: "knight", vector: [0.9, 0.4, 0.8] }, - { id: "2", text: "ranger", vector: [0.8, 0.4, 0.7] }, - { id: "9", text: "priest", vector: [0.6, 0.2, 0.6] }, - { id: "4", text: "rogue", vector: [0.7, 0.4, 0.7] }, + { + id: "1", + name: "King Arthur", + role: "King", + description: "Leader of Camelot and wielder of Excalibur.", + stats: { strength: 4, magic: 1, leadership: 5, wisdom: 4 }, + vector: [0.7, 0.1, 0.9, 0.7], + }, + { + id: "2", + name: "Merlin", + role: "Wizard", + description: "Advisor and prophet with deep magical knowledge.", + stats: { strength: 2, magic: 5, leadership: 4, wisdom: 5 }, + vector: [0.2, 0.9, 0.4, 0.9], + }, + { + id: "3", + name: "Sir Lancelot", + role: "Knight", + description: "Legendary knight known for courage and combat skill.", + stats: { strength: 5, magic: 1, leadership: 3, wisdom: 3 }, + vector: [0.9, 0.1, 0.5, 0.4], + }, ]; - let table = await db.createTable("adventurers", data, { mode: "overwrite" }); + // --8<-- [end:quickstart_data] + + // --8<-- [start:quickstart_create_table] + let table = await db.createTable("characters", data, { mode: "overwrite" }); // --8<-- [end:quickstart_create_table] - expect(await table.countRows()).toBe(4); - await db.dropTable("adventurers"); + expect(await table.countRows()).toBe(3); + await db.dropTable("characters"); // --8<-- [start:quickstart_create_table_no_overwrite] - table = await db.createTable("adventurers", data); + table = await db.createTable("characters", data); // --8<-- [end:quickstart_create_table_no_overwrite] - expect(await table.countRows()).toBe(4); + expect(await table.countRows()).toBe(3); // --8<-- [start:quickstart_vector_search_1] - // Let's search for vectors similar to "warrior" - let queryVector = [0.8, 0.3, 0.8]; + // Search for examples similar to a "wise magical advisor" + let queryVector = [0.2, 0.8, 0.4, 0.9]; - let result = await table.search(queryVector).limit(2).toArray(); + let result = await table + .search(queryVector) + .select(["name", "role", "description", "_distance"]) + .limit(2) + .toArray(); console.table(result); // --8<-- [end:quickstart_vector_search_1] - expect(result[0].text).toBe("knight"); + expect(result[0].name).toBe("Merlin"); + + // --8<-- [start:quickstart_curate_with_metadata] + const curated = await table + .search(queryVector) + .where("stats.magic >= 4") + .select(["name", "role", "description", "_distance"]) + .limit(2) + .toArray(); + console.table(curated); + // --8<-- [end:quickstart_curate_with_metadata] + expect(curated[0].name).toBe("Merlin"); // --8<-- [start:quickstart_output_array] result = await table.search(queryVector).limit(2).toArray(); console.table(result); // --8<-- [end:quickstart_output_array] - expect(result[0].text).toBe("knight"); + expect(result[0].name).toBe("Merlin"); + + // --8<-- [start:quickstart_add_feature] + await table.addColumns([ + { + name: "power_score", + valueSql: + "cast(((stats.strength + stats.magic + stats.leadership + stats.wisdom) / 4.0) as float)", + }, + ]); + // --8<-- [end:quickstart_add_feature] + const schemaWithFeature = await table.schema(); + expect(schemaWithFeature.fields.some((f) => f.name === "power_score")).toBe( + true, + ); + + // --8<-- [start:quickstart_query_feature] + const features = await table + .query() + .select(["name", "role", "power_score"]) + .toArray(); + console.table(features); + // --8<-- [end:quickstart_query_feature] + expect(features[0]).toHaveProperty("power_score"); + + // --8<-- [start:quickstart_multimodal_bytes] + const arrow = await import("apache-arrow"); + const path = await import("node:path"); + const { readFile } = await import("node:fs/promises"); + + const imagePath = path.resolve( + "../../docs/static/assets/images/quickstart/sir-lancelot.jpg", + ); + const imageBytes = await readFile(imagePath); + const imageSchema = new arrow.Schema([ + new arrow.Field("id", new arrow.Utf8()), + new arrow.Field("description", new arrow.Utf8()), + new arrow.Field("image", new arrow.Binary()), + new arrow.Field( + "vector", + new arrow.FixedSizeList( + 4, + new arrow.Field("item", new arrow.Float32(), true), + ), + ), + ]); + const imageData = lancedb.makeArrowTable( + [ + { + id: "lancelot", + description: "Portrait of Sir Lancelot", + image: imageBytes, + vector: [0.9, 0.1, 0.5, 0.4], + }, + ], + { schema: imageSchema }, + ); + const multimodalTable = await db.createTable( + "character_images", + imageData, + { mode: "overwrite" }, + ); + // --8<-- [end:quickstart_multimodal_bytes] + expect(await multimodalTable.countRows()).toBe(1); // --8<-- [start:quickstart_open_table] - table = await db.openTable("adventurers"); + table = await db.openTable("characters"); // --8<-- [end:quickstart_open_table] // --8<-- [start:quickstart_add_data] const moreData = [ - { id: "7", text: "mage", vector: [0.6, 0.3, 0.4] }, - { id: "8", text: "bard", vector: [0.3, 0.8, 0.4] }, + { + id: "4", + name: "Morgana", + role: "Sorceress", + description: "Powerful sorceress of Avalon.", + stats: { strength: 2, magic: 5, leadership: 4, wisdom: 4 }, + vector: [0.3, 0.9, 0.6, 0.8], + power_score: 3.75, + }, ]; // Add data to table await table.add(moreData); // --8<-- [end:quickstart_add_data] - expect(await table.countRows()).toBe(6); + expect(await table.countRows()).toBe(4); // --8<-- [start:quickstart_vector_search_2] - // Let's search for vectors similar to "wizard" - queryVector = [0.7, 0.3, 0.5]; + // Search for examples similar to a "powerful sorceress" + queryVector = [0.3, 0.9, 0.6, 0.8]; const results = await table.search(queryVector).limit(2).toArray(); console.table(results); // --8<-- [end:quickstart_vector_search_2] - expect(results[0].text).toBe("mage"); + expect(results[0].name).toBe("Morgana"); }); });