diff --git a/docs/docs.json b/docs/docs.json
index 7f237c5b..7d1519f7 100644
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -63,6 +63,16 @@
               "performance"
             ]
           },
+          {
+            "group": "Model training",
+            "pages": [
+              "training/why-lancedb",
+              "training/index",
+              "training/torch",
+              "training/object-detection",
+              "training/vlm-finetuning"
+            ]
+          },
           {
             "group": "Guides",
             "pages": [
@@ -141,15 +151,6 @@
                   "storage/index",
                   "storage/configuration"
                 ]
-              },
-              {
-                "group": "Training",
-                "pages": [
-                  "training/index",
-                  "training/torch",
-                  "training/object-detection",
-                  "training/vlm-finetuning"
-                ]
               }
             ]
           },
diff --git a/docs/index.mdx b/docs/index.mdx
index 83f4e19d..c9cf4f75 100644
--- a/docs/index.mdx
+++ b/docs/index.mdx
@@ -3,51 +3,85 @@ title: LanceDB
 sidebarTitle: "LanceDB"
 description: "Multimodal lakehouse for AI."
 icon: "/static/assets/logo/lancedb-icon-gray.svg"
-keywords: ["open source", "oss"]
+keywords: ["multimodal lakehouse", "training", "feature engineering", "search", "open source", "oss"]
 ---
 
-**LanceDB** is a [multimodal lakehouse](https://lancedb.com/blog/multimodal-lakehouse/) for
-AI, built on top of [Lance](/lance), an open-source lakehouse format. Below, we list a few
-ways LanceDB can help you build and scale your AI and ML workloads.
+**LanceDB** is a [multimodal lakehouse](https://lancedb.com/blog/multimodal-lakehouse/) for AI teams that need
+one data layer for curation, feature engineering, search and retrieval, and model training.
+It is built on top of [Lance](/lance), an open-source lakehouse format designed for multimodal AI data.
+
+Move from data exploration to model training on one, unified platform without needing to manage a
+fragmented stack of storage, feature, retrieval, and training systems.
+
+## Build better models, faster
+
+Training data and experimentation slow down when raw data, metadata, embeddings, features, and governance
+artifacts live in separate systems. LanceDB keeps them together in one versioned multimodal table, so AI teams spend less
+time stitching infrastructure together and more time improving datasets, testing features, and keeping GPUs fed.
+
+![Training data lifecycle: Curation, Feature Engineering, Search and Retrieval, Training](/static/assets/images/overview/training-data-lifecycle.svg)
+
+Use the same table to curate training data, add derived features, retrieve examples, and feed training jobs that rely on expensive GPUs.
+Training workloads can sample, shuffle, and scan projected columns from local storage or object storage, then assemble
+GPU-ready batches from a tagged dataset version.
+
+For a deeper look at how this works in training pipelines, start with [Why LanceDB for training](/training/why-lancedb).
+
+## LanceDB suite
+
+The LanceDB suite includes LanceDB OSS, an open-source embedded retrieval library, and LanceDB Enterprise,
+a multimodal lakehouse platform for the full AI data lifecycle.
+OSS is easy to set up on a local machine for search and regular-scale workflows. LanceDB Enterprise is built
+for teams that need scale without building bespoke infrastructure for curation,
+feature engineering, search and retrieval, and efficient training data access.
+
+![LanceDB suite: OSS search and Enterprise multimodal lakehouse on Lance format](/static/assets/images/overview/lancedb-suite.svg)
+
+## Why teams use LanceDB
 
 <Steps>
-  <Step title="High-performance random access and data management for model training">
-    Use LanceDB to curate, explore and distribute very large multimodal datasets for training and fine-tuning models.
-    LanceDB comes with built-in table versioning, schema evolution, and fast random access, making it far more efficient to do
-    dataset slicing, sampling, filtering and shuffles on large, rapidly evolving datasets.
+  <Step title="One table for the whole AI data loop">
+    Store images, video, audio, text, annotations, embeddings, and model-generated features together in one schema-enforced table.
+    The same table can support dataset curation, feature backfills, experiment splits, retrieval, and training.
+  </Step>
+  <Step title="High-throughput data access for training">
+    Training workloads mix fast random access with high-throughput sequential scans. LanceDB is designed for both, so
+    teams can shuffle data into GPU-ready batches more efficiently, improve input throughput, and iterate on experiments faster.
   </Step>
-  <Step title="Massively scalable, fast and high-quality retrieval − without breaking the bank">
-    Use LanceDB as the data + retrieval layer for production AI workloads: RAG, agents, semantic search,
-    recommendation systems, and more.
-    Keep multimodal data, metadata, and embeddings in the same table and query them via vector search,
-    full-text search or SQL. Easily add new features (columns in your tables) as your
-    application evolves, without copying existing data.
+  <Step title="Fast, versatile search and retrieval">
+    Whether the end user is a human or an agent, LanceDB powers production retrieval workloads such as semantic search,
+    hybrid search, RAG, agent memory, and recommendation systems. Retrieval runs against the same LanceDB tables used
+    for curation, feature engineering, and training workflows.
   </Step>
 </Steps>
 
-LanceDB is designed for a variety of workloads and deployment scenarios, and supports use cases
-that are way beyond traditional vector search. The LanceDB suite includes LanceDB OSS, an open-source embedded library,
-and LanceDB Enterprise, a distributed and managed multimodal lakehouse.
-Both are built on top of the same open-source Lance format and table abstractions.
-
-![](/static/assets/images/overview/lancedb-suite.png)
+## Start with your workload
 
-## Use cases
-
-- **Search**: Build high-performance search and retrieval applications using LanceDB's optimized storage, including vector search, full-text search, and hybrid search with secondary indexes.
-- **Data Curation**: Manage and filter on petabyte-scale multimodal datasets, including video and point cloud data, to gain insights, explore data and inform model development.
-- **Feature engineering**: Add new columns (features), create embeddings, and transform your data at
-scale. LanceDB lets you extend tables both vertically and horizontally with minimal I/O overhead.
-- **Training**: Efficiently access and manage large-scale multimodal datasets for training and fine-tuning AI models.
+<CardGroup cols={2}>
+  <Card title="Train and fine-tune models" icon="fire" href="/training/why-lancedb">
+    Learn why LanceDB works well as the data layer for training workloads.
+  </Card>
+  <Card title="Load data into PyTorch" icon="boxes-stacked" href="/training/">
+    Use LanceDB tables and permutations for projected, shuffled, random-access training reads.
+  </Card>
+  <Card title="Browse ready-to-use datasets" icon="database" href="/datasets">
+    Explore Lance-formatted multimodal datasets with raw bytes, metadata, embeddings, and indices.
+  </Card>
+  <Card title="Build search and retrieval" icon="search" href="/search/">
+    Use vector search, full-text search, hybrid search, reranking, filtering, and SQL.
+  </Card>
+</CardGroup>
 
-## Choose how you run LanceDB
+## From local development to production scale
 
-Depending on your needs, you can choose one of the following ways to run LanceDB.
+LanceDB OSS and LanceDB Enterprise share the same Lance format and table model. Start locally with the embedded OSS
+library, then move to Enterprise when your team needs distributed scale, managed infrastructure, private deployment,
+or higher-throughput curation, feature engineering, search and retrieval, and training workflows.
 
 ### 1. LanceDB OSS
 The fastest way to get started is the open-source embedded library, with client SDKs in Python, TypeScript
-and Rust. Run it locally during development, then use the same data model and APIs as you scale up
-and need a managed solution. Start here:
+and Rust. Run it locally in just a few steps, which lets you explore datasets, curate data, and run search and retrieval workloads
+for agents. Start here:
 
 <Columns cols={2}>
   <Card
@@ -59,19 +93,18 @@ and need a managed solution. Start here:
 </Card>
   <Card
     title="Basic Table Operations"
-    icon="search"
+    icon="table"
     href="/tables/"
   >
-    Create tables, search vectors, and modify data in LanceDB.
+    Create tables, evolve schemas, version data, and modify rows in LanceDB.
   </Card>
 </Columns>
 
 ### 2. LanceDB Enterprise
 
-[LanceDB Enterprise](/enterprise) is a distributed and managed **multimodal lakehouse** built for
-search, curation, feature engineering, and training-oriented data access workflows
-on top of the same core table abstraction. This eliminates the need for teams to build bespoke
-infrastructure to manage petabyte-scale multimodal datasets.
+[LanceDB Enterprise](/enterprise) is a petabyte-scale (and beyond), distributed **multimodal lakehouse** platform built for
+search, curation, feature engineering, and high-throughput training data access workflows on top of the same core table
+abstraction. This eliminates the need for teams to build bespoke infrastructure to manage large multimodal datasets.
 To set up LanceDB Enterprise in your organization, reach out to us at
 [contact@lancedb.com](mailto:contact@lancedb.com).
 
@@ -88,4 +121,4 @@ private deployments, and can operate under strict [security requirements](/enter
   href="/enterprise/quickstart"
 >
   Get started with LanceDB Enterprise in minutes.
-</Card>
\ No newline at end of file
+</Card>
diff --git a/docs/lance.mdx b/docs/lance.mdx
index 88c26722..e7a650e5 100644
--- a/docs/lance.mdx
+++ b/docs/lance.mdx
@@ -5,15 +5,15 @@ description: "Open-source lakehouse format for multimodal AI."
 icon: "/static/assets/logo/lance-logo-gray.svg"
 ---
 
-[Lance](https://lance.org/) is an open-source lakehouse format, which provides the
-foundation for LanceDB's capabilities. It provides a file format,
-table format, and catalog spec with multimodal data at the center of its design, allowing developers
+[Lance](https://lance.org/) is an open-source, columnar lakehouse format for multimodal AI.
+It provides a file format, table format, and lightweight catalog spec, allowing developers
 to build a complete open lakehouse on top of object storage.
 
-Building on top of open foundations and optimizing the format for AI workloads brings
-high-performance vector search, full-text search, random access, and feature engineering capabilities
-to a single unified system ([LanceDB](/enterprise)), eliminating the need for bespoke ETL and data pipelines that move data
-to multiple other specialized data systems.
+Building on top of open foundations and optimizing the format for random access
+(without compromising scan performance) enables
+high-performance vector search, full-text search, indexing, and feature engineering capabilities.
+[LanceDB](/enterprise) builds on these capabilities so teams can work with one multimodal data layer
+instead of moving data across separate storage, search, feature, and training systems.
 
 <Card
   title="Lance format documentation"
@@ -23,15 +23,17 @@ to multiple other specialized data systems.
   Visit the Lance format documentation to learn more about its design, features, and how it enables the multimodal lakehouse.
 </Card>
 
-## Advantages of the Lance format
+## Capabilities of the Lance format
 
-Advantage | Description
+Capability | What it enables
 --- | ---
-Multimodal storage | Efficiently holds vectors, images, videos, audio, text, and more
-Version control | Built-in data versioning for reproducible ML experiments and data lineage
-ML-optimized | Designed for training and inference workloads with fast random access
-Query performance | Columnar storage enables blazing-fast vector search and analytics
-Cloud-native | Seamless integration with cloud object stores (S3, GCS, Azure Blob)
+Multimodal storage | Store images, video, audio, text, embeddings, annotations, metadata, features, and more, all in one table.
+First-class blob API | Store large binary objects such as images, video, audio, and model artifacts in blob columns with lazy reads and streaming byte access.
+Fast random access and scans | Sample, shuffle, and retrieve individual rows efficiently without giving up high-throughput sequential reads.
+Flexible data evolution | Add, drop, rename, or alter columns as datasets change, often without rewriting existing data files.
+Versioned tables | Reproduce experiments, restore previous states, and tie downstream artifacts to the exact table version they used.
+Hybrid search and indexing | Combine vector search, full-text search, and scalar filters on the same dataset with Lance indexes.
+Open lakehouse interoperability | Build on object storage and connect Lance tables to open engines such as PyTorch, Ray, Spark, Trino, DuckDB and Polars.
 
 ## Key concepts
 
diff --git a/docs/static/assets/images/overview/lancedb-suite.svg b/docs/static/assets/images/overview/lancedb-suite.svg
new file mode 100644
index 00000000..92a9335b
--- /dev/null
+++ b/docs/static/assets/images/overview/lancedb-suite.svg
@@ -0,0 +1,95 @@
+<svg width="2048" height="540" viewBox="0 0 2048 540" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
+  <title id="title">LanceDB suite</title>
+  <desc id="desc">LanceDB OSS supports search. LanceDB Enterprise is a multimodal lakehouse for curation, feature engineering, search and retrieval, and training. Both are built on the Lance open lakehouse format for multimodal AI.</desc>
+  <defs>
+    <filter id="cardShadow" x="-8%" y="-18%" width="116%" height="136%" color-interpolation-filters="sRGB">
+      <feDropShadow dx="0" dy="16" stdDeviation="18" flood-color="#4F2A1A" flood-opacity="0.16"/>
+    </filter>
+    <linearGradient id="enterpriseGradient" x1="512" y1="90" x2="2020" y2="90" gradientUnits="userSpaceOnUse">
+      <stop offset="0" stop-color="#E9C9F6"/>
+      <stop offset="0.52" stop-color="#F39A9A"/>
+      <stop offset="1" stop-color="#FF744F"/>
+    </linearGradient>
+    <linearGradient id="curationGradient" x1="512" y1="272" x2="872" y2="272" gradientUnits="userSpaceOnUse">
+      <stop offset="0" stop-color="#E8C9F4"/>
+      <stop offset="1" stop-color="#EFC5E2"/>
+    </linearGradient>
+    <linearGradient id="featureGradient" x1="894" y1="272" x2="1254" y2="272" gradientUnits="userSpaceOnUse">
+      <stop offset="0" stop-color="#F0ABB8"/>
+      <stop offset="1" stop-color="#F49A92"/>
+    </linearGradient>
+    <linearGradient id="searchGradient" x1="1276" y1="272" x2="1636" y2="272" gradientUnits="userSpaceOnUse">
+      <stop offset="0" stop-color="#F78F85"/>
+      <stop offset="1" stop-color="#FA8265"/>
+    </linearGradient>
+    <linearGradient id="trainingGradient" x1="1658" y1="272" x2="2018" y2="272" gradientUnits="userSpaceOnUse">
+      <stop offset="0" stop-color="#FF7C5D"/>
+      <stop offset="1" stop-color="#FF704E"/>
+    </linearGradient>
+  </defs>
+
+  <rect x="28" y="28" width="464" height="146" rx="8" style="fill:#FFFFFF !important;stroke:#3E3A35 !important;" stroke-width="2" filter="url(#cardShadow)"/>
+  <rect x="28" y="202" width="464" height="146" rx="8" style="fill:#FFFFFF !important;stroke:#3E3A35 !important;" stroke-width="2" filter="url(#cardShadow)"/>
+
+  <rect x="512" y="28" width="1506" height="146" rx="8" style="fill:url(#enterpriseGradient) !important;" filter="url(#cardShadow)"/>
+  <rect x="516" y="32" width="1498" height="138" rx="6" style="fill:#FFFFFF !important;"/>
+  <rect x="512" y="202" width="360" height="146" rx="8" style="fill:url(#curationGradient) !important;" filter="url(#cardShadow)"/>
+  <rect x="516" y="206" width="352" height="138" rx="6" style="fill:#FFFFFF !important;"/>
+  <rect x="894" y="202" width="360" height="146" rx="8" style="fill:url(#featureGradient) !important;" filter="url(#cardShadow)"/>
+  <rect x="898" y="206" width="352" height="138" rx="6" style="fill:#FFFFFF !important;"/>
+  <rect x="1276" y="202" width="360" height="146" rx="8" style="fill:url(#searchGradient) !important;" filter="url(#cardShadow)"/>
+  <rect x="1280" y="206" width="352" height="138" rx="6" style="fill:#FFFFFF !important;"/>
+  <rect x="1658" y="202" width="360" height="146" rx="8" style="fill:url(#trainingGradient) !important;" filter="url(#cardShadow)"/>
+  <rect x="1662" y="206" width="352" height="138" rx="6" style="fill:#FFFFFF !important;"/>
+
+  <rect x="28" y="376" width="1990" height="116" rx="8" style="fill:#FFFFFF !important;stroke:#665FFF !important;" stroke-width="3" filter="url(#cardShadow)"/>
+
+  <g style="fill:#241712 !important;">
+    <circle cx="78" cy="78" r="9"/>
+    <circle cx="94" cy="78" r="9"/>
+    <circle cx="110" cy="78" r="9"/>
+    <circle cx="78" cy="94" r="9"/>
+    <circle cx="110" cy="94" r="9"/>
+    <circle cx="126" cy="94" r="9"/>
+    <circle cx="78" cy="110" r="9"/>
+    <circle cx="94" cy="110" r="9"/>
+    <circle cx="110" cy="110" r="9"/>
+    <circle cx="94" cy="126" r="9"/>
+    <circle cx="126" cy="126" r="9"/>
+  </g>
+  <text x="162" y="117" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="44" font-weight="760">LanceDB OSS</text>
+  <text x="260" y="291" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="42" font-weight="500">Search</text>
+
+  <g style="fill:#241712 !important;">
+    <circle cx="748" cy="78" r="9"/>
+    <circle cx="764" cy="78" r="9"/>
+    <circle cx="780" cy="78" r="9"/>
+    <circle cx="748" cy="94" r="9"/>
+    <circle cx="780" cy="94" r="9"/>
+    <circle cx="796" cy="94" r="9"/>
+    <circle cx="748" cy="110" r="9"/>
+    <circle cx="764" cy="110" r="9"/>
+    <circle cx="780" cy="110" r="9"/>
+    <circle cx="764" cy="126" r="9"/>
+    <circle cx="796" cy="126" r="9"/>
+  </g>
+  <text x="834" y="117" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="44" font-weight="760">LanceDB Enterprise - Multimodal Lakehouse</text>
+
+  <text x="692" y="285" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="38" font-weight="500">Curation</text>
+  <text x="1074" y="265" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="38" font-weight="500">Feature</text>
+  <text x="1074" y="309" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="38" font-weight="500">Engineering</text>
+  <text x="1456" y="265" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="38" font-weight="500">Search &amp;</text>
+  <text x="1456" y="309" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="38" font-weight="500">Retrieval</text>
+  <text x="1838" y="285" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="38" font-weight="500">Training</text>
+
+  <rect x="456" y="414" width="42" height="42" rx="2" fill="#665FFF"/>
+  <g stroke="#FFFFFF" stroke-width="2.4" stroke-linecap="round" stroke-linejoin="round">
+    <path d="M467 446L487 426"/>
+    <path d="M467 438L479 426"/>
+    <path d="M475 446L487 434"/>
+    <path d="M482 420L493 424L493 438L482 434V420Z"/>
+  </g>
+  <text x="516" y="451" fill="#665FFF" font-family="Inter, Arial, sans-serif" font-size="44" font-weight="760">
+    <tspan font-weight="760">Lance</tspan><tspan font-weight="500"> Open Lakehouse Format for Multimodal AI</tspan>
+  </text>
+</svg>
diff --git a/docs/static/assets/images/overview/training-data-lifecycle.svg b/docs/static/assets/images/overview/training-data-lifecycle.svg
new file mode 100644
index 00000000..a533c70e
--- /dev/null
+++ b/docs/static/assets/images/overview/training-data-lifecycle.svg
@@ -0,0 +1,52 @@
+<svg width="1280" height="280" viewBox="0 0 1280 280" fill="none" xmlns="http://www.w3.org/2000/svg" role="img" aria-labelledby="title desc">
+  <title id="title">Training data lifecycle</title>
+  <desc id="desc">Four pillars of the LanceDB training data lifecycle: Curation, Feature Engineering, Search and Retrieval, and Training.</desc>
+  <defs>
+    <filter id="shadow" x="-20%" y="-20%" width="140%" height="140%" color-interpolation-filters="sRGB">
+      <feDropShadow dx="0" dy="14" stdDeviation="18" flood-color="#4f2a1a" flood-opacity="0.10"/>
+    </filter>
+    <linearGradient id="accent" x1="0" y1="0" x2="0" y2="1">
+      <stop offset="0" stop-color="#FFB08E"/>
+      <stop offset="1" stop-color="#FF7E4F"/>
+    </linearGradient>
+  </defs>
+
+  <rect x="20" y="24" width="260" height="220" rx="16" style="fill:#FFFFFF !important;stroke:#F29A75 !important;" stroke-width="1.5" filter="url(#shadow)"/>
+  <rect x="20" y="48" width="5" height="170" rx="2.5" fill="url(#accent)"/>
+  <rect x="350" y="24" width="260" height="220" rx="16" style="fill:#FFFFFF !important;stroke:#F29A75 !important;" stroke-width="1.5" filter="url(#shadow)"/>
+  <rect x="350" y="48" width="5" height="170" rx="2.5" fill="url(#accent)"/>
+  <rect x="680" y="24" width="260" height="220" rx="16" style="fill:#FFFFFF !important;stroke:#F29A75 !important;" stroke-width="1.5" filter="url(#shadow)"/>
+  <rect x="680" y="48" width="5" height="170" rx="2.5" fill="url(#accent)"/>
+  <rect x="1010" y="24" width="260" height="220" rx="16" style="fill:#FFFFFF !important;stroke:#F29A75 !important;" stroke-width="1.5" filter="url(#shadow)"/>
+  <rect x="1010" y="48" width="5" height="170" rx="2.5" fill="url(#accent)"/>
+
+  <path d="M292 134H338" stroke="#FF8A5C" stroke-width="2.5" stroke-linecap="round"/>
+  <path d="M622 134H668" stroke="#FF8A5C" stroke-width="2.5" stroke-linecap="round"/>
+  <path d="M952 134H998" stroke="#FF8A5C" stroke-width="2.5" stroke-linecap="round"/>
+
+  <g stroke="#EF7B4E" stroke-width="3" stroke-linecap="round" stroke-linejoin="round">
+    <rect x="112" y="84" width="26" height="26" rx="4"/>
+    <rect x="138" y="62" width="26" height="26" rx="4"/>
+  </g>
+  <text x="150" y="172" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="28" font-weight="700">Curation</text>
+
+  <g stroke="#EF7B4E" stroke-width="3" stroke-linecap="round" stroke-linejoin="round">
+    <rect x="447" y="72" width="66" height="46" rx="5"/>
+    <path d="M447 87H513M447 103H513M469 72V118M491 72V118"/>
+  </g>
+  <text x="480" y="165" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="28" font-weight="700">Feature</text>
+  <text x="480" y="200" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="28" font-weight="700">Engineering</text>
+
+  <g stroke="#EF7B4E" stroke-width="3" stroke-linecap="round" stroke-linejoin="round">
+    <circle cx="795" cy="90" r="19"/>
+    <path d="M809 104L828 123"/>
+  </g>
+  <text x="810" y="165" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="28" font-weight="700">Search &amp;</text>
+  <text x="810" y="200" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="28" font-weight="700">Retrieval</text>
+
+  <g stroke="#EF7B4E" stroke-width="3" stroke-linecap="round" stroke-linejoin="round">
+    <path d="M1106 124C1126 119 1141 107 1153 84C1160 96 1174 99 1190 92"/>
+    <circle cx="1191" cy="91" r="3" fill="#EF7B4E" stroke="none"/>
+  </g>
+  <text x="1140" y="172" text-anchor="middle" style="fill:#241712 !important;" font-family="Inter, Arial, sans-serif" font-size="28" font-weight="700">Training</text>
+</svg>
diff --git a/docs/training/why-lancedb.mdx b/docs/training/why-lancedb.mdx
new file mode 100644
index 00000000..0adfc81a
--- /dev/null
+++ b/docs/training/why-lancedb.mdx
@@ -0,0 +1,118 @@
+---
+title: "Why LanceDB for Training"
+sidebarTitle: "Why LanceDB for training"
+description: "Use LanceDB as the multimodal data layer for model training, fine-tuning, curation, and feature engineering workflows."
+icon: fire
+---
+
+LanceDB is built for AI teams that need a practical data layer between raw multimodal datasets and model training.
+Instead of moving data through separate systems for curation, feature engineering, search, manifests, and training,
+you can keep the whole workflow attached to one versioned LanceDB table.
+
+That table can hold images, video, audio, text, annotations, metadata, embeddings, tokenized fields, model outputs,
+quality signals, and training-ready tensors. As the dataset evolves, LanceDB lets you add new columns, filter rows,
+pin versions, and read batches without rewriting the original data.
+
+![Training data lifecycle: Curation, Feature Engineering, Search and Retrieval, Training](/static/assets/images/overview/training-data-lifecycle.svg)
+
+LanceDB gives these stages one platform, so curation, feature engineering, retrieval, and training stay connected.
+
+## A connected data lifecycle
+
+Training pipelines usually need more than a pile of files. They need curation, derived features, reproducible splits,
+fast random access, and a clean path into frameworks such as PyTorch. LanceDB keeps these pieces connected through
+the same table model, whether you organize a workflow as one table or several related tables.
+
+<Steps>
+  <Step title="Curate and slice the dataset">
+    Use filters, vector search, full-text search, and retrieval workflows to find the examples that matter: hard negatives,
+    long-tail failure modes, duplicate clusters, low-quality samples, or targeted fine-tuning slices.
+  </Step>
+  <Step title="Engineer features in place">
+    Add embeddings, detections, OCR output, labels, token IDs, hidden states, deduplication flags, or quality scores as
+    new columns. Lance's columnar layout and schema evolution avoid rewriting large raw media columns when you add features.
+  </Step>
+  <Step title="Create reproducible splits">
+    Build filtered splits and materialized views from the table instead of exporting CSV manifests. Data versions and tags
+    make it possible to tie a checkpoint back to the exact rows and features used for training.
+  </Step>
+  <Step title="Load batches for training">
+    Use fast random access and column projection to read only the columns a training step needs. LanceDB tables can be read
+    from local storage or object storage, and integrate with data loading patterns such as PyTorch datasets.
+  </Step>
+</Steps>
+
+## Lance as the foundation
+
+LanceDB is built on [Lance](https://lance.org/), an open-source lakehouse format designed for multimodal AI data.
+The table below highlights the Lance features that enable the multimodal lakehouse on top.
+
+| Capability | Why it matters for training |
+|---|---|
+| **Multimodal columns** | Store raw bytes, annotations, metadata, embeddings, and features together. |
+| **Fast random access** | Support shuffled and sampled reads without reshuffling the dataset on disk. |
+| **Column projection** | Read only images, tokens, labels, embeddings, or hidden states needed by a given run. |
+| **Schema evolution** | Add new feature columns without rewriting existing media columns. |
+| **Versioning** | Reproduce experiments against the same table snapshot, even as the dataset evolves. |
+| **Search and filtering** | Find and materialize useful training slices directly from the table. |
+
+## Search inside training workflows
+
+Search is not limited to QA systems, agents, or production retrieval apps. It is also a practical way to inspect,
+curate, and improve training data:
+
+- Find visually similar examples when debugging model failures.
+- Retrieve hard negatives or near-duplicates for contrastive training.
+- Combine vector search, full-text search, and metadata filters to build targeted fine-tuning slices.
+- Reuse the same table for both offline curation and production retrieval.
+
+In LanceDB, retrieval and training workflows can operate over the same multimodal tables instead of forcing teams to
+manage separate data systems for each stage.
+
+## Projects using LanceDB for training workflows
+
+<CardGroup cols={1}>
+  <Card
+    title="stable-worldmodel"
+    icon="github"
+    href="https://github.com/galilai-group/stable-worldmodel"
+  >
+    A platform for reproducible world-model research built on a LanceDB data layer, reporting faster data loading on Push-T workloads.
+  </Card>
+  <Card
+    title="le-wm"
+    icon="github"
+    href="https://github.com/lucas-maes/le-wm"
+  >
+    A joint-embedding predictive world model from pixels, trained on the stable-worldmodel platform and its LanceDB data layer.
+  </Card>
+  <Card
+    title="lerobot-lancedb"
+    icon="github"
+    href="https://github.com/lancedb/lerobot-lancedb"
+  >
+    A drop-in LanceDB backend for Hugging Face LeRobot datasets with faster loading across robotics datasets.
+  </Card>
+</CardGroup>
+
+In the world-model ecosystem, [stable-worldmodel](https://github.com/galilai-group/stable-worldmodel) reports
+3-4x faster data loading on Push-T versus HDF5 / MP4 at a fraction of the disk footprint. Across these projects,
+LanceDB and Lance provide the multimodal data layer that keeps raw observations, annotations, features, and training
+access patterns in one format instead of scattering them across task-specific stores.
+
+## Next steps
+
+<CardGroup cols={2}>
+  <Card title="Data loading and shuffles" icon="boxes-stacked" href="/training/">
+    Learn how to use LanceDB permutations to select rows, project columns, split datasets, and shuffle training reads.
+  </Card>
+  <Card title="PyTorch integration" icon="fire" href="/training/torch">
+    Use LanceDB tables and permutations with `torch.utils.data.DataLoader`.
+  </Card>
+  <Card title="Object detection example" icon="car" href="/training/object-detection">
+    Fine-tune an AV perception model on curated failure-mode slices backed by one LanceDB table.
+  </Card>
+  <Card title="VLM fine-tuning example" icon="image" href="/training/vlm-finetuning">
+    Fine-tune a VLM on TextVQA using LanceDB and Geneva to cache expensive training features.
+  </Card>
+</CardGroup>