From 7203478081605fe07260885a4d7eccc1e10b4f31 Mon Sep 17 00:00:00 2001
From: Jason O'Sullivan <jason.osullivan@cloudera.com>
Date: Wed, 4 Mar 2026 12:14:55 +0000
Subject: [PATCH 1/5] HDDS-14303. updating spark3 user guide

---
 .../04-user-guide/02-integrations/06-spark.md | 189 ++++++++++++++++-
 .../04-user-guide/03-integrations/06-spark.md | 190 +++++++++++++++++-
 2 files changed, 373 insertions(+), 6 deletions(-)
diff --git a/docs/04-user-guide/02-integrations/06-spark.md b/docs/04-user-guide/02-integrations/06-spark.md
index 10f30e6887..c55e811e30 100644
--- a/docs/04-user-guide/02-integrations/06-spark.md
+++ b/docs/04-user-guide/02-integrations/06-spark.md
@@ -1,8 +1,189 @@
 ---
-draft: true
+sidebar_label: Spark
 ---
 
-# Spark
+# Using Apache Spark with Ozone
 
-**TODO:** File a subtask under [HDDS-9858](https://issues.apache.org/jira/browse/HDDS-9858) and complete this page or section.
-**TODO:** Uncomment link to this page in src/pages/index.js
+Apache Spark is a widely used unified analytics engine for large-scale data processing. Ozone can serve as a scalable storage layer for Spark applications, allowing you to read and write data directly from/to Ozone clusters using familiar Spark APIs.
+
+:::note
+This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Apache Ozone 2.1.0.
+:::
+
+## Overview
+
+Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended, especially in CDP environments.
+
+Key benefits include:
+
+- Storing large datasets generated or consumed by Spark jobs directly in Ozone.
+- Leveraging Ozone's scalability and object storage features for Spark workloads.
+- Using standard Spark DataFrame and RDD APIs to interact with Ozone data.
+
+## Prerequisites
+
+1. **Ozone Cluster:** A running Ozone cluster.
+2. **Ozone Client JARs:** The `ozone-filesystem-hadoop3.jar` must be available on the Spark driver and executor classpath.
+3. **Hadoop 3.4.x runtime (Ozone 2.1.0+):** Ozone 2.1.0 removed bundled copies of several Hadoop classes (`LeaseRecoverable`, `SafeMode`, `SafeModeAction`) and now requires them from the runtime classpath ([HDDS-13574](https://issues.apache.org/jira/browse/HDDS-13574)). Since Spark 3.5.x ships with Hadoop 3.3.4, you must add `hadoop-common-3.4.x.jar` to the Spark classpath alongside the existing Hadoop JARs.
+4. **Configuration:** Spark needs access to Ozone configuration (`core-site.xml` and potentially `ozone-site.xml`) to connect to the Ozone cluster.
+
+## Configuration
+
+### 1. Core Site (`core-site.xml`)
+
+For `core-site.xml` configuration, refer to the [Ozone File System (ofs) Configuration section](../01-client-interfaces/02-ofs.md#configuration).
+
+### 2. Spark Configuration (`spark-defaults.conf` or `--conf`)
+
+While Spark often picks up settings from `core-site.xml` on the classpath, explicitly setting the implementation can sometimes be necessary:
+
+```properties
+spark.hadoop.fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
+spark.hadoop.fs.o3fs.impl=org.apache.hadoop.fs.ozone.OzoneFileSystem
+```
+
+### 3. Security (Kerberos)
+
+If your Ozone and Spark clusters are Kerberos-enabled, Spark needs permission to obtain delegation tokens for Ozone. Configure the following property in `spark-defaults.conf`or via`--conf`, specifying your Ozone filesystem URI:
+
+```properties
+# For YARN deployments in spark3+
+spark.kerberos.access.hadoopFileSystems=ofs://ozone1/
+```
+
+Replace `ozone1` with your OM Service ID. Ensure the user running the Spark job has a valid Kerberos ticket (`kinit`).
+
+## Usage Examples
+
+You can read and write data using `ofs://` URIs like any other Hadoop-compatible filesystem.
+
+**URI Format:** `ofs://<om-service-id>/<volume>/<bucket>/path/to/key>`
+
+### Reading Data (Scala)
+
+```scala
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession.builder.appName("Ozone Spark Read Example").getOrCreate()
+
+// Read a CSV file from Ozone
+val df = spark.read.format("csv")
+  .option("header", "true")
+  .option("inferSchema", "true")
+  .load("ofs://ozone1/volume1/bucket1/input/data.csv")
+
+df.show()
+
+spark.stop()
+```
+
+### Writing Data (Scala)
+
+```scala
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession.builder.appName("Ozone Spark Write Example").getOrCreate()
+
+// Assume 'df' is a DataFrame you want to write
+val data = Seq(("Alice", 1), ("Bob", 2), ("Charlie", 3))
+val df = spark.createDataFrame(data).toDF("name", "id")
+
+// Write DataFrame to Ozone as Parquet files
+df.write.mode("overwrite")
+  .parquet("ofs://ozone1/volume1/bucket1/output/users.parquet")
+
+spark.stop()
+```
+
+### Reading Data (Python)
+
+```python
+from pyspark.sql import SparkSession
+
+spark = SparkSession.builder.appName("Ozone Spark Read Example").getOrCreate()
+
+# Read a CSV file from Ozone
+df = spark.read.format("csv") \
+    .option("header", "true") \
+    .option("inferSchema", "true") \
+    .load("ofs://ozone1/volume1/bucket1/input/data.csv")
+
+df.show()
+
+spark.stop()
+```
+
+### Writing Data (Python)
+
+```python
+from pyspark.sql import SparkSession
+
+spark = SparkSession.builder.appName("Ozone Spark Write Example").getOrCreate()
+
+# Assume 'df' is a DataFrame you want to write
+data = [("Alice", 1), ("Bob", 2), ("Charlie", 3)]
+columns = ["name", "id"]
+df = spark.createDataFrame(data, columns)
+
+# Write DataFrame to Ozone as Parquet files
+df.write.mode("overwrite") \
+    .parquet("ofs://ozone1/volume1/bucket1/output/users.parquet")
+
+spark.stop()
+```
+
+## Spark on Kubernetes
+
+The recommended approach for running Spark on Kubernetes with Ozone is to bake the ozone-filesystem-hadoop3-client-*.jar, the hadoop-common-3.4.x.jar (if using Ozone 2.1.0+), and core-site.xml directly into a custom Spark image.
+
+1. **Build a Custom Spark Image:** Place the Ozone client JAR and Hadoop compatibility JAR in /opt/spark/jars/, which is on the default Spark classpath, and core-site.xml in /opt/spark/conf/:
+```dockerfile
+FROM apache/spark:3.5.8-scala2.12-java11-python3-ubuntu
+
+USER root
+
+ADD https://repo1.maven.org/maven2/org/apache/ozone/ozone-filesystem-hadoop3-client/2.1.0/ozone-filesystem-hadoop3-client-2.1.0.jar \
+    /opt/spark/jars/
+
+# Ozone 2.1.0+ requires Hadoop 3.4.x classes (HDDS-13574).
+# Add alongside (not replacing) Spark's bundled hadoop-common-3.3.4.jar.
+ADD https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/3.4.2/hadoop-common-3.4.2.jar \
+    /opt/spark/jars/
+
+COPY core-site.xml /opt/spark/conf/core-site.xml
+COPY ozone_write.py /opt/spark/work-dir/ozone_write.py
+
+USER spark
+```
+Where core-site.xml contains at minimum:
+```xml
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+  <property>
+    <name>fs.ofs.impl</name>
+    <value>org.apache.hadoop.fs.ozone.RootedOzoneFileSystem</value>
+  </property>
+  <property>
+    <name>fs.o3fs.impl</name>
+    <value>org.apache.hadoop.fs.ozone.OzoneFileSystem</value>
+  </property>
+  <property>
+    <name>ozone.om.address</name>
+    <value>om-host.example.com:9862</value>
+  </property>
+</configuration>
+```
+2. **Submit `Spark-submit`:**
+    ```bash
+   ./bin/spark-submit \
+     --master k8s://https://<kubernetes-api-server>:6443 \
+     --deploy-mode cluster \
+     --name spark-ozone-example \
+     --conf spark.executor.instances=2 \
+     --conf spark.kubernetes.container.image=<your-repo>/spark-ozone:latest \
+     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
+     --conf spark.kubernetes.namespace=<your-namespace> \
+     local:///opt/spark/work-dir/ozone_example.py
+    ```
+Replace <kubernetes-api-server>, <your-repo>, and <your-namespace> with your environment values.
\ No newline at end of file
diff --git a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
index 5d0235c29e..c55e811e30 100644
--- a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
+++ b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
@@ -1,3 +1,189 @@
-# Spark
+---
+sidebar_label: Spark
+---
 
-**TODO:** File a subtask under [HDDS-9858](https://issues.apache.org/jira/browse/HDDS-9858) and complete this page or section.
+# Using Apache Spark with Ozone
+
+Apache Spark is a widely used unified analytics engine for large-scale data processing. Ozone can serve as a scalable storage layer for Spark applications, allowing you to read and write data directly from/to Ozone clusters using familiar Spark APIs.
+
+:::note
+This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Apache Ozone 2.1.0.
+:::
+
+## Overview
+
+Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended, especially in CDP environments.
+
+Key benefits include:
+
+- Storing large datasets generated or consumed by Spark jobs directly in Ozone.
+- Leveraging Ozone's scalability and object storage features for Spark workloads.
+- Using standard Spark DataFrame and RDD APIs to interact with Ozone data.
+
+## Prerequisites
+
+1. **Ozone Cluster:** A running Ozone cluster.
+2. **Ozone Client JARs:** The `ozone-filesystem-hadoop3.jar` must be available on the Spark driver and executor classpath.
+3. **Hadoop 3.4.x runtime (Ozone 2.1.0+):** Ozone 2.1.0 removed bundled copies of several Hadoop classes (`LeaseRecoverable`, `SafeMode`, `SafeModeAction`) and now requires them from the runtime classpath ([HDDS-13574](https://issues.apache.org/jira/browse/HDDS-13574)). Since Spark 3.5.x ships with Hadoop 3.3.4, you must add `hadoop-common-3.4.x.jar` to the Spark classpath alongside the existing Hadoop JARs.
+4. **Configuration:** Spark needs access to Ozone configuration (`core-site.xml` and potentially `ozone-site.xml`) to connect to the Ozone cluster.
+
+## Configuration
+
+### 1. Core Site (`core-site.xml`)
+
+For `core-site.xml` configuration, refer to the [Ozone File System (ofs) Configuration section](../01-client-interfaces/02-ofs.md#configuration).
+
+### 2. Spark Configuration (`spark-defaults.conf` or `--conf`)
+
+While Spark often picks up settings from `core-site.xml` on the classpath, explicitly setting the implementation can sometimes be necessary:
+
+```properties
+spark.hadoop.fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
+spark.hadoop.fs.o3fs.impl=org.apache.hadoop.fs.ozone.OzoneFileSystem
+```
+
+### 3. Security (Kerberos)
+
+If your Ozone and Spark clusters are Kerberos-enabled, Spark needs permission to obtain delegation tokens for Ozone. Configure the following property in `spark-defaults.conf`or via`--conf`, specifying your Ozone filesystem URI:
+
+```properties
+# For YARN deployments in spark3+
+spark.kerberos.access.hadoopFileSystems=ofs://ozone1/
+```
+
+Replace `ozone1` with your OM Service ID. Ensure the user running the Spark job has a valid Kerberos ticket (`kinit`).
+
+## Usage Examples
+
+You can read and write data using `ofs://` URIs like any other Hadoop-compatible filesystem.
+
+**URI Format:** `ofs://<om-service-id>/<volume>/<bucket>/path/to/key>`
+
+### Reading Data (Scala)
+
+```scala
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession.builder.appName("Ozone Spark Read Example").getOrCreate()
+
+// Read a CSV file from Ozone
+val df = spark.read.format("csv")
+  .option("header", "true")
+  .option("inferSchema", "true")
+  .load("ofs://ozone1/volume1/bucket1/input/data.csv")
+
+df.show()
+
+spark.stop()
+```
+
+### Writing Data (Scala)
+
+```scala
+import org.apache.spark.sql.SparkSession
+
+val spark = SparkSession.builder.appName("Ozone Spark Write Example").getOrCreate()
+
+// Assume 'df' is a DataFrame you want to write
+val data = Seq(("Alice", 1), ("Bob", 2), ("Charlie", 3))
+val df = spark.createDataFrame(data).toDF("name", "id")
+
+// Write DataFrame to Ozone as Parquet files
+df.write.mode("overwrite")
+  .parquet("ofs://ozone1/volume1/bucket1/output/users.parquet")
+
+spark.stop()
+```
+
+### Reading Data (Python)
+
+```python
+from pyspark.sql import SparkSession
+
+spark = SparkSession.builder.appName("Ozone Spark Read Example").getOrCreate()
+
+# Read a CSV file from Ozone
+df = spark.read.format("csv") \
+    .option("header", "true") \
+    .option("inferSchema", "true") \
+    .load("ofs://ozone1/volume1/bucket1/input/data.csv")
+
+df.show()
+
+spark.stop()
+```
+
+### Writing Data (Python)
+
+```python
+from pyspark.sql import SparkSession
+
+spark = SparkSession.builder.appName("Ozone Spark Write Example").getOrCreate()
+
+# Assume 'df' is a DataFrame you want to write
+data = [("Alice", 1), ("Bob", 2), ("Charlie", 3)]
+columns = ["name", "id"]
+df = spark.createDataFrame(data, columns)
+
+# Write DataFrame to Ozone as Parquet files
+df.write.mode("overwrite") \
+    .parquet("ofs://ozone1/volume1/bucket1/output/users.parquet")
+
+spark.stop()
+```
+
+## Spark on Kubernetes
+
+The recommended approach for running Spark on Kubernetes with Ozone is to bake the ozone-filesystem-hadoop3-client-*.jar, the hadoop-common-3.4.x.jar (if using Ozone 2.1.0+), and core-site.xml directly into a custom Spark image.
+
+1. **Build a Custom Spark Image:** Place the Ozone client JAR and Hadoop compatibility JAR in /opt/spark/jars/, which is on the default Spark classpath, and core-site.xml in /opt/spark/conf/:
+```dockerfile
+FROM apache/spark:3.5.8-scala2.12-java11-python3-ubuntu
+
+USER root
+
+ADD https://repo1.maven.org/maven2/org/apache/ozone/ozone-filesystem-hadoop3-client/2.1.0/ozone-filesystem-hadoop3-client-2.1.0.jar \
+    /opt/spark/jars/
+
+# Ozone 2.1.0+ requires Hadoop 3.4.x classes (HDDS-13574).
+# Add alongside (not replacing) Spark's bundled hadoop-common-3.3.4.jar.
+ADD https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/3.4.2/hadoop-common-3.4.2.jar \
+    /opt/spark/jars/
+
+COPY core-site.xml /opt/spark/conf/core-site.xml
+COPY ozone_write.py /opt/spark/work-dir/ozone_write.py
+
+USER spark
+```
+Where core-site.xml contains at minimum:
+```xml
+<?xml version="1.0"?>
+<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+<configuration>
+  <property>
+    <name>fs.ofs.impl</name>
+    <value>org.apache.hadoop.fs.ozone.RootedOzoneFileSystem</value>
+  </property>
+  <property>
+    <name>fs.o3fs.impl</name>
+    <value>org.apache.hadoop.fs.ozone.OzoneFileSystem</value>
+  </property>
+  <property>
+    <name>ozone.om.address</name>
+    <value>om-host.example.com:9862</value>
+  </property>
+</configuration>
+```
+2. **Submit `Spark-submit`:**
+    ```bash
+   ./bin/spark-submit \
+     --master k8s://https://<kubernetes-api-server>:6443 \
+     --deploy-mode cluster \
+     --name spark-ozone-example \
+     --conf spark.executor.instances=2 \
+     --conf spark.kubernetes.container.image=<your-repo>/spark-ozone:latest \
+     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
+     --conf spark.kubernetes.namespace=<your-namespace> \
+     local:///opt/spark/work-dir/ozone_example.py
+    ```
+Replace <kubernetes-api-server>, <your-repo>, and <your-namespace> with your environment values.
\ No newline at end of file

From 02e9e8895296797594e659e6122b609a8b943eab Mon Sep 17 00:00:00 2001
From: Jason O'Sullivan <jason.osullivan@cloudera.com>
Date: Wed, 4 Mar 2026 12:30:47 +0000
Subject: [PATCH 2/5] HDDS-14303. updating spark3 user guide

---
 .../04-user-guide/02-integrations/06-spark.md | 48 +++++++++++--------
 .../04-user-guide/03-integrations/06-spark.md | 42 +++++++++-------
 2 files changed, 53 insertions(+), 37 deletions(-)

diff --git a/docs/04-user-guide/02-integrations/06-spark.md b/docs/04-user-guide/02-integrations/06-spark.md
index c55e811e30..69f01ef20c 100644
--- a/docs/04-user-guide/02-integrations/06-spark.md
+++ b/docs/04-user-guide/02-integrations/06-spark.md
@@ -12,13 +12,13 @@ This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Ap
 
 ## Overview
 
-Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended, especially in CDP environments.
+Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended.
 
 Key benefits include:
 
 - Storing large datasets generated or consumed by Spark jobs directly in Ozone.
 - Leveraging Ozone's scalability and object storage features for Spark workloads.
-- Using standard Spark DataFrame and RDD APIs to interact with Ozone data.
+- Using standard Spark DataFrame and `RDD` APIs to interact with Ozone data.
 
 ## Prerequisites
 
@@ -103,9 +103,9 @@ from pyspark.sql import SparkSession
 spark = SparkSession.builder.appName("Ozone Spark Read Example").getOrCreate()
 
 # Read a CSV file from Ozone
-df = spark.read.format("csv") \
-    .option("header", "true") \
-    .option("inferSchema", "true") \
+df = spark.read.format("csv")
+    .option("header", "true")
+    .option("inferSchema", "true")
     .load("ofs://ozone1/volume1/bucket1/input/data.csv")
 
 df.show()
@@ -134,9 +134,12 @@ spark.stop()
 
 ## Spark on Kubernetes
 
-The recommended approach for running Spark on Kubernetes with Ozone is to bake the ozone-filesystem-hadoop3-client-*.jar, the hadoop-common-3.4.x.jar (if using Ozone 2.1.0+), and core-site.xml directly into a custom Spark image.
+The recommended approach for running Spark on Kubernetes with Ozone is to bake the `ozone-filesystem-hadoop3-client-*.jar` JAR, the `hadoop-common-3.4.x.jar` JAR (if using Ozone 2.1.0+), and core-site.xml directly into a custom Spark image.
+
+### Build a Custom Spark Image
+
+Place the Ozone client JAR and Hadoop compatibility JAR in /opt/spark/jars/, which is on the default Spark classpath, and core-site.xml in /opt/spark/conf/:
 
-1. **Build a Custom Spark Image:** Place the Ozone client JAR and Hadoop compatibility JAR in /opt/spark/jars/, which is on the default Spark classpath, and core-site.xml in /opt/spark/conf/:
 ```dockerfile
 FROM apache/spark:3.5.8-scala2.12-java11-python3-ubuntu
 
@@ -155,7 +158,9 @@ COPY ozone_write.py /opt/spark/work-dir/ozone_write.py
 
 USER spark
 ```
+
 Where core-site.xml contains at minimum:
+
 ```xml
 <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
@@ -174,16 +179,19 @@ Where core-site.xml contains at minimum:
   </property>
 </configuration>
 ```
-2. **Submit `Spark-submit`:**
-    ```bash
-   ./bin/spark-submit \
-     --master k8s://https://<kubernetes-api-server>:6443 \
-     --deploy-mode cluster \
-     --name spark-ozone-example \
-     --conf spark.executor.instances=2 \
-     --conf spark.kubernetes.container.image=<your-repo>/spark-ozone:latest \
-     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
-     --conf spark.kubernetes.namespace=<your-namespace> \
-     local:///opt/spark/work-dir/ozone_example.py
-    ```
-Replace <kubernetes-api-server>, <your-repo>, and <your-namespace> with your environment values.
\ No newline at end of file
+
+### Submit `Spark-submit`
+
+```bash
+./bin/spark-submit \
+  --master k8s://https://YOUR_KUBERNETES_API_SERVER:6443 \
+  --deploy-mode cluster \
+  --name spark-ozone-example \
+  --conf spark.executor.instances=2 \
+  --conf spark.kubernetes.container.image=YOUR_REPO/spark-ozone:latest \
+  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
+  --conf spark.kubernetes.namespace=YOUR_NAMESPACE \
+  local:///opt/spark/work-dir/ozone_example.py
+```
+
+Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with your environment values.
diff --git a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
index c55e811e30..1a7e61f0b3 100644
--- a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
+++ b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
@@ -12,13 +12,13 @@ This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Ap
 
 ## Overview
 
-Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended, especially in CDP environments.
+Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended.
 
 Key benefits include:
 
 - Storing large datasets generated or consumed by Spark jobs directly in Ozone.
 - Leveraging Ozone's scalability and object storage features for Spark workloads.
-- Using standard Spark DataFrame and RDD APIs to interact with Ozone data.
+- Using standard Spark DataFrame and `RDD` APIs to interact with Ozone data.
 
 ## Prerequisites
 
@@ -134,9 +134,12 @@ spark.stop()
 
 ## Spark on Kubernetes
 
-The recommended approach for running Spark on Kubernetes with Ozone is to bake the ozone-filesystem-hadoop3-client-*.jar, the hadoop-common-3.4.x.jar (if using Ozone 2.1.0+), and core-site.xml directly into a custom Spark image.
+The recommended approach for running Spark on Kubernetes with Ozone is to bake the `ozone-filesystem-hadoop3-client-*.jar` JAR, the `hadoop-common-3.4.x.jar` JAR (if using Ozone 2.1.0+), and core-site.xml directly into a custom Spark image.
+
+### Build a Custom Spark Image
+
+Place the Ozone client JAR and Hadoop compatibility JAR in /opt/spark/jars/, which is on the default Spark classpath, and core-site.xml in /opt/spark/conf/:
 
-1. **Build a Custom Spark Image:** Place the Ozone client JAR and Hadoop compatibility JAR in /opt/spark/jars/, which is on the default Spark classpath, and core-site.xml in /opt/spark/conf/:
 ```dockerfile
 FROM apache/spark:3.5.8-scala2.12-java11-python3-ubuntu
 
@@ -155,7 +158,9 @@ COPY ozone_write.py /opt/spark/work-dir/ozone_write.py
 
 USER spark
 ```
+
 Where core-site.xml contains at minimum:
+
 ```xml
 <?xml version="1.0"?>
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
@@ -174,16 +179,19 @@ Where core-site.xml contains at minimum:
   </property>
 </configuration>
 ```
-2. **Submit `Spark-submit`:**
-    ```bash
-   ./bin/spark-submit \
-     --master k8s://https://<kubernetes-api-server>:6443 \
-     --deploy-mode cluster \
-     --name spark-ozone-example \
-     --conf spark.executor.instances=2 \
-     --conf spark.kubernetes.container.image=<your-repo>/spark-ozone:latest \
-     --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
-     --conf spark.kubernetes.namespace=<your-namespace> \
-     local:///opt/spark/work-dir/ozone_example.py
-    ```
-Replace <kubernetes-api-server>, <your-repo>, and <your-namespace> with your environment values.
\ No newline at end of file
+
+### Submit `Spark-submit`
+
+```bash
+./bin/spark-submit \
+  --master k8s://https://YOUR_KUBERNETES_API_SERVER:6443 \
+  --deploy-mode cluster \
+  --name spark-ozone-example \
+  --conf spark.executor.instances=2 \
+  --conf spark.kubernetes.container.image=YOUR_REPO/spark-ozone:latest \
+  --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
+  --conf spark.kubernetes.namespace=YOUR_NAMESPACE \
+  local:///opt/spark/work-dir/ozone_example.py
+```
+
+Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with your environment values.

From d619f3fcf47b2a4d93928827f6359877de612510 Mon Sep 17 00:00:00 2001
From: Jason O'Sullivan <jason.osullivan@cloudera.com>
Date: Wed, 4 Mar 2026 12:39:21 +0000
Subject: [PATCH 3/5] HDDS-14303. updating spark3 user guide

---
 .../04-user-guide/02-integrations/06-spark.md | 19 +++++++++++++------
 .../04-user-guide/03-integrations/06-spark.md | 13 ++++++++++---
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/docs/04-user-guide/02-integrations/06-spark.md b/docs/04-user-guide/02-integrations/06-spark.md
index 69f01ef20c..765835577d 100644
--- a/docs/04-user-guide/02-integrations/06-spark.md
+++ b/docs/04-user-guide/02-integrations/06-spark.md
@@ -4,7 +4,7 @@ sidebar_label: Spark
 
 # Using Apache Spark with Ozone
 
-Apache Spark is a widely used unified analytics engine for large-scale data processing. Ozone can serve as a scalable storage layer for Spark applications, allowing you to read and write data directly from/to Ozone clusters using familiar Spark APIs.
+[Apache Spark](https://spark.apache.org/) is a widely used unified analytics engine for large-scale data processing. Ozone can serve as a scalable storage layer for Spark applications, allowing you to read and write data directly from/to Ozone clusters using familiar Spark APIs.
 
 :::note
 This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Apache Ozone 2.1.0.
@@ -12,7 +12,10 @@ This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Ap
 
 ## Overview
 
-Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended.
+Spark interacts with Ozone primarily through the OzoneFileSystem connector, which allows access using the `ofs://` URI scheme.
+Spark can also access Ozone through the S3 Gateway using the `s3a://` protocol, which is useful for porting existing cloud-native Spark applications to Ozone without changing application code.
+
+The older `o3fs://` scheme is supported for legacy compatibility but is not recommended for new deployments.
 
 Key benefits include:
 
@@ -39,7 +42,6 @@ While Spark often picks up settings from `core-site.xml` on the classpath, expli
 
 ```properties
 spark.hadoop.fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
-spark.hadoop.fs.o3fs.impl=org.apache.hadoop.fs.ozone.OzoneFileSystem
 ```
 
 ### 3. Security (Kerberos)
@@ -103,9 +105,9 @@ from pyspark.sql import SparkSession
 spark = SparkSession.builder.appName("Ozone Spark Read Example").getOrCreate()
 
 # Read a CSV file from Ozone
-df = spark.read.format("csv")
-    .option("header", "true")
-    .option("inferSchema", "true")
+df = spark.read.format("csv") \
+    .option("header", "true") \
+    .option("inferSchema", "true") \
     .load("ofs://ozone1/volume1/bucket1/input/data.csv")
 
 df.show()
@@ -195,3 +197,8 @@ Where core-site.xml contains at minimum:
 ```
 
 Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with your environment values.
+
+## Using the S3A Protocol
+
+Spark can also access Ozone through the S3 Gateway using the `s3a://` protocol. This is useful for porting existing cloud-native Spark applications to Ozone without changing application code.
+For configuration details, refer to the [S3A documentation](../01-client-interfaces/04-s3a.md).
diff --git a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
index 1a7e61f0b3..765835577d 100644
--- a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
+++ b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
@@ -4,7 +4,7 @@ sidebar_label: Spark
 
 # Using Apache Spark with Ozone
 
-Apache Spark is a widely used unified analytics engine for large-scale data processing. Ozone can serve as a scalable storage layer for Spark applications, allowing you to read and write data directly from/to Ozone clusters using familiar Spark APIs.
+[Apache Spark](https://spark.apache.org/) is a widely used unified analytics engine for large-scale data processing. Ozone can serve as a scalable storage layer for Spark applications, allowing you to read and write data directly from/to Ozone clusters using familiar Spark APIs.
 
 :::note
 This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Apache Ozone 2.1.0.
@@ -12,7 +12,10 @@ This guide covers Apache Spark 3.x. Examples were tested with Spark 3.5.x and Ap
 
 ## Overview
 
-Spark interacts with Ozone primarily through the OzoneFileSystem (ofs) connector, which allows access using the `ofs://` URI scheme. You can also use the older `o3fs://` scheme, though `ofs://` is generally recommended.
+Spark interacts with Ozone primarily through the OzoneFileSystem connector, which allows access using the `ofs://` URI scheme.
+Spark can also access Ozone through the S3 Gateway using the `s3a://` protocol, which is useful for porting existing cloud-native Spark applications to Ozone without changing application code.
+
+The older `o3fs://` scheme is supported for legacy compatibility but is not recommended for new deployments.
 
 Key benefits include:
 
@@ -39,7 +42,6 @@ While Spark often picks up settings from `core-site.xml` on the classpath, expli
 
 ```properties
 spark.hadoop.fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
-spark.hadoop.fs.o3fs.impl=org.apache.hadoop.fs.ozone.OzoneFileSystem
 ```
 
 ### 3. Security (Kerberos)
@@ -195,3 +197,8 @@ Where core-site.xml contains at minimum:
 ```
 
 Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with your environment values.
+
+## Using the S3A Protocol
+
+Spark can also access Ozone through the S3 Gateway using the `s3a://` protocol. This is useful for porting existing cloud-native Spark applications to Ozone without changing application code.
+For configuration details, refer to the [S3A documentation](../01-client-interfaces/04-s3a.md).

From 091a3ef728d96477bec6dde1758d099fafd2d2b7 Mon Sep 17 00:00:00 2001
From: Jason O'Sullivan <jason.osullivan@cloudera.com>
Date: Wed, 4 Mar 2026 12:52:35 +0000
Subject: [PATCH 4/5] HDDS-14303. updating spark3 user guide

---
 .../04-user-guide/02-integrations/06-spark.md | 17 +++++++--------
 .../04-user-guide/03-integrations/06-spark.md | 21 +++++++++----------
 2 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/docs/04-user-guide/02-integrations/06-spark.md b/docs/04-user-guide/02-integrations/06-spark.md
index 765835577d..8791213f17 100644
--- a/docs/04-user-guide/02-integrations/06-spark.md
+++ b/docs/04-user-guide/02-integrations/06-spark.md
@@ -26,7 +26,7 @@ Key benefits include:
 ## Prerequisites
 
 1. **Ozone Cluster:** A running Ozone cluster.
-2. **Ozone Client JARs:** The `ozone-filesystem-hadoop3.jar` must be available on the Spark driver and executor classpath.
+2. **Ozone Client JARs:** The `ozone-filesystem-hadoop3-client-*.jar` must be available on the Spark driver and executor classpath.
 3. **Hadoop 3.4.x runtime (Ozone 2.1.0+):** Ozone 2.1.0 removed bundled copies of several Hadoop classes (`LeaseRecoverable`, `SafeMode`, `SafeModeAction`) and now requires them from the runtime classpath ([HDDS-13574](https://issues.apache.org/jira/browse/HDDS-13574)). Since Spark 3.5.x ships with Hadoop 3.3.4, you must add `hadoop-common-3.4.x.jar` to the Spark classpath alongside the existing Hadoop JARs.
 4. **Configuration:** Spark needs access to Ozone configuration (`core-site.xml` and potentially `ozone-site.xml`) to connect to the Ozone cluster.
 
@@ -46,7 +46,9 @@ spark.hadoop.fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
 
 ### 3. Security (Kerberos)
 
-If your Ozone and Spark clusters are Kerberos-enabled, Spark needs permission to obtain delegation tokens for Ozone. Configure the following property in `spark-defaults.conf`or via`--conf`, specifying your Ozone filesystem URI:
+If your Ozone and Spark clusters are Kerberos-enabled, Spark needs permission to obtain delegation tokens for Ozone.
+
+Configure the following property in `spark-defaults.conf` or via `--conf`, specifying your Ozone filesystem URI:
 
 ```properties
 # For YARN deployments in spark3+
@@ -59,7 +61,7 @@ Replace `ozone1` with your OM Service ID. Ensure the user running the Spark job
 
 You can read and write data using `ofs://` URIs like any other Hadoop-compatible filesystem.
 
-**URI Format:** `ofs://<om-service-id>/<volume>/<bucket>/path/to/key>`
+**URI Format:** `ofs://<om-service-id>/<volume>/<bucket>/path/to/key`
 
 ### Reading Data (Scala)
 
@@ -171,10 +173,6 @@ Where core-site.xml contains at minimum:
     <name>fs.ofs.impl</name>
     <value>org.apache.hadoop.fs.ozone.RootedOzoneFileSystem</value>
   </property>
-  <property>
-    <name>fs.o3fs.impl</name>
-    <value>org.apache.hadoop.fs.ozone.OzoneFileSystem</value>
-  </property>
   <property>
     <name>ozone.om.address</name>
     <value>om-host.example.com:9862</value>
@@ -182,7 +180,7 @@ Where core-site.xml contains at minimum:
 </configuration>
 ```
 
-### Submit `Spark-submit`
+### Submit a Spark Job
 
 ```bash
 ./bin/spark-submit \
@@ -193,7 +191,7 @@ Where core-site.xml contains at minimum:
   --conf spark.kubernetes.container.image=YOUR_REPO/spark-ozone:latest \
   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
   --conf spark.kubernetes.namespace=YOUR_NAMESPACE \
-  local:///opt/spark/work-dir/ozone_example.py
+  local:///opt/spark/work-dir/ozone_write.py
 ```
 
 Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with your environment values.
@@ -201,4 +199,5 @@ Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with you
 ## Using the S3A Protocol
 
 Spark can also access Ozone through the S3 Gateway using the `s3a://` protocol. This is useful for porting existing cloud-native Spark applications to Ozone without changing application code.
+
 For configuration details, refer to the [S3A documentation](../01-client-interfaces/04-s3a.md).
diff --git a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
index 765835577d..9035ba092b 100644
--- a/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
+++ b/versioned_docs/version-2.1.0/04-user-guide/03-integrations/06-spark.md
@@ -26,7 +26,7 @@ Key benefits include:
 ## Prerequisites
 
 1. **Ozone Cluster:** A running Ozone cluster.
-2. **Ozone Client JARs:** The `ozone-filesystem-hadoop3.jar` must be available on the Spark driver and executor classpath.
+2. **Ozone Client JARs:** The `ozone-filesystem-hadoop3-client-*.jar` must be available on the Spark driver and executor classpath.
 3. **Hadoop 3.4.x runtime (Ozone 2.1.0+):** Ozone 2.1.0 removed bundled copies of several Hadoop classes (`LeaseRecoverable`, `SafeMode`, `SafeModeAction`) and now requires them from the runtime classpath ([HDDS-13574](https://issues.apache.org/jira/browse/HDDS-13574)). Since Spark 3.5.x ships with Hadoop 3.3.4, you must add `hadoop-common-3.4.x.jar` to the Spark classpath alongside the existing Hadoop JARs.
 4. **Configuration:** Spark needs access to Ozone configuration (`core-site.xml` and potentially `ozone-site.xml`) to connect to the Ozone cluster.
 
@@ -34,7 +34,7 @@ Key benefits include:
 
 ### 1. Core Site (`core-site.xml`)
 
-For `core-site.xml` configuration, refer to the [Ozone File System (ofs) Configuration section](../01-client-interfaces/02-ofs.md#configuration).
+For `core-site.xml` configuration, refer to the [Ozone File System (ofs) Configuration section](../client-interfaces/ofs#configuration).
 
 ### 2. Spark Configuration (`spark-defaults.conf` or `--conf`)
 
@@ -46,7 +46,9 @@ spark.hadoop.fs.ofs.impl=org.apache.hadoop.fs.ozone.RootedOzoneFileSystem
 
 ### 3. Security (Kerberos)
 
-If your Ozone and Spark clusters are Kerberos-enabled, Spark needs permission to obtain delegation tokens for Ozone. Configure the following property in `spark-defaults.conf`or via`--conf`, specifying your Ozone filesystem URI:
+If your Ozone and Spark clusters are Kerberos-enabled, Spark needs permission to obtain delegation tokens for Ozone.
+
+Configure the following property in `spark-defaults.conf` or via `--conf`, specifying your Ozone filesystem URI:
 
 ```properties
 # For YARN deployments in spark3+
@@ -59,7 +61,7 @@ Replace `ozone1` with your OM Service ID. Ensure the user running the Spark job
 
 You can read and write data using `ofs://` URIs like any other Hadoop-compatible filesystem.
 
-**URI Format:** `ofs://<om-service-id>/<volume>/<bucket>/path/to/key>`
+**URI Format:** `ofs://<om-service-id>/<volume>/<bucket>/path/to/key`
 
 ### Reading Data (Scala)
 
@@ -171,10 +173,6 @@ Where core-site.xml contains at minimum:
     <name>fs.ofs.impl</name>
     <value>org.apache.hadoop.fs.ozone.RootedOzoneFileSystem</value>
   </property>
-  <property>
-    <name>fs.o3fs.impl</name>
-    <value>org.apache.hadoop.fs.ozone.OzoneFileSystem</value>
-  </property>
   <property>
     <name>ozone.om.address</name>
     <value>om-host.example.com:9862</value>
@@ -182,7 +180,7 @@ Where core-site.xml contains at minimum:
 </configuration>
 ```
 
-### Submit `Spark-submit`
+### Submit a Spark Job
 
 ```bash
 ./bin/spark-submit \
@@ -193,7 +191,7 @@ Where core-site.xml contains at minimum:
   --conf spark.kubernetes.container.image=YOUR_REPO/spark-ozone:latest \
   --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
   --conf spark.kubernetes.namespace=YOUR_NAMESPACE \
-  local:///opt/spark/work-dir/ozone_example.py
+  local:///opt/spark/work-dir/ozone_write.py
 ```
 
 Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with your environment values.
@@ -201,4 +199,5 @@ Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with you
 ## Using the S3A Protocol
 
 Spark can also access Ozone through the S3 Gateway using the `s3a://` protocol. This is useful for porting existing cloud-native Spark applications to Ozone without changing application code.
-For configuration details, refer to the [S3A documentation](../01-client-interfaces/04-s3a.md).
+
+For configuration details, refer to the [S3A documentation](../client-interfaces/s3a).

From 881abf935ca91d65dd16585e2821078a42cad7ab Mon Sep 17 00:00:00 2001
From: Jason O'Sullivan <jason.osullivan@cloudera.com>
Date: Wed, 4 Mar 2026 14:09:56 +0000
Subject: [PATCH 5/5] HDDS-14303. updating spark3 user guide

---
 docs/04-user-guide/02-integrations/06-spark.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/04-user-guide/02-integrations/06-spark.md b/docs/04-user-guide/02-integrations/06-spark.md
index 8791213f17..9035ba092b 100644
--- a/docs/04-user-guide/02-integrations/06-spark.md
+++ b/docs/04-user-guide/02-integrations/06-spark.md
@@ -34,7 +34,7 @@ Key benefits include:
 
 ### 1. Core Site (`core-site.xml`)
 
-For `core-site.xml` configuration, refer to the [Ozone File System (ofs) Configuration section](../01-client-interfaces/02-ofs.md#configuration).
+For `core-site.xml` configuration, refer to the [Ozone File System (ofs) Configuration section](../client-interfaces/ofs#configuration).
 
 ### 2. Spark Configuration (`spark-defaults.conf` or `--conf`)
 
@@ -200,4 +200,4 @@ Replace `YOUR_KUBERNETES_API_SERVER`, `YOUR_REPO`, and `YOUR_NAMESPACE` with you
 
 Spark can also access Ozone through the S3 Gateway using the `s3a://` protocol. This is useful for porting existing cloud-native Spark applications to Ozone without changing application code.
 
-For configuration details, refer to the [S3A documentation](../01-client-interfaces/04-s3a.md).
+For configuration details, refer to the [S3A documentation](../client-interfaces/s3a).