From bf34ede6d5dbe6a1bc62b101458b89a0ffe3e73f Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Thu, 19 Feb 2026 22:28:41 +0530
Subject: [PATCH 01/12] HDDS-10611. Design document for MPU GC Optimization

---
 .../content/design/mpu-gc-optimization.md     | 344 ++++++++++++++++++
 1 file changed, 344 insertions(+)
 create mode 100644 hadoop-hdds/docs/content/design/mpu-gc-optimization.md

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
new file mode 100644
index 000000000000..f22d218cc255
--- /dev/null
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -0,0 +1,344 @@
+---
+title: Multipart Upload GC Pressure Optimizations
+summary: Change Multipart Upload Logic to improve OM GC Pressure
+date: 2026-02-19
+jira: HDDS-10611
+status: implemented
+author: Abhishek Pal, Rakesh Radhakrishnan
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Ozone MPU Optimization - Design Doc
+
+
+## Table of Contents
+1. [Motivation](#1-motivation)
+2. [Proposal](#2-proposal)
+* [Backward Compatibility](#backward-compatibility)
+* [Split-table design (V1)](#split-table-design-v1)
+* [Comparison: V0 (legacy) vs V1](#comparison-v0-legacy-vs-v1)
+* [2.1 Approach-1: Reuse multipartInfoTable with empty part list](#21-approach-1--reuse-multipartinfotable-with-empty-part-list)
+* [2.2 Approach-2: Introduce new multipartMetadataTable](#22-approach-2--introduce-new-multipartmetadatatable)
+* [2.3 Summary](#23-summary)
+* [2.4 Chosen Approach: Approach-1](#24-chosen-approach-approach-1)
+3. [Upgrades](#3-upgrades)
+4. [Benchmarking and Performance](#4-benchmarking-and-performance)
+5. [Open Questions](#5-open-questions)
+
+---
+
+## 1. Motivation
+Presently Ozone has several overheads when uploading large files via Multipart upload (MPU). This document presents a detailed design for optimizing the MPU storage layout to reduce these overheads.
+
+### Problem with the current MPU schema
+**Current design:**
+* One row per MPU: `key = /{vol}/{bucket}/{key}/{uploadId}`
+* Value = full `OmMultipartKeyInfo` with all parts inline.
+
+**Implications:**
+1. Each MPU part commit reads the full `OmMultipartKeyInfo`, deserializes it, adds one part, serializes it, and writes it back (HDDS-10611).
+2. RocksDB WAL logs each full write → WAL growth (HDDS-8238).
+3. GC pressure grows with the size of the object (HDDS-10611).
+
+#### a) Deserialization overhead
+| Operation | Current |
+| :--- | :--- |
+| Commit part N | Read + deserialize whole OmMultipartKeyInfo (N-1 parts) |
+
+#### b) WAL overhead
+Assuming one MPU part info object takes ~1.5KB.
+
+| Scenario | Current WAL |
+| :--- | :--- |
+| 1,000 parts | ~733 MB (1+2+...+1000) × 1.5 KB |
+
+#### c) GC pressure
+Current: Large short-lived objects per part commit.
+
+#### Existing Storage Layout Overview
+```protobuf
+MultipartKeyInfo {
+  uploadID : string
+  creationTime : uint64
+  type : ReplicationType
+  factor : ReplicationFactor (optional)
+  partKeyInfoList : repeated PartKeyInfo ← grows with each part
+  objectID : uint64 (optional)
+  updateID : uint64 (optional)
+  parentID : uint64 (optional)
+  ecReplicationConfig : optional
+}
+```
+
+---
+
+## 2. Proposal
+The idea is to split the content of `MultipartInfoTable`. Part information will be stored separately in a flattened schema (one row per part) instead of one giant object.
+
+### Split-table design (V1)
+Split MPU metadata into:
+* **Metadata table:** Lightweight per-MPU metadata (no part list).
+* **Parts table:** One row per part (flat structure).
+
+**New MultipartPartInfo Structure:**
+```protobuf
+message MultipartPartInfo {
+  required string partName = 1;
+  required uint32 partNumber = 2;
+  required string volumeName = 3;
+  required string bucketName = 4;
+  required string keyName = 5;
+  required uint64 dataSize = 6;
+  required uint64 modificationTime = 7;
+  repeated KeyLocationList keyLocationList = 8;
+  repeated hadoop.hdds.KeyValue metadata = 9;
+  optional FileEncryptionInfoProto fileEncryptionInfo = 10;
+  optional FileChecksumProto fileChecksum = 11;
+}
+```
+
+### Comparison: V0 (legacy) vs V1
+| Metric | Current (V0) | Split-Table (V1) |
+| :--- | :--- | :--- |
+| **Commit part N** | Read + deserialize whole list | Read Metadata (~200B) + write single PartKeyInfo |
+| **1,000 parts WAL** | ~733 MB | ~1.5 MB (or ~600KB with optimized info) |
+| **GC Pressure** | Large short-lived objects | Small metadata + single-part objects |
+
+---
+
+### 2.1. Approach-1 : Reuse multipartInfoTable with empty part list
+Reuse the existing table but introduce a new `multipartPartsTable`.
+
+**Storage Layout:**
+* **multipartInfoTable (RocksDB):**
+  * V0: Key → `OmMultipartKeyInfo` { parts inline }
+  * V1: Key → `OmMultipartKeyInfo` { empty list, schemaVersion: 1 }
+* **multipartPartsTable (RocksDB) [V1 only]:**
+  * `/uploadId/part1` → `PartKeyInfo`
+  * `/uploadId/part2` → `PartKeyInfo`
+  * `/uploadId/part3` → `PartKeyInfo`
+
+
+```protobuf
+message MultipartKeyInfo {
+    required string uploadID = 1;
+    required uint64 creationTime = 2;
+    required hadoop.hdds.ReplicationType type = 3;
+    optional hadoop.hdds.ReplicationFactor factor = 4;
+    repeated PartKeyInfo partKeyInfoList = 5;
+    optional uint64 objectID = 6;
+    optional uint64 updateID = 7;
+    optional uint64 parentID = 8;
+    optional hadoop.hdds.ECReplicationConfig ecReplicationConfig = 9;
+    optional uint32 schemaVersion = 10; // default 0
+}
+```
+
+#### V0: OmMultipartKeyInfo (parts inline)
+```
+OmMultipartKeyInfo {
+  uploadID
+  creationTime
+  type
+  factor
+  partKeyInfoList: [ PartKeyInfo, PartKeyInfo, ... ]   ← all parts inline
+  objectID
+  updateID
+  parentID
+  schemaVersion: 0 (or absent)
+}
+```
+##### V1: OmMultipartKeyInfo (empty list + schemaVersion)
+```
+OmMultipartKeyInfo {
+  uploadID
+  creationTime
+  type
+  factor
+  partKeyInfoList: []   ← empty
+  objectID
+  updateID
+  parentID
+  schemaVersion: 1
+}
+```
+
+#### Example (for a 10 part MPU)
+
+---
+#### MultipartInfoTable :
+```
+Key:   `/vol1/bucket1/mp_file1/abc123-uuid-456`
+
+Value:
+OmMultipartKeyInfo {
+  uploadID: "abc123-uuid-456"
+  creationTime: 1738742400000
+  type: RATIS
+  factor: THREE
+  partKeyInfoList: []
+  objectID: 1001
+  updateID: 12345
+  parentID: 0
+  schemaVersion: 1
+}
+```
+
+#### MultipartPartsTable – 10 rows:
+
+```text
+Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part1
+Value: PartKeyInfo { partName: ".../part1", partNumber: 1, partKeyInfo: KeyInfo{blocks, size,...} }
+
+Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part2
+Value: PartKeyInfo { partName: ".../part2", partNumber: 2, partKeyInfo: KeyInfo{...} }
+...
+...
+Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part10
+Value: PartKeyInfo { partName: ".../part10", partNumber: 10, partKeyInfo: KeyInfo{...} }
+```
+
+### 2.2. Approach-2 : Introduce new multipartMetadataTable
+
+Split metadata and introduce two new tables:
+- **multipartMetadataTable**: lightweight per-MPU metadata (no part list).
+- **multipartPartsTable**: one row per part (no aggregation).
+
+Below is the new metadata table info object structure:
+```protobuf
+message MultipartMetadataInfo {
+  required string uploadID = 1;
+  required uint64 creationTime = 2;
+  required hadoop.hdds.ReplicationType type = 3;
+  optional hadoop.hdds.ReplicationFactor factor = 4;
+  optional hadoop.hdds.ECReplicationConfig ecReplicationConfig = 5;
+  optional uint64 objectID = 6;
+  optional uint64 updateID = 7;
+  optional uint64 parentID = 8;
+  optional uint32 schemaVersion = 9; // default 0
+}
+```
+
+#### Storage Layout Overview
+
+* **multipartInfoTable (RocksDB):**
+  * V0: Key → `OmMultipartKeyInfo` { parts inline }
+  * V1: Key → `OmMultipartKeyInfo` { empty list, schemaVersion: 1 }
+* **multipartPartsTable (RocksDB) [V1 only]:**
+  * `/uploadId/part1` → `PartKeyInfo`
+  * `/uploadId/part2` → `PartKeyInfo`
+  * `/uploadId/part3` → `PartKeyInfo`
+
+* **multipartInfoTable (RocksDB):**
+  * V0: `/vol/bucket/key/uploadId` → `OmMultipartKeyInfo { partKeyInfoList: [...] }`
+
+
+* **multipartMetadataTable (RocksDB)**
+  * V1: `/vol/bucket/key/uploadId` → `MultipartMetadata { schemaVersion: 1 }`
+
+
+* **multipartPartsTable (RocksDB) [v1 only]**
+  * `/vol/bucket/key/uploadId/part1`  → `PartKeyInfo` 
+  * `/vol/bucket/key/uploadId/part2`  → PartKeyInfo 
+  * `/vol/bucket/key/uploadId/part3`  → PartKeyInfo 
+  * `...`
+
+#### multipartMetadataInfo Table – 1 row
+**V1: OmMultipartMetadataInfo (metadata only)**
+```text
+OmMultipartMetadataInfo {
+  uploadID
+  creationTime
+  type (ReplicationType)
+  factor (ReplicationFactor)
+  objectID
+  updateID
+  parentID
+  ecReplicationConfig
+  schemaVersion: 1
+}
+```
+
+```protobuf
+message MultipartMetadata {
+  required string uploadID = 1;
+  required uint64 creationTime = 2;
+  required hadoop.hdds.ReplicationType type = 3;
+  optional hadoop.hdds.ReplicationFactor factor = 4;
+  optional uint64 objectID = 5;
+  optional uint64 updateID = 6;
+  optional uint64 parentID = 7;
+  optional hadoop.hdds.ECReplicationConfig ecReplicationConfig = 8;
+  optional uint32 schemaVersion = 9;
+  // NO partKeyInfoList - moved to new table
+}
+```
+
+#### Example:
+
+---
+```
+Key: /vol1/bucket1/mp_file1/abc123-uuid-456
+
+Value:
+MultipartMetadata {
+  uploadID: "abc123-uuid-456"
+  creationTime: 1738742400000
+  type: RATIS
+  factor: THREE
+  objectID: 1001
+  updateID: 12345
+  parentID: 0
+  schemaVersion: 1
+}
+```
+
+---
+
+### 2.3. Summary
+* **Approach-1:** Minimal change, same value type, uses `schemaVersion` flag.
+* **Approach-2:** Dedicated table, cleanest separation, requires new lookup logic.
+
+----
+### 2.4. Chosen Approach: Approach-1
+We have chosen **Approach-1: Reuse multipartInfoTable with empty part list**
+as the preferred implementation for MPU optimization (V1).
+
+This approach is favored because it introduces minimal changes to the existing `OmMultipartKeyInfo` protobuf structure.
+<br>
+By simply introducing an optional `schemaVersion` field and ensuring the partKeyInfoList is empty for V1 entries,
+we maintain backward compatibility.
+
+The key advantages are:
+* **Minimal Protobuf Change**: Older clients and processes can still read the multipartInfoTable entries without issue,
+    as the **core structure remains the same**.
+* **Compatibility**: Older uploads (V0) **remain fully functional**, and new uploads (V1) can be distinguished by
+    the schemaVersion. This significantly reduces the risk of breaking existing functionality.
+* **Simplicity**: The transition logic between V0 and V1 is straightforward, primarily checking the
+    `schemaVersion` field upon read.
+---
+
+## 3. Upgrades
+Add a new feature in `OMLayoutFeature`:
+```java
+MPU_PARTS_TABLE_SPLIT(10, "Split multipart table into separate table for parts and key");
+```
+`schemaVersion` is set to `1` only when the upgrade is finalized.
+
+For each of the four S3MultipartUpload request types we need to ensure the check that split table layout feature
+is allowed and only then we can set the schema as `version:1`
+<br>
+This ensures that no new writes are happening if the split table is not supported -
+specially in cases where in pre-finalize the client may try to write a new MPU key.

From c4416bb99d64c02579468ee3e6a0eb39d6b007b7 Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Fri, 20 Feb 2026 19:32:08 +0530
Subject: [PATCH 02/12] Address review comments

---
 .../content/design/mpu-gc-optimization.md     | 44 ++++++++-----------
 1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index f22d218cc255..810ba5fb8146 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -53,15 +53,15 @@ Presently Ozone has several overheads when uploading large files via Multipart u
 3. GC pressure grows with the size of the object (HDDS-10611).
 
 #### a) Deserialization overhead
-| Operation | Current |
-| :--- | :--- |
+| Operation     | Current                                                 |
+|:--------------|:--------------------------------------------------------|
 | Commit part N | Read + deserialize whole OmMultipartKeyInfo (N-1 parts) |
 
 #### b) WAL overhead
 Assuming one MPU part info object takes ~1.5KB.
 
-| Scenario | Current WAL |
-| :--- | :--- |
+| Scenario    | Current WAL                     |
+|:------------|:--------------------------------|
 | 1,000 parts | ~733 MB (1+2+...+1000) × 1.5 KB |
 
 #### c) GC pressure
@@ -110,11 +110,11 @@ message MultipartPartInfo {
 ```
 
 ### Comparison: V0 (legacy) vs V1
-| Metric | Current (V0) | Split-Table (V1) |
-| :--- | :--- | :--- |
-| **Commit part N** | Read + deserialize whole list | Read Metadata (~200B) + write single PartKeyInfo |
-| **1,000 parts WAL** | ~733 MB | ~1.5 MB (or ~600KB with optimized info) |
-| **GC Pressure** | Large short-lived objects | Small metadata + single-part objects |
+| Metric              | Current (V0)                  | Split-Table (V1)                                 |
+|:--------------------|:------------------------------|:-------------------------------------------------|
+| **Commit part N**   | Read + deserialize whole list | Read Metadata (~200B) + write single PartKeyInfo |
+| **1,000 parts WAL** | ~733 MB                       | ~1.5 MB (or ~600KB with optimized info)          |
+| **GC Pressure**     | Large short-lived objects     | Small metadata + single-part objects             |
 
 ---
 
@@ -126,9 +126,9 @@ Reuse the existing table but introduce a new `multipartPartsTable`.
   * V0: Key → `OmMultipartKeyInfo` { parts inline }
   * V1: Key → `OmMultipartKeyInfo` { empty list, schemaVersion: 1 }
 * **multipartPartsTable (RocksDB) [V1 only]:**
-  * `/uploadId/part1` → `PartKeyInfo`
-  * `/uploadId/part2` → `PartKeyInfo`
-  * `/uploadId/part3` → `PartKeyInfo`
+  * `/uploadId/part00001` → `PartKeyInfo`
+  * `/uploadId/part00002` → `PartKeyInfo`
+  * `/uploadId/part00003` → `PartKeyInfo`
 
 
 ```protobuf
@@ -199,14 +199,14 @@ OmMultipartKeyInfo {
 #### MultipartPartsTable – 10 rows:
 
 ```text
-Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part1
+Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part00001
 Value: PartKeyInfo { partName: ".../part1", partNumber: 1, partKeyInfo: KeyInfo{blocks, size,...} }
 
-Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part2
+Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part00002
 Value: PartKeyInfo { partName: ".../part2", partNumber: 2, partKeyInfo: KeyInfo{...} }
 ...
 ...
-Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part10
+Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part00010
 Value: PartKeyInfo { partName: ".../part10", partNumber: 10, partKeyInfo: KeyInfo{...} }
 ```
 
@@ -233,14 +233,6 @@ message MultipartMetadataInfo {
 
 #### Storage Layout Overview
 
-* **multipartInfoTable (RocksDB):**
-  * V0: Key → `OmMultipartKeyInfo` { parts inline }
-  * V1: Key → `OmMultipartKeyInfo` { empty list, schemaVersion: 1 }
-* **multipartPartsTable (RocksDB) [V1 only]:**
-  * `/uploadId/part1` → `PartKeyInfo`
-  * `/uploadId/part2` → `PartKeyInfo`
-  * `/uploadId/part3` → `PartKeyInfo`
-
 * **multipartInfoTable (RocksDB):**
   * V0: `/vol/bucket/key/uploadId` → `OmMultipartKeyInfo { partKeyInfoList: [...] }`
 
@@ -250,9 +242,9 @@ message MultipartMetadataInfo {
 
 
 * **multipartPartsTable (RocksDB) [v1 only]**
-  * `/vol/bucket/key/uploadId/part1`  → `PartKeyInfo` 
-  * `/vol/bucket/key/uploadId/part2`  → PartKeyInfo 
-  * `/vol/bucket/key/uploadId/part3`  → PartKeyInfo 
+  * `/vol/bucket/key/uploadId/part00001`  → `PartKeyInfo` 
+  * `/vol/bucket/key/uploadId/part00002`  → `PartKeyInfo` 
+  * `/vol/bucket/key/uploadId/part00003`  → `PartKeyInfo`
   * `...`
 
 #### multipartMetadataInfo Table – 1 row

From 28f8ac6afd6f87b435afd2ebcd4e70ffd3199fc7 Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Sun, 22 Feb 2026 12:35:24 +0530
Subject: [PATCH 03/12] Update code flow logic details, fix sections, add
 industry practice section

---
 .../content/design/mpu-gc-optimization.md     | 525 ++++++++++++++----
 1 file changed, 419 insertions(+), 106 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index 810ba5fb8146..9f3db8955c6e 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -26,17 +26,13 @@ author: Abhishek Pal, Rakesh Radhakrishnan
 ## Table of Contents
 1. [Motivation](#1-motivation)
 2. [Proposal](#2-proposal)
-* [Backward Compatibility](#backward-compatibility)
-* [Split-table design (V1)](#split-table-design-v1)
-* [Comparison: V0 (legacy) vs V1](#comparison-v0-legacy-vs-v1)
-* [2.1 Approach-1: Reuse multipartInfoTable with empty part list](#21-approach-1--reuse-multipartinfotable-with-empty-part-list)
-* [2.2 Approach-2: Introduce new multipartMetadataTable](#22-approach-2--introduce-new-multipartmetadatatable)
-* [2.3 Summary](#23-summary)
-* [2.4 Chosen Approach: Approach-1](#24-chosen-approach-approach-1)
+  * [Split-table design (V2)](#split-table-design-v2)
+  * [Comparison: V1 (legacy) vs V2](#comparison-v1-legacy-vs-v2)
+  * [2.1 Data Layout Changes](#21-data-layout-changes)
+  * [2.2 MPU Flow Changes](#22-mpu-flow-changes)
+  * [2.3 Summary and Trade-offs](#23-summary-and-trade-offs)
 3. [Upgrades](#3-upgrades)
-4. [Benchmarking and Performance](#4-benchmarking-and-performance)
-5. [Open Questions](#5-open-questions)
-
+4. [Industry Patterns](#4-industry-patterns-flattened-keys-in-lsmrocksdb-systems)
 ---
 
 ## 1. Motivation
@@ -87,7 +83,7 @@ MultipartKeyInfo {
 ## 2. Proposal
 The idea is to split the content of `MultipartInfoTable`. Part information will be stored separately in a flattened schema (one row per part) instead of one giant object.
 
-### Split-table design (V1)
+### Split-table design (V2)
 Split MPU metadata into:
 * **Metadata table:** Lightweight per-MPU metadata (no part list).
 * **Parts table:** One row per part (flat structure).
@@ -109,8 +105,8 @@ message MultipartPartInfo {
 }
 ```
 
-### Comparison: V0 (legacy) vs V1
-| Metric              | Current (V0)                  | Split-Table (V1)                                 |
+### Comparison: V1 (legacy) vs V2
+| Metric              | Current (V1)                  | Split-Table (V2)                                 |
 |:--------------------|:------------------------------|:-------------------------------------------------|
 | **Commit part N**   | Read + deserialize whole list | Read Metadata (~200B) + write single PartKeyInfo |
 | **1,000 parts WAL** | ~733 MB                       | ~1.5 MB (or ~600KB with optimized info)          |
@@ -118,18 +114,30 @@ message MultipartPartInfo {
 
 ---
 
-### 2.1. Approach-1 : Reuse multipartInfoTable with empty part list
-Reuse the existing table but introduce a new `multipartPartsTable`.
+### 2.1 Data Layout Changes
 
-**Storage Layout:**
-* **multipartInfoTable (RocksDB):**
-  * V0: Key → `OmMultipartKeyInfo` { parts inline }
-  * V1: Key → `OmMultipartKeyInfo` { empty list, schemaVersion: 1 }
-* **multipartPartsTable (RocksDB) [V1 only]:**
-  * `/uploadId/part00001` → `PartKeyInfo`
-  * `/uploadId/part00002` → `PartKeyInfo`
-  * `/uploadId/part00003` → `PartKeyInfo`
+#### 2.1.1 Chosen Approach (Implemented): Reuse `multipartInfoTable` + add `multipartPartsTable`
 
+Keep `multipartInfoTable` for MPU metadata, and store part rows in `multipartPartsTable`.
+
+**Storage Layout:**
+* **`multipartInfoTable` (RocksDB):**
+  * V1: Key -> `OmMultipartKeyInfo` { parts inline }
+  * V2: Key -> `OmMultipartKeyInfo` { empty list, `schemaVersion: 1` }
+* **`multipartPartsTable` (RocksDB):**
+  * Key type: `OmMultipartPartKey(uploadId, partNumber)`
+  * Value type: `OmMultipartPartInfo`
+
+**`multipartPartsTable` key codec (V2):**
+* `OmMultipartPartKey` uses two logical fields:
+  * `uploadId` (`String`)
+  * `partNumber` (`int32`)
+* Persisted key bytes are encoded as:
+  * `uploadId(UTF-8 bytes)` + `0x00` + `partNumber(4-byte big-endian int)`
+* Prefix scan for all parts in one upload uses:
+  * `uploadId(UTF-8 bytes)` + `0x00`
+
+This replaces the previous string-padding style examples (like `part00001`) with typed key encoding.
 
 ```protobuf
 message MultipartKeyInfo {
@@ -146,28 +154,29 @@ message MultipartKeyInfo {
 }
 ```
 
-#### V0: OmMultipartKeyInfo (parts inline)
+##### V1: `OmMultipartKeyInfo` (parts inline)
 ```
 OmMultipartKeyInfo {
   uploadID
   creationTime
   type
   factor
-  partKeyInfoList: [ PartKeyInfo, PartKeyInfo, ... ]   ← all parts inline
+  partKeyInfoList: [ PartKeyInfo, PartKeyInfo, ... ]   <- all parts inline
   objectID
   updateID
   parentID
   schemaVersion: 0 (or absent)
 }
 ```
-##### V1: OmMultipartKeyInfo (empty list + schemaVersion)
+
+##### V2: `OmMultipartKeyInfo` (empty list + schemaVersion)
 ```
 OmMultipartKeyInfo {
   uploadID
   creationTime
   type
   factor
-  partKeyInfoList: []   ← empty
+  partKeyInfoList: []   <- empty
   objectID
   updateID
   parentID
@@ -175,12 +184,11 @@ OmMultipartKeyInfo {
 }
 ```
 
-#### Example (for a 10 part MPU)
+##### Example (for a 10-part MPU)
 
----
-#### MultipartInfoTable :
+`multipartInfoTable`:
 ```
-Key:   `/vol1/bucket1/mp_file1/abc123-uuid-456`
+Key:   /vol1/bucket1/mp_file1/abc123-uuid-456
 
 Value:
 OmMultipartKeyInfo {
@@ -196,27 +204,35 @@ OmMultipartKeyInfo {
 }
 ```
 
-#### MultipartPartsTable – 10 rows:
-
+`multipartPartsTable` (logical keys):
 ```text
-Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part00001
-Value: PartKeyInfo { partName: ".../part1", partNumber: 1, partKeyInfo: KeyInfo{blocks, size,...} }
+Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=1}
+Value: OmMultipartPartInfo{partNumber=1, partName=".../part1", ...}
 
-Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part00002
-Value: PartKeyInfo { partName: ".../part2", partNumber: 2, partKeyInfo: KeyInfo{...} }
-...
+Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=2}
+Value: OmMultipartPartInfo{partNumber=2, partName=".../part2", ...}
 ...
-Key:   /vol1/bucket1/mp_file1/abc123-uuid-456/part00010
-Value: PartKeyInfo { partName: ".../part10", partNumber: 10, partKeyInfo: KeyInfo{...} }
+Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=10}
+Value: OmMultipartPartInfo{partNumber=10, partName=".../part10", ...}
 ```
 
-### 2.2. Approach-2 : Introduce new multipartMetadataTable
+`multipartPartsTable` (encoded key bytes, illustrative):
+```text
+uploadId = "abc123-uuid-456"
+partNumber = 2
+
+encodedKey = [61 62 63 31 32 33 2d 75 75 69 64 2d 34 35 36 00 00 00 00 02]
+             [--------------uploadId UTF-8------------][00][--int32 BE--]
+```
+
+#### 2.1.2 Alternative Approach (Proposed): Add `multipartMetadataTable` + `multipartPartsTable`
 
 Split metadata and introduce two new tables:
-- **multipartMetadataTable**: lightweight per-MPU metadata (no part list).
-- **multipartPartsTable**: one row per part (no aggregation).
+* **`multipartMetadataTable`**: lightweight per-MPU metadata (no part list).
+* **`multipartPartsTable`**: one row per part (no aggregation).
+
+> Note: This approach is proposed and not implemented in current code.
 
-Below is the new metadata table info object structure:
 ```protobuf
 message MultipartMetadataInfo {
   required string uploadID = 1;
@@ -231,37 +247,14 @@ message MultipartMetadataInfo {
 }
 ```
 
-#### Storage Layout Overview
-
-* **multipartInfoTable (RocksDB):**
-  * V0: `/vol/bucket/key/uploadId` → `OmMultipartKeyInfo { partKeyInfoList: [...] }`
-
-
-* **multipartMetadataTable (RocksDB)**
-  * V1: `/vol/bucket/key/uploadId` → `MultipartMetadata { schemaVersion: 1 }`
-
-
-* **multipartPartsTable (RocksDB) [v1 only]**
-  * `/vol/bucket/key/uploadId/part00001`  → `PartKeyInfo` 
-  * `/vol/bucket/key/uploadId/part00002`  → `PartKeyInfo` 
-  * `/vol/bucket/key/uploadId/part00003`  → `PartKeyInfo`
-  * `...`
-
-#### multipartMetadataInfo Table – 1 row
-**V1: OmMultipartMetadataInfo (metadata only)**
-```text
-OmMultipartMetadataInfo {
-  uploadID
-  creationTime
-  type (ReplicationType)
-  factor (ReplicationFactor)
-  objectID
-  updateID
-  parentID
-  ecReplicationConfig
-  schemaVersion: 1
-}
-```
+**Storage Layout Overview:**
+* **`multipartInfoTable` (RocksDB):**
+  * V1: `/vol/bucket/key/uploadId` -> `OmMultipartKeyInfo { partKeyInfoList: [...] }`
+* **`multipartMetadataTable` (RocksDB):**
+  * V2: `/vol/bucket/key/uploadId` -> `MultipartMetadata { schemaVersion: 1 }`
+* **`multipartPartsTable` (RocksDB):**
+  * Key: `OmMultipartPartKey(uploadId, partNumber)`
+  * Value: `PartKeyInfo`-equivalent part payload
 
 ```protobuf
 message MultipartMetadata {
@@ -278,9 +271,7 @@ message MultipartMetadata {
 }
 ```
 
-#### Example:
-
----
+Example:
 ```
 Key: /vol1/bucket1/mp_file1/abc123-uuid-456
 
@@ -297,40 +288,362 @@ MultipartMetadata {
 }
 ```
 
----
+### 2.2 MPU Flow Changes
+
+#### 2.2.1 Chosen Approach Flow Changes
+
+##### Multipart Upload Initiate
+
+**Old Flow**
+* Create `multipartKey = /{vol}/{bucket}/{key}/{uploadId}`.
+* Build `OmMultipartKeyInfo` (schema default/legacy, inline `partKeyInfoList` model).
+* Write:
+  * `openKeyTable[multipartKey] = OmKeyInfo`
+  * `multipartInfoTable[multipartKey] = OmMultipartKeyInfo`
+
+Example:
+```text
+multipartInfoTable[/vol1/b1/fileA/upload-001] ->
+  OmMultipartKeyInfo{schemaVersion=0, partKeyInfoList=[]}
+openKeyTable[/vol1/b1/fileA/upload-001] ->
+  OmKeyInfo{key=fileA, objectID=9001}
+```
+
+**New Flow**
+* Same keys/tables as old flow, but initiate sets `schemaVersion` explicitly:
+  * `schemaVersion=1` when `OMLayoutFeature.MPU_PARTS_TABLE_SPLIT` is allowed.
+  * `schemaVersion=0` otherwise.
+* No part row is created at initiate time; part rows are created during commit-part.
+* FSO response path (`S3InitiateMultipartUploadResponseWithFSO`) still writes parent directory entries, then open-file + multipart-info rows.
+* Backward compatibility check overview: write path selection is schema-based and layout-gated (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+
+Example:
+```text
+multipartInfoTable[/vol1/b1/fileA/upload-001] ->
+  OmMultipartKeyInfo{schemaVersion=1, partKeyInfoList=[]}
+```
+
+##### Multipart Upload Commit Part
+
+**Old Flow**
+* Read `multipartInfoTable[multipartKey]`.
+* Read current uploaded part blocks from `openKeyTable[getOpenKey(..., clientID)]`.
+* Upsert in inline map:
+  * `oldPart = multipartKeyInfo.getPartKeyInfo(partNumber)`
+  * `multipartKeyInfo.addPartKeyInfo(currentPart)`
+* Delete committed one-shot open key for this part.
+* Update quota based on overwrite delta.
+
+Example:
+```text
+Before: partKeyInfoList=[{part=1,size=64MB},{part=2,size=32MB}]
+Commit part 2 size=40MB
+After:  partKeyInfoList=[{part=1,size=64MB},{part=2,size=40MB}]
+```
+
+**New Flow**
+* Load `multipartKeyInfo` and validate layout gate:
+  * if split feature is not allowed and `schemaVersion != 0`, fail early.
+* Branch by schema:
+  * `schemaVersion=0`: same old inline behavior.
+  * `schemaVersion=1`:
+    * create `multipartPartKey = OmMultipartPartKey(uploadId, partNumber)`,
+    * write `multipartPartTable[multipartPartKey] = OmMultipartPartInfo{openKey, partName, partNumber, size, metadata, locations}`,
+    * keep current part open key in `openKeyTable` (needed later by list/complete/abort),
+    * if overwriting an existing part row, delete old part open key and adjust quota.
+* `multipartInfoTable[multipartKey]` is still updated for metadata/updateID.
+* Backward compatibility check overview: schema decides row format, and split behavior is blocked before finalization (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+
+Example (`schemaVersion=1`):
+```text
+multipartPartTable[OmMultipartPartKey{uploadId="upload-001", partNumber=2}] ->
+  OmMultipartPartInfo{partNumber=2, openKey=/vol1/b1/fileA#client-77, size=40MB}
+```
+
+##### Multipart Upload Complete
+
+**Old Flow**
+* Read `multipartInfoTable[multipartKey]` and validate ordered requested parts.
+* Validate each requested part from inline `partKeyInfoMap`.
+* Build final key block list from selected parts.
+* Write final key:
+  * `keyTable[/vol1/b1/fileA] = OmKeyInfo{locations=p1+p2+...}`
+* Delete MPU state:
+  * `openKeyTable[multipartOpenKey]`
+  * `multipartInfoTable[multipartKey]`
+* Move unused parts to deleted table.
+
+Example:
+```text
+Complete [1,2,3] -> keyTable[/vol1/b1/fileA] written, MPU rows removed
+```
+
+**New Flow**
+* Validate layout gate first (same pattern as commit/abort when `schemaVersion != 0`).
+* Load part materialization by schema:
+  * `schemaVersion=0`: use inline `partKeyInfoMap`.
+  * `schemaVersion=1`:
+    * scan `multipartPartTable` with prefix `OmMultipartPartKey.prefix(uploadId)`,
+    * rebuild `PartKeyInfo` view from `OmMultipartPartInfo`,
+    * pull block locations from stored part metadata / open keys as needed.
+* Perform same user-facing validation (order, existence, eTag/partName, min size).
+* Commit final key to `keyTable`.
+* Cleanup:
+  * always delete `multipartInfoTable[multipartKey]` and `openKeyTable[multipartOpenKey]`,
+  * for split schema also delete all matching `multipartPartTable` rows and their tracked part open keys.
+* Backward compatibility check overview: completion transparently supports both persisted schemas and enforces layout gating for split rows (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+
+Example (`schemaVersion=1` cleanup):
+```text
+Delete:
+  multipartPartTable[OmMultipartPartKey{uploadId="upload-001", partNumber=1}]
+  multipartPartTable[OmMultipartPartKey{uploadId="upload-001", partNumber=2}]
+  ...
+  openKeyTable[part-open-key-1], openKeyTable[part-open-key-2], ...
+```
+
+##### Multipart Upload Abort
+
+**Old Flow**
+* Read `multipartInfoTable[multipartKey]`.
+* Release bucket used-bytes from inline `partKeyInfoMap`.
+* Delete `openKeyTable[multipartOpenKey]` and `multipartInfoTable[multipartKey]`.
+* Move part key infos to deleted table.
+
+Example:
+```text
+Abort upload-001 -> delete MPU metadata/open rows and tombstone parts
+```
+
+**New Flow**
+* Validate layout gate (reject split behavior before finalization if schema indicates split).
+* Branch by schema for quota and cleanup:
+  * `schemaVersion=0`: same legacy inline part iteration.
+  * `schemaVersion=1`:
+    * iterate `multipartPartTable` by prefix (`OmMultipartPartKey.prefix(uploadId)`),
+    * compute released quota from part rows,
+    * for each part: move corresponding open key to deleted table, delete part open key, delete part row.
+* Delete `multipartInfoTable[multipartKey]` and `multipartOpenKey` entry.
+* Backward compatibility check overview: abort handles both old inline and new split rows in the same codepath (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+
+Example (`schemaVersion=1`):
+```text
+multipartPartTable[OmMultipartPartKey{uploadId="upload-001", partNumber=3}] -> deleted
+openKeyTable[/vol1/b1/fileA#client-90] -> deleted
+deletedTable[objectId:/vol1/b1/fileA/upload-001] -> appended
+```
+
+##### Multipart Upload List Parts
+
+**Old Flow**
+* API path:
+  * S3G `MultipartKeyHandler.listParts(...)`
+  * `OzoneManager.listParts(...)`
+  * `KeyManagerImpl.listParts(...)`
+* Read `multipartInfoTable[multipartKey]`.
+* Iterate inline `partKeyInfoMap` in part-number order; apply marker + `maxParts`.
+* If no part entries exist yet, read replication from `openKeyTable[multipartKey]`.
+
+Example:
+```text
+parts=[1,2,3,4], marker=1, maxParts=2 => return [2,3], nextMarker=3, truncated=true
+```
+
+**New Flow**
+* Same API path and response contract.
+* Branch by schema in `KeyManagerImpl.listParts`:
+  * `schemaVersion=0`: iterate inline `partKeyInfoMap`.
+  * `schemaVersion=1`:
+    * scan `multipartPartTable` by `OmMultipartPartKey.prefix(uploadId)`,
+    * resolve each part's `openKey` from `openKeyTable`,
+    * build `PartKeyInfo` view on the fly and paginate.
+* Replication fallback remains from MPU open key when no part is returned.
+
+Example (`schemaVersion=1` read materialization):
+```text
+multipartPartTable[OmMultipartPartKey{uploadId="upload-001", partNumber=2}] ->
+  openKey=/vol1/b1/fileA#c2
+openKeyTable[/vol1/b1/fileA#c2] -> KeyInfo{size=40MB, eTag=...}
+listParts response partNumber=2, size=40MB, eTag=...
+```
+
+##### Related APIs
+* `ListMultipartUploads` remains upload-session listing (not part listing), returning upload IDs and metadata from MPU entries.
+* Wire path for list-parts remains unchanged:
+  * `MultipartKeyHandler` (S3G) -> `OzoneBucket.listParts` -> OM RPC (`OzoneManagerRequestHandler`) -> `OzoneManager.listParts` -> `KeyManagerImpl.listParts`.
+
+#### 2.2.2 Alternative Approach Flow (Proposed)
+
+##### Multipart Upload Initiate
+
+**Old Flow**
+* Write `openKeyTable` + `multipartInfoTable` (`OmMultipartKeyInfo`), with legacy inline-part model.
+
+**New Flow**
+* Write `openKeyTable` + `multipartMetadataTable` (new metadata object, no part list).
+* `multipartInfoTable` is no longer used for new V1 MPU writes in this approach.
+* `schemaVersion` (or equivalent layout marker) is stored in `multipartMetadataTable` row.
+
+Example:
+```text
+multipartMetadataTable[/vol1/b1/fileA/upload-001] ->
+  MultipartMetadata{schemaVersion=1, replication=RATIS/THREE, objectID=9001}
+openKeyTable[/vol1/b1/fileA/upload-001] ->
+  OmKeyInfo{key=fileA, locations=[empty]}
+```
+
+##### Multipart Upload Commit Part
+
+**Old Flow**
+* Read/modify/write one `multipartInfoTable` row by updating inline part list.
+
+**New Flow**
+* Read `multipartMetadataTable` only for MPU metadata + validation context.
+* Write one row per part into `multipartPartsTable`:
+  * key: `OmMultipartPartKey(uploadId, partNumber)`,
+  * value: `PartKeyInfo` / equivalent flattened part payload.
+* Avoid rewriting a large aggregate MPU value for each part.
+
+Example:
+```text
+multipartPartsTable[OmMultipartPartKey{uploadId="upload-001", partNumber=2}] ->
+  PartKeyInfo{partNumber=2, partName=..., keyInfo={blocks,size,etag}}
+```
+
+##### Multipart Upload Complete
+
+**Old Flow**
+* Read parts from inline `partKeyInfoList` in `multipartInfoTable`.
+* Commit final key, delete MPU row and MPU open row.
+
+**New Flow**
+* Read MPU metadata from `multipartMetadataTable`.
+* Scan `multipartPartsTable` prefix to gather candidate parts; validate request list/order/eTags.
+* Build final key and commit to `keyTable`.
+* Cleanup:
+  * delete `multipartMetadataTable[multipartKey]`,
+  * delete all `multipartPartsTable` rows for that upload,
+  * delete MPU open key and part open keys.
+
+##### Multipart Upload Abort
+
+**Old Flow**
+* Iterate inline `partKeyInfoList` to release quota and move parts to deleted table.
+
+**New Flow**
+* Read `multipartMetadataTable` for replication + MPU identity.
+* Iterate `multipartPartsTable` by prefix for quota release and delete-table movement.
+* Delete metadata row + all part rows + corresponding open keys.
+
+##### Multipart Upload List Parts
+
+**Old Flow**
+* `listParts` reads `multipartInfoTable` and paginates inline `partKeyInfoList`.
 
-### 2.3. Summary
+**New Flow**
+* `listParts` reads `multipartMetadataTable` for MPU existence/metadata.
+* Materialize part listing from `multipartPartsTable` prefix scan and paginate by part number marker.
+* If needed, join with `openKeyTable` for additional per-part runtime fields.
+
+##### Compatibility and Upgrade Guard
+* Same compatibility strategy as section [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating):
+  * legacy rows continue on old path,
+  * new writes only when layout feature is finalized,
+  * reject unsupported split-layout mutations pre-finalize.
+
+### 2.3 Summary and Trade-offs
 * **Approach-1:** Minimal change, same value type, uses `schemaVersion` flag.
-* **Approach-2:** Dedicated table, cleanest separation, requires new lookup logic.
-
-----
-### 2.4. Chosen Approach: Approach-1
-We have chosen **Approach-1: Reuse multipartInfoTable with empty part list**
-as the preferred implementation for MPU optimization (V1).
-
-This approach is favored because it introduces minimal changes to the existing `OmMultipartKeyInfo` protobuf structure.
-<br>
-By simply introducing an optional `schemaVersion` field and ensuring the partKeyInfoList is empty for V1 entries,
-we maintain backward compatibility.
-
-The key advantages are:
-* **Minimal Protobuf Change**: Older clients and processes can still read the multipartInfoTable entries without issue,
-    as the **core structure remains the same**.
-* **Compatibility**: Older uploads (V0) **remain fully functional**, and new uploads (V1) can be distinguished by
-    the schemaVersion. This significantly reduces the risk of breaking existing functionality.
-* **Simplicity**: The transition logic between V0 and V1 is straightforward, primarily checking the
-    `schemaVersion` field upon read.
----
+* **Approach-2:** Dedicated metadata table, cleanest separation, requires broader refactor.
+
+#### Pros and Cons
+
+|                      | Pros                                                                                                                                                                                                    | Cons                                                                                                                                                                                           |
+|----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Chosen Approach      | * Minimal migration risk<br/> * Reuses existing `OmMultipartKeyInfo` plumbing<br/>* Easiest incremental rollout with `schemaVersion` gating.<br/> * Lower implementation impact request/response paths. | * Carries complexity for mixed code (same table serving legacy + split metadata modes).<br/> * Still coupled to `OmMultipartKeyInfo`.<br/> * More conditional logic over time.                 |
+| Alternative Approach | * Clean separation of concerns (`multipartMetadataTable` vs `multipartPartsTable`)<br/> * Clearer long-term model and easier mental mapping.<br/> * Avoids overloading legacy value type.               | * Requires wider code changes (new codecs/table wiring/request/response updates).<br/> * Higher migration and compatibility test surface.<br/> * More rollout complexity than chosen approach. |
 
 ## 3. Upgrades
 Add a new feature in `OMLayoutFeature`:
 ```java
 MPU_PARTS_TABLE_SPLIT(10, "Split multipart table into separate table for parts and key");
 ```
-`schemaVersion` is set to `1` only when the upgrade is finalized.
 
-For each of the four S3MultipartUpload request types we need to ensure the check that split table layout feature
-is allowed and only then we can set the schema as `version:1`
-<br>
-This ensures that no new writes are happening if the split table is not supported -
-specially in cases where in pre-finalize the client may try to write a new MPU key.
+### 3.1 Backward compatibility and layout gating
+
+Backward compatibility is handled by combining `schemaVersion` with layout-feature checks.
+
+- **New MPU initiate writes**
+  - `schemaVersion` is set to `1` only when `MPU_PARTS_TABLE_SPLIT` is allowed.
+  - Otherwise initiate writes `schemaVersion=0` and stays on legacy inline part behavior.
+
+- **Existing MPU rows**
+  - `schemaVersion=0` rows continue to use legacy inline-part read/write paths.
+  - `schemaVersion=1` rows use split-table paths (`multipartPartsTable` + tracked open keys).
+
+- **Pre-finalize protection**
+  - Mutating split-table operations (commit part / complete / abort) check:
+    - if feature is not allowed and `schemaVersion != 0`, reject with `NOT_SUPPORTED_OPERATION_PRIOR_FINALIZATION`.
+  - This prevents accidental split-layout writes/updates before finalization.
+
+- **Read compatibility**
+  - `listParts` supports both schemas:
+    - schema 0 -> read inline `partKeyInfoMap`,
+    - schema 1 -> materialize parts from `multipartPartsTable` + `openKeyTable`.
+
+In short: old MPU entries keep working unchanged, new entries only use split layout when cluster layout allows it, and write paths are guarded to avoid unsafe transitions.
+
+## 4. Industry Patterns: Flattened Keys in LSM/RocksDB Systems
+
+Using flattened keys (for example, `baseKey + sortable suffix`) is a common design in RocksDB-backed systems.
+The MPU `multipartPartsTable` layout follows the same principle by using one row per part keyed by
+`OmMultipartPartKey(uploadId, partNumber)` with byte-level ordering from codec serialization.
+
+### 4.1 Why this is common
+
+RocksDB is optimized for:
+- point lookups by key,
+- prefix/range scans over lexicographically ordered keys,
+- append-like write patterns with small values.
+
+Flattened schemas map naturally to this model:
+- each logical sub-record (part/version) becomes an independent KV row,
+- updates rewrite only one small row instead of a large aggregate object,
+- range scans can fetch all rows for one logical entity via a shared prefix.
+
+### 4.2 MVCC systems as examples
+
+In MVCC-oriented systems such as CockroachDB and TiKV, a common pattern is:
+- encode the user key as a prefix,
+- encode version dimension (timestamp / sequence / commit-ts) in key suffix,
+- use key ordering to make version reads and range scans efficient.
+
+High-level shape:
+```text
+<logical-key>/<version-or-ts>
+```
+
+Example (illustrative):
+```text
+user:42@1699000001
+user:42@1699000005
+user:42@1699000010
+```
+
+The idea is conceptually similar to MPU part storage:
+```text
+OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=1}
+OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=2}
+OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=3}
+```
+
+Both designs rely on ordered keys to make grouped reads/writes efficient.
+
+### 4.3 Relevance to MPU optimization
+
+For MPU, flattened part rows provide:
+- lower write amplification per part commit (single-row updates),
+- lower object allocation pressure in OM (no repeated large list rebuild),
+- straightforward cleanup by prefix scan during complete/abort,
+- better operational visibility (`one row = one part`).
+
+This is why the split schema is not just an optimization for this code path, but also a storage-layout pattern that aligns well with how LSM/RocksDB systems are typically modeled.

From 5a721f772f3f406bfce0eb2045eb3ab1598ea16a Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Sun, 22 Feb 2026 19:46:32 +0530
Subject: [PATCH 04/12] Update section headings

---
 hadoop-hdds/docs/content/design/mpu-gc-optimization.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index 9f3db8955c6e..cfb373e701b3 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -116,7 +116,7 @@ message MultipartPartInfo {
 
 ### 2.1 Data Layout Changes
 
-#### 2.1.1 Chosen Approach (Implemented): Reuse `multipartInfoTable` + add `multipartPartsTable`
+#### 2.1.1 Chosen Approach: Reuse `multipartInfoTable` + add `multipartPartsTable`
 
 Keep `multipartInfoTable` for MPU metadata, and store part rows in `multipartPartsTable`.
 
@@ -225,7 +225,7 @@ encodedKey = [61 62 63 31 32 33 2d 75 75 69 64 2d 34 35 36 00 00 00 00 02]
              [--------------uploadId UTF-8------------][00][--int32 BE--]
 ```
 
-#### 2.1.2 Alternative Approach (Proposed): Add `multipartMetadataTable` + `multipartPartsTable`
+#### 2.1.2 Alternative Approach: Add `multipartMetadataTable` + `multipartPartsTable`
 
 Split metadata and introduce two new tables:
 * **`multipartMetadataTable`**: lightweight per-MPU metadata (no part list).

From c2102ba9779a300ba7786fc6d5764901985a48b1 Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Sun, 22 Feb 2026 21:16:35 +0530
Subject: [PATCH 05/12] Correct some statements

---
 .../content/design/mpu-gc-optimization.md     | 37 +++++++------------
 1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index cfb373e701b3..dc5f119cd5c4 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -3,7 +3,7 @@ title: Multipart Upload GC Pressure Optimizations
 summary: Change Multipart Upload Logic to improve OM GC Pressure
 date: 2026-02-19
 jira: HDDS-10611
-status: implemented
+status: proposed
 author: Abhishek Pal, Rakesh Radhakrishnan
 ---
 <!--
@@ -137,8 +137,6 @@ Keep `multipartInfoTable` for MPU metadata, and store part rows in `multipartPar
 * Prefix scan for all parts in one upload uses:
   * `uploadId(UTF-8 bytes)` + `0x00`
 
-This replaces the previous string-padding style examples (like `part00001`) with typed key encoding.
-
 ```protobuf
 message MultipartKeyInfo {
     required string uploadID = 1;
@@ -216,13 +214,13 @@ Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=10}
 Value: OmMultipartPartInfo{partNumber=10, partName=".../part10", ...}
 ```
 
-`multipartPartsTable` (encoded key bytes, illustrative):
+`multipartPartsTable` (encoded key bytes):
 ```text
 uploadId = "abc123-uuid-456"
 partNumber = 2
 
 encodedKey = [61 62 63 31 32 33 2d 75 75 69 64 2d 34 35 36 00 00 00 00 02]
-             [--------------uploadId UTF-8------------][00][--int32 BE--]
+             [--------------uploadId UTF-8---------------][00][--int32 BE--]
 ```
 
 #### 2.1.2 Alternative Approach: Add `multipartMetadataTable` + `multipartPartsTable`
@@ -231,8 +229,6 @@ Split metadata and introduce two new tables:
 * **`multipartMetadataTable`**: lightweight per-MPU metadata (no part list).
 * **`multipartPartsTable`**: one row per part (no aggregation).
 
-> Note: This approach is proposed and not implemented in current code.
-
 ```protobuf
 message MultipartMetadataInfo {
   required string uploadID = 1;
@@ -315,7 +311,7 @@ openKeyTable[/vol1/b1/fileA/upload-001] ->
   * `schemaVersion=0` otherwise.
 * No part row is created at initiate time; part rows are created during commit-part.
 * FSO response path (`S3InitiateMultipartUploadResponseWithFSO`) still writes parent directory entries, then open-file + multipart-info rows.
-* Backward compatibility check overview: write path selection is schema-based and layout-gated (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+* Backward compatibility: write path selection is schema-based and layout-gated (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
 
 Example:
 ```text
@@ -328,7 +324,7 @@ multipartInfoTable[/vol1/b1/fileA/upload-001] ->
 **Old Flow**
 * Read `multipartInfoTable[multipartKey]`.
 * Read current uploaded part blocks from `openKeyTable[getOpenKey(..., clientID)]`.
-* Upsert in inline map:
+* Insert in inline map:
   * `oldPart = multipartKeyInfo.getPartKeyInfo(partNumber)`
   * `multipartKeyInfo.addPartKeyInfo(currentPart)`
 * Delete committed one-shot open key for this part.
@@ -348,11 +344,11 @@ After:  partKeyInfoList=[{part=1,size=64MB},{part=2,size=40MB}]
   * `schemaVersion=0`: same old inline behavior.
   * `schemaVersion=1`:
     * create `multipartPartKey = OmMultipartPartKey(uploadId, partNumber)`,
-    * write `multipartPartTable[multipartPartKey] = OmMultipartPartInfo{openKey, partName, partNumber, size, metadata, locations}`,
+    * write `multipartPartTable[multipartPartKey] = OmMultipartPartInfo{openKey, partName, partNumber, size, metadata, locations}`
     * keep current part open key in `openKeyTable` (needed later by list/complete/abort),
     * if overwriting an existing part row, delete old part open key and adjust quota.
 * `multipartInfoTable[multipartKey]` is still updated for metadata/updateID.
-* Backward compatibility check overview: schema decides row format, and split behavior is blocked before finalization (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+* Backward compatibility: schema decides row format, and split behavior is blocked before finalization (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
 
 Example (`schemaVersion=1`):
 ```text
@@ -391,7 +387,7 @@ Complete [1,2,3] -> keyTable[/vol1/b1/fileA] written, MPU rows removed
 * Cleanup:
   * always delete `multipartInfoTable[multipartKey]` and `openKeyTable[multipartOpenKey]`,
   * for split schema also delete all matching `multipartPartTable` rows and their tracked part open keys.
-* Backward compatibility check overview: completion transparently supports both persisted schemas and enforces layout gating for split rows (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+* Backward compatibility: completion transparently supports both persisted schemas and enforces layout gating for split rows (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
 
 Example (`schemaVersion=1` cleanup):
 ```text
@@ -424,7 +420,7 @@ Abort upload-001 -> delete MPU metadata/open rows and tombstone parts
     * compute released quota from part rows,
     * for each part: move corresponding open key to deleted table, delete part open key, delete part row.
 * Delete `multipartInfoTable[multipartKey]` and `multipartOpenKey` entry.
-* Backward compatibility check overview: abort handles both old inline and new split rows in the same codepath (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
+* Backward compatibility: abort handles both old inline and new split rows in the same codepath (see [3.1 Backward compatibility and layout gating](#31-backward-compatibility-and-layout-gating)).
 
 Example (`schemaVersion=1`):
 ```text
@@ -467,12 +463,7 @@ openKeyTable[/vol1/b1/fileA#c2] -> KeyInfo{size=40MB, eTag=...}
 listParts response partNumber=2, size=40MB, eTag=...
 ```
 
-##### Related APIs
-* `ListMultipartUploads` remains upload-session listing (not part listing), returning upload IDs and metadata from MPU entries.
-* Wire path for list-parts remains unchanged:
-  * `MultipartKeyHandler` (S3G) -> `OzoneBucket.listParts` -> OM RPC (`OzoneManagerRequestHandler`) -> `OzoneManager.listParts` -> `KeyManagerImpl.listParts`.
-
-#### 2.2.2 Alternative Approach Flow (Proposed)
+#### 2.2.2 Alternative Approach Flow
 
 ##### Multipart Upload Initiate
 
@@ -557,10 +548,10 @@ multipartPartsTable[OmMultipartPartKey{uploadId="upload-001", partNumber=2}] ->
 
 #### Pros and Cons
 
-|                      | Pros                                                                                                                                                                                                    | Cons                                                                                                                                                                                           |
-|----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Chosen Approach      | * Minimal migration risk<br/> * Reuses existing `OmMultipartKeyInfo` plumbing<br/>* Easiest incremental rollout with `schemaVersion` gating.<br/> * Lower implementation impact request/response paths. | * Carries complexity for mixed code (same table serving legacy + split metadata modes).<br/> * Still coupled to `OmMultipartKeyInfo`.<br/> * More conditional logic over time.                 |
-| Alternative Approach | * Clean separation of concerns (`multipartMetadataTable` vs `multipartPartsTable`)<br/> * Clearer long-term model and easier mental mapping.<br/> * Avoids overloading legacy value type.               | * Requires wider code changes (new codecs/table wiring/request/response updates).<br/> * Higher migration and compatibility test surface.<br/> * More rollout complexity than chosen approach. |
+|                      | Pros                                                                                                                                                                                                   | Cons                                                                                                                                                                                         |
+|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Chosen Approach      | * Minimal migration risk<br/> * Reuses existing `OmMultipartKeyInfo` message<br/>* Easiest incremental rollout with `schemaVersion` gating.<br/> * Lower implementation impact request/response paths. | * Carries complexity for mixed code (same table serving legacy + split metadata modes).<br/> * Still coupled to `OmMultipartKeyInfo`.<br/> * More conditional logic over time.               |
+| Alternative Approach | * Clean separation of concerns (`multipartMetadataTable` vs `multipartPartsTable`)<br/> * Clearer long-term model and easier mental mapping.<br/> * Avoids overloading legacy value type.              | * Requires wider code changes (new codecs/table wiring/request/response updates).<br/> * Higher migration and compatibility test scope.<br/> * More rollout complexity than chosen approach. |
 
 ## 3. Upgrades
 Add a new feature in `OMLayoutFeature`:

From 69b722097b9187b2a9fb5198d578b95f6d38495f Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Wed, 25 Feb 2026 22:06:35 +0530
Subject: [PATCH 06/12] Added some side notes and descriptions

---
 .../content/design/mpu-gc-optimization.md     | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index dc5f119cd5c4..bcfeb1014ac7 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -45,6 +45,10 @@ Presently Ozone has several overheads when uploading large files via Multipart u
 
 **Implications:**
 1. Each MPU part commit reads the full `OmMultipartKeyInfo`, deserializes it, adds one part, serializes it, and writes it back (HDDS-10611).
+<br>
+```
+Side note: This is a common pattern in regular open key writes as well, but the MPU case is more severe due to the growing part list and more frequent updates.
+```
 2. RocksDB WAL logs each full write → WAL growth (HDDS-8238).
 3. GC pressure grows with the size of the object (HDDS-10611).
 
@@ -137,6 +141,11 @@ Keep `multipartInfoTable` for MPU metadata, and store part rows in `multipartPar
 * Prefix scan for all parts in one upload uses:
   * `uploadId(UTF-8 bytes)` + `0x00`
 
+```text
+Note: The null byte separator ensures that the "uploadId" is properly delimited from the "partNumber" in the byte encoding, allowing for correct lexicographical ordering.
+```
+
+#### MultipartKeyInfo Structure
 ```protobuf
 message MultipartKeyInfo {
     required string uploadID = 1;
@@ -214,7 +223,7 @@ Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=10}
 Value: OmMultipartPartInfo{partNumber=10, partName=".../part10", ...}
 ```
 
-`multipartPartsTable` (encoded key bytes):
+`multipartPartsTable` (encoded key sample):
 ```text
 uploadId = "abc123-uuid-456"
 partNumber = 2
@@ -548,10 +557,10 @@ multipartPartsTable[OmMultipartPartKey{uploadId="upload-001", partNumber=2}] ->
 
 #### Pros and Cons
 
-|                      | Pros                                                                                                                                                                                                   | Cons                                                                                                                                                                                         |
-|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Chosen Approach      | * Minimal migration risk<br/> * Reuses existing `OmMultipartKeyInfo` message<br/>* Easiest incremental rollout with `schemaVersion` gating.<br/> * Lower implementation impact request/response paths. | * Carries complexity for mixed code (same table serving legacy + split metadata modes).<br/> * Still coupled to `OmMultipartKeyInfo`.<br/> * More conditional logic over time.               |
-| Alternative Approach | * Clean separation of concerns (`multipartMetadataTable` vs `multipartPartsTable`)<br/> * Clearer long-term model and easier mental mapping.<br/> * Avoids overloading legacy value type.              | * Requires wider code changes (new codecs/table wiring/request/response updates).<br/> * Higher migration and compatibility test scope.<br/> * More rollout complexity than chosen approach. |
+|                      | Pros                                                                                                                                                                                                                                                                  | Cons                                                                                                                                                                                         |
+|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Chosen Approach      | * Minimal migration risk<br/> * Reuses existing `OmMultipartKeyInfo` message<br/>* Easiest incremental rollout with `schemaVersion` gating, so finalization of upgrades do not affect existing key writes.<br/> * Lower implementation impact request/response paths. | * Carries complexity for mixed code (same table serving legacy + split metadata modes).<br/> * Still coupled to `OmMultipartKeyInfo`.<br/> * More conditional logic over time.               |
+| Alternative Approach | * Clean separation of concerns (`multipartMetadataTable` vs `multipartPartsTable`)<br/> * Clearer long-term model and easier mental mapping.<br/> * Avoids overloading legacy value type.                                                                             | * Requires wider code changes (new codecs/table wiring/request/response updates).<br/> * Higher migration and compatibility test scope.<br/> * More rollout complexity than chosen approach. |
 
 ## 3. Upgrades
 Add a new feature in `OMLayoutFeature`:

From 949602c5aa914db054ad6c9ed05334e17c0fc2ab Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Wed, 25 Feb 2026 22:23:39 +0530
Subject: [PATCH 07/12] Add clarification regarding sorting of part keys

---
 hadoop-hdds/docs/content/design/mpu-gc-optimization.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index bcfeb1014ac7..28769423129a 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -145,6 +145,8 @@ Keep `multipartInfoTable` for MPU metadata, and store part rows in `multipartPar
 Note: The null byte separator ensures that the "uploadId" is properly delimited from the "partNumber" in the byte encoding, allowing for correct lexicographical ordering.
 ```
 
+The parts are stored in lexicographical order by uploadID and part number, which complies with the S3 specifications for ordering of ListPart and ListMultipartUpload operations.
+
 #### MultipartKeyInfo Structure
 ```protobuf
 message MultipartKeyInfo {

From cf14bd9b2cdfdb74b4e109d713dfe1c54ca8478a Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Wed, 25 Feb 2026 23:18:32 +0530
Subject: [PATCH 08/12] Modify protobuf object, mark some fields as deprecated,
 pull in common info to metadata

---
 .../content/design/mpu-gc-optimization.md     | 21 ++++++++++---------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index 28769423129a..b90054e4e782 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -97,15 +97,12 @@ Split MPU metadata into:
 message MultipartPartInfo {
   required string partName = 1;
   required uint32 partNumber = 2;
-  required string volumeName = 3;
-  required string bucketName = 4;
-  required string keyName = 5;
-  required uint64 dataSize = 6;
-  required uint64 modificationTime = 7;
-  repeated KeyLocationList keyLocationList = 8;
-  repeated hadoop.hdds.KeyValue metadata = 9;
-  optional FileEncryptionInfoProto fileEncryptionInfo = 10;
-  optional FileChecksumProto fileChecksum = 11;
+  required uint64 dataSize = 3;
+  required uint64 modificationTime = 4;
+  repeated KeyLocationList keyLocationList = 5;
+  repeated hadoop.hdds.KeyValue metadata = 6;
+  optional FileEncryptionInfoProto fileEncryptionInfo = 7;
+  optional FileChecksumProto fileChecksum = 8;
 }
 ```
 
@@ -154,12 +151,16 @@ message MultipartKeyInfo {
     required uint64 creationTime = 2;
     required hadoop.hdds.ReplicationType type = 3;
     optional hadoop.hdds.ReplicationFactor factor = 4;
-    repeated PartKeyInfo partKeyInfoList = 5;
+    repeated PartKeyInfo partKeyInfoList = 5; [deprecated = true]
     optional uint64 objectID = 6;
     optional uint64 updateID = 7;
     optional uint64 parentID = 8;
     optional hadoop.hdds.ECReplicationConfig ecReplicationConfig = 9;
     optional uint32 schemaVersion = 10; // default 0
+    // Following is pulled in from part info as these will not change for a given part number
+    required string volumeName = 11;
+    required string bucketName = 12;
+    required string keyName = 13;
 }
 ```
 

From a47277f6313fb0bab5c5310dbc29b8929e072e8f Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Fri, 27 Feb 2026 14:57:44 +0530
Subject: [PATCH 09/12] Address review comments

---
 .../content/design/mpu-gc-optimization.md     | 48 +++++++++++--------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index b90054e4e782..1d934b96680e 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -97,12 +97,14 @@ Split MPU metadata into:
 message MultipartPartInfo {
   required string partName = 1;
   required uint32 partNumber = 2;
-  required uint64 dataSize = 3;
-  required uint64 modificationTime = 4;
-  repeated KeyLocationList keyLocationList = 5;
-  repeated hadoop.hdds.KeyValue metadata = 6;
-  optional FileEncryptionInfoProto fileEncryptionInfo = 7;
-  optional FileChecksumProto fileChecksum = 8;
+  repeated KeyLocationList keyLocationList = 6;
+  required uint64 dataSize = 7;
+  required uint64 modificationTime = 8;
+  required uint64 objectID = 9;
+  required uint64 updateID = 10;
+  repeated hadoop.hdds.KeyValue metadata = 11;
+  optional FileEncryptionInfoProto fileEncryptionInfo = 12;
+  optional FileChecksumProto fileChecksum = 13;
 }
 ```
 
@@ -134,12 +136,17 @@ Keep `multipartInfoTable` for MPU metadata, and store part rows in `multipartPar
   * `uploadId` (`String`)
   * `partNumber` (`int32`)
 * Persisted key bytes are encoded as:
-  * `uploadId(UTF-8 bytes)` + `0x00` + `partNumber(4-byte big-endian int)`
+  * `uploadId(UTF-8 bytes)` + `'/' (0x2f)` + `partNumber(4-byte big-endian int)`
 * Prefix scan for all parts in one upload uses:
-  * `uploadId(UTF-8 bytes)` + `0x00`
+  * `uploadId(UTF-8 bytes)` + `'/' (0x2f)`
 
 ```text
-Note: The null byte separator ensures that the "uploadId" is properly delimited from the "partNumber" in the byte encoding, allowing for correct lexicographical ordering.
+`OmMultipartPartKey.toString()` returns:
+  - full key:   "<uploadId>/<partNumber>"
+  - prefix key: "<uploadId>" (used only as in-memory prefix object)
+
+Example:
+  OmMultipartPartKey.of("abc123-uuid-456", 2).toString() == "abc123-uuid-456/2"
 ```
 
 The parts are stored in lexicographical order by uploadID and part number, which complies with the S3 specifications for ordering of ListPart and ListMultipartUpload operations.
@@ -157,10 +164,10 @@ message MultipartKeyInfo {
     optional uint64 parentID = 8;
     optional hadoop.hdds.ECReplicationConfig ecReplicationConfig = 9;
     optional uint32 schemaVersion = 10; // default 0
-    // Following is pulled in from part info as these will not change for a given part number
-    required string volumeName = 11;
-    required string bucketName = 12;
-    required string keyName = 13;
+    // this is being pull up from the part information as this wil not change per part for a given key
+    optional string volumeName = 11;
+    optional string bucketName = 12;
+    optional string keyName = 13;
 }
 ```
 
@@ -217,12 +224,15 @@ OmMultipartKeyInfo {
 `multipartPartsTable` (logical keys):
 ```text
 Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=1}
+      String form: "abc123-uuid-456/1"
 Value: OmMultipartPartInfo{partNumber=1, partName=".../part1", ...}
 
 Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=2}
+      String form: "abc123-uuid-456/2"
 Value: OmMultipartPartInfo{partNumber=2, partName=".../part2", ...}
 ...
 Key:   OmMultipartPartKey{uploadId="abc123-uuid-456", partNumber=10}
+      String form: "abc123-uuid-456/10"
 Value: OmMultipartPartInfo{partNumber=10, partName=".../part10", ...}
 ```
 
@@ -231,8 +241,8 @@ Value: OmMultipartPartInfo{partNumber=10, partName=".../part10", ...}
 uploadId = "abc123-uuid-456"
 partNumber = 2
 
-encodedKey = [61 62 63 31 32 33 2d 75 75 69 64 2d 34 35 36 00 00 00 00 02]
-             [--------------uploadId UTF-8---------------][00][--int32 BE--]
+encodedKey = [61 62 63 31 32 33 2d 75 75 69 64 2d 34 35 36 2f 00 00 00 02]
+             [--------------uploadId UTF-8---------------][2f][-int32 BE-]
 ```
 
 #### 2.1.2 Alternative Approach: Add `multipartMetadataTable` + `multipartPartsTable`
@@ -356,7 +366,7 @@ After:  partKeyInfoList=[{part=1,size=64MB},{part=2,size=40MB}]
   * `schemaVersion=0`: same old inline behavior.
   * `schemaVersion=1`:
     * create `multipartPartKey = OmMultipartPartKey(uploadId, partNumber)`,
-    * write `multipartPartTable[multipartPartKey] = OmMultipartPartInfo{openKey, partName, partNumber, size, metadata, locations}`
+    * write `multipartPartTable[multipartPartKey] = OmMultipartPartInfo{openKey, partName, partNumber, dataSize, modificationTime, objectID, updateID, metadata, keyLocationList, fileEncryptionInfo?, fileChecksum?}`
     * keep current part open key in `openKeyTable` (needed later by list/complete/abort),
     * if overwriting an existing part row, delete old part open key and adjust quota.
 * `multipartInfoTable[multipartKey]` is still updated for metadata/updateID.
@@ -365,7 +375,7 @@ After:  partKeyInfoList=[{part=1,size=64MB},{part=2,size=40MB}]
 Example (`schemaVersion=1`):
 ```text
 multipartPartTable[OmMultipartPartKey{uploadId="upload-001", partNumber=2}] ->
-  OmMultipartPartInfo{partNumber=2, openKey=/vol1/b1/fileA#client-77, size=40MB}
+  OmMultipartPartInfo{partNumber=2, openKey=/vol1/b1/fileA#client-77, dataSize=40MB}
 ```
 
 ##### Multipart Upload Complete
@@ -392,8 +402,8 @@ Complete [1,2,3] -> keyTable[/vol1/b1/fileA] written, MPU rows removed
   * `schemaVersion=0`: use inline `partKeyInfoMap`.
   * `schemaVersion=1`:
     * scan `multipartPartTable` with prefix `OmMultipartPartKey.prefix(uploadId)`,
-    * rebuild `PartKeyInfo` view from `OmMultipartPartInfo`,
-    * pull block locations from stored part metadata / open keys as needed.
+    * rebuild `PartKeyInfo` view directly from `OmMultipartPartInfo` (including locations, metadata, objectID/updateID, and optional encryption/checksum fields),
+    * track part open keys from part metadata for cleanup.
 * Perform same user-facing validation (order, existence, eTag/partName, min size).
 * Commit final key to `keyTable`.
 * Cleanup:

From 9591c736db5790247e06100703c24066d496a8a8 Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Thu, 5 Mar 2026 12:47:15 +0530
Subject: [PATCH 10/12] Added explicit eTag field

---
 hadoop-hdds/docs/content/design/mpu-gc-optimization.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index 1d934b96680e..93931043fee2 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -102,7 +102,7 @@ message MultipartPartInfo {
   required uint64 modificationTime = 8;
   required uint64 objectID = 9;
   required uint64 updateID = 10;
-  repeated hadoop.hdds.KeyValue metadata = 11;
+  required string eTag = 11;
   optional FileEncryptionInfoProto fileEncryptionInfo = 12;
   optional FileChecksumProto fileChecksum = 13;
 }

From 4a660203d54a2c9ed63512031294aea2e285e458 Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Thu, 5 Mar 2026 15:43:40 +0530
Subject: [PATCH 11/12] Mark fields as optional

---
 .../content/design/mpu-gc-optimization.md     | 24 +++++++++++--------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index 93931043fee2..00085914d275 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -95,19 +95,23 @@ Split MPU metadata into:
 **New MultipartPartInfo Structure:**
 ```protobuf
 message MultipartPartInfo {
-  required string partName = 1;
-  required uint32 partNumber = 2;
-  repeated KeyLocationList keyLocationList = 6;
-  required uint64 dataSize = 7;
-  required uint64 modificationTime = 8;
-  required uint64 objectID = 9;
-  required uint64 updateID = 10;
-  required string eTag = 11;
-  optional FileEncryptionInfoProto fileEncryptionInfo = 12;
-  optional FileChecksumProto fileChecksum = 13;
+  optional string partName = 1;
+  optional uint32 partNumber = 2;
+  optional string eTag = 3;
+  optional KeyLocationList keyLocationList = 4; 
+  optional uint64 dataSize = 5;
+  optional uint64 modificationTime = 6;
+  optional uint64 objectID = 7;
+  optional uint64 updateID = 8;
+  optional FileEncryptionInfoProto fileEncryptionInfo = 9;
+  optional FileChecksumProto fileChecksum = 10;
 }
 ```
 
+```
+Note: Here we are setting all fields to optional because Protobuf states that required field should be enforced in the application level. Also proto3 doesn't support required fields.
+```
+
 ### Comparison: V1 (legacy) vs V2
 | Metric              | Current (V1)                  | Split-Table (V2)                                 |
 |:--------------------|:------------------------------|:-------------------------------------------------|

From 2349e0545e7f9cc7f682f95d918bb60f711b03d3 Mon Sep 17 00:00:00 2001
From: Abhishek Pal <pal.abhishek03012001@gmail.com>
Date: Fri, 6 Mar 2026 18:44:35 +0530
Subject: [PATCH 12/12] Add owner and ACL information for the MPU key at
 metadata level

---
 hadoop-hdds/docs/content/design/mpu-gc-optimization.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
index 00085914d275..ad726abdd320 100644
--- a/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
+++ b/hadoop-hdds/docs/content/design/mpu-gc-optimization.md
@@ -172,6 +172,8 @@ message MultipartKeyInfo {
     optional string volumeName = 11;
     optional string bucketName = 12;
     optional string keyName = 13;
+    optional string ownerName = 14;
+    repeated OzoneAclInfo acls = 15;
 }
 ```