[RFC] Blob Storage Design

### 🚀 The feature, motivation and pitch

This proposal details the shared design components between the ExecuTorch features [‘backend weight sharing’](https://github.com/pytorch/executorch/issues/8121) and [‘data separation’](https://github.com/pytorch/executorch/issues/8118)

At a high level, we introduce an opaque blob storage for backends to add and request data from. Ahead-of-time, backends place opaque blobs into the blob storage under a unique key. At runtime, backends request the opaque blobs back using those unique keys.

### Motivation
1. Provides a mechanism for backends to serialize and load shared data (Link GH Issue)
   - Ahead-of-time: Backends can share data in the PTE file, instead of duplicating shared data across processed blobs.
   - Runtime: backends can load the shared data.
2. Provides an interface for enabling [data separation](https://github.com/pytorch/executorch/issues/8118).
   - Ahead-of-time: backends can specify whether shared data is stored in an external file or not.
   - Runtime: backends can load the shared data.

### RFC 

## AoT: NamedBlobStore
We first introduce the concept of the ‘NamedBlobStore’. This allows delegates to serialize bytes with keys. These bytes can be retrieved from the NamedDataMap at runtime (see section Runtime: NamedDataMap). Delegates can serialize information shared across methods or subgraphs into the NamedBlobStore, and retrieve them when initializing either method or subgraph.

For data that is saved to multiple external files, users can a field ‘external’ to indicate the desired grouping. Eg. blob1, blob2 in ‘external_file1’, and blob3 in ‘external_file2’.
```python
class NamedBlobStore:
  """
  NamedBlobStore manages the blobs that delegates want to share. Backends add bytes 
  to the store under a unique key. These bytes can be retrieved at runtime using the
  same key with the NamedDataMap. 
  """
  def add_named_blob(key: str, blob: bytes, alignment: int, external: Optional[str]) -> bool:
    """
    Adds a named blob to the NamedBlobStore.
    Args:
	key (str): key used to serialize bytes.
	blob (bytes): Bytes being requested to be serialized.
	alignment (int): alignment for bytes to be serialized with.
	external (Optional[str]): the external filename that this blob is saved to.
    Return:
	bool: true if the blob was successfully added, false if not.
    """
```
The NamedBlobStore is part of EdgeProgramManager, and is passed to ExecutorchProgramManager for serialization at to_executorch.

## AoT: Preprocess
**Preprocess**
We provide the ‘NamedBlobStore’ to backends when processing their lowered graphs. While processing, backends can add to the NamedBlobStore with any data they wish to be shared.

```python
class Backend(BackendDetails):
  @staticmethod
  def preprocess(
    edge_program, 
    compile_specs,
    named_blob_store: NamedBlobStore,
) -> PreprocessResults:
```

**Preprocess All**

To further address backend weight sharing, we introduce a new API preprocess_all. The limitation with the current preprocess API  is that backends can only process a single graph at a time. As a result, they have no information about the larger model and shared components with other graphs that are delegated to the same backend. The new preprocess_all API enables backends to process and lower all the delegated graphs from the model at once. This allows backends to identify the shared components (weights, tensors, etc.) from all the ExportedPrograms when producing their backend payloads. The blob storage service can then be used to serialize any shared components (weights, constant data, etc.) through the named blob store.

```python
def preprocess_all(
    exported_programs: Dict[Str, List[ExportedProgram]], 
    named_blob_store: NamedBlobStore)
 -> Dict[Str, List[PreprocessResult]]:
    """
    Args:
      exported_programs (Dict[str, List[ExportedProgram]]): 
        This is a map mapping the method name with a list of all the 
        partitions that were partitioned by the partitioner. If backend_id 
        was specified instead of a partitioner, then the list would be a 
        single element list containing the exported program corresponding 
        to that method's ExportedProgram.
      named_blob_store (NamedBlobStore): 
        blob store that delegates can use to request bytes to be serialized. 
        Backends serialize bytes with a string key. At runtime, they can use 
        this same key to request the same bytes back.
    Return:
      Dict[Str, List[PreprocessResult]]: 
        Must produce one preprocess result for every ExportedProgram in exported_programs.
        The PreprocessResult for method [str] and index [i] corresponds with the 
        ExportedProgram at exported_programs[str][i].
	"""
```

### Runtime: NamedDataMap
We define an interface called the NamedDataMap (NDM) that looks up data based on string keys. The NDM views over ‘shared_delegate_data’ in the PTE file and ‘shared_external_data’ in the external data file. 

<img src="https://github.com/user-attachments/assets/12c436d1-2eaf-44b9-a286-5e1b0466e871" alt="drawing" width="700"/>

The ExecuTorch-provided NDM will use a linear or binary search over the keys to avoid pulling in C++ libraries and increasing the core runtime binary search.

For the external data case, users can bring their own implementation, using e.g. std::unordered_map for faster lookup.
```C++
// NamedDataMap interface.
class NamedDataMap {
 public:
  virtual ~NamedDataMap () = default;

  // Get data by key.
  virtual Result<FreeableBuffer> get_data(const char* key) const = 0;

  // Get number of keys.
  virtual int get_num_keys() const = 0;

  // Get key at index.
  virtual Result<const char*> get_key_at(int index);
};
```
The NDM is passed to backend.init, and backends use it to retrieve data.

The NDM loads upon request and provides read-only data. If a backend wants to mutate the data, they should copy the data, mutate it, and then free the original. Ideally, mutated data is stored in a backend-wide cache so subsequent methods can access it without invoking another load.
Delegate flow
```C++
// Sample implementation for a backend.

// A backend-specific data cache to store data shared within that backend. 
// This is owned and implemented by the backend. The backend must implement its own locking.
backend::shared_data shared_data_cache = nullptr;

--- 
Result<DelegateHandle*> init(
      BackendInitContext& context,
      FreeableBuffer* processed,
      ArrayRef<CompileSpec> compile_specs,
      NamedDataMap shared_data_map,
) const override {  
  ...
  // Resolve external data when we come across it in the preprocessed graph.
  // Note: backends should lock access to shared data, as multiple threads 
  // could load models simultaneously.
  if (shared_data_cache.find(key) == shared_data_cache.end()) {
    Result<FreeableBuffer> data = shared_data_map.get_data(key);
    // Case 1: delegates that mutate data at runtime. 
    if (mutate) {
      // Copy and mutate data.
      auto initialized_data = backend::initialize_data(data);
      // Add to shared_data_cache.
      shared_data_cache.insert(initialized_data);
      // Free the original data.
      data->Free();
    } 
    // Case 2: delegates do not mutate data at runtime.
    else {
      shared_data_cache.insert(data);
    } 
  }
  ...
}
```
### User flow
User flow with shared data inside the PTE file is unchanged.
Example user-flow with data in an external file.
```C++
// Example ExecuTorch runtime flow

// Load program.
Result<FileDataLoader> program_loader = FileDataLoader::from(pte_file_path);
Result<Program> program = Program::load(program_loader);

// Load shared data.
Result<FileDataLoader> data_loader = FileDataLoader::from(data_file_path);
Result<CustomNamedDataMap> custom_named_data_map = CustomNamedDataMap::load(data_loader);

// Pass into method.
Result<Method> method = program->load_method(
    "forward", 			// method name
    memory_manager, 		// memory manager
    nullptr, 			// event_tracer
    custom_named_data_map, 	// external data
);

Error err = method->execute();

```

### Schema Changes
Note that the runtime doesn’t depend on a specific data file format. The NamedDataMap can interface with data inside the PTE, any custom file format, or wrap around some separate service. As an initial example of an external file, you can check out [FlatTensor](https://github.com/pytorch/executorch/blob/d359b8010e0034bdcf37b11bf34739263b02f166/extension/flat_tensor/serialize/flat_tensor_schema.py), please note that it is still experimental and under development.

**PTE File**
We introduce new tables to the existing ‘program.fbs’ schema, for when shared data is stored inside the PTE.
This parallels the external data file schema below. If the NamedData and corresponding segments are removed and placed in an external file, the PTE file + External File should execute as expected.
```
table NamedData {
  // The unique id of the data blob.
  key: string;

  // Program.segments index where the data for this NamedBlob is stored.
  segment_index: uint32;
}

table Program {
  ...
  segments: [DataSegment];
  ...
  named_data: [NamedData];
}
```

**External File**
We introduce a new file schema for data-only files. The NamedBlobStore is serialized into this schema. 

This parallels the PTE file schema changes. If the NamedData and corresponding segments are placed into the PTE file, the PTE file should execute as expected.
```
table NamedData {
  // The unique id of the data blob.
  key: string;

  // FlatData.segments index where the data for this NamedBlob is stored.
  segment_index: uint32;
}

// FlatData is a flatbuffer-based format for storing and loading opaque data.
table FlatData {
  // Schema version.
  version: uint32;

  // List of blobs and references to their location. 
  named_data: [NamedData];

  // List of data segments that follow the FlatData file, sorted by
  // offset. Elements in this schema can refer to these segments by index.
  segments: [DataSegment];
}

root_type FlatData;
```

cc @mcr229 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Blob Storage Design #8122

🚀 The feature, motivation and pitch

Motivation

RFC

AoT: NamedBlobStore

AoT: Preprocess

Runtime: NamedDataMap

User flow

Schema Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Blob Storage Design #8122

Description

🚀 The feature, motivation and pitch

Motivation

RFC

AoT: NamedBlobStore

AoT: Preprocess

Runtime: NamedDataMap

User flow

Schema Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions