Current Implementation Ignores Nested Protobuf Messages

* The existing mapping.py logic only considers top-level messages and ignores nested message types.

```python
 name_class_map = {}
    for file in PROTO_FILES:
        for message_name in file.DESCRIPTOR.message_types_by_name:
            message_type = getattr(file, message_name)
            name_class_map[message_type.DESCRIPTOR.full_name] = message_type
```

* For example, given the following message, the above logic will find TST.GroupByArchive but will not detect the nested message TST.GroupByArchive.GroupNodeArchive.
```
message GroupByArchive {
  message AggNodeArchive {
    required .TSCE.CellCoordinateArchive formula_coord = 1;
    optional .TST.AccumulatorArchive accum = 2;
    repeated .TST.GroupByArchive.AggNodeArchive child = 3;
  }

  message GroupNodeArchive {
    message FormatManagerArchive {
      message RowSetArchive {
        repeated .TSP.UUID row_uids = 1;
      }

      optional .TSCE.CellValueArchive cell_value = 1;
      repeated .TSK.FormatStructArchive formats = 2;
      repeated .TST.GroupByArchive.GroupNodeArchive.FormatManagerArchive.RowSetArchive row_sets = 3;
      repeated .TSCE.IndexSetArchive row_uid_lookup_sets = 4;
    }

    required .TSP.UUID group_uid = 1;
    repeated .TST.GroupByArchive.GroupNodeArchive child = 3;
    repeated .TSP.UUID row_uid = 4;
    repeated .TSCE.CellCoordinateArchive agg_formula_coords = 5;
    optional .TST.GroupByArchive.GroupNodeArchive.FormatManagerArchive format_manager = 6;
    optional .TSCE.CellValueArchive group_cell_value = 7;
    optional .TSCE.IndexSetArchive row_indexes = 8;
    optional .TSCE.IndexSetArchive row_lookup_uids = 9;
  }


```

* I tested by updating the `mapping.py`
```
    name_class_map = {}
    for file in PROTO_FILES:
        for message_name in file.DESCRIPTOR.message_types_by_name:
            message_type = getattr(file, message_name)
            name_class_map[message_type.DESCRIPTOR.full_name] = message_type

    id_name_map = {}
    for k, v in list(TSPRegistryMapping.items()):
        if v in name_class_map:
            id_name_map[int(k)] = name_class_map[v]
        else:                                                # <-------- (1)
            print(f"[-] [{__file__} not found {v}]")

```
I added the print statement at `(1)`. Upon running the code, it can be observed that `GroupNodeArchive` is not found.

```
./keynote-parser |grep GroupNodeArchive
[-]  not found TST.GroupByArchive.GroupNodeArchive]
```

* The solution for this, generated using an LLM, is as follows:

```python
def collect_all_message_types(module, container, prefix=""):
    for message_name, descriptor in container.message_types_by_name.items():
        message_type = getattr(module, message_name)
        full_name = message_type.DESCRIPTOR.full_name
        name_class_map[full_name] = message_type

        # Recursively collect nested types
        collect_nested_types(message_type)


def collect_nested_types(parent_type):
    for nested_desc in parent_type.DESCRIPTOR.nested_types:
        nested_name = nested_desc.name
        nested_type = getattr(parent_type, nested_name)
        full_name = nested_type.DESCRIPTOR.full_name
        name_class_map[full_name] = nested_type

        # Recurse further if there are nested messages inside
        collect_nested_types(nested_type)


# Run collection
name_class_map = {}
for file in PROTO_FILES:
    collect_all_message_types(file, file.DESCRIPTOR)

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current Implementation Ignores Nested Protobuf Messages #66

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Current Implementation Ignores Nested Protobuf Messages #66

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions