- The existing mapping.py logic only considers top-level messages and ignores nested message types.
name_class_map = {}
for file in PROTO_FILES:
for message_name in file.DESCRIPTOR.message_types_by_name:
message_type = getattr(file, message_name)
name_class_map[message_type.DESCRIPTOR.full_name] = message_type
- For example, given the following message, the above logic will find TST.GroupByArchive but will not detect the nested message TST.GroupByArchive.GroupNodeArchive.
message GroupByArchive {
message AggNodeArchive {
required .TSCE.CellCoordinateArchive formula_coord = 1;
optional .TST.AccumulatorArchive accum = 2;
repeated .TST.GroupByArchive.AggNodeArchive child = 3;
}
message GroupNodeArchive {
message FormatManagerArchive {
message RowSetArchive {
repeated .TSP.UUID row_uids = 1;
}
optional .TSCE.CellValueArchive cell_value = 1;
repeated .TSK.FormatStructArchive formats = 2;
repeated .TST.GroupByArchive.GroupNodeArchive.FormatManagerArchive.RowSetArchive row_sets = 3;
repeated .TSCE.IndexSetArchive row_uid_lookup_sets = 4;
}
required .TSP.UUID group_uid = 1;
repeated .TST.GroupByArchive.GroupNodeArchive child = 3;
repeated .TSP.UUID row_uid = 4;
repeated .TSCE.CellCoordinateArchive agg_formula_coords = 5;
optional .TST.GroupByArchive.GroupNodeArchive.FormatManagerArchive format_manager = 6;
optional .TSCE.CellValueArchive group_cell_value = 7;
optional .TSCE.IndexSetArchive row_indexes = 8;
optional .TSCE.IndexSetArchive row_lookup_uids = 9;
}
- I tested by updating the
mapping.py
name_class_map = {}
for file in PROTO_FILES:
for message_name in file.DESCRIPTOR.message_types_by_name:
message_type = getattr(file, message_name)
name_class_map[message_type.DESCRIPTOR.full_name] = message_type
id_name_map = {}
for k, v in list(TSPRegistryMapping.items()):
if v in name_class_map:
id_name_map[int(k)] = name_class_map[v]
else: # <-------- (1)
print(f"[-] [{__file__} not found {v}]")
I added the print statement at (1). Upon running the code, it can be observed that GroupNodeArchive is not found.
./keynote-parser |grep GroupNodeArchive
[-] not found TST.GroupByArchive.GroupNodeArchive]
- The solution for this, generated using an LLM, is as follows:
def collect_all_message_types(module, container, prefix=""):
for message_name, descriptor in container.message_types_by_name.items():
message_type = getattr(module, message_name)
full_name = message_type.DESCRIPTOR.full_name
name_class_map[full_name] = message_type
# Recursively collect nested types
collect_nested_types(message_type)
def collect_nested_types(parent_type):
for nested_desc in parent_type.DESCRIPTOR.nested_types:
nested_name = nested_desc.name
nested_type = getattr(parent_type, nested_name)
full_name = nested_type.DESCRIPTOR.full_name
name_class_map[full_name] = nested_type
# Recurse further if there are nested messages inside
collect_nested_types(nested_type)
# Run collection
name_class_map = {}
for file in PROTO_FILES:
collect_all_message_types(file, file.DESCRIPTOR)
mapping.pyI added the print statement at
(1). Upon running the code, it can be observed thatGroupNodeArchiveis not found.