Skip to content

Fix typed proxy access to generic skeleton event storage#394

Open
rudresh-systream wants to merge 3 commits intoeclipse-score:mainfrom
rudresh-systream:bugfix/311-generic-skeleton-typed-proxy-storage
Open

Fix typed proxy access to generic skeleton event storage#394
rudresh-systream wants to merge 3 commits intoeclipse-score:mainfrom
rudresh-systream:bugfix/311-generic-skeleton-typed-proxy-storage

Conversation

@rudresh-systream
Copy link
Copy Markdown

@rudresh-systream rudresh-systream commented May 8, 2026

Summary:

  • Fixed typed ProxyEvent sample access so it no longer depends on interpreting shared memory as EventDataStorage.
  • Added raw-slot based access using sample size/alignment, making typed proxies compatible with GenericSkeleton-created event storage.
  • Kept existing typed skeleton behavior compatible.
  • Updated ProxyEvent and event storage related tests/resources.

Implemented a compatibility fix for the GenericSkeleton ↔ typed ProxyEvent shared-memory interaction described in #311.

Root cause

The issue originates from the fact that GenericSkeleton and typed skeletons create/interprete EventDataStorage differently.

For typed skeletons, the storage is created through:

EventDataStorage<SampleType>

which internally creates a typed DynamicArray<SampleType>.

For GenericSkeleton/GenericSkeletonEvent, the storage is instead created using:

EventDataStorage<std::max_align_t>

with the storage size being calculated in units of std::max_align_t.

As a result, the underlying raw memory region may be large enough, but the metadata/layout interpretation of the DynamicArray<T> becomes incompatible between the producer and consumer sides.

The typed proxy path was still assuming a typed EventDataStorage<T> representation and therefore interpreted slot count/layout using the wrong type information. This creates an incompatibility whenever:

  • a GenericSkeleton publishes samples
  • and a typed proxy consumes them.

This issue is especially visible because the DynamicArray element count depends on the template type T, which differs between the generic and typed paths.


Investigated solution approaches

Several approaches were considered:

1. Extending DynamicArray<T>

One option was to add a constructor allowing externally managed/preallocated storage while preserving a typed DynamicArray<T> interface.

This was not chosen because:

  • it would introduce LoLa/event-storage specific semantics into a generic container abstraction,
  • it increases complexity in DynamicArray,
  • and it still keeps the shared-memory interpretation tightly coupled to template typing.

2. Replacing all event storage with DynamicArray<std::byte>

Another option was to fully migrate all event storage handling to raw byte storage.

While architecturally clean, this would require significantly broader refactoring across:

  • typed skeletons,
  • generic skeletons,
  • proxies,
  • allocation logic,
  • and existing tests.

Given the scope/risk, this approach was considered too invasive for the current issue.

3. Fixing typed proxy access to use raw slot storage (chosen solution)

The implemented solution changes typed proxy sample access so it no longer depends on interpreting the shared memory as EventDataStorage<T>.

Instead:

  • the proxy accesses the shared-memory event region through raw slot memory/meta information,

  • and calculates slot/sample access using:

    • sizeof(T)
    • alignof(T)

This keeps the existing shared-memory layout compatible while allowing typed proxies to correctly consume data produced by GenericSkeleton.

This approach was selected because it:

  • directly resolves the interoperability problem,
  • minimizes architectural disruption,
  • preserves existing typed skeleton behavior,
  • avoids changing generic container semantics,
  • and provides a more robust producer/consumer compatibility model going forward.

Files changed

score/mw/com/impl/bindings/lola/proxy_event.h

Main functional fix.

Updated typed ProxyEvent<T> sample access logic to avoid relying on interpreting the shared-memory region as EventDataStorage<T>.

Instead, access is now performed through raw event slot metadata and pointer arithmetic based on the actual sample type size/alignment.

This makes typed proxies compatible with GenericSkeleton-created storage.


score/mw/com/impl/bindings/lola/skeleton_memory_manager.h

score/mw/com/impl/bindings/lola/skeleton_memory_manager.cpp

Updated/supporting changes around event storage creation and raw slot metadata handling.

These changes ensure both generic and typed paths expose compatible storage information to consumers.


score/mw/com/impl/bindings/lola/skeleton.cpp

Adjusted skeleton-side integration to align with the updated event storage access model and metadata usage.


score/mw/com/impl/bindings/lola/proxy_event_test.cpp

score/mw/com/impl/bindings/lola/test/proxy_event_test_resources.cpp

score/mw/com/impl/bindings/lola/test/proxy_event_test_resources.h

Updated and extended test resources/coverage to validate:

  • typed proxy interaction with GenericSkeleton-created event storage,
  • slot access correctness,
  • and compatibility of the updated shared-memory interpretation model.

Validation performed

The following builds/tests were executed successfully after the changes:

bazel build //score/mw/com/impl/bindings/lola:proxy
bazel build //score/mw/com/impl/bindings/lola:skeleton

bazel test //score/mw/com/impl/bindings/lola:proxy_event_test
bazel test //score/mw/com/impl/bindings/lola:generic_proxy_event_test
bazel test //score/mw/com/impl/bindings/lola:skeleton_test
bazel test //score/mw/com/impl/bindings/lola:event_data_storage_test

All relevant LoLa event/proxy/skeleton tests passed successfully after the fix.

Signed-off-by: Rudresh Shirwal <rudresh.shirwal@systream.io>
Signed-off-by: Rudresh Shirwal <rudresh.shirwal@systream.io>
@rudresh-systream rudresh-systream force-pushed the bugfix/311-generic-skeleton-typed-proxy-storage branch from 37ebeb3 to dcb34ed Compare May 8, 2026 11:01
@rudresh-systream
Copy link
Copy Markdown
Author

Added a dedicated verification application to prove the fix for #311 in the Second Commit.

Why this application was created

The bug is specifically about interoperability between:

GenericSkeleton producer
normal typed Proxy consumer

So the verification app was created to reproduce that exact runtime architecture instead of only relying on unit tests.

The app was added under:

score/mw/com/test/generic_skeleton_typed_proxy/

It contains one binary that can run in two modes:

--mode generic_skeleton
--mode typed_proxy

How the application was created

1. GenericSkeleton side

In generic_skeleton mode, the application creates a GenericSkeleton, offers the LoLa service, creates the event storage, and continuously sends samples.

It registers the service events using explicit sample metadata:

sample size
sample alignment
event name

Both required events from the typed interface were registered so the typed proxy can create all required event control views correctly.

This side proves the producer is using the GenericSkeleton storage path that originally triggered the bug.

2. Typed proxy side

In typed_proxy mode, the same binary starts the existing typed proxy flow.

It searches for the service, instantiates the typed proxy, subscribes to the event, receives samples through callback, validates the received data, and exits cleanly after the configured number of cycles.

This side proves the consumer is using the normal typed proxy path, not a GenericProxy.

Build and integration test

The app was built with:

bazel build //score/mw/com/test/generic_skeleton_typed_proxy:generic_skeleton_typed_proxy
bazel build //score/mw/com/test/generic_skeleton_typed_proxy:generic_skeleton_typed_proxy-pkg

The automated integration test was executed with:

bazel test //score/mw/com/test/generic_skeleton_typed_proxy/integration_test:generic_skeleton_typed_proxy

The test passed.

Manual runtime verification

The application was also run manually as two real processes from the repo.

Before running, runtime configs were copied to the repo root:

mkdir -p etc
cp score/mw/com/test/generic_skeleton_typed_proxy/mw_com_config.json etc/mw_com_config.json
cp score/mw/com/test/generic_skeleton_typed_proxy/logging.json etc/logging.json

Terminal 1 ran the GenericSkeleton producer:

bazel-bin/score/mw/com/test/generic_skeleton_typed_proxy/generic_skeleton_typed_proxy \
  --mode generic_skeleton \
  --cycle-time 40 \
  --num-cycles 0

Terminal 2 ran the typed proxy consumer:

bazel-bin/score/mw/com/test/generic_skeleton_typed_proxy/generic_skeleton_typed_proxy \
  --mode typed_proxy \
  --num-cycles 25

Evidence from GenericSkeleton terminal

The GenericSkeleton side successfully created and offered the LoLa service:

Recreating SHM of Skeleton (S: 6432 I: 1)
Created shared-memory-object for DATA (S: 6432 I: 1)
Successfully created offer path /tmp/mw_com_lola/service_discovery/6432/1
created flag file for service: /tmp/mw_com_lola/service_discovery/6432/1/...

Then it continuously sent samples:

GenericSkeleton sent sample 0
GenericSkeleton sent sample 1
GenericSkeleton sent sample 2
...
GenericSkeleton sent sample 99
GenericSkeleton sent sample 100
GenericSkeleton sent sample 101
...
GenericSkeleton sent sample 123

This proves the GenericSkeleton producer was active and publishing samples through GenericSkeleton-created shared memory.

Evidence from typed proxy terminal

The typed proxy successfully discovered the service:

score/cp60/MapApiLanesStamped: Running as proxy, looking for services
score/cp60/MapApiLanesStamped: Found service, instantiating proxy
score/cp60/MapApiLanesStamped: Subscribing to service

Then the proxy callback was invoked and valid samples were received:

score/cp60/MapApiLanesStamped: Callback called
score/cp60/MapApiLanesStamped: Received sample: 99
score/cp60/MapApiLanesStamped: Received sample: 100
score/cp60/MapApiLanesStamped: Proxy received valid data

The proxy continued receiving sequential samples:

Received sample: 101
Proxy received valid data

Received sample: 102
Proxy received valid data

Received sample: 103
Proxy received valid data
...
Received sample: 123
Proxy received valid data

Finally, the typed proxy unsubscribed and terminated cleanly:

score/cp60/MapApiLanesStamped: Unsubscribing...
score/cp60/MapApiLanesStamped: and terminating, bye bye

How this proves the bug resolution

Before the fix, this exact combination was broken:

GenericSkeleton-created event storage
+
normal typed ProxyEvent<T> consumer

The typed proxy could interpret the shared-memory event storage incorrectly because GenericSkeleton and typed skeletons used different EventDataStorage representations.

The fix changed typed proxy sample access so it uses raw event slot access with the actual sample type’s sizeof(T) and alignof(T), instead of depending on the producer’s typed EventDataStorage<T> representation.

The verification app proves the fix because:

  1. The producer is a real GenericSkeleton.
  2. The consumer is a real typed proxy.
  3. They communicate through the real LoLa shared-memory path.
  4. The proxy discovers the offered service.
  5. The proxy subscribes successfully.
  6. The proxy callback is triggered.
  7. The proxy receives sequential samples from the GenericSkeleton.
  8. The proxy validates the samples and prints Proxy received valid data.
  9. The proxy exits cleanly after receiving the configured samples.

This confirms the typed proxy can now correctly consume samples from GenericSkeleton-created event storage, which is the issue described in #311.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant