RDKEMW-13253- Dobby l2 fix by Sonajeya31 · Pull Request #433 · rdkcentral/Dobby

Sonajeya31 · 2026-04-15T07:37:45Z

Fix L2 test failures on GitHub Actions CI (cgroupv2 compatibility and GCOV coverage issues)

This PR addresses multiple cascading failures in the L2 test suite when running on GitHub Actions runners which use cgroupv2 (unified hierarchy) and have coverage instrumentation enabled.
Root Causes Fixed:

cgroupv2 Compatibility - The codebase assumed cgroupv1 (separate controller mounts). GitHub Actions uses cgroupv2 where all controllers share a single mount point. Updated DobbyEnv.cpp, IonMemoryPlugin.cpp, GpuPlugin.cpp, and DobbyTemplate.cpp to detect and handle cgroupv2. Also added runtime patching in test_utils.py to remove unsupported settings (swappiness, kernel memory, RT scheduling) from bundle configs.

GCOV Coverage Write Failures - Instrumented binaries (DobbyPluginLauncher) couldn't write .gcda files to /home/runner/work, causing hooks to exit with code 1 and breaking all container operations. Fixed by setting GCOV_PREFIX=/tmp/gcov at workflow level and passing GCOV environment to DobbyDaemon via sudo -E.

Network Setup Issues - DNS resolution and NAT not working in containers because eth0 was hardcoded but doesn't exist on GitHub runners, and systemd stub resolver doesn't work well in containers. Fixed by detecting default-route interface dynamically and forcing reliable DNS servers (1.1.1.1, 8.8.8.8).

OCI Hook Config Path - createContainer hooks run in container namespace and couldn't access host paths for config.json. Fixed by mounting config.json to /tmp/dobby_config.json inside container and adding fallback path resolution in DobbyPluginLauncher.

Invalid Interface Index in Networking - Netlink code was attempting to set addresses on interface index 0 (invalid), causing network setup failures. Added validation to reject invalid interface indices before setting addresses.

Test Procedure
Push PR to trigger GitHub Actions L2 test workflow
Verify all test groups pass:
basic_sanity_tests
container_manipulations
bundle_generation
plugin_launcher
command_line_containers
annotation_tests
start_from_bundle
thunder_plugin
network_tests
pid_limit_tests
memcr_tests

Type of Change
[Bug fix (non-breaking change which fixes an issue)
Requires Bitbake Recipe changes?
No - This PR contains only code fixes and CI workflow changes. No new dependencies, build options, or installed files are added.

Rebase to develop

Copilot

Pull request overview

This PR targets GitHub Actions L2 test stability by improving bundle handling and platform-specific behavior (cgroupv2/CI quirks), and by adjusting templates/CI steps to reduce environment-dependent failures.

Changes:

Harden L2 test runner utilities (bundle extraction validation, DobbyDaemon readiness probing, container launch retries, improved async log reading).
Make multiple L2 tests more CI-tolerant (bundle config sanitization, cgroupv1/v2 pid-limit path resolution, more robust bundle comparisons).
Add CI workflow step to regenerate bundles for cgroupv2 compatibility and relax LCOV handling for unused coverage artifacts.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/L2_testing/test_runner/thunder_plugin.py	Sanitizes bundle config and makes test set platform-conditional; improves handling of invalid bundles.
tests/L2_testing/test_runner/test_utils.py	Adds bundle extraction validation, nested bundle path detection, DobbyDaemon readiness probing, and more robust container launch behavior.
tests/L2_testing/test_runner/start_from_bundle.py	Uses validated bundle context and waits briefly for log flushing before asserting.
tests/L2_testing/test_runner/runner.py	Fixes test-results aggregation to only attach per-suite JSON when tests actually ran.
tests/L2_testing/test_runner/pid_limit_tests.py	Resolves `pids.max` across cgroup v1/v2 layouts, including `/proc/<pid>/cgroup`-derived paths.
tests/L2_testing/test_runner/network_tests.py	Switches to shared `launch_container` and adds invalid-bundle handling.
tests/L2_testing/test_runner/memcr_tests.py	Hardens PID parsing and optionally skips PID checkpoint validation when PIDs aren’t reported.
tests/L2_testing/test_runner/bundle_generation.py	Replaces filesystem diff with normalized `config.json` comparison and adds rootfs existence check.
tests/L2_testing/test_runner/bundle/regenerate_bundles.py	New CI helper to patch and repack test bundles for cgroupv2 compatibility.
tests/L2_testing/test_runner/basic_sanity_tests.py	Reworks async log reading with `select()` and makes daemon-stop verification less log-dependent.
tests/L2_testing/test_runner/annotation_tests.py	Starts container from spec path (vs extracted bundle) and stops container after annotation checks.
client/tool/source/Main.cpp	Removes mutex lock from stop callback before completing a promise.
bundle/lib/source/templates/OciConfigJsonVM1.0.2-dobby.template	Removes memory swappiness field from generated OCI config templates.
bundle/lib/source/templates/OciConfigJson1.0.2-dobby.template	Removes memory swappiness field from generated OCI config templates.
.github/workflows/L2-tests.yml	Adds bundle-regeneration step and relaxes LCOV errors for unused coverage.

Comments suppressed due to low confidence (1)

tests/L2_testing/test_runner/network_tests.py:121

If launch_container fails, the test still proceeds to sleep and read from netcat, and then overwrites the earlier launch-failure message with a generic "Received ... expected ..." message. This makes failures harder to diagnose and can produce misleading output. Consider short-circuiting when launch_result is false (or preserving the original error message/log) before attempting to validate the netcat payload.

        launch_result = test_utils.launch_container(container_name, bundle_path)

        message = ""
        result = True

        if not launch_result:
            message = "Container did not launch successfully"
            result = False

        # give container time to start and send message before checking netcat listener
        sleep(2)

        nc_message = nc.get_output().rstrip("\n")

        # check if netcat listener received message
        if test.expected_output.lower() not in nc_message.lower():
            message = "Received '%s' from container, expected '%s'" % (nc_message.lower(), test.expected_output.lower())
            result = False

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T06:06:03Z

+        iterations = int(exit_timeout / 0.5)
+        for i in range(0, iterations):
            sleep(0.5)


iterations = int(exit_timeout / 0.5) truncates and can become 0 for small exit_timeout values (or under-wait when exit_timeout isn’t a multiple of 0.5). That would immediately take the timeout path without any DobbyTool info checks. Consider using a ceiling calculation and/or enforcing a minimum of 1 iteration so the wait duration matches exit_timeout more closely.

Copilot · 2026-04-16T06:06:03Z

    if (state == IDobbyProxyEvents::ContainerState::Stopped && containerId == *id)
    {
        AI_LOG_INFO("Container %s has stopped", containerId.c_str());
-        std::lock_guard<std::mutex> locker(gLock);
        promise.set_value();
    }


Removing the mutex lock in the callback introduces a data race on the global promise: stopCommand reassigns promise under gLock, while this callback can call promise.set_value() concurrently from the proxy event thread. That’s undefined behavior and can crash intermittently. Restore synchronization (e.g., keep std::lock_guard<std::mutex> locker(gLock); here, or switch to a thread-safe pattern such as storing the promise in a shared_ptr captured by the callback).

Copilot · 2026-04-16T06:06:03Z

+Changes made for cgroupv2 compatibility:
+- Removes 'swappiness' from memory resources (not supported in cgroupv2)
+- Sets realtimeRuntime and realtimePeriod to valid values or removes them
+- Updates rootfsPropagation to 'slave' for better compatibility


The module docstring says the script “Updates rootfsPropagation to 'slave'”, but the implementation removes both linux.rootfsPropagation and top-level rootfsPropagation. This mismatch is confusing for anyone using the script to understand what’s being changed. Please update the docstring to reflect the actual behavior (or implement the documented change).

Suggested change

- Updates rootfsPropagation to 'slave' for better compatibility

- Removes both linux.rootfsPropagation and top-level rootfsPropagation

Copilot · 2026-04-16T06:06:04Z

-    with test_utils.dobby_daemon(), test_utils.untar_bundle(container_name) as bundle_path:
+    bundle_ctx = test_utils.untar_bundle(container_name)
+    with test_utils.dobby_daemon(), bundle_ctx as bundle_path:
+        if not bundle_ctx.valid:


bundle_ctx.valid is accessed here, but test_utils.untar_bundle currently doesn't define a valid attribute (it only untars in __init__ and returns the path in __enter__). This will raise AttributeError and fail the test before any bundle validation logic runs. Either add a valid field to untar_bundle (and set it based on tar exit status + existence of required files like config.json/rootfs) or remove these checks and handle extraction/validation failures another way.

Suggested change

if not bundle_ctx.valid:

bundle_valid = (

path.isdir(bundle_path) and

path.isfile(path.join(bundle_path, "config.json")) and

path.isdir(path.join(bundle_path, "rootfs"))

)

if not bundle_valid:

Copilot · 2026-04-16T06:06:04Z

+    bundle_ctx = test_utils.untar_bundle(container_id)
+    with bundle_ctx as bundle_path:
+        if not bundle_ctx.valid:
+            return False, "Bundle extraction or validation failed"


bundle_ctx.valid is referenced, but test_utils.untar_bundle doesn't currently provide a valid attribute. This will throw at runtime and mask the real extraction/validation failure. Consider extending untar_bundle to expose valid (and any failure reason) or remove this check and instead validate the extracted bundle path (e.g., check config.json exists) before launching.

Copilot · 2026-04-16T06:06:04Z

+    bundle_ctx = test_utils.untar_bundle(container_id)
+    with bundle_ctx as bundle_path:
+        if not bundle_ctx.valid:
+            return False, "Bundle extraction or validation failed"
+


bundle_ctx.valid is used here, but test_utils.untar_bundle has no valid attribute in its current implementation, so this will raise AttributeError. If you need to guard against missing/invalid bundles, implement that in untar_bundle (e.g., capture tar return code and verify required files) and expose it via valid/error, or do explicit file checks on bundle_path.

Copilot · 2026-04-16T06:06:04Z

+    bundle_ctx = test_utils.untar_bundle(container_name)
+    with bundle_ctx as bundle_path:
+        if not bundle_ctx.valid:
+            test = tests[0]
+            output = test_utils.create_simple_test_output(
+                test, False,
+                "Bundle extraction or validation failed",
+                "Bundle tarball could not be extracted or config.json was missing"
+            )


bundle_ctx.valid is accessed here, but test_utils.untar_bundle doesn't currently have a valid attribute. This will raise at runtime and cause this test group to error rather than report a clean bundle-extraction failure. If validation is desired, update untar_bundle to track success/failure (and maybe an error message) or validate bundle_path directly before proceeding.

Copilot · 2026-04-16T06:06:04Z

    if (state == wp->state && containerId == wp->containerId)
    {
        AI_LOG_INFO("Wait complete");
-        std::lock_guard<std::mutex> locker(gLock);
        promise.set_value();
    }


Same issue as the stop callback: without locking (or another synchronization mechanism), promise.set_value() can race with promise = std::promise<void>() in the command handler. Please reintroduce gLock protection here or use a callback-specific promise instance that can’t be concurrently reassigned.

Copilot · 2026-04-16T06:06:05Z

+
+    # Backup original
+    if backup:
+        backup_path = bundle_tarball.with_suffix('.tar.gz.bak')


backup_path = bundle_tarball.with_suffix('.tar.gz.bak') will produce an odd filename for inputs like foo.tar.gz (e.g., foo.tar.tar.gz.bak) because Path.with_suffix replaces only the last suffix. If you want foo.tar.gz.bak, build the backup name explicitly (e.g., Path(str(bundle_tarball) + '.bak') or bundle_tarball.with_name(bundle_tarball.name + '.bak')).

Suggested change

backup_path = bundle_tarball.with_suffix('.tar.gz.bak')

backup_path = bundle_tarball.with_name(bundle_tarball.name + '.bak')

Copilot · 2026-04-16T06:06:05Z

+    bundle_ctx = test_utils.untar_bundle(container_name)
+    with test_utils.dobby_daemon(), netcat_listener() as nc, bundle_ctx as bundle_path:
+        if not bundle_ctx.valid:
+            output = test_utils.create_simple_test_output(tests[0], False, "Bundle extraction or validation failed",
+                                                          log_content="Bundle extraction or validation failed; container was never launched.")
+            output_table.append(output)


bundle_ctx.valid is checked here, but test_utils.untar_bundle doesn't define a valid attribute. This will fail with AttributeError even when bundle extraction succeeds. Implement valid on the context manager (and set it based on extraction + required file existence) or replace this with explicit validation of bundle_path (e.g., check config.json exists) before starting the container.

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T08:39:15Z

+        for member in tar.getmembers():
+            member_path = (bundle_dir / member.name).resolve()
+            if not str(member_path).startswith(str(bundle_dir.resolve())):
+                raise RuntimeError(
+                    f"Tarball member '{member.name}' would escape extraction "
+                    f"directory '{bundle_dir}' — aborting for safety"
+                )
+        tar.extractall(path=bundle_dir)


The tar path-traversal check is incorrect: using str(member_path).startswith(str(bundle_dir.resolve())) can be bypassed by paths like <bundle_dir>_evil/... (prefix match) and would allow extraction outside bundle_dir. Use a real path containment check (e.g., member_path.relative_to(bundle_dir.resolve()) in a try/except, or os.path.commonpath), or extract members individually after validating each target path.

Suggested change

for member in tar.getmembers():

member_path = (bundle_dir / member.name).resolve()

if not str(member_path).startswith(str(bundle_dir.resolve())):

raise RuntimeError(

f"Tarball member '{member.name}' would escape extraction "

f"directory '{bundle_dir}' — aborting for safety"

)

tar.extractall(path=bundle_dir)

extraction_root = bundle_dir.resolve()

for member in tar.getmembers():

member_path = (bundle_dir / member.name).resolve()

try:

member_path.relative_to(extraction_root)

except ValueError:

raise RuntimeError(

f"Tarball member '{member.name}' would escape extraction "

f"directory '{bundle_dir}' — aborting for safety"

)

tar.extract(member, path=bundle_dir)

Copilot · 2026-04-16T08:39:15Z

+"""
+Script to regenerate L2 test bundles for cgroupv2 compatibility.
+
+This script:
+1. Extracts each .tar.gz bundle
+2. Patches config.json to remove cgroupv2-incompatible settings
+3. Repacks the bundle
+
+Changes made for cgroupv2 compatibility:
+- Removes 'swappiness' from memory resources (not supported in cgroupv2)
+- Sets realtimeRuntime and realtimePeriod to valid values or removes them
+- Updates rootfsPropagation to 'slave' for better compatibility
+"""
+
+import json
+import shutil
+import sys
+import tarfile
+from pathlib import Path
+
+
+def patch_config_for_cgroupv2(config: dict, bundle_name: str = "") -> dict:
+    """Patch OCI config.json for cgroupv2 compatibility."""
+
+    # Remove swappiness from memory resources (not supported in cgroupv2)
+    if 'linux' in config and 'resources' in config['linux']:
+        resources = config['linux']['resources']
+
+        if 'memory' in resources:
+            memory = resources['memory']
+            if 'swappiness' in memory:
+                del memory['swappiness']
+                print("  - Removed 'swappiness' from memory resources")
+
+        # Fix cpu realtime settings - remove null values
+        if 'cpu' in resources:
+            cpu = resources['cpu']
+            if cpu.get('realtimeRuntime') is None:
+                del cpu['realtimeRuntime']
+                print("  - Removed null 'realtimeRuntime'")
+            if cpu.get('realtimePeriod') is None:
+                del cpu['realtimePeriod']
+                print("  - Removed null 'realtimePeriod'")
+            # Remove cpu section entirely if empty
+            if not cpu:
+                del resources['cpu']
+                print("  - Removed empty 'cpu' section")
+
+    # Remove rootfsPropagation entirely - it causes "make rootfs private" errors
+    # in user namespace environments like GitHub Actions
+    if 'linux' in config and 'rootfsPropagation' in config['linux']:
+        del config['linux']['rootfsPropagation']
+        print("  - Removed linux.rootfsPropagation")
+
+    # Remove top-level rootfsPropagation as well
+    if 'rootfsPropagation' in config:
+        del config['rootfsPropagation']
+        print("  - Removed top-level rootfsPropagation")


The module docstring says the script "Updates rootfsPropagation to 'slave'", but the implementation actually removes both linux.rootfsPropagation and top-level rootfsPropagation. Please update the docstring (or the logic) so it accurately reflects what the script does.

Sonajeya31 and others added 3 commits April 15, 2026 12:33

Merge pull request #3 from rdkcentral/develop

9117a1f

Rebase to develop

L2 Dobby Fix

2e7e32d

Remove cov lock

298f0e7

Copilot AI review requested due to automatic review settings April 15, 2026 07:37

Copilot started reviewing on behalf of Sonajeya31 April 15, 2026 07:38 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread tests/L2_testing/test_runner/thunder_plugin.py

Comment thread tests/L2_testing/test_runner/test_utils.py

Comment thread tests/L2_testing/test_runner/bundle/regenerate_bundles.py

Copilot AI review requested due to automatic review settings April 16, 2026 06:01

Copilot started reviewing on behalf of Sonajeya31 April 16, 2026 06:02 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings April 16, 2026 08:32

Sonajeya31 force-pushed the topic/RDKEMW-13253-5 branch from f1bf099 to 298f0e7 Compare April 16, 2026 08:32

Copilot started reviewing on behalf of Sonajeya31 April 16, 2026 08:33 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

Swap fix

e7be695

	- Updates rootfsPropagation to 'slave' for better compatibility
	- Removes both linux.rootfsPropagation and top-level rootfsPropagation

-        if not bundle_ctx.valid:
+        bundle_valid = (
+            path.isdir(bundle_path) and
+            path.isfile(path.join(bundle_path, "config.json")) and
+            path.isdir(path.join(bundle_path, "rootfs"))
+        )
+        if not bundle_valid:

	backup_path = bundle_tarball.with_suffix('.tar.gz.bak')
	backup_path = bundle_tarball.with_name(bundle_tarball.name + '.bak')

Conversation

Sonajeya31 commented Apr 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants