Skip to content

RDKEMW-13253: L2 Dobby Fix#432

Open
Sonajeya31 wants to merge 1 commit intordkcentral:developfrom
Sonajeya31:topic/RDKEMW-13253-4
Open

RDKEMW-13253: L2 Dobby Fix#432
Sonajeya31 wants to merge 1 commit intordkcentral:developfrom
Sonajeya31:topic/RDKEMW-13253-4

Conversation

@Sonajeya31
Copy link
Copy Markdown

Fix L2 test failures on GitHub Actions CI (cgroupv2 compatibility and GCOV coverage issues)

This PR addresses multiple cascading failures in the L2 test suite when running on GitHub Actions runners which use cgroupv2 (unified hierarchy) and have coverage instrumentation enabled.
Root Causes Fixed:

cgroupv2 Compatibility - The codebase assumed cgroupv1 (separate controller mounts). GitHub Actions uses cgroupv2 where all controllers share a single mount point. Updated DobbyEnv.cpp, IonMemoryPlugin.cpp, GpuPlugin.cpp, and DobbyTemplate.cpp to detect and handle cgroupv2. Also added runtime patching in test_utils.py to remove unsupported settings (swappiness, kernel memory, RT scheduling) from bundle configs.

GCOV Coverage Write Failures - Instrumented binaries (DobbyPluginLauncher) couldn't write .gcda files to /home/runner/work, causing hooks to exit with code 1 and breaking all container operations. Fixed by setting GCOV_PREFIX=/tmp/gcov at workflow level and passing GCOV environment to DobbyDaemon via sudo -E.

Network Setup Issues - DNS resolution and NAT not working in containers because eth0 was hardcoded but doesn't exist on GitHub runners, and systemd stub resolver doesn't work well in containers. Fixed by detecting default-route interface dynamically and forcing reliable DNS servers (1.1.1.1, 8.8.8.8).

OCI Hook Config Path - createContainer hooks run in container namespace and couldn't access host paths for config.json. Fixed by mounting config.json to /tmp/dobby_config.json inside container and adding fallback path resolution in DobbyPluginLauncher.

Invalid Interface Index in Networking - Netlink code was attempting to set addresses on interface index 0 (invalid), causing network setup failures. Added validation to reject invalid interface indices before setting addresses.

Test Procedure

Push PR to trigger GitHub Actions L2 test workflow
Verify all test groups pass:
basic_sanity_tests
container_manipulations
bundle_generation
plugin_launcher
command_line_containers
annotation_tests
start_from_bundle
thunder_plugin
network_tests
pid_limit_tests
memcr_tests

Type of Change

  • [Bug fix (non-breaking change which fixes an issue)

Requires Bitbake Recipe changes?

  • No - This PR contains only code fixes and CI workflow changes. No new dependencies, build options, or installed files are added.

Copilot AI review requested due to automatic review settings April 15, 2026 06:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes GitHub Actions L2 test failures by improving cgroupv2/CI compatibility in the L2 test runner, normalizing bundle comparisons, and updating OCI templates/workflow to avoid unsupported settings and flaky startup races.

Changes:

  • Hardened L2 test runner bundle handling (validation, cleanup) and reduced CI flakiness (daemon readiness wait, retry logic).
  • Updated several L2 tests to better tolerate CI/kernel differences (pause/resume gating, pid/memcr robustness, network launch path).
  • Added CI-time regeneration of bundled test tarballs for cgroupv2 compatibility; removed swappiness from OCI templates; improved coverage generation robustness.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/L2_testing/test_runner/thunder_plugin.py Sanitizes thunder bundle config and gates pause/resume tests on platform; handles invalid bundle extraction.
tests/L2_testing/test_runner/test_utils.py Adds bundle extraction validation/nested config detection; adds daemon readiness probing; improves shutdown and container launch robustness.
tests/L2_testing/test_runner/start_from_bundle.py Uses validated bundle context + launch_container, adds small delay to allow log flush.
tests/L2_testing/test_runner/runner.py Only reads per-group results when tests actually ran (total > 0), fixes suite indexing.
tests/L2_testing/test_runner/pid_limit_tests.py Makes pid limit validation work across cgroup v1/v2 layouts (including /proc-based resolution).
tests/L2_testing/test_runner/network_tests.py Uses shared launch_container and fails gracefully when bundle extraction fails.
tests/L2_testing/test_runner/memcr_tests.py Makes pid handling more robust; adds optional waiting for pids and safer dump-dir handling.
tests/L2_testing/test_runner/bundle_generation.py Normalizes config.json differences before comparison; validates generated rootfs existence; handles invalid bundle extraction.
tests/L2_testing/test_runner/bundle/regenerate_bundles.py New script to patch/repacks L2 bundle tarballs for cgroupv2/CI compatibility.
tests/L2_testing/test_runner/basic_sanity_tests.py Replaces multiprocessing-based log wait with select()/timeout; changes stop verification to process absence; stronger kill.
tests/L2_testing/test_runner/annotation_tests.py Launches container from spec path and ensures container is stopped after the test.
bundle/lib/source/templates/OciConfigJsonVM1.0.2-dobby.template Removes swappiness from generated OCI config template.
bundle/lib/source/templates/OciConfigJson1.0.2-dobby.template Removes swappiness from generated OCI config template.
.github/workflows/L2-tests.yml Runs bundle regeneration step in CI; makes lcov more tolerant (--ignore-errors unused).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +86 to +89
"""Returns the bundle path when valid, or None when extraction/validation
failed. Callers must check .valid (or the returned path) before use."""
if not self.valid:
return None
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

untar_bundle.enter now returns None when extraction/validation fails. Some callers in this repo still use the older pattern with untar_bundle(...) as bundle_path: without checking .valid (e.g. tests/L2_testing/test_runner/plugin_launcher.py:36 and memcr_tests.py:182/237), which will crash with a TypeError when building command args (None in argv) instead of reporting a clean test failure. Consider either updating the remaining call sites to the new bundle_ctx = ...; with bundle_ctx as bundle_path: if not bundle_ctx.valid: ... pattern, or changing enter to raise a clear exception / return a consistently typed path value to preserve backward compatibility.

Suggested change
"""Returns the bundle path when valid, or None when extraction/validation
failed. Callers must check .valid (or the returned path) before use."""
if not self.valid:
return None
"""Returns the bundle path when valid; raises when extraction/validation failed."""
if not self.valid:
raise RuntimeError("Bundle extraction/validation failed for container '%s' at '%s'"
% (self.container_id, self.extract_root))

Copilot uses AI. Check for mistakes.

# Backup original
if backup:
backup_path = bundle_tarball.with_suffix('.tar.gz.bak')
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backup_path is built with bundle_tarball.with_suffix('.tar.gz.bak'). For inputs like *_bundle.tar.gz, Path.with_suffix only replaces the final suffix (.gz), producing names like sleepy_bundle.tar.tar.gz.bak. If the intent is to create sleepy_bundle.tar.gz.bak, build the backup path by appending .bak to the full filename (e.g. bundle_tarball.with_name(bundle_tarball.name + '.bak')).

Suggested change
backup_path = bundle_tarball.with_suffix('.tar.gz.bak')
backup_path = bundle_tarball.with_name(bundle_tarball.name + '.bak')

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +13
- Sets realtimeRuntime and realtimePeriod to valid values or removes them
- Updates rootfsPropagation to 'slave' for better compatibility
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring claims the script "Updates rootfsPropagation to 'slave'" and "Sets realtimeRuntime and realtimePeriod to valid values", but the implementation currently removes rootfsPropagation and only removes realtime fields when they are null. Please update the docstring to match actual behavior (or adjust the patch logic to match the documented intent).

Suggested change
- Sets realtimeRuntime and realtimePeriod to valid values or removes them
- Updates rootfsPropagation to 'slave' for better compatibility
- Removes null 'realtimeRuntime' and 'realtimePeriod' values from CPU resources
- Removes 'rootfsPropagation' settings for better compatibility

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +123
# Extract (with path-traversal protection)
print(f" Extracting...")
with tarfile.open(bundle_tarball, 'r:gz') as tar:
# Reject members that escape the target directory via absolute paths
# or '..' components to prevent path-traversal attacks.
for member in tar.getmembers():
member_path = (bundle_dir / member.name).resolve()
if not str(member_path).startswith(str(bundle_dir.resolve())):
raise RuntimeError(
f"Tarball member '{member.name}' would escape extraction "
f"directory '{bundle_dir}' — aborting for safety"
)
tar.extractall(path=bundle_dir)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tar extraction hardening only checks the resolved member path, but still calls tar.extractall() unfiltered. This doesn't protect against symlink/hardlink tricks (e.g., a symlink member inside the bundle pointing outside the extraction dir, followed by a regular file written through it). Since this runs in CI on PR content, consider rejecting symlink/hardlink members (member.issym()/islnk()) and/or using tar.extractall(..., filter='data') on Python 3.12+ to avoid writing links and special files.

Suggested change
# Extract (with path-traversal protection)
print(f" Extracting...")
with tarfile.open(bundle_tarball, 'r:gz') as tar:
# Reject members that escape the target directory via absolute paths
# or '..' components to prevent path-traversal attacks.
for member in tar.getmembers():
member_path = (bundle_dir / member.name).resolve()
if not str(member_path).startswith(str(bundle_dir.resolve())):
raise RuntimeError(
f"Tarball member '{member.name}' would escape extraction "
f"directory '{bundle_dir}' — aborting for safety"
)
tar.extractall(path=bundle_dir)
# Extract (with path-traversal and link/special-file protection)
print(f" Extracting...")
with tarfile.open(bundle_tarball, 'r:gz') as tar:
# Only allow regular files and directories that stay within the
# target directory. Reject links and special files to prevent
# symlink/hardlink escapes and unsafe filesystem side effects.
safe_members = []
bundle_dir_resolved = bundle_dir.resolve()
for member in tar.getmembers():
if member.issym() or member.islnk():
raise RuntimeError(
f"Tarball member '{member.name}' is a symlink/hardlink "
f"and is not allowed — aborting for safety"
)
if not (member.isdir() or member.isfile()):
raise RuntimeError(
f"Tarball member '{member.name}' is not a regular file "
f"or directory and is not allowed — aborting for safety"
)
member_path = (bundle_dir / member.name).resolve()
if not str(member_path).startswith(str(bundle_dir_resolved)):
raise RuntimeError(
f"Tarball member '{member.name}' would escape extraction "
f"directory '{bundle_dir}' — aborting for safety"
)
safe_members.append(member)
if sys.version_info >= (3, 12):
tar.extractall(path=bundle_dir, members=safe_members, filter='data')
else:
tar.extractall(path=bundle_dir, members=safe_members)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants