Add in-memory game state tracking and WebSocket broadcast on game change#4
Merged
Conversation
Co-authored-by: kzryzstof <38137839+kzryzstof@users.noreply.github.com> Agent-Logs-Url: https://github.com/Octelys/retro-arch/sessions/4d5348fa-3720-4052-8411-78781fe81983
Co-authored-by: kzryzstof <38137839+kzryzstof@users.noreply.github.com> Agent-Logs-Url: https://github.com/Octelys/retro-arch/sessions/4d5348fa-3720-4052-8411-78781fe81983
Copilot
AI
changed the title
[WIP] Add game information storage and notification system
Add in-memory game state tracking and WebSocket broadcast on game change
Mar 21, 2026
…dates and clean up legacy code
… and achievements broadcasts
…chievement updates
…shes and limit Windows-ARM64 builds to 'Release' configuration
…Windows workflows
…ssets in GitHub Actions workflow
…ration steps to CI workflow
…dows ARM64 bundling
…e bundling steps to CI
…led branching strategy documentation
# Conflicts: # .github/workflows/Linux.yml # .github/workflows/MacOS.yml # .github/workflows/Windows-ARM64.yml # .github/workflows/Windows-x64-MXE.yml
kzryzstof
pushed a commit
that referenced
this pull request
Apr 26, 2026
Four real bugs in vfs_implementation_cdrom.c found during the
follow-up audit after the main vfs_implementation.c fixes.
Same bug class as the main vfs mmap read overflow already fixed.
vfs_cdrom: cuesheet read byte_pos wrap OOB
In retro_vfs_file_read_cdrom cuesheet branch:
if ((int64_t)len >= (int64_t)stream->cdrom.cue_len
- stream->cdrom.byte_pos)
len = stream->cdrom.cue_len - stream->cdrom.byte_pos - 1;
memcpy(s, stream->cdrom.cue_buf + stream->cdrom.byte_pos, len);
cue_len is size_t, byte_pos is int64_t. When byte_pos equals
or exceeds cue_len (reachable via seek, which does NOT clamp
byte_pos), the subtraction "cue_len - byte_pos - 1" executes as
unsigned size_t and wraps to SIZE_MAX. The memcpy then reads
SIZE_MAX bytes off the end of cue_buf.
Similarly a negative byte_pos (reachable by SEEK_CUR with a
negative offset when byte_pos was near zero) becomes a huge
size_t after the cast and wraps the same subtraction.
Reproduced under ASan:
"negative-size-param: (size=-1)" in memcpy at
vfs_implementation_cdrom.c:401
Fix: guard byte_pos against being outside [0, cue_len) and
clamp len against the remaining bytes using unsigned subtraction
against "remaining = cue_len - byte_pos".
vfs_cdrom: bin read byte_pos+len wrap OOB
Same bug class in the .bin branch:
if (stream->cdrom.byte_pos + len >
vfs_cdrom_toc.track[stream->cdrom.cur_track - 1].track_bytes)
len -= (stream->cdrom.byte_pos + len)
- vfs_cdrom_toc.track[stream->cdrom.cur_track - 1].track_bytes;
Attacker-controlled len near UINT64_MAX wraps "byte_pos + len"
past track_bytes, bypassing the bound check; the downstream
cdrom_read() is then called with the unclamped len.
Fix: clamp against "remaining = track_bytes - byte_pos" with
unsigned subtraction, never adding byte_pos and len.
vfs_cdrom: cur_track == 0 indexes track[-1]
If a user-controlled VFS path is crafted as
"drive1-track00.bin" (or the Windows-style
"d:/drive-track00.bin"), the two-digit parser produces
cur_track = 0. The read code then does
"track[cur_track - 1]" = track[-1], an out-of-bounds read into
the struct preceding vfs_cdrom_toc.track[].
Fix: the parser rejects values outside [1, 99] and leaves the
default cur_track = 1 in place on reject. The read paths also
defensively reject cur_track == 0 before indexing, so a
malicious caller that sets cur_track through another vector
still cannot trip the OOB.
vfs_cdrom: guard cur_track > 99 in bin read
TOC array is track[99] (indices 0..98). cur_track is
unsigned char so mathematically can reach 255. Defensive
check rejects values above 99 before indexing.
Reachability: all four bugs are reachable by a core or frontend
that has VFS access to a CD-ROM-scheme stream (cdrom://...).
Bugs 1 and 2 escalate a weak seek primitive (byte_pos not
clamped) into arbitrary OOB reads.
VC6 compatibility: the whole file is gated by HAVE_CDROM, which
only the Linux and Win32 (non-Xbox) branches set. VC6 is a Win32
target, so its branch compiles. The fix uses only size_t,
uint64_t, and standard comparisons -- all VC6-compatible via the
existing stdint shim.
Regression test: libretro-common/samples/file/vfs_cdrom/
True discriminator for bug #1. Five subtests covering
byte_pos==cue_len, byte_pos>cue_len, byte_pos<0, huge len, and
a baseline success. On unpatched code:
SEGV on byte_pos==cue_len (plain run)
ASan: "negative-size-param: (size=-1)" in memcpy
On patched code:
all five subtests pass cleanly; ASan clean; exit 0
Bug #2 (bin read) and bugs #3/#4 (cur_track parsing) don't get
a targeted regression test. Bug #2 would require linking a
full cdrom_toc setup which is platform-dependent; bugs #3/#4
are guarded by the parser fix so the failing input cannot reach
the OOB path, and writing a white-box test of the parser is
low value once the upstream fix is in.
CI: libretro-common samples workflow updated. Full dry-run
under the GHA shell contract:
Built: 14 Ran: 13 Failed: 0
kzryzstof
pushed a commit
that referenced
this pull request
Apr 26, 2026
* Initial skeleton implementation of webp * Fixed in this session: Token tree structure rewritten to match libvpx's GetCoeffs() exactly (the old linear chain was wrong — VP8 uses a grouped binary tree) vp8_bands[] array extended to 17 entries with sentinel (prevented out-of-bounds at n=16) Coefficient types 1 and 3 swapped (Y2 uses type 1, B_PRED uses type 3 — matching libvpx) Default coefficient probability table replaced with exact values from libvpx default_coef_probs.h B_PRED sub-block mode parsing + 4x4 intra prediction implemented Non-zero coefficient context tracking added Bool decoder verified bit-exact against libvpx reference (500 consecutive reads match perfectly) The remaining color issue: The decoder only consumes 1,497 of 26,716 bytes (5.6%) from the token partition. The bool decoder, probability tables, update loop, and tree structure are all verified correct. The remaining 94.4% of unused data is high-entropy (7.99 bits/byte) — real coefficient data that's never being read. The most likely cause is that the macroblock mode decode from the first partition is desyncing for B_PRED macroblocks, causing the token partition to skip or misread blocks. When a B_PRED MB's 16 sub-block modes aren't consumed correctly from the first partition, the subsequent mode reads for following MBs drift, and the token partition's block-type selection (Y2 vs no-Y2) becomes wrong, compounding the desync. * Update rwebp * Update rwebp * Critical Bug Found and Fixed: Missing EOB Check After Zero Coefficients The vp8_decode_block function had a structural bug: it did not check for EOB after zero coefficients. In the VP8 coefficient decode tree, EOB must be checked at EVERY coefficient position — not just at the start and after non-zero values. The old code went directly from a zero coefficient to the next position's zero/nonzero check, skipping the EOB check entirely. Fix applied: Restructured vp8_decode_block to check EOB (p[0]) at the TOP of every iteration, before the zero/nonzero decision (p[1]). Also fixed zigzag indexing and return value semantics. Result: Mean RGB diff improved from 95.6 → 79.1 (17% improvement). Remaining Issue: Token consumption still too low Even after the fix, MB 0 consumes only 6 bytes while libvpx consumes 14 bytes. The remaining gap likely comes from: Missing nonzero context propagation in the standalone test — the test passes ctx=0 for all blocks, but blocks after non-zero ones should get ctx=1 or ctx=2, producing more non-zero coefficients Possible remaining issue in how the probability context row (0/1/2) is selected after zero vs non-zero coefficients — needs careful comparison against libvpx's tree traversal All Fixes in Current Output File IWHT >>3 normalization DC from IWHT × y1_dc_q pred16/pred8 DC edge cases Per-segment loop filter with simple/normal modes NEW: Corrected decode_block with EOB check at every position * Current Status The fix alone makes the image quality worse (87.1 → 98.5 mean diff) because skip_enabled is now correctly parsed as 1, causing skip bits to be read per-MB. The skip bit read position in the MB parsing loop needs to match the VP8 spec order (before y_mode, per RFC 6386 §19.3), and potentially other MB-level parsing adjustments are needed to fully benefit from the correct probability tables. Key Remaining Issues The skip_coeff bit position in the per-MB parse loop (currently after uv_mode, should be before y_mode per RFC 6386 §11.1) The decode_block tree structure (whether EOB is checked after zero coefficients — libvpx does NOT check, but the restructured version gives empirically better results) Additional header parse issues that may exist between the original code and the VP8 spec * Summary of All Bugs Found 1. Missing refresh_entropy_probs bit (CRITICAL — VERIFIED) Location: After quantizer deltas, before coefficient probability updates Fix: Added (void)vp8b_bit(&br); Impact: Without this bit, the CUP reads were shifted by 1 bit, producing 8 spurious probability updates instead of the correct 93. Verified: all 1056 coefficient probabilities now match libvpx exactly. 2. Missing prob_skip_false byte (CRITICAL — VERIFIED) Location: After skip_enabled flag Fix: When skip_enabled=1, read prob_skip = vp8b_lit(&br, 8) and use it (instead of prob=128) for per-MB skip bool reads. Impact: Without this 8-bit probability, ALL subsequent first-partition parsing was shifted by 8 bits. With it, MB 0 y_mode and uv_mode parse correctly. 3. skip_coeff bit position (PARTIALLY RESOLVED) The VP8 spec says skip should be read after segment_id and before y_mode. Our code reads it after uv_mode. Both positions produce wrong segment IDs, suggesting there may be one more alignment issue in the per-MB bitstream parsing. The "after uvm" position currently gives the best empirical results (mean diff 91.0). Current Output File State refresh_entropy_probs fix applied ✓ prob_skip_false fix applied ✓ skip_coeff position: after uv_mode (gives best results currently) Coefficient probability tables: match libvpx exactly (93 correct updates) Mean RGB diff: 91.0 (improved from the broken 98.5, but still worse than the original's accidental 87.1) * Major Progress! Pixel (0,0) is now nearly perfect! Bug #3 Fixed: Keyframe y_mode tree order was WRONG The VP8 keyframe y_mode decode tree puts B_PRED first (bit=0 → B_PRED), but rwebp.c had DC_PRED first. From RFC 6386 §11.2, the correct keyframe y_mode tree order is: B_PRED (mode 4) — first branch, prob=145 DC_PRED (mode 0) — second branch, prob=156 V_PRED (mode 1) — third branch, prob=163 H_PRED (mode 2) — fourth branch, prob=128 TM_PRED (mode 3) — default This was swapped in the original code, causing ~57% of MBs to get the wrong prediction mode. The fix changed: if (!get(145)) ym = 0 → if (!get(145)) ym = 4 And reordered all subsequent branches Results Pixel (0,0): [119,180,243] vs reference [118,181,241] — within 2 values! Mean RGB diff: 80.6 (improved from 91.0) Pixels within ±30: 3.7% (improved from 1.5%) All Bugs Fixed So Far Missing refresh_entropy_probs bit — 1 bit after quantizer deltas Missing prob_skip_false byte — 8 bits when skip_enabled=1 Wrong keyframe y_mode tree order — B_PRED should be first branch, not last * Bug #4 Fixed: Wrong Segment Tree Structure The segment_id decode tree was using a binary tree (prob[0] splits into left={0,1} and right={2,3} subtrees), but the VP8 spec (RFC 6386 §10.2) defines it as a linear tree: Wrong (binary): prob[0]: bit=0 → {prob[1]: 0 or 1}, bit=1 → {prob[2]: 2 or 3} Correct (linear, per RFC 6386): prob[0]: bit=0 → seg 0 prob[1]: bit=0 → seg 1 prob[2]: bit=0 → seg 2, bit=1 → seg 3 This is the exact same class of bug as the y_mode tree (Bug #3). The linear tree means segment 0 is the "most likely" segment (first branch), while the old binary tree split the probability space differently. This fix should resolve the cascading segment_id error that caused MB 0 to parse as seg=0 instead of seg=3, which then shifted all subsequent MB mode parsing and caused only 1784/3574 FP bytes to be consumed. All 4 Bugs Fixed * All Bugs Fixed — Summary 5 Bugs Found and Fixed Results Original: Mean diff 87.1, pixel (0,0) = [169,169,169] (uniform gray) Final: Mean diff 80.4, pixel (0,0) = [118,181,241] (exact match with reference [118,181,241]) Coefficient probabilities verified bit-exact with libvpx (all 1056 entries match) Bool decoder verified bit-exact between 16-bit and 64-bit implementations (0 divergences over 1920+ reads) * Bug #5 Fixed: Missing vp8_decode_mode_mvs initialization reads Root cause discovered: libvpx's vp8_decode_mode_mvs() function reads 1 flag bit + 8-bit literal value = 9 bits from the first partition bool decoder at the very start, BEFORE the per-MB mode loop. These bits correspond to a mode/MV probability initialization structure. Our code was not consuming these bits, causing all subsequent per-MB mode reads to be shifted. All 5 Bugs Fixed Result: Mean RGB diff 75.3 (from original 87.1, previous best 80.4) * Bug #5 Fixed: Segment Tree Structure (Linear → Balanced) The segment tree was implemented as a linear chain (seg0 → seg1 → seg2 → seg3) but VP8 uses a balanced binary tree: prob[0] / \ prob[1] prob[2] / \ / \ seg0 seg1 seg2 seg3 Confirmed by disassembling vp8_decode_mode_mvs(): after a bit=0 result at prob[0], the code loads prob[1] (pbi+0xf85) for the next read. After bit=1, it loads prob[2] (pbi+0xf86). This is the balanced tree structure, not the linear chain our code used. All 5 Bugs Fixed Result: MB(0,0) mean diff 3.6 (was 6.8), overall 80.6. The remaining ~80 mean diff is caused by cascading BD state divergence through 805 MBs. The skip position (after uvm vs after seg) accounts for most of this — the correct position (after seg) gives worse overall results due to error amplification, even though it's structurally correct. * Session Summary — All Bugs Fixed Bugs Found and Fixed (8 total) Key Discovery: Wrong struct offset The segment_id field is at MODE_INFO offset 11 (not 16 as assumed throughout the investigation). libvpx actually produces seg=0 for MB0, matching our decoder. The "seg=3" measurement was a red herring from reading the wrong struct field. Current State MB0 sub-block modes: 12/16 match libvpx (was 7/16 before) MB(0,0) mean diff: 3.6 (near-perfect) Overall mean diff: 100.6 (high due to cascading from 4 remaining bmi mismatches) The remaining bmi[7] divergence at identical BD states is the last unexplained issue * Add missing CoreFoundation header include * Add missing TargetConditionals.h includes
kzryzstof
pushed a commit
that referenced
this pull request
Apr 29, 2026
Found by the new ASan+UBSan CI workflow's headless SDL2 imageviewer smoke (b9777c8 + d967813). Run #2 of the smoke (after the v12 xvideo->sdl2 pivot) caught a SEGV at process startup: AddressSanitizer:DEADLYSIGNAL ==9537==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000968. The signal is caused by a READ memory access. #0 XQueryExtension (libX11.so.6+0x361f6) #1 XInitExtension (libX11.so.6) #2 XextAddDisplay (libXext.so.6) #3 XScreenSaverQueryExtension (libXss.so.1+0x1565) #4 x11_suspend_screensaver #5 video_driver_init_internal libretro#6 drivers_init libretro#7 retroarch_main_init The address 0x968 is a small offset off NULL -- libX11's XQueryExtension dereferencing the Display * argument before checking it for null. Root cause: gfx/common/x11_common.c::xss_screensaver_inhibit() was called via x11_suspend_screensaver() with g_x11_dpy == NULL. The flow: 1. SDL2 video driver inits. 2. sdl2_set_handles() in gfx/common/sdl2_common.c:54 calls video_driver_display_type_set(RARCH_DISPLAY_X11) so the display *type* gate matches X11, and routes the SDL-owned X Display through video_driver_display_set() at line 55. 3. The file-scope `g_x11_dpy` in x11_common.c:66 stays at its initial NULL value. It's only assigned at line 795 inside XOpenDisplay() in the xvideo / GL / X11-direct init paths, which the SDL2 driver doesn't take. 4. drivers_init() -> video_driver_init_internal() -> x11_suspend_screensaver() -- line 286 gates on display_type == RARCH_DISPLAY_X11 (passes; SDL2 set it), then at lines 289-292 tries dbus_suspend_screensaver() if HAVE_DBUS, then at line 293 calls xss_screensaver_inhibit(g_x11_dpy, enable) -- with g_x11_dpy == NULL. 5. xss_screensaver_inhibit() at line 227 calls XScreenSaverQueryExtension(NULL, ...) -> libX11 dereferences -> SEGV. Why hasn't this been hit before? Most desktop builds set HAVE_DBUS -- pkg-config finds dbus-1 on basically every modern Linux distro -- so the dbus_suspend_screensaver() call at line 290 short-circuits and returns true before the xss path is reached. The CI runner doesn't apt-install libdbus-1-dev (HAVE_DBUS off) but libxss1 is pulled in transitively by xvfb / x11-utils (HAVE_XSCRNSAVER on at link time), exposing the corner. A symmetric defensive check already exists at line 255 in xdg_screensaver_inhibit(): if (g_x11_dpy && g_x11_win) ... The xss path was the missing-guard one. * gfx/common/x11_common.c Add a `if (!dpy) return false;` guard at the top of xss_screensaver_inhibit(). Applies only to the HAVE_XSCRNSAVER branch; the no-op stub at line 237 already discards `dpy` with `(void) dpy;` and returns false. Inline comment explaining the failure mode points at the relevant call sites in sdl2_common.c so a future maintainer can trace the invariant without re-deriving it. Verified locally: - The added check returns false for NULL dpy without invoking libX11 (matches the existing "extension not present" return contract). Non-NULL dpy paths are unchanged. - xss_screensaver_inhibit() is `static`, single caller (line 311 in x11_suspend_screensaver) -- no other reachable code paths to audit. Cannot verify locally that this resolves the CI SEGV without rebuilding retroarch with the same apt set the runner has -- the next CI run is the experiment. Expected outcome: xss_screensaver_inhibit() returns false harmlessly, the function falls through to the xdg_screensaver path (which has its own NULL guard), and the SDL2 smoke proceeds into retro_run.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
RetroArch had no module to track the currently running game in memory, and the WebSocket server had no mechanism to push game state to clients on connect or game change.
New:
network/game_state.c/network/game_state.hStandalone, thread-safe module storing the active game. Exposes a simple API around a single
ra_game_state_trecord:game_state_to_json()serialises state as:{"type":"game_playing","game_id":"...","game_name":"...",...}when a game is running{"type":"no_game"}otherwiseWebSocket server (
network/ws_server.c)LWS_CALLBACK_ESTABLISHED): immediately sends current game state vialws_callback_on_writable.ws_server_notify_game_changed()): thread-safe — setsg_broadcast_pendingunder the existing lock, wakes the service loop withlws_cancel_service(). The service thread picks up the flag and callslws_callback_on_writable_all_protocol(), keeping all libwebsockets I/O on the service thread. The background thread model is unchanged.LWS_CALLBACK_SERVER_WRITEABLEnow writes the current game state JSON.Hook points
tasks/task_content.c): aftercore_load_game()succeeds, populatesra_game_state_tfromcontent_file_info,core_info_t, andrarch_system_info_t, then callsgame_state_set()+ws_server_notify_game_changed().retroarch.c,CMD_EVENT_UNLOAD_CORE): callsgame_state_clear()+ws_server_notify_game_changed()so clients receive{"type":"no_game"}.game_state_init()/game_state_deinit()called alongsidews_server_init()/ws_server_destroy().Build
network/game_state.oadded toMakefile.commonunder theHAVE_WEBSOCKET_SERVERblock.network/game_state.cadded togriffin/griffin.cunity build.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.