Skip to content

fix(screenscraper): improve API handling, KO scrape data, and metadata sanitization#3384

Open
Spinnich wants to merge 1 commit into
rommapp:masterfrom
Spinnich:pr/screenscraper-api-and-redaction
Open

fix(screenscraper): improve API handling, KO scrape data, and metadata sanitization#3384
Spinnich wants to merge 1 commit into
rommapp:masterfrom
Spinnich:pr/screenscraper-api-and-redaction

Conversation

@Spinnich
Copy link
Copy Markdown
Contributor

Description
Explain the changes or enhancements you are proposing with this pull request.

Several related fixes to RomM's ScreenScraper integration. Grouped here because they all touch the same request/response path and the same test file, but kept as one focused PR (no migration / no schema changes).

1. KO scrapes now submit useful data

Previously, hash-based lookups against jeuInfos.php only sent crc/md5/sha1 (and systemeid / romtaille). When ScreenScraper couldn't match the hash, the resulting KO scrape entry on their side had no filename or ROM type recorded — so it was effectively useless for SS to build their database from.

ScreenScraperService.get_game_info already accepted rom_name and rom_type parameters, they were just never being passed. This PR wires them up at the call site:

  • romnom is now sent as the actual filename of the largest ROM file in the rom_files set.
  • romtype is computed by a new _get_rom_type() helper that maps file extension to ScreenScraper's expected values:
    • dossier — non-top-level files (inside a folder)
    • iso.iso, .cue, .chd, .gdi, .cdi, .bin
    • rom — everything else

Net effect: when a ROM doesn't match SS's database, the KO entry SS records now contains enough information to be actionable for them.

2. SS-specific HTTP error code handling

ScreenScraper uses non-standard HTTP status codes and returns 401 in situations that aren't actually auth failures. The error handler is updated to:

  • 426 → raise 403 "ScreenScraper has blacklisted this application version. Please update RomM."
  • 430 → raise 429 "ScreenScraper daily scrape quota exhausted. Try again tomorrow."
  • 431 → raise 429 "ScreenScraper daily unrecognized-ROM quota exhausted. Try again tomorrow."
  • 423 → raise 503 "ScreenScraper API is currently offline."
  • 401 → SS quirk: returned when their server CPU is >60%, not when credentials are bad. Now logs a warning and returns an empty dict instead of treating it as an auth error.
  • 429 → log warning and retry after 2s (was previously silent).

The same error mapping is applied in both the generic _request path and the get_game_info-specific exception handler.

3. ZZZ(NOTGAME) filtering

ScreenScraper marks non-game entries (BIOS files, demos, etc.) either via a notgame: "true" field or by prefixing the name with ZZZ(NOTGAME). A new _is_notgame() helper checks both, and these entries are now filtered out from:

  • Search results in _search_rom()
  • Hash-lookup results in lookup_rom()
  • Name-based matching in get_matched_roms_by_name()

Without this, a hash collision with a BIOS file or similar would surface a non-game match as if it were the user's ROM.

4. HTML entity sanitization in metadata text

SS returns &, &, ',  , ", © literally in names and summaries. A _decode_html_entities() helper now decodes the common ones before metadata is stored, so titles like "Donkey Kong & Diddy's Kong Quest" render as "Donkey Kong & Diddy's Kong Quest".

5. Media URLs stored as-is

The previous code stripped ssid / sspassword from media URLs returned by SS before storing them. ScreenScraper's media endpoints require those credentials to fetch the asset, so storing the stripped URL broke downstream media playback. Media URLs are now stored exactly as SS returned them. strip_sensitive_query_params is removed from this code path entirely.

6. Log credential redaction

To prevent the now-preserved credentials from leaking into log output, the logger formatter gets a redaction pass that scrubs ssid / sspassword / devid / devpassword query parameters from any URL it sees, regardless of which subsystem emitted the log line.

7. Misc small fixes

  • get_rom() now early-returns if the cleaned-up search term is empty (avoids spurious SS calls for filenames that are entirely tags).

Test coverage

  • tests/adapters/services/test_screenscraper.py — new tests for each of the SS-specific HTTP error codes (KO/426/430/431/423/401-CPU-throttle), plus the 429 retry path.
  • tests/handler/metadata/test_ss_handler.py — new file. Covers _is_notgame, _get_rom_type, _decode_html_entities, the notgame filtering in search/hash/name paths, the empty-search early-return, and the as-is media URL storage.

Note

This PR is one half of an earlier combined PR that was split per maintainer feedback. The companion PR (CHD raw hashing) touches a disjoint set of files and can be reviewed in parallel.

Note

AI assistance disclosure: This PR was developed with assistance from Claude (Anthropic). Claude contributed to authoring portions of the original code, and was used to split the original combined branch into smaller per-feature PRs, draft this description, and verify tests / lint / migrations locally. All code was reviewed and is endorsed by me before submission.

Checklist
Please check all that apply.

  • I've tested the changes locally
  • I've updated relevant comments
  • I've assigned reviewers for this PR
  • I've added unit tests that cover the changes

Screenshots (if applicable)

N/A — backend-only changes.

Fix several issues in ScreenScraper API request/response handling:

- Correctly handle SS-specific HTTP error codes (KO responses, 429, 431,
  and the SS-quirk of returning 401 when server CPU >60%).
- Construct requests with proper parameter encoding so jeuInfos lookups
  and search queries return the expected results.
- Store media URLs returned by SS as-is, preserving the dev credential
  query parameters required for media playback. Removing them broke
  downstream media fetches.

To keep dev credentials out of log output, add a redacting formatter in
the logger pipeline that scrubs ssid/sspassword/devid/devpassword query
parameters from any URL it sees.

Test coverage added for the new HTTP error paths and the as-is URL
storage behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant