fix(screenscraper): improve API handling, KO scrape data, and metadata sanitization#3384
Open
Spinnich wants to merge 1 commit into
Open
fix(screenscraper): improve API handling, KO scrape data, and metadata sanitization#3384Spinnich wants to merge 1 commit into
Spinnich wants to merge 1 commit into
Conversation
Fix several issues in ScreenScraper API request/response handling: - Correctly handle SS-specific HTTP error codes (KO responses, 429, 431, and the SS-quirk of returning 401 when server CPU >60%). - Construct requests with proper parameter encoding so jeuInfos lookups and search queries return the expected results. - Store media URLs returned by SS as-is, preserving the dev credential query parameters required for media playback. Removing them broke downstream media fetches. To keep dev credentials out of log output, add a redacting formatter in the logger pipeline that scrubs ssid/sspassword/devid/devpassword query parameters from any URL it sees. Test coverage added for the new HTTP error paths and the as-is URL storage behaviour.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Explain the changes or enhancements you are proposing with this pull request.
Several related fixes to RomM's ScreenScraper integration. Grouped here because they all touch the same request/response path and the same test file, but kept as one focused PR (no migration / no schema changes).
1. KO scrapes now submit useful data
Previously, hash-based lookups against
jeuInfos.phponly sentcrc/md5/sha1(andsystemeid/romtaille). When ScreenScraper couldn't match the hash, the resulting KO scrape entry on their side had no filename or ROM type recorded — so it was effectively useless for SS to build their database from.ScreenScraperService.get_game_infoalready acceptedrom_nameandrom_typeparameters, they were just never being passed. This PR wires them up at the call site:romnomis now sent as the actual filename of the largest ROM file in the rom_files set.romtypeis computed by a new_get_rom_type()helper that maps file extension to ScreenScraper's expected values:dossier— non-top-level files (inside a folder)iso—.iso,.cue,.chd,.gdi,.cdi,.binrom— everything elseNet effect: when a ROM doesn't match SS's database, the KO entry SS records now contains enough information to be actionable for them.
2. SS-specific HTTP error code handling
ScreenScraper uses non-standard HTTP status codes and returns 401 in situations that aren't actually auth failures. The error handler is updated to:
426→ raise 403 "ScreenScraper has blacklisted this application version. Please update RomM."430→ raise 429 "ScreenScraper daily scrape quota exhausted. Try again tomorrow."431→ raise 429 "ScreenScraper daily unrecognized-ROM quota exhausted. Try again tomorrow."423→ raise 503 "ScreenScraper API is currently offline."401→ SS quirk: returned when their server CPU is >60%, not when credentials are bad. Now logs a warning and returns an empty dict instead of treating it as an auth error.429→ log warning and retry after 2s (was previously silent).The same error mapping is applied in both the generic
_requestpath and theget_game_info-specific exception handler.3. ZZZ(NOTGAME) filtering
ScreenScraper marks non-game entries (BIOS files, demos, etc.) either via a
notgame: "true"field or by prefixing the name withZZZ(NOTGAME). A new_is_notgame()helper checks both, and these entries are now filtered out from:_search_rom()lookup_rom()get_matched_roms_by_name()Without this, a hash collision with a BIOS file or similar would surface a non-game match as if it were the user's ROM.
4. HTML entity sanitization in metadata text
SS returns
&,&,', ,",©literally in names and summaries. A_decode_html_entities()helper now decodes the common ones before metadata is stored, so titles like"Donkey Kong & Diddy's Kong Quest"render as"Donkey Kong & Diddy's Kong Quest".5. Media URLs stored as-is
The previous code stripped
ssid/sspasswordfrom media URLs returned by SS before storing them. ScreenScraper's media endpoints require those credentials to fetch the asset, so storing the stripped URL broke downstream media playback. Media URLs are now stored exactly as SS returned them.strip_sensitive_query_paramsis removed from this code path entirely.6. Log credential redaction
To prevent the now-preserved credentials from leaking into log output, the logger formatter gets a redaction pass that scrubs
ssid/sspassword/devid/devpasswordquery parameters from any URL it sees, regardless of which subsystem emitted the log line.7. Misc small fixes
get_rom()now early-returns if the cleaned-up search term is empty (avoids spurious SS calls for filenames that are entirely tags).Test coverage
tests/adapters/services/test_screenscraper.py— new tests for each of the SS-specific HTTP error codes (KO/426/430/431/423/401-CPU-throttle), plus the 429 retry path.tests/handler/metadata/test_ss_handler.py— new file. Covers_is_notgame,_get_rom_type,_decode_html_entities, the notgame filtering in search/hash/name paths, the empty-search early-return, and the as-is media URL storage.Note
This PR is one half of an earlier combined PR that was split per maintainer feedback. The companion PR (CHD raw hashing) touches a disjoint set of files and can be reviewed in parallel.
Note
AI assistance disclosure: This PR was developed with assistance from Claude (Anthropic). Claude contributed to authoring portions of the original code, and was used to split the original combined branch into smaller per-feature PRs, draft this description, and verify tests / lint / migrations locally. All code was reviewed and is endorsed by me before submission.
Checklist
Please check all that apply.
Screenshots (if applicable)
N/A — backend-only changes.