Context
Discovered during review of epic #271 (graphify parity closeout).
Description
In codegraph/transcribe.py:222-237, the yt-dlp outtmpl uses %(id)s which is trusted from the remote source. A crafted video ID containing ../ sequences could write files outside output_dir. While yt-dlp itself may sanitize, the subsequent path construction at line 237 (output_dir / f"{video_id}.wav") directly interpolates the raw info["id"] without sanitization.
Suggested approach
Sanitize video_id by stripping path separators and .. components before using it in any path construction: video_id = re.sub(r'[/\\]|\.\.', '_', info["id"]). Alternatively, resolve the final path and assert it's still under output_dir.
Context
Discovered during review of epic #271 (graphify parity closeout).
Description
In
codegraph/transcribe.py:222-237, the yt-dlpouttmpluses%(id)swhich is trusted from the remote source. A crafted video ID containing../sequences could write files outsideoutput_dir. While yt-dlp itself may sanitize, the subsequent path construction at line 237 (output_dir / f"{video_id}.wav") directly interpolates the rawinfo["id"]without sanitization.Suggested approach
Sanitize
video_idby stripping path separators and..components before using it in any path construction:video_id = re.sub(r'[/\\]|\.\.', '_', info["id"]). Alternatively, resolve the final path and assert it's still underoutput_dir.