Skip to content

bug(transcribe): path traversal via yt-dlp video_id in download path #291

@egouilliard-leyton

Description

@egouilliard-leyton

Context

Discovered during review of epic #271 (graphify parity closeout).

Description

In codegraph/transcribe.py:222-237, the yt-dlp outtmpl uses %(id)s which is trusted from the remote source. A crafted video ID containing ../ sequences could write files outside output_dir. While yt-dlp itself may sanitize, the subsequent path construction at line 237 (output_dir / f"{video_id}.wav") directly interpolates the raw info["id"] without sanitization.

Suggested approach

Sanitize video_id by stripping path separators and .. components before using it in any path construction: video_id = re.sub(r'[/\\]|\.\.', '_', info["id"]). Alternatively, resolve the final path and assert it's still under output_dir.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions