Skip to content

Conversation

@Lettuceleaves
Copy link
Contributor

PR: Fix Temporary File Leak in HTMLParser's Download Link Handling

Description

This PR addresses a critical temporary file leak in the _handle_download_link method of HTMLParser. When downloading remote resources (PDF, Markdown, etc.) for delegated parsing, temporary files were not reliably deleted if an exception occurred during parsing or if the function exited early via the HTML branch. The fix implements a robust try...finally block to guarantee file cleanup under all execution paths.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

  • Sentinel Variable: Introduced temp_path = None to track whether a temporary file was successfully created.
  • Unified Cleanup: Consolidated all file deletion logic into a single finally block, removing ad-hoc unlink calls in success branches.
  • Robust Exception Handling: Wrapped the cleanup logic in a silent try...except to ensure that errors during deletion (e.g., file system locks) do not interfere with the returned results.
  • Early Return Safety: Ensured that the html branch return path still triggers the finally block for cleanup.

Testing

A manual verification script was used to simulate failures:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows (Verified on local dev environment)

Test Scenario: Forced a ValueError by passing an unsupported file type after download.
Result: Verified that the temporary file was physically removed from the system's Temp directory despite the exception.

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas (Code is simplified to standard Python patterns)
  • I have made corresponding changes to the documentation (This PR description)
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

N/A

Additional Notes

The fix adheres to the KISS (Keep It Simple, Stupid) principle by using standard Python resource management instead of complex custom cleanup logic.

@CLAassistant
Copy link

CLAassistant commented Feb 7, 2026

CLA assistant check
All committers have signed the CLA.

@kkkwjx07 kkkwjx07 merged commit 81df7e3 into volcengine:main Feb 8, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants