Base swtich#1520
Base swtich#1520jarlungoodoo73 wants to merge 2 commits intomicrosoft:dependabot/github_actions/actions/setup-python-6from
Conversation
|
@jarlungoodoo73 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
There was a problem hiding this comment.
Pull request overview
This pull request addresses CVE-2025-11849, a security vulnerability related to r:link resources in DOCX files. The changes upgrade the mammoth library dependency and remove a previous monkey-patch workaround that disabled r:link processing.
Key changes:
- Upgrades mammoth dependency from version 1.10.0 to 1.11.0 to leverage native security fixes
- Removes the monkey-patch code that previously disabled r:link resource processing in DOCX files
- Adds a new test case to verify that r:link resources are not embedded in output, preventing potential path traversal attacks
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
packages/markitdown/tests/test_module_misc.py |
Adds test_doc_rlink() to verify CVE-2025-11849 fix; comments out YouTube test |
packages/markitdown/tests/test_files/rlink.docx |
New test fixture containing DOCX with r:link resource for security testing |
packages/markitdown/src/markitdown/converters/_docx_converter.py |
Removes monkey-patch that disabled r:link processing, relying on library fix instead |
packages/markitdown/pyproject.toml |
Updates mammoth dependency from ~1.10.0 to ~1.11.0 in both optional dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -15,13 +15,6 @@ | |||
| _dependency_exc_info = None | |||
| try: | |||
| import mammoth | |||
There was a problem hiding this comment.
Removing the monkey-patch that disabled r:link processing appears to be intentional for addressing CVE-2025-11849, presumably because mammoth 1.11.0 handles this securely. However, since version 1.11.0 doesn't exist yet, this change may introduce a security vulnerability by re-enabling r:link processing with mammoth 1.10.0. The code changes should be synchronized with the actual availability of the secure mammoth version.
| rlink_file_path = os.path.join(rlink_tmp_dir, "test_rlink.txt") | ||
| rlink_content = "de658225-569e-4e3d-9ed2-cfb6abf927fc" | ||
| b64_prefix = ( | ||
| "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjk" # base64 prefix of rlink_content |
There was a problem hiding this comment.
The comment "# base64 prefix of rlink_content" is misleading. The value "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjk" is truncated and not a complete base64 encoding. The full base64 encoding of "de658225-569e-4e3d-9ed2-cfb6abf927fc" would be "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjkyN2Zj". Consider either using the complete base64 string or clarifying in the comment that this is a partial/prefix match.
| "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjk" # base64 prefix of rlink_content | |
| "ZGU2NTgyMjUtNTY5ZS00ZTNkLTllZDItY2ZiNmFiZjkyN2Zj" # base64 encoding of rlink_content |
| finally: | ||
| os.remove(rlink_file_path) |
There was a problem hiding this comment.
The test unconditionally deletes the rlink file in the finally block, even if the file existed before the test ran. This could delete user data if the file was already present. Consider only deleting the file if the test created it (when os.path.exists returned False initially).
| if os.path.exists(rlink_file_path): | ||
| with open(rlink_file_path, "r", encoding="utf-8") as f: | ||
| existing_content = f.read() | ||
| if existing_content != rlink_content: | ||
| raise ValueError( | ||
| f"Existing {rlink_file_path} content does not match expected content." | ||
| ) | ||
| else: | ||
| with open(rlink_file_path, "w", encoding="utf-8") as f: | ||
| f.write(rlink_content) | ||
|
|
||
| try: |
There was a problem hiding this comment.
The test skips when /tmp doesn't exist but doesn't clean up the created file if the test fails between file creation (line 321) and the finally block. If an assertion fails or an exception is raised, the file will be left behind. Consider wrapping the file creation in the try block or using a proper temporary directory with proper cleanup.
| if os.path.exists(rlink_file_path): | |
| with open(rlink_file_path, "r", encoding="utf-8") as f: | |
| existing_content = f.read() | |
| if existing_content != rlink_content: | |
| raise ValueError( | |
| f"Existing {rlink_file_path} content does not match expected content." | |
| ) | |
| else: | |
| with open(rlink_file_path, "w", encoding="utf-8") as f: | |
| f.write(rlink_content) | |
| try: | |
| try: | |
| if os.path.exists(rlink_file_path): | |
| with open(rlink_file_path, "r", encoding="utf-8") as f: | |
| existing_content = f.read() | |
| if existing_content != rlink_content: | |
| raise ValueError( | |
| f"Existing {rlink_file_path} content does not match expected content." | |
| ) | |
| else: | |
| with open(rlink_file_path, "w", encoding="utf-8") as f: | |
| f.write(rlink_content) |
| return | ||
|
|
There was a problem hiding this comment.
The early return statement after pytest.skip is unnecessary. pytest.skip raises an exception that prevents further execution, so the return on line 304 will never be reached.
| return |
| # result = markitdown.convert(YOUTUBE_TEST_URL) | ||
| # for test_string in YOUTUBE_TEST_STRINGS: | ||
| # assert test_string in result.text_content | ||
|
|
||
|
|
There was a problem hiding this comment.
This comment appears to contain commented-out code.
| # result = markitdown.convert(YOUTUBE_TEST_URL) | |
| # for test_string in YOUTUBE_TEST_STRINGS: | |
| # assert test_string in result.text_content |
Git, ideas committs