-
Notifications
You must be signed in to change notification settings - Fork 1
embed odt fixture inline #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
embed odt fixture inline #23
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR embeds the ODT regression sample as an inline base64 constant and replaces the binary fixture with a temporary file materialization during tests. It also includes significant updates to add support for many new document formats and improves the codebase architecture.
Key Changes
- Removed binary
.odtfixture file and replaced with inline base64 encoding in test - Added comprehensive support for 20+ new document formats including DocBook, JATS, OPML, FB2, ODT, citation formats, plain-text markups, and diagram syntaxes
- Renamed main class from
MarkItDowntoMarkItDownClientthroughout codebase for consistency
Reviewed Changes
Copilot reviewed 65 out of 65 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/MarkItDown.Tests/NewFormatsConverterTests.cs | Added comprehensive test suite for new format converters with inline ODT base64 constant |
| src/MarkItDown/MarkItDownClient.cs | Renamed from MarkItDown, added telemetry support and registered 23 new format converters |
| src/MarkItDown/Converters/*.cs | Added 23 new converter classes supporting formats like DocBook, JATS, OPML, FB2, citation formats, markup languages, and diagram types |
| src/MarkItDown/MimeMapping.cs | Extended MIME type mappings to support all new formats with fallback logic |
| tests/MarkItDown.Tests/TestFiles/ | Added 22 new test fixture files for regression testing of new formats |
| Multiple test files | Updated class name references from MarkItDown to MarkItDownClient |
| var hex = rtf.Substring(i, 2); | ||
| if (byte.TryParse(hex, System.Globalization.NumberStyles.HexNumber, null, out var value)) | ||
| { | ||
| builder.Append(Encoding.Default.GetString(new[] { value })); |
Copilot
AI
Oct 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Encoding.Default can lead to platform-dependent behavior. Consider using a specific encoding like Encoding.UTF8 or Encoding.GetEncoding(1252) for RTF content.
| builder.Append(Encoding.Default.GetString(new[] { value })); | |
| builder.Append(Encoding.GetEncoding(1252).GetString(new[] { value })); |
| if (File.Exists(tempPath)) | ||
| { | ||
| File.Delete(tempPath); | ||
| } |
Copilot
AI
Oct 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The File.Exists check is unnecessary before File.Delete as File.Delete does not throw an exception if the file doesn't exist.
| if (File.Exists(tempPath)) | |
| { | |
| File.Delete(tempPath); | |
| } | |
| File.Delete(tempPath); |
| { | ||
| if (!stream.CanSeek) | ||
| { | ||
| throw new FileConversionException("ODT conversion requires a seekable stream."); |
Copilot
AI
Oct 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a more specific exception message that explains why seekable streams are required (e.g., 'ODT files are ZIP archives that require seekable streams for random access').
| throw new FileConversionException("ODT conversion requires a seekable stream."); | |
| throw new FileConversionException("ODT files are ZIP archives that require seekable streams for random access."); |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
.odtduring the testsample.odtfixture so the repository no longer checks in that assetTesting
https://chatgpt.com/codex/tasks/task_e_68eb63577ed48326bbde32ca81781d06