Excludes Image objects when assembling plaintext content to write.#25
Excludes Image objects when assembling plaintext content to write.#25jtkiley wants to merge 1 commit intobrendonh:masterfrom jtkiley:master
Image objects when assembling plaintext content to write.#25Conversation
|
That will do the trick, even though its a bit hackish... I don't know if people would like this but it might be useful to include a snippet: |
|
I agree that it's a specific and not-at-all pretty fix. I'm just not familiar enough with pyth and the finer points of the RTF format to intelligently make changes to the design. As for the snippet, I do a lot of content analysis, and I use pyth to process RTFs into plain text. It's probably my specific research use case, but I'm wary of adding text into a document. Also, the images in my documents are an artifact of the data provider (not the original data). It may be a good option, though. If I were looking at documents with "real" embedded images, being able to capture that fact might lead to interesting results. I would guess that a lot of use cases would similarly be interested in at least knowing about images. |
Fixes #24.
Obviously, this is the simple fix. When I looked at stopping
Imagefrom inheriting fromParagraph, I didn't get errors (and without this change, I still got the image hex in files). I'm still a little fuzzy on the finer points of the RTF spec and the reader's logic, so I probably need to clear that up before working onImage.