Skip to content

[FEA]: to_markdown/to_markdown_by_page should differentiate by distinct document ingested #1630

@randerzander

Description

@randerzander

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Significant improvement

Please provide a clear description of problem this feature solves

Using the snippet, if you ingest a single document, the markdown conversion makes sense.

However, if your ingestion job contained multiple documents, there's no way to differentiate returns for different documents

For example, if you ingest multimodal_test.pdf and an additional single page PDF, to_markdown_by_page will return what looks like a representation of a 4 page single document.

Describe the feature, and optionally a solution or implementation and any alternatives

Both to_markdown and to_markdown_by page should probably include a source_filename field by which chunks are grouped.

Additional context

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions