Skip to content

feat(stream): stream to zenodo-rdm deposit form#13

Open
mairasalazar wants to merge 7 commits into
inveniosoftware:mainfrom
mairasalazar:allow-reading-file-from-path
Open

feat(stream): stream to zenodo-rdm deposit form#13
mairasalazar wants to merge 7 commits into
inveniosoftware:mainfrom
mairasalazar:allow-reading-file-from-path

Conversation

@mairasalazar

@mairasalazar mairasalazar commented Mar 6, 2026

Copy link
Copy Markdown
Collaborator

❤️ Thank you for your review!

This PR allows integrating the Orcha workflow with InvenioRDM by:

  • Allowing a file URI to be passed instead of a URL.
  • Adding support for Server-Sent Events, to allow streaming.

@mairasalazar mairasalazar requested a review from yashlamba March 6, 2026 15:08
@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch 2 times, most recently from c03fbb6 to 77626f8 Compare March 10, 2026 13:29
@mairasalazar mairasalazar changed the title refactor: read from path instead of url integrate with invenio-app-rdm Mar 10, 2026
@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch from 77626f8 to ceddece Compare March 11, 2026 09:22
Comment thread app/activities/extract_pdf_content.py Outdated
@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch 2 times, most recently from 497936d to 1195026 Compare March 12, 2026 16:10
Comment thread app/activities/extract_pdf_content.py Outdated
pdf_bytes = response.content
"""Read a file and extract its text content using the specified extractor."""
if settings.orcha_env in [Environment.LOCAL, Environment.DEV]:
with open(request.url, "rb") as f:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rename it to request.source as it can now be a locale file path which is not an URL.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it as it is since the local file path is a temporary thing for local dev

Comment thread app/activities/extract_pdf_content.py Outdated
Comment thread app/routers/workflows.py Outdated
return

except SQLAlchemyError as e:
print("Error in fetching from database (stream_workflow)", e)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move to using logging, so we get the stack traces + observability.

Comment thread app/activities/extract_pdf_content.py Outdated
@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch 2 times, most recently from c80e161 to b923845 Compare March 16, 2026 16:51
@mairasalazar mairasalazar changed the title integrate with invenio-app-rdm feat(stream): stream to zenodo-rdm deposit form Mar 18, 2026
Comment thread app/activities/extract_pdf_content.py Outdated
Comment thread app/routers/workflows.py Outdated
if status == WorkflowStatus.ERROR:
yield ServerSentEvent(
# TODO: improve it with a better error message for end users
data="The Temporal Workflow failed",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

including translations, if this error will be directly shown in the interface. Otherwise, it is better to return error IDs or similar (numbers?), so that the error msg will be converted by the UI and translated.

Comment thread app/routers/workflows.py Outdated
Comment thread app/routers/workflows.py Outdated
Comment thread app/config.py Outdated
@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch from bebe3ab to 80919a9 Compare March 20, 2026 10:15
Comment thread app/routers/workflows.py
select(Workflow).where(Workflow.public_id == workflow_id)
).one()
except NoResultFound:
raise WorkflowEventError(error_code="WORKFLOW_NOT_FOUND")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: still need to handle error codes on the ui side

@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch from 80919a9 to 20d2e76 Compare March 20, 2026 10:20
@mairasalazar

Copy link
Copy Markdown
Collaborator Author

Failing one test because some changes here depend on #18 (e.g.: Workflow class)

@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch 2 times, most recently from 72dde08 to 7577f4a Compare April 15, 2026 13:22

@slint slint left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor/nits, but otherwise LGTM

Comment thread app/config.py Outdated
Comment thread app/routers/workflows.py
from sqlalchemy.exc import SQLAlchemyError
from sqlalchemy.orm.exc import NoResultFound
from sqlmodel import Session, select
from sse_starlette import EventSourceResponse, ServerSentEvent

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I see from the FastAPI SSE docs, you can import these from fastapi.sse. Does this maybe imply also that sse-starlette is already bundled with FastAPI (or comes via an extra?)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fastapi.sse was introduced in version 0.135.0 and Orcha is on version 0.129.0, which I guess we could bump without issues :) From what I've seen, the implementation doesn't use sse-starlette for the SSE support

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yashlamba fyi, maybe we can look into using a newer version of fastapi

Comment thread app/routers/workflows.py
@github-project-automation github-project-automation Bot moved this from In review 🔍 to In progress in Sprint Q2 2026 ☀️ Apr 28, 2026
@mairasalazar mairasalazar force-pushed the allow-reading-file-from-path branch from 7577f4a to e970fce Compare April 28, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants