-
Notifications
You must be signed in to change notification settings - Fork 164
feat(BA-3744): Add multiple download handler and pydantic jwt model #8424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a new endpoint for downloading multiple files/directories as a single ZIP archive, addressing issue #7766 (BA-3744). The implementation introduces a streaming ZIP archive handler that leverages zipstream to avoid creating temporary files.
Changes:
- Adds new
ZipArchiveStreamReaderclass andArchiveDownloadTokenDataPydantic model for JWT validation - Implements
/folder/file/download-archivemanager endpoint to generate download tokens for multiple files - Implements
/download-archiveclient endpoint to stream requested files as a ZIP archive
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 16 comments.
| File | Description |
|---|---|
| src/ai/backend/storage/services/file_stream/zip.py | New module containing ZIP streaming infrastructure with ZipArchiveStreamReader and ArchiveDownloadTokenData Pydantic model |
| src/ai/backend/storage/api/manager.py | Adds create_archive_download_session endpoint and route for generating JWT tokens for multi-file downloads |
| src/ai/backend/storage/api/client.py | Adds download_archive handler for streaming multiple files as ZIP using token-based authentication |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| payload = jwt.decode(raw_token, secret, algorithms=["HS256"]) | ||
| token_data = ArchiveDownloadTokenData.model_validate(payload) | ||
| except jwt.ExpiredSignatureError as e: | ||
| raise web.HTTPUnauthorized(reason="Token expired") from e | ||
| except jwt.InvalidTokenError as e: | ||
| raise web.HTTPUnauthorized(reason="Invalid token") from e | ||
| except ValidationError as e: | ||
| raise InvalidAPIParameters(extra_msg=str(e)) from e |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual JWT validation is inconsistent with existing patterns in this file. The existing download and upload handlers use tx.JsonWebToken validator in the check_params context manager for automatic JWT decoding and validation (see lines 145-148, 309-312). This approach should be used here for consistency and better error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tx.JsonWebToken validator in the check_params uses Trafaret based validation.
Our ultimate goal is to migrate Trafaret into pydantic inside the program, Better implement it inline the function and replace with pydantic based validator later.
But at least not in this pr.
src/ai/backend/storage/api/client.py
Outdated
| sanitized: list[Path] = [] | ||
| for relpath in token_data.files: | ||
| file_path = volume.sanitize_vfpath(token_data.vfolder_id, relpath) | ||
| # Ensure the file_path is within the vfolder_root boundary | ||
| if not file_path.exists(): | ||
| raise web.HTTPNotFound(reason=f"File not found: {relpath}") |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation doesn't handle duplicate paths in the input list. If the same file or directory appears multiple times in token_data.files, it will be added to the zip archive multiple times with the same arcname, which may cause issues or confusion. Consider deduplicating the file paths before processing, or at least documenting this behavior.
| sanitized: list[Path] = [] | |
| for relpath in token_data.files: | |
| file_path = volume.sanitize_vfpath(token_data.vfolder_id, relpath) | |
| # Ensure the file_path is within the vfolder_root boundary | |
| if not file_path.exists(): | |
| raise web.HTTPNotFound(reason=f"File not found: {relpath}") | |
| sanitized: list[Path] = [] | |
| seen_paths: set[Path] = set() | |
| for relpath in token_data.files: | |
| file_path = volume.sanitize_vfpath(token_data.vfolder_id, relpath) | |
| # Ensure the file_path is within the vfolder_root boundary | |
| if not file_path.exists(): | |
| raise web.HTTPNotFound(reason=f"File not found: {relpath}") | |
| if file_path in seen_paths: | |
| continue | |
| seen_paths.add(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add comment on this one, but we need to decide which part should get charged of detecting duplicated files in the list
| raise web.HTTPUnauthorized(reason="Invalid token") from e | ||
| except ValidationError as e: | ||
| raise InvalidAPIParameters(extra_msg=str(e)) from e | ||
| # TODO: We need to implement pydantic jwt decorder to avoid this manual validation. include extract secret key from ctx. |
Copilot
AI
Jan 29, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates that the current implementation is incomplete and needs to be refactored. Since the codebase already has a tx.JsonWebToken validator that handles JWT decoding with secret extraction from context, this manual approach should be replaced with the existing pattern before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on trafaret based validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write it based on Pydantic this time.
2660427 to
0f00acc
Compare
| vfid: VFolderID | ||
| files: list[str] | ||
|
|
||
| async with cast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of wrapping api handlers with pydantic, but it seems better to have it in the future PR
| raw_token = request.query.get("token") | ||
| if not raw_token: | ||
| raise InvalidAPIParameters(extra_msg="Missing 'token' query parameter") | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't find pydantic wrapper to automatize validation and error handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be done in following issue as migrating storage is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the api_handler decorator, which was also used in manager.
|
We can add three tests.
|
| raise InvalidAPIParameters(extra_msg=f"Path escapes vfolder boundary: {relpath}") | ||
| if not file_path.exists(): | ||
| raise web.HTTPNotFound(reason=f"File not found: {relpath}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sanitize_vfpath call mangle function which translate vfid into absolute path, not needed to use entire function.
| raw_token = request.query.get("token") | ||
| if not raw_token: | ||
| raise InvalidAPIParameters(extra_msg="Missing 'token' query parameter") | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the api_handler decorator, which was also used in manager.
| raise web.HTTPUnauthorized(reason="Invalid token") from e | ||
| except ValidationError as e: | ||
| raise InvalidAPIParameters(extra_msg=str(e)) from e | ||
| # TODO: We need to implement pydantic jwt decorder to avoid this manual validation. include extract secret key from ctx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please write it based on Pydantic this time.
| async def create_archive_download_session(request: web.Request) -> web.Response: | ||
| class Params(TypedDict): | ||
| volume: str | ||
| vfid: VFolderID | ||
| files: list[str] | ||
|
|
||
| async with cast( | ||
| AbstractAsyncContextManager[Params], | ||
| check_params( | ||
| request, | ||
| t.Dict( | ||
| { | ||
| t.Key("volume"): t.String(), | ||
| t.Key("vfid"): tx.VFolderID(), | ||
| t.Key("files"): t.List(t.String, min_length=1), | ||
| }, | ||
| ), | ||
| ), | ||
| ) as params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason to keep this implementation. Please define it based on pydantic.
| token_payload = ArchiveDownloadTokenData( | ||
| operation="download", | ||
| volume=params["volume"], | ||
| vfolder_id=params["vfid"], | ||
| files=params["files"], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining pydantic and first validating with trafaret is highly unusual.
| class ArchiveDownloadTokenData(BaseModel): | ||
| """Pydantic model for validating the JWT payload of archive download tokens.""" | ||
|
|
||
| operation: Literal["download"] | ||
| volume: str | ||
| vfolder_id: VFolderIDField | ||
| files: list[str] = Field(min_length=1) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the request must also be made in the manager, the values to be passed as DTOs should be defined in common/dto/storage.
And please define values like operation as an Enum rather than a Literal.
| async def stream_archive_response( | ||
| request: web.Request, | ||
| reader: StreamReader, | ||
| filename: str, | ||
| ) -> web.StreamResponse: | ||
| """Stream a StreamReader's output as an HTTP attachment response. | ||
|
|
||
| This helper is format-agnostic: it works with any StreamReader (ZIP, tar, etc.). | ||
| """ | ||
| ascii_filename = filename.encode("ascii", errors="ignore").decode("ascii").replace('"', r"\"") | ||
| encoded_filename = urllib.parse.quote(filename, encoding="utf-8") | ||
| response = web.StreamResponse( | ||
| headers={ | ||
| hdrs.CONTENT_TYPE: reader.content_type() or "application/octet-stream", | ||
| hdrs.CONTENT_DISPOSITION: " ".join([ | ||
| f'attachment;filename="{ascii_filename}";', # RFC-2616 sec2.2 | ||
| f"filename*=UTF-8''{encoded_filename}", # RFC-5987 | ||
| ]), | ||
| }, | ||
| ) | ||
| await response.prepare(request) | ||
| async for chunk in reader.read(): | ||
| await response.write(chunk) | ||
| return response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure API requests are not intertwined here. The service layer should only return a StreamReader, and we want it to use only the values returned from the API layer.
resolves #7766 (BA-3744)
implementation of BA-1040, multiple file zip downloader.
Checklist: (if applicable)
ai.backend.testdocsdirectory