feat: support endpoint upload file#2364
feat: support endpoint upload file#2364nguyenviet02 wants to merge 1 commit intorouter-for-me:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a local file storage system and OpenAI-compatible file management endpoints (/v1/files). It includes a new internal/files package for handling file persistence, metadata, and text extraction from various formats like PDF and DOCX. The OpenAIResponsesAPIHandler was updated to preprocess input_file references by extracting and inlining text content. Feedback focuses on improving memory efficiency during file uploads by streaming data instead of using io.ReadAll, enhancing log clarity for store initialization failures, and simplifying JSON response construction in the handlers. Additionally, it was noted that silent errors during file listing should be logged for better diagnosability.
| data, err := io.ReadAll(f) | ||
| if err != nil { | ||
| writeFilesError(c, http.StatusBadRequest, fmt.Sprintf("read uploaded file: %v", err)) | ||
| return | ||
| } |
There was a problem hiding this comment.
The Create handler reads the entire uploaded file into memory using io.ReadAll. This can lead to excessive memory consumption for large files, potentially causing performance issues or even a denial-of-service vulnerability.
To improve this, the file should be streamed to disk instead of being fully loaded into memory. This would involve modifying store.Create to accept an io.Reader for the file data and then using io.Copy to write to the destination file.
| if err != nil { | ||
| log.WithError(err).Warn("failed to initialize local file store") | ||
| } |
There was a problem hiding this comment.
The error from cliproxyfiles.NewStoreWithDir is only logged as a warning. While this allows the server to start, it might not be obvious to the operator that the file upload functionality is unavailable, especially if this error occurs due to a misconfiguration. Consider making the log message more explicit about the consequences.
| if err != nil { | |
| log.WithError(err).Warn("failed to initialize local file store") | |
| } | |
| if err != nil { | |
| log.WithError(err).Warn("failed to initialize local file store; file upload functionality will be unavailable") | |
| } |
| if err != nil { | ||
| continue | ||
| } | ||
| var meta storedMetadata | ||
| if err := json.Unmarshal(blob, &meta); err != nil { | ||
| continue |
There was a problem hiding this comment.
In the List function, errors from os.ReadFile and json.Unmarshal are silently ignored by using continue. While this makes the function robust against individual corrupted metadata files, it also hides potential problems from the administrator. It would be beneficial to log these errors to help diagnose issues with the file store.
You will need to add an import for a logger, for example log "github.com/sirupsen/logrus".
if err != nil {
log.WithError(err).Warnf("skipping corrupted metadata file %q", entry.Name())
continue
}
var meta storedMetadata
if err := json.Unmarshal(blob, &meta); err != nil {
log.WithError(err).Warnf("skipping unparsable metadata file %q", entry.Name())
continue| data := make([]gin.H, 0, len(items)) | ||
| for _, meta := range items { | ||
| data = append(data, gin.H{ | ||
| "id": meta.ID, | ||
| "object": meta.Object, | ||
| "bytes": meta.Bytes, | ||
| "created_at": meta.CreatedAt, | ||
| "filename": meta.Filename, | ||
| "purpose": meta.Purpose, | ||
| "status": meta.Status, | ||
| }) | ||
| } | ||
| c.JSON(http.StatusOK, gin.H{"object": "list", "data": data}) |
There was a problem hiding this comment.
The JSON response in the List handler is constructed manually using gin.H, which is verbose and repetitive. The cliproxyfiles.Metadata struct is already tagged for JSON serialization, so you can pass the items slice directly.
This simplification can also be applied to the Create and Get methods in this file, where c.JSON(http.StatusOK, meta) can be used to simplify the code.
| data := make([]gin.H, 0, len(items)) | |
| for _, meta := range items { | |
| data = append(data, gin.H{ | |
| "id": meta.ID, | |
| "object": meta.Object, | |
| "bytes": meta.Bytes, | |
| "created_at": meta.CreatedAt, | |
| "filename": meta.Filename, | |
| "purpose": meta.Purpose, | |
| "status": meta.Status, | |
| }) | |
| } | |
| c.JSON(http.StatusOK, gin.H{"object": "list", "data": data}) | |
| c.JSON(http.StatusOK, gin.H{"object": "list", "data": items}) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c06815d495
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return | ||
| } | ||
| defer f.Close() | ||
| data, err := io.ReadAll(f) |
There was a problem hiding this comment.
Reject oversized uploads before buffering request body
The upload path reads the entire multipart file with io.ReadAll and does not enforce any maximum size, so a large authenticated upload can force a full in-memory allocation and potentially OOM the process before persistence happens. Gin’s multipart handling may spill to temp files, but this call still buffers the whole file in RAM; adding an explicit byte limit (e.g., limited reader / request size guard) is needed to prevent memory exhaustion.
Useful? React with 👍 / 👎.
| if h == nil || h.store == nil { | ||
| return nil, fmt.Errorf("file store unavailable") | ||
| } |
There was a problem hiding this comment.
Return 503 when file store is unavailable in GET path
When the file store cannot be initialized (for example, invalid or unwritable files-dir), lookup returns a generic error that is later mapped to HTTP 500, while the other file endpoints explicitly return 503 for the same condition. This makes /v1/files/:id inconsistent and incorrectly signals an internal server fault instead of a temporary dependency outage.
Useful? React with 👍 / 👎.
luispater
left a comment
There was a problem hiding this comment.
Summary:
This PR adds /v1/files CRUD endpoints, local file persistence, and input_file preprocessing for /v1/responses by converting uploaded files into input_text. The overall direction is good and key happy paths are covered by tests.
Blocking:
- The upload + preprocessing flow currently performs unbounded full-buffer reads (
io.ReadAll) with no request/file size enforcement. This allows very large uploads to consume excessive memory and potentially crash the service. Please add explicit size limits (request-level and file-level) and enforce bounded reads before storing/processing.
Non-blocking:
- Consider aligning status codes for “file store unavailable” across all file endpoints (currently
GET /v1/files/:idcan return 500 while others return 503).
Test plan I ran:
go test ./internal/files ./sdk/api/handlers/openai ./internal/config ./internal/watcher/diff ./internal/api
|
Closing this PR because support for this is not planned at this time. |
No description provided.