SLING-13202 Remove dependency on commons-fileupload#75
Conversation
|
@sagarmiglani, as you started with this approach some time ago #61 could you please have a look. |
|
Thanks @cziegeler for taking a look. I will wait for your feedback. Sorry, I didn't realize there was already SLING-12497 for the same issue as I didn't see any unresolved issues linked to the unreleased version when I started working on it. |
|
Hi @cziegeler and @enapps-enorman, Since Eric has already addressed the same issue in this PR, I'm happy to close #61 (SLING-12497) in favor of this one to avoid duplication. Thank you! |
sagarmiglani
left a comment
There was a problem hiding this comment.
Thank you, @enapps-enorman, for taking this on. I’ve shared my feedback to the best of my understanding. Please feel free to tag me if any clarification is needed (my responses may sometimes be delayed)
| public byte[] get() { | ||
| return this.delegatee.get(); | ||
| try { | ||
| return getInputStream().readAllBytes(); |
There was a problem hiding this comment.
This method re-reads the part on every call and never closes the stream. I will also suggest caching the content (https://github.com/apache/commons-fileupload/blob/FILEUPLOAD_1_3_1/src/main/java/org/apache/commons/fileupload/disk/DiskFileItem.java#L302)
There was a problem hiding this comment.
Good catch. I'll change this to a try-with-resources block to make sure the steam is closed.
I think caching the bytes would be redundant and a premature optimization? And what if the file is really big? Reading the stream on demand seems safer and lets the container spool it from memory or disk.
| } | ||
|
|
||
| return this.delegatee.getString(); | ||
| return new String(this.get()); |
There was a problem hiding this comment.
This now uses the JVM platform default charset. Same upload returns different strings on Linux UTF-8 vs Windows.
The old DiskFileItem.getString() used the part's Content-Type charset.
There was a problem hiding this comment.
I think I was confused by the comment above that that said "only apply encoding in the case of a form field". If the DiskFileItem did do some other encoding, then I can do the same there. But I am not sure what the practical use case is for needing the file content as a string.
| * from a multipart/form-data POST request. | ||
| * <p> | ||
| * To not add a dependency to Servlet API 3 this class does not implement the | ||
| * Servlet API 3 {@code Part} interface. To support Servlet API 3 {@code Part}s |
There was a problem hiding this comment.
this javadoc needs update :)
There was a problem hiding this comment.
I brought SlingPart back, so is the javadoc ok now? Or do you have some suggestion on different wording?
| this.parseMultiPartPost(parameters); | ||
| } catch (final ServletException | IOException e) { | ||
| this.log.error( | ||
| "getRequestParameterMapInternal: Error parsing multipart streamed request", e); |
There was a problem hiding this comment.
Log message says "streamed request" but this is the non-streamed branch. Also, I’m wondering if it's okay to swallow these exceptions.
There was a problem hiding this comment.
I will change the log message to remove "steamed". From what I can tell the old commons-fileupload based implementation would also eat parsing errors and log them so this isn't new behavior.
| parameters.addParameter(pp, false); | ||
| } | ||
| private void parseMultiPartPost(ParameterMap parameters) throws IOException, ServletException { | ||
| final Collection<Part> parts = this.getServletRequest().getParts(); |
There was a problem hiding this comment.
maxFileCount seems to be dropped, can we use something similar? https://github.com/apache/sling-org-apache-sling-engine/pull/61/changes#diff-7ada9ce1f0869aa363c184c894ab0b732ccf28ce4b7e16dda47de7c8a7c3982aR418
There was a problem hiding this comment.
I suppose. Limiting the file count wasn't part of any standard, so I assumed the "Maximum File Size" and "Maximum Request Size" accomplished the same objective without having to parse the whole upload.
But if people were using that for something, then we can check the part count after parsing and throw an exception. Though maybe doing that check inside jetty would be more useful so it fails faster?
There was a problem hiding this comment.
Felix Http Base has such a config already as "osgi.http.whiteboard.servlet.multipart.maxFileCount"`of the registered servlet. So we could set that property
There was a problem hiding this comment.
Ok, I guess I can pass that through again. However, my next step was going to be attempting to remove the commons-fileupload dependency from the felix.http stuff as well since they should just delegate to jetty to parse the multipart stuff too. And I don't believe jetty currently has any mechanism to limit the max number of file parts.
|
|
||
| /** extra config properties for multipart file upload support */ | ||
| @Reference | ||
| private transient RequestParameterConfig reqParamConfig; |
There was a problem hiding this comment.
nit: do we need transient?
There was a problem hiding this comment.
Without this modifier, sonar reports it as a violation of rule "java:S1948" with this reason:
Fields in a "Serializable" class should either be transient or serializable
There was a problem hiding this comment.
I personally think that this we should ignore such default sonar reports. While this makes sense in general for Serializable objects, it does imho not make sense for Servlets (or any other service). These objects will never be serialized and therefore this just clutters the code
There was a problem hiding this comment.
It doesn't bother me either way which is why I usually just change it to satisfy sonar. I have no idea how I would go about changing the sonar rules for specific "serializable" types for this or other sling projects. Please tell me how I would accomplish that.
| if (itemIterator == null) { | ||
| Collection<Part> parts; | ||
| try { | ||
| parts = request.getParts(); |
There was a problem hiding this comment.
IIRC, this was my biggest concern during the work on my PR.
Sling has a feature where a client could send a header Sling-uploadmode: stream, that used commons-fileupload's FileUpload.getItemIterator(...). That returned a true lazy iterator.
If i'm not wrong, request.getParts() is required to fully parse the entire multipart body before returning. The lazy-hasNext pattern is cosmetic here, not real.
A client that explicitly opted into streaming mode to upload a 50 GB file will now have the container try to buffer it (memory + disk).
There was a problem hiding this comment.
I suppose it is up to the servlet container how the returned collection of parts object is implemented. I'd need to do some more research to understand how jetty does it.
Maybe it still has value in that the parts are not preemptively parsed and converted into MulitpartRequestParameters?
OTOH, configuring your server to allow 50 GB uploads seems dangerous. Is there a real use case for that?
|



Jetty does not need the Apache commons-fileupload library. Modern versions of Jetty support the Java Servlet Specification (3.0 and newer), which includes built-in, native APIs to parse multipart/form-data requests.
Refactoring to use the standard servlet apis: