Skip to content

SLING-13202 Remove dependency on commons-fileupload#75

Open
enapps-enorman wants to merge 7 commits into
apache:masterfrom
enapps-enorman:issue/SLING-13202
Open

SLING-13202 Remove dependency on commons-fileupload#75
enapps-enorman wants to merge 7 commits into
apache:masterfrom
enapps-enorman:issue/SLING-13202

Conversation

@enapps-enorman
Copy link
Copy Markdown
Contributor

Jetty does not need the Apache commons-fileupload library. Modern versions of Jetty support the Java Servlet Specification (3.0 and newer), which includes built-in, native APIs to parse multipart/form-data requests.

Refactoring to use the standard servlet apis:

  1. remove the commons-fileupload dependency
  2. Move the request parameter support configuration properties from the RequestParameterSupportConfigurer component to a new RequestParameterConfig component using the same PID for backward compatibility. This config is now needed in two places (SlingMainServlet and RequestParameterSupportConfigurer).
  3. Configure the SlingMainServlet to enable and configure multipart support using the standard whiteboard properties during the servlet registration.
  4. Refactoring to replace the org.apache.commons.fileupload references with the servlet Part equivalent

@enapps-enorman enapps-enorman requested a review from cziegeler May 15, 2026 16:46
@cziegeler
Copy link
Copy Markdown
Contributor

@sagarmiglani, as you started with this approach some time ago #61 could you please have a look.
I'll try to do a review in the next days as well

@enapps-enorman
Copy link
Copy Markdown
Contributor Author

Thanks @cziegeler for taking a look. I will wait for your feedback. Sorry, I didn't realize there was already SLING-12497 for the same issue as I didn't see any unresolved issues linked to the unreleased version when I started working on it.

@sagarmiglani
Copy link
Copy Markdown
Contributor

Hi @cziegeler and @enapps-enorman,
Apologies for the delayed response - I haven't had the bandwidth to revisit this recently.

Since Eric has already addressed the same issue in this PR, I'm happy to close #61 (SLING-12497) in favor of this one to avoid duplication. Thank you!

Copy link
Copy Markdown
Contributor

@sagarmiglani sagarmiglani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @enapps-enorman, for taking this on. I’ve shared my feedback to the best of my understanding. Please feel free to tag me if any clarification is needed (my responses may sometimes be delayed)

public byte[] get() {
return this.delegatee.get();
try {
return getInputStream().readAllBytes();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method re-reads the part on every call and never closes the stream. I will also suggest caching the content (https://github.com/apache/commons-fileupload/blob/FILEUPLOAD_1_3_1/src/main/java/org/apache/commons/fileupload/disk/DiskFileItem.java#L302)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll change this to a try-with-resources block to make sure the steam is closed.

I think caching the bytes would be redundant and a premature optimization? And what if the file is really big? Reading the stream on demand seems safer and lets the container spool it from memory or disk.

}

return this.delegatee.getString();
return new String(this.get());
Copy link
Copy Markdown
Contributor

@sagarmiglani sagarmiglani May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now uses the JVM platform default charset. Same upload returns different strings on Linux UTF-8 vs Windows.
The old DiskFileItem.getString() used the part's Content-Type charset.

https://github.com/apache/sling-org-apache-sling-engine/pull/61/changes#diff-456ff33f6f11891702c3f5e9226e0ee6b15090fbd46a0a1ec5ea5901d40f7393L120

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was confused by the comment above that that said "only apply encoding in the case of a form field". If the DiskFileItem did do some other encoding, then I can do the same there. But I am not sure what the practical use case is for needing the file content as a string.

* from a multipart/form-data POST request.
* <p>
* To not add a dependency to Servlet API 3 this class does not implement the
* Servlet API 3 {@code Part} interface. To support Servlet API 3 {@code Part}s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this javadoc needs update :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought SlingPart back, so is the javadoc ok now? Or do you have some suggestion on different wording?

Comment thread src/main/java/org/apache/sling/engine/impl/parameters/RequestPartsIterator.java Outdated
this.parseMultiPartPost(parameters);
} catch (final ServletException | IOException e) {
this.log.error(
"getRequestParameterMapInternal: Error parsing multipart streamed request", e);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log message says "streamed request" but this is the non-streamed branch. Also, I’m wondering if it's okay to swallow these exceptions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change the log message to remove "steamed". From what I can tell the old commons-fileupload based implementation would also eat parsing errors and log them so this isn't new behavior.

parameters.addParameter(pp, false);
}
private void parseMultiPartPost(ParameterMap parameters) throws IOException, ServletException {
final Collection<Part> parts = this.getServletRequest().getParts();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose. Limiting the file count wasn't part of any standard, so I assumed the "Maximum File Size" and "Maximum Request Size" accomplished the same objective without having to parse the whole upload.

But if people were using that for something, then we can check the part count after parsing and throw an exception. Though maybe doing that check inside jetty would be more useful so it fails faster?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felix Http Base has such a config already as "osgi.http.whiteboard.servlet.multipart.maxFileCount"`of the registered servlet. So we could set that property

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess I can pass that through again. However, my next step was going to be attempting to remove the commons-fileupload dependency from the felix.http stuff as well since they should just delegate to jetty to parse the multipart stuff too. And I don't believe jetty currently has any mechanism to limit the max number of file parts.

Comment thread src/main/java/org/apache/sling/engine/impl/parameters/ParameterSupport.java Outdated

/** extra config properties for multipart file upload support */
@Reference
private transient RequestParameterConfig reqParamConfig;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need transient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this modifier, sonar reports it as a violation of rule "java:S1948" with this reason:
Fields in a "Serializable" class should either be transient or serializable

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think that this we should ignore such default sonar reports. While this makes sense in general for Serializable objects, it does imho not make sense for Servlets (or any other service). These objects will never be serialized and therefore this just clutters the code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't bother me either way which is why I usually just change it to satisfy sonar. I have no idea how I would go about changing the sonar rules for specific "serializable" types for this or other sling projects. Please tell me how I would accomplish that.

Comment thread src/main/java/org/apache/sling/engine/impl/parameters/ParameterMap.java Outdated
if (itemIterator == null) {
Collection<Part> parts;
try {
parts = request.getParts();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this was my biggest concern during the work on my PR.

Sling has a feature where a client could send a header Sling-uploadmode: stream, that used commons-fileupload's FileUpload.getItemIterator(...). That returned a true lazy iterator.

If i'm not wrong, request.getParts() is required to fully parse the entire multipart body before returning. The lazy-hasNext pattern is cosmetic here, not real.
A client that explicitly opted into streaming mode to upload a 50 GB file will now have the container try to buffer it (memory + disk).

Copy link
Copy Markdown
Contributor Author

@enapps-enorman enapps-enorman May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it is up to the servlet container how the returned collection of parts object is implemented. I'd need to do some more research to understand how jetty does it.

Maybe it still has value in that the parts are not preemptively parsed and converted into MulitpartRequestParameters?

OTOH, configuring your server to allow 50 GB uploads seems dangerous. Is there a real use case for that?

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants