Conversation
📝 WalkthroughWalkthroughAdds multipart/form-data parsing to search routes. Introduces Changes
Sequence DiagramsequenceDiagram
autonumber
participant Client as HTTP Client
participant Handler as copy-of-body-handler
participant Detector as Content-Type Check
participant MultipartParser as parse-multipart-body
participant Stream as Reconstituted Body (ByteArrayInputStream)
Client->>Handler: POST request (maybe multipart/form-data)
Handler->>Detector: Inspect Content-Type header
alt multipart/form-data
Handler->>MultipartParser: Pass request with params
MultipartParser->>MultipartParser: Convert parts -> query-string bytes
MultipartParser->>Stream: Return ByteArrayInputStream
Handler->>Handler: Attach `:body-copy` and `:body` (Stream)
else not multipart
Handler->>Handler: Slurp raw body -> attach `:body-copy` and `:body`
end
Handler->>Client: Continue processing with preserved body
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks✅ Passed checks (3 passed)
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
search-app/src/cmr/search/routes.clj
🔇 Additional comments (2)
search-app/src/cmr/search/routes.clj (2)
27-27: LGTM!The import addition is necessary for multipart form-data parsing and follows standard conventions.
70-73: Implementation looks correct, pending fixes toparse-multipart-body.The content-type detection using
string/starts-with?properly handles multipart content-types with boundary parameters (e.g.,multipart/form-data; boundary=...). The approach of parsing the multipart body and reconstructing it as a ByteArrayInputStream enables multiple reads, which addresses the original bug.However, ensure the issues flagged in
parse-multipart-body(URL encoding and file upload handling) are addressed, as this handler depends on that function's correctness.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2366 +/- ##
===========================================
- Coverage 58.02% 29.14% -28.89%
===========================================
Files 1063 1003 -60
Lines 72750 69333 -3417
Branches 2110 1196 -914
===========================================
- Hits 42212 20204 -22008
- Misses 28569 47994 +19425
+ Partials 1969 1135 -834 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
search-app/src/cmr/search/routes.clj (1)
77-77: Major: Specify UTF-8 encoding explicitly when converting to bytes.Using
.getByteswithout specifying a charset uses the platform's default encoding, which can vary across environments and cause issues with non-ASCII characters (e.g., international characters, emojis).🔎 Proposed fix specifying UTF-8 encoding
(handler (assoc request :body-copy body - :body (java.io.ByteArrayInputStream. (.getBytes body))))))) + :body (java.io.ByteArrayInputStream. (.getBytes body "UTF-8")))))))This ensures consistent UTF-8 encoding across all environments and properly handles international characters.
♻️ Duplicate comments (1)
search-app/src/cmr/search/routes.clj (1)
59-62: Critical: URL encoding is still missing, plus type and nil safety concerns.This function has several issues that will cause runtime errors or malformed data:
URL encoding: Parameter names and values containing special characters (
&,=,%, spaces, etc.) will create malformed query strings that cannot be parsed correctly downstream. This was flagged in a previous review but remains unaddressed.Type safety:
(name k)assumes keys are keywords, but:multipart-paramscan contain string keys. This will throw an exception for string keys.Nil and non-string values: Values could be
nil,Fileobjects (for file uploads), or other non-string types. These need explicit handling before concatenation.🔎 Proposed fix addressing all three issues
+(ns cmr.search.routes + ... + (:require + ... + [ring.util.codec :as codec] + ...)) + (defn parse-multipart-body [request] (let [parsed (multipart/multipart-params-request request)] (string/join "&" (for [[k v] (:multipart-params parsed)] - (str (name k) "=" v))))) + (let [key-str (if (keyword? k) (name k) (str k)) + val-str (if (nil? v) "" (str v))] + (str (codec/url-encode key-str) "=" (codec/url-encode val-str)))))))This fix:
- Imports
ring.util.codecfor URL encoding- Handles both keyword and string keys safely
- Converts nil values to empty strings
- Converts all values to strings before encoding
- URL-encodes both keys and values to prevent malformed query strings
🧹 Nitpick comments (1)
search-app/src/cmr/search/routes.clj (1)
70-77: Consider adding error handling for parsing failures.The current implementation doesn't handle potential exceptions from
parse-multipart-bodyorslurp. While the outerexception-handlermiddleware (line 132) will catch these, adding explicit error handling here could:
- Provide more specific error messages for debugging
- Allow graceful fallbacks if multipart parsing fails
- Make the middleware more resilient to malformed requests
Example error handling approach
(defn copy-of-body-handler [handler] (fn [request] (let [content-type (get-in request [:headers "content-type"]) body (try (if (and content-type (string? content-type) (string/starts-with? content-type "multipart/form-data")) (parse-multipart-body request) (slurp (:body request))) (catch Exception e (throw (ex-info "Failed to parse request body" {:type :body-parsing-error :content-type content-type} e))))] (handler (assoc request :body-copy body :body (java.io.ByteArrayInputStream. (.getBytes body "UTF-8")))))))This provides clearer error context while still propagating the exception to be handled by the middleware chain.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
search-app/src/cmr/search/routes.clj
🔇 Additional comments (2)
search-app/src/cmr/search/routes.clj (2)
27-27: LGTM!The import of
ring.middleware.multipart-paramsis appropriate for the multipart form-data parsing functionality added in this PR.
70-73: Critical: Nil content-type will cause NullPointerException.Line 71 will throw a
NullPointerExceptionif thecontent-typeheader isnilbecausestring/starts-with?does not handle nil strings.🔎 Proposed fix to safely check content-type
(let [content-type (get-in request [:headers "content-type"]) - body (if (and content-type (string/starts-with? content-type "multipart/form-data")) + body (if (and content-type + (string? content-type) + (string/starts-with? content-type "multipart/form-data")) (parse-multipart-body request) (slurp (:body request)))]This ensures
content-typeis not only non-nil but also a string before callingstring/starts-with?.Likely an incorrect or invalid review comment.
| (defn parse-multipart-body [request] | ||
| (let [parsed (multipart/multipart-params-request request)] | ||
| (string/join "&" (for [[k v] (:multipart-params parsed)] | ||
| (str (name k) "=" v))))) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify file upload patterns and body-copy usage in the search app
echo "=== Check for file upload handling in search routes ==="
rg -n -C5 'file|upload|:tempfile|:filename' search-app/src/cmr/search/ --type=clojure
echo -e "\n=== Find all usages of :body-copy to understand expected format ==="
rg -n -C5 ':body-copy' search-app/ --type=clojure
echo -e "\n=== Check if shapefile middleware expects multipart or query string ==="
rg -n -C5 'body-copy.*shapefile|shapefile.*body-copy' search-app/src/cmr/search/middleware/ --type=clojureRepository: nasa/Common-Metadata-Repository
Length of output: 50388
🏁 Script executed:
#!/bin/bash
# Find where parse-multipart-body is defined and used
echo "=== Find parse-multipart-body usages ==="
rg -n "parse-multipart-body" search-app/src/cmr/search/ --type=clojure
# Search specifically for :body-copy assignments and usage
echo -e "\n=== Find :body-copy usage patterns ==="
rg -n ":body-copy" search-app/src/cmr/search/ --type=clojure -C3 | head -100
# Check copy-of-body-handler middleware
echo -e "\n=== Find copy-of-body-handler ==="
rg -n "copy-of-body-handler\|body-copy" search-app/src/cmr/search/ --type=clojure -B2 -A5 | head -80Repository: nasa/Common-Metadata-Repository
Length of output: 4255
🏁 Script executed:
#!/bin/bash
# Get the complete copy-of-body-handler function
echo "=== Complete copy-of-body-handler function ==="
rg -A20 "defn copy-of-body-handler" search-app/src/cmr/search/routes.clj
# Check if shapefile uploads use multipart/form-data
echo -e "\n=== Shapefile middleware inspection ==="
rg -n "shapefile-upload" search-app/src/cmr/search/middleware/ -A10 -B5 --type=clojure | head -60
# Find the shapefile middleware file
echo -e "\n=== Find shapefile middleware ==="
fd "shapefile" search-app/src/cmr/search/middleware/Repository: nasa/Common-Metadata-Repository
Length of output: 2770
🏁 Script executed:
#!/bin/bash
# Check what the parse-multipart-body function actually does with multipart params
echo "=== Check parse-multipart-body full implementation ==="
rg -B2 -A8 "defn parse-multipart-body" search-app/src/cmr/search/routes.clj
# Check middleware order and where shapefile param flows
echo -e "\n=== Understand middleware chain order in routes.clj ==="
sed -n '130,145p' search-app/src/cmr/search/routes.clj
# Check if body-copy is actually used in request parameter processing
echo -e "\n=== Check if body or body-copy is used in find-concepts ==="
rg -n "find-concepts" search-app/src/cmr/search/api/concepts_search.clj -A30 | head -50
# Check what parameters are expected in the search API
echo -e "\n=== Check parameter usage in search ==="
rg -n "defn find-concepts" search-app/src/cmr/search/api/concepts_search.clj -A5Repository: nasa/Common-Metadata-Repository
Length of output: 4254
🏁 Script executed:
#!/bin/bash
# Let's check what multipart/multipart-params-request actually returns
echo "=== Find multipart-params-request usage ==="
rg -n "multipart-params-request" search-app/src/cmr/search/ -B2 -A5 --type=clojure
# Check if body or body-copy is used anywhere for shapefile/file processing
echo -e "\n=== Check how body-copy is consumed in the API ==="
rg -n "body" search-app/src/cmr/search/api/concepts_search.clj | grep -E "(body-copy|body)" | head -30
# Look at find-concepts function to see if body is used as shapefile parameter
echo -e "\n=== Check find-concepts implementation ==="
rg -B5 -A20 "defn- find-concepts" search-app/src/cmr/search/api/concepts_search.clj | head -50Repository: nasa/Common-Metadata-Repository
Length of output: 4808
🏁 Script executed:
#!/bin/bash
# Check the shapefile parameter conversion logic
echo "=== Check how shapefile parameter is converted ==="
rg -B5 -A15 "defmethod p/parameter->condition :shapefile" search-app/src/cmr/search/services/parameters/converters/shapefile.clj
# Check what the shapefile-upload middleware actually extracts
echo -e "\n=== Check shapefile-upload middleware ==="
cat search-app/src/cmr/search/middleware/shapefile.clj | head -100
# Check if shapefile param goes through parse-multipart-body or is handled separately
echo -e "\n=== Check wrap-multipart-params usage ==="
rg -n "wrap-multipart-params" search-app/ --type=clojure -B3 -A3Repository: nasa/Common-Metadata-Repository
Length of output: 4732
🏁 Script executed:
#!/bin/bash
# Check if shapefile param actually flows through find-concepts with body parameter
echo "=== Check how body-copy is used in find-concepts ==="
rg -B2 -A10 "defn- find-concepts\b" search-app/src/cmr/search/api/concepts_search.clj
# Check what params structure looks like and if shapefile is passed in params or via body
echo -e "\n=== Check core-api/process-params to understand param processing ==="
rg -n "defn process-params" search-app/src/cmr/search/api/ --type=clojure -A5
# Confirm: does shapefile come as a query param or multipart param?
echo -e "\n=== Check shapefile parameter mapping ==="
rg -n "shapefile.*:" search-app/src/cmr/search/services/parameters/conversion.clj | head -20Repository: nasa/Common-Metadata-Repository
Length of output: 4129
🏁 Script executed:
#!/bin/bash
# Check if body or query-string (from body-copy) is actually used for shapefile parameter conversion
echo "=== Check how query-string is used in query service ==="
rg -n "query-string" search-app/src/cmr/search/services/query_service.clj -B2 -A2 | head -40
# Check if shapefile comes from params or from body
echo -e "\n=== Check shapefile parameter flow in parameter conversion ==="
rg -n "shapefile" search-app/src/cmr/search/services/parameters/conversion.clj -B3 -A3 | head -60
# Confirm: is :body-copy actually used for anything related to parameter parsing?
echo -e "\n=== Check where body-copy is used in query execution ==="
rg -n "body-copy\|query-string" search-app/src/cmr/search/services/query_service.clj -C2 | head -50Repository: nasa/Common-Metadata-Repository
Length of output: 2649
🏁 Script executed:
#!/bin/bash
# Check where params are populated - from which source?
echo "=== Check params/wrap-params ==="
rg -n "wrap-params\|defn wrap-params" search-app/src/cmr/search/ --type=clojure -B2 -A10 | head -50
# Check if body-copy is actually consumed anywhere - search exhaustively
echo -e "\n=== Search for all body-copy usages ==="
rg -n "body-copy" search-app/src/cmr/search/ --type=clojure
# Check if params are sourced from body or from URL params
echo -e "\n=== Check if params are parsed from body-copy or URL ==="
rg -n "params.*wrap-params\|wrap-params.*params" search-app/src/cmr/search/ --type=clojure -B3 -A3 | head -40Repository: nasa/Common-Metadata-Repository
Length of output: 912
🏁 Script executed:
#!/bin/bash
# Check the middleware execution order more carefully
echo "=== Middleware order in routes.clj ==="
sed -n '125,145p' search-app/src/cmr/search/routes.clj
# Verify: does shapefile-upload populate params, and when does copy-of-body-handler run?
echo -e "\n=== Check what shapefile-upload middleware does ==="
cat search-app/src/cmr/search/middleware/shapefile.clj
# Check if there are any other file upload params besides shapefile
echo -e "\n=== Check if there are other multipart/file params ==="
rg -n "multipart\|:tempfile\|file.*upload" search-app/src/cmr/search/services/parameters/ --type=clojure | grep -v shapefile | head -20Repository: nasa/Common-Metadata-Repository
Length of output: 3304
🏁 Script executed:
#!/bin/bash
# Let me check what actually happens to multipart-params after wrap-multipart-params
echo "=== Check Ring's wrap-multipart-params documentation behavior ==="
rg -n "multipart-params\|:multipart-params" search-app/ --type=clojure | head -30
# Check if shapefile comes via params or via explicit multipart handling
echo -e "\n=== Trace shapefile parameter flow ==="
rg -n "shapefile" search-app/src/cmr/search/api/concepts_search.clj --type=clojure -B3 -A3
# Most importantly: verify if body-copy is ever used for anything that matters
echo -e "\n=== Check all functions that receive body parameter ==="
rg -n "defn.*body\]" search-app/src/cmr/search/api/concepts_search.clj --type=clojure -A5 | head -60Repository: nasa/Common-Metadata-Repository
Length of output: 237
🏁 Script executed:
#!/bin/bash
# Let's trace how shapefile actually gets to the handler
echo "=== Check if shapefile is a query param or multipart field ==="
rg -n "shapefile" search-app/src/cmr/search/api/concepts_search.clj --type=clojure -C5
# Check what params/wrap-params does - does it consume multipart-params?
echo -e "\n=== Check wrap-params behavior ==="
rg -n "defn wrap-params\|require.*wrap-params" search-app/src/cmr/search/ --type=clojure -B2 -A2
# Check if shapefile comes from query string or body
echo -e "\n=== Check Ring wrap-params documentation usage ==="
rg -n "require.*params\|ring.middleware.params" search-app/src/cmr/search/routes.clj -B2 -A2
# Look at complete routes file to understand param flow
echo -e "\n=== Check complete search routes configuration ==="
head -80 search-app/src/cmr/search/routes.cljRepository: nasa/Common-Metadata-Repository
Length of output: 4080
🏁 Script executed:
#!/bin/bash
# Verify how wrap-multipart-params populates :multipart-params and how params/wrap-params handles it
echo "=== Check if Ring's wrap-params merges multipart-params into params ==="
rg -n "wrap-params\|multipart" search-app/src/cmr/search/routes.clj
# Confirm: does the shapefile File object make it to the converter?
echo -e "\n=== Check shapefile converter input (shapefile-info) ==="
rg -B5 -A15 "esri-shapefile->condition-vec\|geojson->conditions-vec" search-app/src/cmr/search/services/parameters/converters/shapefile.clj | head -80
# Final confirmation: what is shapefile-info and where does it come from?
echo -e "\n=== Check parameter->condition flow for shapefile ==="
rg -B10 "parameter->condition :shapefile" search-app/src/cmr/search/services/parameters/converters/shapefile.cljRepository: nasa/Common-Metadata-Repository
Length of output: 640
🏁 Script executed:
#!/bin/bash
# Let's trace exactly how shapefile gets from multipart to the converter
echo "=== Check if shapefile File object is in params after wrap-multipart-params ==="
# Find where parameter->condition is called with shapefile
rg -n "parameter->condition" search-app/src/cmr/search/services/ --type=clojure -B3 -A3 | head -40
# Check how Ring's wrap-multipart-params works - does it merge into :params?
echo -e "\n=== Verify Ring params middleware behavior ==="
rg -n "require.*ring.middleware.params\|require.*params" search-app/src/cmr/search/routes.clj
# Final check: confirm wrap-multipart-params puts files into :multipart-params
echo -e "\n=== Confirm multipart-params location after wrap-multipart-params ==="
cat search-app/src/cmr/search/middleware/shapefile.clj | grep -A5 "wrap-multipart-params"Repository: nasa/Common-Metadata-Repository
Length of output: 5529
🏁 Script executed:
#!/bin/bash
# Check what happens to other multipart params - are there any that are not files?
echo "=== Check if there are other multipart param types besides files ==="
rg -n "multipart\|:tempfile\|File\|multipart-params" search-app/src/cmr/search/services/parameters/ --type=clojure | grep -v shapefile
# Check Ring's wrap-params behavior - does it merge multipart-params into params?
echo -e "\n=== Verify that wrap-params merges multipart-params into params ==="
# Ring's wrap-params is standard, but let's confirm by checking how the params are used
rg -n ":params\|multipart-params" search-app/src/cmr/search/api/concepts_search.clj | head -20
# Check if body-copy string representation breaks anything
echo -e "\n=== Check if query-string (from body-copy) is used in parameter parsing ==="
rg -n "query-string" search-app/src/cmr/search/services/parameters/ --type=clojure -B2 -A2Repository: nasa/Common-Metadata-Repository
Length of output: 289
🏁 Script executed:
#!/bin/bash
# Let's check where params destructuring happens in the handler
echo "=== Check handler params destructuring ==="
rg -n "{params :params" search-app/src/cmr/search/api/concepts_search.clj -B2 -A10 | head -50
# Check what Ring's wrap-params does - this should merge URL params and multipart-params
echo -e "\n=== Check which params middleware is used ==="
grep -n "wrap-params\|wrap-keyword-params\|wrap-nested-params" search-app/src/cmr/search/routes.clj
# Let me check Ring's standard behavior by looking at CMR's usage of wrap-params
echo -e "\n=== Understand params flow: which order and which merge ==="
sed -n '130,145p' search-app/src/cmr/search/routes.cljRepository: nasa/Common-Metadata-Repository
Length of output: 1145
The conversion from multipart form data to a query string format is fragile and creates unnecessary data loss, though it doesn't currently break file uploads.
The original concern is partially valid. The (str v) conversion in parse-multipart-body does create a lossy string representation (e.g., File objects become "java.io.File@abc123" references). However, the code functions for shapefiles because:
- Ring's
wrap-multipart-paramsmiddleware preserves actual File objects in:multipart-params - Ring's subsequent
wrap-paramsmiddleware merges:multipart-paramsinto:params - The shapefile converter accesses files via
(:shapefile params), retrieving the actual File object, not the string from:body-copy
The real issue is that parse-multipart-body creates a string representation that discards complex data types, and :body-copy receives this degraded information. While this doesn't affect file uploads (which work independently via :params), the design is fragile and could fail if:
- Other multipart parameters containing complex data are added
- The middleware ordering changes
- Code attempts to use
:body-copyfor parameter extraction instead of:params
The string representation in :body-copy is only appropriate for JSON/AQL query bodies, not for multipart parameter preservation.
🤖 Prompt for AI Agents
search-app/src/cmr/search/routes.clj around lines 59-62: parse-multipart-body
currently joins all multipart values with str which loses complex types (Files,
streams, maps) and yields a fragile :body-copy; change it to detect multipart
requests and preserve non-scalar values instead of stringifying them —
specifically, when :multipart-params exists, build the body-copy only from
scalar/text params (or omit multipart params entirely) and skip/ignore File and
InputStream instances (or serialize only their metadata if needed);
alternatively, if you must provide a copy for logging, convert only
primitive/string values to a safe representation (or JSON encode simple params)
and leave complex objects out so files remain accessible via :params and
:body-copy does not drop type information.
Overview
What is the objective?
Original reported bug noted that collection facets search was returning an inaccurate field; further investigation determined the root cause was that this middleware function for attaching "body-copy" when processing form encoded data was not working correctly, and actually just returning an empty string. This middleware has been fixed in this PR to correctly parse the body copy in the case of form encoded requests while still maintaining the original slurp functionality otherwise.
What are the changes?
This middleware has been fixed in this PR to correctly parse the body copy in the case of form encoded requests while still maintaining the original slurp functionality otherwise.
What areas of the application does this impact?
search-app
Required Checklist
Additional Checklist
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.