fix(api): validate input before sanitization for security#19
fix(api): validate input before sanitization for security#19arunsanna wants to merge 4 commits intoGenAI-Security-Project:mainfrom
Conversation
Fixes GenAI-Security-Project#14 - Input validation order bug The validation was happening AFTER sanitization, which is a security issue because sanitization could transform malicious input into something that passes validation. This commit swaps the order to: 1. Validate the raw user input first (catches malicious patterns) 2. Sanitize after validation (for safe display/processing) Test results: - Valid model IDs: Successfully generates AIBOM - Invalid inputs (e.g., <script>alert(1)</script>): Correctly rejected - Server logs confirm validation catches invalid input before sanitize
Test ResultsTest 1: Valid Model IDTest 2: Invalid Model ID (XSS attempt)Test 3: Server Logs Confirm Validation WorksTest 4: API Import TestAll tests confirm the validation now correctly runs BEFORE sanitization, catching malicious input patterns early. |
There was a problem hiding this comment.
Pull request overview
This PR addresses a security anti-pattern by reordering input validation and sanitization in the /generate endpoint. The fix ensures that raw user input is validated first, then sanitized only for display purposes in error responses, preventing potential bypass scenarios where sanitized input might slip through validation.
Changes:
- Moved validation of
model_idto occur before sanitization withhtml.escape() - Introduced a separate
sanitized_for_displayvariable for safe error message rendering - Updated comment documentation to clarify the security-driven ordering
Comments suppressed due to low confidence (4)
HF_files/aibom-generator/src/aibom-generator/api.py:867
- This endpoint (
/api/generate) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the/generateendpoint, validation should occur before sanitization here as well. This prevents potential edge cases where sanitization could transform malicious input into something that bypasses validation.
sanitized_model_id = html.escape(request.model_id)
if not is_valid_hf_input(sanitized_model_id):
HF_files/aibom-generator/src/aibom-generator/api.py:941
- This endpoint (
/api/generate-with-report) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the/generateendpoint, validation should occur before sanitization here as well.
sanitized_model_id = html.escape(request.model_id)
if not is_valid_hf_input(sanitized_model_id):
HF_files/aibom-generator/src/aibom-generator/api.py:1053
- This endpoint (
/api/models/{model_id:path}/score) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the/generateendpoint, validation should occur before sanitization here as well.
sanitized_model_id = html.escape(model_id)
if not is_valid_hf_input(sanitized_model_id):
HF_files/aibom-generator/src/aibom-generator/api.py:1137
- This endpoint (
/api/batch) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the/generateendpoint, validation should occur before sanitization here as well.
sanitized_id = html.escape(model_id)
if is_valid_hf_input(sanitized_id):
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review feedback - apply the same security pattern
to 4 additional endpoints that were missed in the original fix:
- /api/generate
- /api/generate-with-report
- /api/models/{model_id}/score
- /api/batch
All endpoints now validate raw input FIRST, then sanitize only
after validation passes. This ensures consistent security posture
across the entire API surface.
Copilot Review Feedback Addressed ✅Fixed all 4 additional endpoints identified by Copilot:
All endpoints now follow the same security pattern:
|
Address Copilot review: _normalise_model_id() should receive raw validated input (not HTML-escaped) since it needs to parse URLs with special characters like / and :. - Normalize raw validated model_id first - Sanitize only when passing to HTML templates for display - Consistent pattern across all template responses
Additional Copilot Review Feedback Addressed ✅Fixed normalization order issue:
Change: Normalization needs to parse URLs with special chars ( Pattern applied consistently to all template responses in the |
|
| Test | Result |
|---|---|
Invalid input rejection (<script>alert(1)</script>) |
✅ Correctly rejected |
Valid input processing (openai/whisper-tiny) |
❌ Error |
Bug Details
Error: name 'sanitized_model_id' is not defined
The validation order change is correct (validates before sanitizing), but there's a variable reference issue - sanitized_model_id is used somewhere in the code path before it's assigned.
Suggested Fix
Check that sanitized_model_id = html.escape(model_id) is called before any code that references it in the success path.
Needs fix before merge.
The /generate form endpoint was missing the sanitized_model_id assignment after validation passes. This caused a NameError when the variable was referenced later in the code.
✅ Bug Fixed and Re-TestedCommit: IssueThe FixAdded the missing assignment at line 601: # Sanitize for safe display/logging after validation passes
sanitized_model_id = html.escape(model_id)Re-Test Results
Test Space: https://megamind1-aibom-pr19-validation-order.hf.space Ready for merge. ✓ |
Reapply of PR GenAI-Security-Project#19 fix for v0.2 architecture. Security improvement: Validate model_id BEFORE html.escape() sanitization to prevent potential bypass attacks where malicious input could be transformed into something that passes validation. Example: <script>org/model</script> → <script>org/model</script> could slip through if sanitization occurs first.
Status Update: Reapplied to v0.2This security fix (validate before sanitize) has been reapplied to the v0.2 branch in PR #36. Testing completed:
The fix correctly rejects XSS attempts like This PR can be closed in favor of PR #36 which targets v0.2. |
Summary
/generateendpointProblem
The code was sanitizing input (with
html.escape()) BEFORE validating it. This is a security anti-pattern because:<script>org/model</script>→<script>org/model</script>could slip throughSolution
Validate the raw user input first, then sanitize after validation:
Test Plan
<script>alert(1)</script>) are rejected