fix(api): validate input before sanitization for security by arunsanna · Pull Request #19 · GenAI-Security-Project/aibom-generator

arunsanna · 2026-01-15T03:59:53Z

Summary

Fixes Bug: Input validation occurs after sanitization #14 - Input validation should occur before sanitization for security
Swaps the order of validation and sanitization in the /generate endpoint

Problem

The code was sanitizing input (with html.escape()) BEFORE validating it. This is a security anti-pattern because:

Sanitization can transform malicious input into something that bypasses validation
Example: <script>org/model</script> → <script>org/model</script> could slip through

Solution

Validate the raw user input first, then sanitize after validation:

# BEFORE (wrong order):
sanitized_model_id = html.escape(model_id)
if not is_valid_hf_input(sanitized_model_id):  # Validating sanitized input

# AFTER (correct order):
if not is_valid_hf_input(model_id):            # Validate raw input first
    sanitized_for_display = html.escape(model_id)
    return error_response(...)
sanitized_model_id = html.escape(model_id)     # Then sanitize

Test Plan

Valid model IDs still generate AIBOM correctly
Invalid inputs (e.g., <script>alert(1)</script>) are rejected
Server logs confirm validation catches invalid input before sanitization
Docker build passes
API syntax verified via import test

Fixes GenAI-Security-Project#14 - Input validation order bug The validation was happening AFTER sanitization, which is a security issue because sanitization could transform malicious input into something that passes validation. This commit swaps the order to: 1. Validate the raw user input first (catches malicious patterns) 2. Sanitize after validation (for safe display/processing) Test results: - Valid model IDs: Successfully generates AIBOM - Invalid inputs (e.g., <script>alert(1)</script>): Correctly rejected - Server logs confirm validation catches invalid input before sanitize

arunsanna · 2026-01-15T04:00:03Z

Test Results

Test 1: Valid Model ID

$ curl -X POST http://localhost:7860/generate -d "model_id=meta-llama/Llama-3.1-8B"
# Result: AIBOM generated successfully
# Completeness score: 85/100

Test 2: Invalid Model ID (XSS attempt)

$ curl -X POST http://localhost:7860/generate -d "model_id=<script>alert(1)</script>"
# Result: Error page returned (input rejected)
# Server log: "Invalid model input format received: <script>alert(1)</script>"

Test 3: Server Logs Confirm Validation Works

2026-01-15 03:58:53,306 - src.aibom_generator.api - WARNING - Invalid model input format received: <script>alert(1)</script>

Test 4: API Import Test

$ docker run --rm --entrypoint python aibom-test -c "from src.aibom_generator.api import app; print('api.py imports successfully')"
# Result: api.py imports successfully

All tests confirm the validation now correctly runs BEFORE sanitization, catching malicious input patterns early.

Copilot

Pull request overview

This PR addresses a security anti-pattern by reordering input validation and sanitization in the /generate endpoint. The fix ensures that raw user input is validated first, then sanitized only for display purposes in error responses, preventing potential bypass scenarios where sanitized input might slip through validation.

Changes:

Moved validation of model_id to occur before sanitization with html.escape()
Introduced a separate sanitized_for_display variable for safe error message rendering
Updated comment documentation to clarify the security-driven ordering

Comments suppressed due to low confidence (4)

HF_files/aibom-generator/src/aibom-generator/api.py:867

This endpoint (/api/generate) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the /generate endpoint, validation should occur before sanitization here as well. This prevents potential edge cases where sanitization could transform malicious input into something that bypasses validation.

        sanitized_model_id = html.escape(request.model_id)
        if not is_valid_hf_input(sanitized_model_id):

HF_files/aibom-generator/src/aibom-generator/api.py:941

This endpoint (/api/generate-with-report) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the /generate endpoint, validation should occur before sanitization here as well.

        sanitized_model_id = html.escape(request.model_id)
        if not is_valid_hf_input(sanitized_model_id):

HF_files/aibom-generator/src/aibom-generator/api.py:1053

This endpoint (/api/models/{model_id:path}/score) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the /generate endpoint, validation should occur before sanitization here as well.

        sanitized_model_id = html.escape(model_id)
        if not is_valid_hf_input(sanitized_model_id):

HF_files/aibom-generator/src/aibom-generator/api.py:1137

This endpoint (/api/batch) still follows the old pattern of sanitizing before validation. For consistency with the security fix applied to the /generate endpoint, validation should occur before sanitization here as well.

            sanitized_id = html.escape(model_id)
            if is_valid_hf_input(sanitized_id):

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

HF_files/aibom-generator/src/aibom-generator/api.py

Address Copilot review feedback - apply the same security pattern to 4 additional endpoints that were missed in the original fix: - /api/generate - /api/generate-with-report - /api/models/{model_id}/score - /api/batch All endpoints now validate raw input FIRST, then sanitize only after validation passes. This ensures consistent security posture across the entire API surface.

arunsanna · 2026-01-15T04:24:51Z

Copilot Review Feedback Addressed ✅

Fixed all 4 additional endpoints identified by Copilot:

Endpoint	Status
`/api/generate`	✅ Fixed
`/api/generate-with-report`	✅ Fixed
`/api/models/{model_id}/score`	✅ Fixed
`/api/batch`	✅ Fixed

All endpoints now follow the same security pattern:

Validate raw input FIRST
Sanitize only AFTER validation passes
Use sanitized value only for display/logging

Address Copilot review: _normalise_model_id() should receive raw validated input (not HTML-escaped) since it needs to parse URLs with special characters like / and :. - Normalize raw validated model_id first - Sanitize only when passing to HTML templates for display - Consistent pattern across all template responses

arunsanna · 2026-01-15T04:31:09Z

Additional Copilot Review Feedback Addressed ✅

Fixed normalization order issue:

File	Line	Fix
api.py	601	`_normalise_model_id()` now receives raw validated input instead of HTML-escaped

Change: Normalization needs to parse URLs with special chars (/, :), so it should operate on raw validated input. HTML sanitization is now done only when passing to templates for display.

Pattern applied consistently to all template responses in the /generate endpoint.

arunsanna · 2026-02-02T19:09:14Z

⚠️ Testing Found Bug

Test Space: https://megamind1-aibom-pr19-validation-order.hf.space

Test Results

Test	Result
Invalid input rejection (`<script>alert(1)</script>`)	✅ Correctly rejected
Valid input processing (`openai/whisper-tiny`)	❌ Error

Bug Details

Error: name 'sanitized_model_id' is not defined

The validation order change is correct (validates before sanitizing), but there's a variable reference issue - sanitized_model_id is used somewhere in the code path before it's assigned.

Suggested Fix

Check that sanitized_model_id = html.escape(model_id) is called before any code that references it in the success path.

Needs fix before merge.

The /generate form endpoint was missing the sanitized_model_id assignment after validation passes. This caused a NameError when the variable was referenced later in the code.

arunsanna · 2026-02-02T19:15:45Z

✅ Bug Fixed and Re-Tested

Commit: 877a650 - fix: add missing sanitized_model_id assignment in form endpoint

Issue

The /generate form endpoint was missing sanitized_model_id = html.escape(model_id) after validation passes, causing a NameError.

Fix

Added the missing assignment at line 601:

# Sanitize for safe display/logging after validation passes
sanitized_model_id = html.escape(model_id)

Re-Test Results

Test	Result
Valid input (`openai/whisper-tiny`)	✅ AIBOM generated successfully
Invalid input (`<script>alert(1)</script>`)	✅ Correctly rejected
Security validation order	✅ Validates BEFORE sanitization

Test Space: https://megamind1-aibom-pr19-validation-order.hf.space

Ready for merge. ✓

Reapply of PR GenAI-Security-Project#19 fix for v0.2 architecture. Security improvement: Validate model_id BEFORE html.escape() sanitization to prevent potential bypass attacks where malicious input could be transformed into something that passes validation. Example: <script>org/model</script> → <script>org/model</script> could slip through if sanitization occurs first.

arunsanna · 2026-02-03T04:24:11Z

Status Update: Reapplied to v0.2

This security fix (validate before sanitize) has been reapplied to the v0.2 branch in PR #36.

Testing completed:

✅ Local pytest (5/5 tests pass)
✅ Local server tested with malicious input
✅ HF Space aibom-generator-test deployed and tested
✅ HF Space aibom-pr19-validation-order deployed and tested

The fix correctly rejects XSS attempts like <script>alert('xss')</script>/model with "Invalid model ID format."

This PR can be closed in favor of PR #36 which targets v0.2.

Copilot AI review requested due to automatic review settings January 15, 2026 03:59

Copilot started reviewing on behalf of arunsanna January 15, 2026 04:00 View session

Copilot AI reviewed Jan 15, 2026

View reviewed changes

HF_files/aibom-generator/src/aibom-generator/api.py Outdated Show resolved Hide resolved

fix: add missing sanitized_model_id assignment in form endpoint

877a650

The /generate form endpoint was missing the sanitized_model_id assignment after validation passes. This caused a NameError when the variable was referenced later in the code.

arunsanna mentioned this pull request Feb 3, 2026

fix: reapply security fix and docs for v0.2 #36

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): validate input before sanitization for security#19

fix(api): validate input before sanitization for security#19
arunsanna wants to merge 4 commits intoGenAI-Security-Project:mainfrom
arunsanna:fix/validation-order-issue-14

arunsanna commented Jan 15, 2026

Uh oh!

arunsanna commented Jan 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

arunsanna commented Jan 15, 2026

Uh oh!

arunsanna commented Jan 15, 2026

Uh oh!

arunsanna commented Feb 2, 2026

Uh oh!

arunsanna commented Feb 2, 2026

Uh oh!

arunsanna commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

arunsanna commented Jan 15, 2026

Summary

Problem

Solution

Test Plan

Uh oh!

arunsanna commented Jan 15, 2026

Test Results

Test 1: Valid Model ID

Test 2: Invalid Model ID (XSS attempt)

Test 3: Server Logs Confirm Validation Works

Test 4: API Import Test

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

arunsanna commented Jan 15, 2026

Copilot Review Feedback Addressed ✅

Uh oh!

arunsanna commented Jan 15, 2026

Additional Copilot Review Feedback Addressed ✅

Uh oh!

arunsanna commented Feb 2, 2026

⚠️ Testing Found Bug

Test Results

Bug Details

Suggested Fix

Uh oh!

arunsanna commented Feb 2, 2026

✅ Bug Fixed and Re-Tested

Issue

Fix

Re-Test Results

Uh oh!

arunsanna commented Feb 3, 2026

Status Update: Reapplied to v0.2

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant