Skip to content

fix(generator): correct PURL encoding for model IDs#18

Open
arunsanna wants to merge 1 commit intoGenAI-Security-Project:mainfrom
arunsanna:fix/purl-encoding-issue-13
Open

fix(generator): correct PURL encoding for model IDs#18
arunsanna wants to merge 1 commit intoGenAI-Security-Project:mainfrom
arunsanna:fix/purl-encoding-issue-13

Conversation

@arunsanna
Copy link

Summary

  • Fix no-op .replace('/', '/') calls that were intended to URL-encode forward slashes in model IDs
  • Replace with .replace('/', '%2F') to properly encode per the PURL specification

Problem

The code had 6 instances of .replace('/', '/') which does nothing (replaces / with /). This resulted in invalid PURLs like:

pkg:huggingface/meta-llama/Llama-3.1-8B@1.0

Solution

Changed to .replace('/', '%2F') to produce valid PURLs:

pkg:huggingface/meta-llama%2FLlama-3.1-8B@1.0

Test plan

  • Built Docker image with fix
  • Tested with meta-llama/Llama-3.1-8B model
  • Verified bom-ref and purl fields contain %2F encoding
  • Confirmed no remaining instances of the bug pattern

Fixes #13

Replace no-op `.replace('/', '/')` with `.replace('/', '%2F')` to
properly URL-encode forward slashes in model IDs per the PURL spec.

This fix ensures PURLs like `pkg:huggingface/meta-llama/Llama-3.1-8B`
are correctly encoded as `pkg:huggingface/meta-llama%2FLlama-3.1-8B`.

Fixes GenAI-Security-Project#13
Copilot AI review requested due to automatic review settings January 15, 2026 03:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where .replace('/', '/') no-op calls prevented proper URL encoding of forward slashes in model IDs, resulting in invalid Package URLs (PURLs) that don't comply with the PURL specification.

Changes:

  • Corrected 6 instances of .replace('/', '/') to .replace('/', '%2F') to properly URL-encode forward slashes in model IDs
  • Ensures PURL identifiers like pkg:huggingface/meta-llama/Llama-3.1-8B@1.0 are correctly encoded as pkg:huggingface/meta-llama%2FLlama-3.1-8B@1.0
  • Makes PURL generation consistent with existing correctly-encoded instances already in the codebase

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@arunsanna
Copy link
Author

Test Results ✅

Tested with 3 different HuggingFace models to verify the fix:

Test 1: meta-llama/Llama-3.1-8B

AIBOM generated successfully
Completeness score: 85/100

PURL fields from JSON:
  bom-ref: pkg:huggingface/meta-llama%2FLlama-3.1-8B@1.0
  purl: pkg:huggingface/meta-llama%2FLlama-3.1-8B@1.0
  dependencies.ref: pkg:generic/meta-llama%2FLlama-3.1-8B@1.0
  dependencies.dependsOn: pkg:huggingface/meta-llama%2FLlama-3.1-8B@1.0

Test 2: openai/whisper-large-v3

AIBOM generated successfully
Completeness score: 85/100

PURL fields from JSON:
  bom-ref: pkg:huggingface/openai%2Fwhisper-large-v3@1.0
  purl: pkg:huggingface/openai%2Fwhisper-large-v3@1.0
  dependencies.ref: pkg:generic/openai%2Fwhisper-large-v3@1.0
  dependencies.dependsOn: pkg:huggingface/openai%2Fwhisper-large-v3@1.0

Test 3: google/gemma-2-9b

AIBOM generated successfully
Completeness score: 85/100

PURL fields from JSON:
  bom-ref: pkg:huggingface/google%2Fgemma-2-9b@1.0
  purl: pkg:huggingface/google%2Fgemma-2-9b@1.0
  dependencies.ref: pkg:generic/google%2Fgemma-2-9b@1.0
  dependencies.dependsOn: pkg:huggingface/google%2Fgemma-2-9b@1.0

Summary

  • ✅ All 4 PURL-related fields now contain proper %2F encoding
  • ✅ Tested across LLM, Audio, and multi-modal model types
  • ✅ No regressions in completeness scoring

arunsanna added a commit to arunsanna/aibom-generator that referenced this pull request Jan 15, 2026
Fixes GenAI-Security-Project#15 - Add unit test infrastructure for the AIBOM Generator

Added:
- tests/ directory with pytest configuration
- conftest.py with mock HuggingFace API fixtures
- test_generator.py with 15 tests for AIBOMGenerator
- test_scoring.py with 7 tests for completeness scoring
- Sample fixtures for testing (sample_model_card.json, expected_aibom.json)
- pytest.ini configuration
- Test dependencies in requirements.txt (pytest, pytest-mock, pytest-cov)

Test coverage:
- AIBOM generation structure validation
- CycloneDX compliance checks
- PURL encoding (xfail until PR GenAI-Security-Project#18 merged)
- Model card extraction
- Error handling
- Model ID normalization
- Completeness scoring

All tests run offline using mocked HuggingFace API responses.

Results: 21 passed, 1 xfailed (expected)
arunsanna added a commit to arunsanna/aibom-generator that referenced this pull request Jan 15, 2026
- Remove unused variable `result` in test_generate_aibom_with_output_file
- Simplify xfail reason to just reference PR GenAI-Security-Project#18
- Remove unused `import pytest` from test_scoring.py
- Replace permissive `or` assertions with specific checks
@arunsanna
Copy link
Author

✅ Testing Completed - VERIFIED

Test Space: https://megamind1-aibom-pr18-purl-fix.hf.space

Test Results

Test Result
PURL encoding for meta-llama/Llama-3.1-8B ✅ Pass
Generated %2F encoding ✅ Confirmed
App functionality ✅ Working

Comparison

Version PURL Output
❌ Baseline (unfixed) pkg:huggingface/meta-llama/Llama-3.1-8B@1.0
✅ This PR (fixed) pkg:huggingface/meta-llama%2FLlama-3.1-8B@1.0

Ready for merge.

@arunsanna
Copy link
Author

Status Update: Superseded by v0.2

This PURL encoding fix has been incorporated into the v0.2 branch architecture.

Evidence: src/models/service.py line 288 now uses .replace('/', '%2F') correctly.

This PR can be closed as the fix is already in v0.2. See PR #36 for the consolidated v0.2 reapplication.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: PURL format incorrect - .replace('/', '/') is a no-op

1 participant