Skip to content

Commit e213a40

Browse files
committed
Fix #4609: Handle file-type license references in NuGet packages
Detect <license type='file'> in .nuspec files and extract file path to license_file_references field. Keep extracted_license_statement as raw path value to integrate with existing license resolution in process_codebase function. This follows the two-phase architecture pattern: - Phase 1: Extract and store file path (this change) - Phase 2: Existing process_codebase resolves file references Minimal changes (37 lines) following maintainer feedback from PR #4689. Fixes #4609 Signed-off-by: Jayant <jayantmcom@gmail.com>
1 parent 022ddc8 commit e213a40

20 files changed

Lines changed: 4829 additions & 4456 deletions

AUTHORS.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,4 @@ The following organizations or individuals have contributed to ScanCode:
105105
- Yash Sharma @yasharmaster
106106
- Yunus Rahbar @yns88
107107
- Stefano Zacchiroli @zacchiro
108+
- Jayant <jayantmcom@gmail.com>

CHANGELOG.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,13 @@ Changelog
44
Next release
55
--------------
66

7+
78
v3.5.0 - 2026-01-15
89
-------------------
910

11+
- Fix #4609: Handle NuGet license file references properly. Added license_file_references
12+
field to PackageData model to store file paths from <license type="file"> elements.
13+
1014
- Improve package scan performance by:
1115

1216
- Skipping binary package detection steps by default,

final_fix_msg.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Fix: Add ignorable copyrights and holders to BSD license rule
2+
3+
The validation test checks that all detected copyrights, holders,
4+
and URLs in the license text are declared in the rule metadata.
5+
6+
Added:
7+
- ignorable_copyrights
8+
- ignorable_holders
9+
10+
This matches the ignorable clues detected in the license text.

models_start.txt

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
#
2+
# Copyright (c) nexB Inc. and others. All rights reserved.
3+
# ScanCode is a trademark of nexB Inc.
4+
# SPDX-License-Identifier: Apache-2.0
5+
# See http://www.apache.org/licenses/LICENSE-2.0 for the license text.
6+
# See https://github.com/nexB/scancode-toolkit for support or download.
7+
# See https://aboutcode.org for more information about nexB OSS projects.
8+
#
9+
10+
import logging
11+
import os
12+
import uuid
13+
import sys
14+
15+
from fnmatch import fnmatchcase
16+
17+
import attr
18+
import saneyaml
19+
20+
from commoncode import filetype
21+
from commoncode.fileutils import as_posixpath
22+
from commoncode.datautils import choices
23+
from commoncode.datautils import Boolean
24+
from commoncode.datautils import Date
25+
from commoncode.datautils import Integer
26+
from commoncode.datautils import List
27+
from commoncode.datautils import Mapping
28+
from commoncode.datautils import String
29+
from commoncode.resource import Resource
30+
from license_expression import combine_expressions
31+
from license_expression import Licensing
32+
from packageurl import normalize_qualifiers
33+
from packageurl import PackageURL
34+
35+
try:
36+
from typecode import contenttype
37+
except ImportError:
38+
contenttype = None
39+
40+
try:
41+
from packagedcode import licensing
42+
except ImportError:
43+
licensing = None
44+
45+
# FIXME: what if licensing is not importable?
46+
from packagedcode.licensing import get_declared_license_expression_spdx
47+
48+
"""
49+
This module contain data models for package and dependencies, abstracting and
50+
normalizing the small differences that exist across different package types
51+
(aka. ecosystems), manifest file formats and tools.
52+
53+
A package is a unit of code that is provisioned and installable. More commonly a
54+
package is stored in an archive and found in a package repository, though it can
55+
be as simple as a single file such as a script or may be stored in a VCS
56+
repository such as git.
57+
58+
A package contains:
59+
60+
- package information and metadata in some "manifest" file,
61+
- a payload such as code, documentation, or data.
62+
63+
64+
Structured package information come in three primary kinds:
65+
66+
- "metadata" such as a name, version or description,
67+
68+
- "dependencies" on other packages either potential with version requirements or
69+
resolved and locked with concrete versions), and
70+
71+
- "build" and packaging scripts and instructions.
72+
73+
Package types combine these in one or more manifest or script that we
74+
collectively call datafiles. For instance a Maven POM XML file contains combined
75+
metadata, dependencies and build instructions in an XML file while a pip
76+
requirements.txt file contains only dependencies.
77+
78+
These package "data" files come in many different shapes:
79+
80+
- Manifest files proper such as a Maven POM, NPM package.json and several others.
81+
- Dependency lockfiles such as pip requirements.txt or Go go.sum.
82+
- Build scripts such as Makefile.
83+
- Various structured or semi-structured metadata files in JSON, YAML or plain text
84+
- Property files that supplement manifests such as a pom.properties
85+
- Structured data headers or sections in binaries such as in an ELF, LKM or
86+
Windows PE; or the header of an RPM archive.
87+
- Code tags or conventional variables such JavaDoc tags or Python __copyright__
88+
magic variables and variable in Yocto/Bitbake.
89+
- In JSON datafiles (or similar) fetched from registry or package repository APIs.
90+
91+
We handle package information at two levels:
92+
93+
- First, we parse manifests or lockfiles in a common package data model.
94+
95+
- Second, we assemble lists of top-level Package and Dependency by aggregating
96+
the data from one or more parsed package datafiles.
97+
98+
The key models defined here are:
99+
100+
- PackageData: a class holding package data as parsed from a package datafile

nuget_class.txt

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
class NugetNuspecHandler(models.DatafileHandler):
2+
datasource_id = 'nuget_nupsec'
3+
path_patterns = ('*.nuspec',)
4+
default_package_type = 'nuget'
5+
description = 'NuGet nuspec package manifest'
6+
documentation_url = 'https://docs.microsoft.com/en-us/nuget/reference/nuspec'
7+
8+
@classmethod
9+
def parse(cls, location, package_only=False):
10+
with open(location, 'rb') as loc:
11+
parsed = xmltodict.parse(loc)
12+
13+
if not parsed:
14+
return
15+
16+
pack = parsed.get('package') or {}
17+
nuspec = pack.get('metadata')
18+
if not nuspec:
19+
return
20+
21+
name = nuspec.get('id')
22+
version = nuspec.get('version')
23+
24+
# Summary: A short description of the package for UI display. If omitted, a
25+
# truncated version of description is used.
26+
description = build_description(nuspec.get('summary'), nuspec.get('description'))
27+
28+
# title: A human-friendly title of the package, typically used in UI
29+
# displays as on nuget.org and the Package Manager in Visual Studio. If not
30+
# specified, the package ID is used.
31+
title = nuspec.get('title')
32+
if title and title != name:
33+
description = build_description(title, description)
34+
35+
parties = []
36+
authors = nuspec.get('authors')
37+
if authors:
38+
parties.append(models.Party(name=authors, role='author'))
39+
40+
owners = nuspec.get('owners')
41+
if owners:
42+
parties.append(models.Party(name=owners, role='owner'))
43+
44+
vcs_url = None
45+
46+
repo = nuspec.get('repository') or {}
47+
vcs_repository = repo.get('@url') or ''
48+
if vcs_repository:
49+
vcs_tool = repo.get('@type') or ''
50+
if vcs_tool:
51+
vcs_url = f'{vcs_tool}+{vcs_repository}'
52+
else:
53+
vcs_url = vcs_repository
54+
55+
urls = get_urls(name, version)
56+
57+
extracted_license_statement = None
58+
# See https://docs.microsoft.com/en-us/nuget/reference/nuspec#license
59+
# This is a SPDX license expression
60+
if 'license' in nuspec:
61+
extracted_license_statement = nuspec.get('license')
62+
# Deprecated and not a license expression, just a URL
63+
elif 'licenseUrl' in nuspec:
64+
extracted_license_statement = nuspec.get('licenseUrl')
65+
66+
package_data = dict(
67+
datasource_id=cls.datasource_id,
68+
type=cls.default_package_type,
69+
primary_language=cls.default_primary_language,
70+
name=name,
71+
version=version,
72+
description=description or None,
73+
homepage_url=nuspec.get('projectUrl') or None,
74+
parties=parties,
75+
dependencies=list(get_dependencies(nuspec)),
76+
extracted_license_statement=extracted_license_statement,
77+
copyright=nuspec.get('copyright') or None,
78+
vcs_url=vcs_url,
79+
**urls,
80+
)
81+
yield models.PackageData.from_data(package_data, package_only)

nuget_parse.txt

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
@classmethod
2+
def parse(cls, location, package_only=False):
3+
with open(location, 'rb') as loc:
4+
parsed = xmltodict.parse(loc)
5+
6+
if not parsed:
7+
return
8+
9+
pack = parsed.get('package') or {}
10+
nuspec = pack.get('metadata')
11+
if not nuspec:
12+
return
13+
14+
name = nuspec.get('id')
15+
version = nuspec.get('version')
16+
17+
# Summary: A short description of the package for UI display. If omitted, a
18+
# truncated version of description is used.
19+
description = build_description(nuspec.get('summary'), nuspec.get('description'))
20+
21+
# title: A human-friendly title of the package, typically used in UI
22+
# displays as on nuget.org and the Package Manager in Visual Studio. If not
23+
# specified, the package ID is used.
24+
title = nuspec.get('title')
25+
if title and title != name:
26+
description = build_description(title, description)

src/packagedcode/models.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -672,6 +672,13 @@ class PackageData(IdentifiablePackageData):
672672
'package manifest and extracted. This can be a string, a list or dict of '
673673
'strings possibly nested, as found originally in the manifest.')
674674

675+
license_file_references = attr.ib(
676+
default=attr.Factory(list),
677+
metadata=dict(
678+
help='List of file paths to license files referenced in a package manifest.'
679+
)
680+
)
681+
675682
notice_text = String(
676683
label='notice text',
677684
help='A notice text for this package.')

0 commit comments

Comments
 (0)