Skip to content

Commit d9be250

Browse files
authored
chore: tidy up validation helpers and improve test coverage (#18)
- Consolidate shared validation patterns - Improve primordial coverage for consistency - Add bounds to internal caches and string processing - Fix VERS containment for compound range expressions - Update tests to match improved validation behavior - Freeze cached instances for immutability guarantees
1 parent cfe0d6e commit d9be250

14 files changed

+394
-114
lines changed

CHANGELOG.md

Lines changed: 14 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,38 +8,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
88

99
### Added
1010

11-
- **VERS parser**: First JavaScript implementation of the VERS (VErsion Range Specifier) companion spec to PURL. Supports parsing, serialization, and containment checking for semver-based schemes (npm, cargo, golang, gem, hex, pub, cran, swift)
12-
- **URL-to-PURL conversion**: `UrlConverter.fromUrl()` converts registry URLs to PackageURLs across 27 hostnames and 17 purl types (npm, pypi, maven, cargo, nuget, github, gitlab, bitbucket, docker, hex, pub, cocoapods, hackage, conda, cpan, luarocks, huggingface, swift, cran, vscode)
13-
- **`toSpec()` method**: Returns the package identity without the `pkg:type/` prefix (the npm "spec" equivalent)
11+
- **VERS parser**: First JavaScript implementation of the VERS (VErsion Range Specifier) companion spec to PURL
12+
- **URL-to-PURL conversion**: `UrlConverter.fromUrl()` converts registry URLs to PackageURLs
13+
- **`toSpec()` method**: Returns the package identity without the `pkg:type/` prefix
1414
- **`isValid()` static method**: Quick validation without throwing
1515
- **`fromUrl()` static method**: Convenience wrapper for `UrlConverter.fromUrl()`
1616
- **Immutable copy methods**: `withVersion()`, `withNamespace()`, `withQualifier()`, `withQualifiers()`, `withSubpath()` return new instances
17-
- **PurlBuilder factories**: Added 18 new type factories (bitbucket, cocoapods, conan, conda, cran, deb, docker, github, gitlab, hackage, hex, huggingface, luarocks, oci, pub, rpm, swift, vscode-extension)
18-
- **Injection character detection**: `containsInjectionCharacters()` utility for shell metacharacter detection
17+
- **PurlBuilder factories**: Added type factories for common ecosystems
18+
- **Input validation utilities**: Character detection for dangerous input
1919
- **`vers` qualifier**: Added 6th standard qualifier per purl spec
2020
- **`./exists` entry point**: Registry existence checks available via `@socketregistry/packageurl-js/exists`
2121

2222
### Changed
2323

24-
- **Bundle size reduced 95%**: Core bundle is 178 KB (was 3.3 MB). Exists functions moved to separate entry point to avoid bundling HTTP dependencies
25-
- **Primordials module**: All 43 built-in references captured at module load time via `uncurryThis` pattern (mirrors Node.js internals). Zero raw prototype method calls remain
26-
- **Frozen constants**: Module-level Maps, Sets, regex patterns, and arrays are frozen
27-
- **Null prototype objects**: All user-facing object literals use `__proto__: null`
28-
- **Flyweight cache**: `fromString()` caches up to 1024 instances; `toString()` memoized
24+
- **Bundle size reduced 95%**: Exists functions moved to separate entry point to avoid bundling HTTP dependencies
25+
- **Hardened against prototype pollution**: Built-in references captured at module load time
26+
- **Frozen constants**: Module-level data structures are immutable
27+
- **Null prototype objects**: All user-facing object literals use null prototypes
28+
- **Performance**: Instance caching for `fromString()`; `toString()` memoized
2929
- **Version lowercasing**: Added for oci, pypi, and vscode-extension per upstream spec
3030

3131
### Fixed
3232

33-
- **ReDoS prevention**: Consecutive `.*` groups collapsed in wildcard regex
34-
- **Null byte rejection**: All string components reject `\x00` to prevent truncation in C-based consumers
35-
- **VERS resource limits**: 1000 constraint maximum, MAX_SAFE_INTEGER validation
36-
- **vscode-extension validation**: Rejects illegal characters in namespace, name, version, and platform qualifier
37-
38-
### Security
39-
40-
- Prototype pollution resilience via primordials (captured String, Array, RegExp, Object, Reflect methods)
41-
- Global tampering protection verified (replacing `global.URL` after import has no effect)
42-
- Inline regex patterns hoisted to frozen module-scope constants
33+
- **ReDoS prevention**: Fixed potential denial-of-service in pattern matching
34+
- **Input validation**: Reject dangerous characters in string components
35+
- **VERS resource limits**: Constraint and value bounds enforced
36+
- **vscode-extension validation**: Improved input validation
4337

4438
## [1.3.5](https://github.com/SocketDev/socket-packageurl-js/releases/tag/v1.3.5) - 2025-11-02
4539

src/compare.ts

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,13 @@ const WILDCARD_CACHE_MAX = 1024
5252
* Supports * (match any chars), ? (match single char), ** (match anything including empty).
5353
* Designed for version strings and package names, not file paths.
5454
*/
55+
const MAX_PATTERN_LENGTH = 4096
56+
5557
function matchWildcard(pattern: string, value: string): boolean {
58+
// Reject excessively long patterns to prevent regex compilation DoS
59+
if (pattern.length > MAX_PATTERN_LENGTH) {
60+
return false
61+
}
5662
let regex = wildcardRegexCache.get(pattern)
5763
if (regex === undefined) {
5864
// Convert glob pattern to regex

src/normalize.ts

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,14 @@ function normalizeQualifiers(
9595
let qualifiers: Record<string, string> | undefined
9696
// Use for-of to work with entries iterators
9797
for (const { 0: key, 1: value } of qualifiersToEntries(rawQualifiers)) {
98-
const strValue = typeof value === 'string' ? value : String(value)
98+
// Only coerce primitive types — reject objects/functions that could
99+
// execute arbitrary code via toString() during coercion.
100+
const strValue =
101+
typeof value === 'string'
102+
? value
103+
: typeof value === 'number' || typeof value === 'boolean'
104+
? `${value}`
105+
: ''
99106
const trimmed = StringPrototypeTrim(strValue)
100107
// A key=value pair with an empty value is the same as no key/value
101108
// at all for this key

src/package-url.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -508,7 +508,13 @@ class PackageURL {
508508
}
509509
}
510510
const purl = new PackageURL(...PackageURL.parseString(purlStr))
511-
// Cache the result for future lookups
511+
// Eagerly populate the toString cache before freezing
512+
purl.toString()
513+
// Deep freeze the instance and its nested qualifiers object to prevent
514+
// cache poisoning via mutation of shared cached instances.
515+
recursiveFreeze(purl)
516+
// Cache the frozen result for future lookups — freezing prevents
517+
// cache poisoning via property mutation on shared instances.
512518
if (typeof purlStr === 'string') {
513519
if (flyweightCache.size >= FLYWEIGHT_CACHE_MAX) {
514520
// Evict oldest entry (first key in Map iteration order)

src/purl-types/conda.ts

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,8 +116,9 @@ export async function condaExists(
116116

117117
const fetchResult = async (): Promise<ExistsResult> => {
118118
try {
119+
const encodedChannel = encodeComponent(channelName)
119120
const encodedName = encodeComponent(name)
120-
const url = `https://api.anaconda.org/package/${channelName}/${encodedName}`
121+
const url = `https://api.anaconda.org/package/${encodedChannel}/${encodedName}`
121122

122123
const data = await httpJson<{
123124
latest_version?: string

src/purl-types/docker.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,10 @@ export async function dockerExists(
107107

108108
const fetchResult = async (): Promise<ExistsResult> => {
109109
try {
110-
const encodedRepo = encodeComponent(repo)
110+
// Encode each path segment separately to preserve the / delimiter
111+
const encodedRepo = namespace
112+
? `${encodeComponent(namespace)}/${encodeComponent(name)}`
113+
: encodeComponent(name)
111114
const url = `https://hub.docker.com/v2/repositories/${encodedRepo}`
112115

113116
const data = await httpJson<{

src/purl-types/golang.ts

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ import { httpJson } from '@socketsecurity/lib/http-request'
3333
import { PurlError } from '../error.js'
3434
import {
3535
ArrayPrototypeJoin,
36+
encodeComponent,
3637
StringPrototypeCharCodeAt,
3738
StringPrototypeIncludes,
3839
StringPrototypeReplace,
@@ -108,10 +109,12 @@ export async function golangExists(
108109
// Go proxy uses case-encoded paths where uppercase letters are !lowercase
109110
const parts = StringPrototypeSplit(modulePath, '/' as any)
110111
for (let i = 0; i < parts.length; i++) {
111-
parts[i] = StringPrototypeReplace(
112-
parts[i]!,
113-
/[A-Z]/g,
114-
letter => `!${StringPrototypeToLowerCase(letter)}`,
112+
parts[i] = encodeComponent(
113+
StringPrototypeReplace(
114+
parts[i]!,
115+
/[A-Z]/g,
116+
letter => `!${StringPrototypeToLowerCase(letter)}`,
117+
),
115118
)
116119
}
117120
const encodedPath = ArrayPrototypeJoin(parts, '/')
@@ -126,7 +129,7 @@ export async function golangExists(
126129
const latestVersion = data.Version
127130

128131
if (version) {
129-
const versionUrl = `https://proxy.golang.org/${encodedPath}/@v/${version}.info`
132+
const versionUrl = `https://proxy.golang.org/${encodedPath}/@v/${encodeComponent(version)}.info`
130133
try {
131134
await httpJson(versionUrl)
132135
} catch {

src/purl-types/npm.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ import {
2020
StringPrototypeTrim,
2121
} from '../primordials.js'
2222
import { isBlank, lowerName, lowerNamespace } from '../strings.js'
23+
import { validateNoInjectionByType } from '../validate.js'
2324

2425
import type { TtlCache } from '@socketsecurity/lib/cache-with-ttl'
2526

@@ -434,6 +435,13 @@ export function parseNpmSpecifier(specifier: unknown): NpmPackageComponents {
434435
*/
435436
export function validate(purl: PurlObject, throws: boolean): boolean {
436437
const { name, namespace } = purl
438+
// Validate name and namespace for injection characters
439+
if (!validateNoInjectionByType('npm', 'name', name, throws)) {
440+
return false
441+
}
442+
if (!validateNoInjectionByType('npm', 'namespace', namespace, throws)) {
443+
return false
444+
}
437445
const hasNs = namespace && namespace.length > 0
438446
const id = getNpmId(purl)
439447
const code0 = StringPrototypeCharCodeAt(id, 0)

src/strings.ts

Lines changed: 154 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -203,55 +203,187 @@ function replaceUnderscoresWithDashes(str: string): string {
203203
* space (0x20), DEL (0x7f)
204204
*/
205205
function isInjectionCharCode(code: number): boolean {
206-
// C0 control characters (0x00-0x1f) — includes NUL, tab, newline, CR,
207-
// ESC (0x1b, terminal escape sequences), and all other control chars.
208-
// Also catches vertical tab (0x0b), form feed (0x0c), and bell (0x07)
209-
// which can be used for log injection and terminal manipulation.
206+
// C0 control characters (0x00-0x1f)
210207
if (code <= 0x1f) {
211208
return true
212209
}
213210
// biome-ignore format: newlines
214211
if (
215-
// space — argument splitting in shell contexts
212+
// space
216213
code === 0x20 ||
217-
// " — breaks double-quoted shell/SQL/URL contexts
214+
// !
215+
code === 0x21 ||
216+
// "
218217
code === 0x22 ||
219-
// # — URL fragment injection, shell comments
218+
// #
220219
code === 0x23 ||
221-
// $ — shell variable expansion, command substitution $()
220+
// $
222221
code === 0x24 ||
223-
// & — shell background execution, URL parameter delimiter
222+
// %
223+
code === 0x25 ||
224+
// &
224225
code === 0x26 ||
225-
// ' — breaks single-quoted shell/SQL contexts
226+
// '
226227
code === 0x27 ||
227-
// ( — shell subshell, command grouping
228+
// (
228229
code === 0x28 ||
229-
// ) — shell subshell, command grouping
230+
// )
230231
code === 0x29 ||
231-
// ; — shell command separator
232+
// *
233+
code === 0x2a ||
234+
// ;
232235
code === 0x3b ||
233-
// < — shell input redirection, XML/HTML injection
236+
// <
234237
code === 0x3c ||
235-
// > — shell output redirection, XML/HTML injection
238+
// =
239+
code === 0x3d ||
240+
// >
236241
code === 0x3e ||
237-
// \ — shell escape character, path traversal on Windows
242+
// ?
243+
code === 0x3f ||
244+
// [
245+
code === 0x5b ||
246+
// \
238247
code === 0x5c ||
239-
// ` — shell command substitution (legacy backtick form)
248+
// ]
249+
code === 0x5d ||
250+
// `
240251
code === 0x60 ||
241-
// { — shell brace expansion
252+
// {
242253
code === 0x7b ||
243-
// | — shell pipe
254+
// |
244255
code === 0x7c ||
245-
// } — shell brace expansion
256+
// }
246257
code === 0x7d ||
247-
// DEL (0x7f) — control character, terminal manipulation
258+
// ~
259+
code === 0x7e ||
260+
// DEL
248261
code === 0x7f
249262
) {
250263
return true
251264
}
265+
// C1 control characters (0x80-0x9f)
266+
if (code >= 0x80 && code <= 0x9f) {
267+
return true
268+
}
269+
// Unicode dangerous characters
270+
// biome-ignore format: newlines
271+
if (
272+
// Zero-width space
273+
code === 0x200b ||
274+
// Zero-width non-joiner
275+
code === 0x200c ||
276+
// Zero-width joiner
277+
code === 0x200d ||
278+
// Left-to-right mark
279+
code === 0x200e ||
280+
// Right-to-left mark
281+
code === 0x200f ||
282+
// Left-to-right embedding
283+
code === 0x202a ||
284+
// Right-to-left embedding
285+
code === 0x202b ||
286+
// Pop directional formatting
287+
code === 0x202c ||
288+
// Left-to-right override
289+
code === 0x202d ||
290+
// Right-to-left override
291+
code === 0x202e ||
292+
// Word joiner
293+
code === 0x2060 ||
294+
// BOM / zero-width no-break space
295+
code === 0xfeff ||
296+
// Object replacement character
297+
code === 0xfffc ||
298+
// Replacement character
299+
code === 0xfffd
300+
) {
301+
return true
302+
}
303+
return false
304+
}
305+
306+
/**
307+
* Test whether a character code enables command execution.
308+
*
309+
* A narrower scanner than isInjectionCharCode, targeting characters that
310+
* enable shell command execution and code injection. Allows characters
311+
* that are legitimate in version strings and URL-based qualifier values
312+
* (like !, +, ?, &, =, %, :, /, #, space) while still blocking the
313+
* most dangerous execution vectors.
314+
*
315+
* Used for version, subpath, and qualifier value validation where the
316+
* full injection scanner would cause false positives.
317+
*/
318+
function isCommandInjectionCharCode(code: number): boolean {
319+
// C0 control characters except tab (0x09) — tab is used in some
320+
// version metadata but other controls are never legitimate
321+
if (code <= 0x1f && code !== 0x09) {
322+
return true
323+
}
324+
// biome-ignore format: newlines
325+
if (
326+
// $ — command substitution $()
327+
code === 0x24 ||
328+
// ; — command separator
329+
code === 0x3b ||
330+
// < — input redirection
331+
code === 0x3c ||
332+
// > — output redirection
333+
code === 0x3e ||
334+
// \ — escape character
335+
code === 0x5c ||
336+
// ` — command substitution (backtick form)
337+
code === 0x60 ||
338+
// | — pipe
339+
code === 0x7c ||
340+
// DEL
341+
code === 0x7f
342+
) {
343+
return true
344+
}
345+
// C1 control characters
346+
if (code >= 0x80 && code <= 0x9f) {
347+
return true
348+
}
349+
// Unicode dangerous characters (same set as isInjectionCharCode)
350+
// biome-ignore format: newlines
351+
if (
352+
code === 0x200b ||
353+
code === 0x200c ||
354+
code === 0x200d ||
355+
code === 0x200e ||
356+
code === 0x200f ||
357+
code === 0x202a ||
358+
code === 0x202b ||
359+
code === 0x202c ||
360+
code === 0x202d ||
361+
code === 0x202e ||
362+
code === 0x2060 ||
363+
code === 0xfeff ||
364+
code === 0xfffc ||
365+
code === 0xfffd
366+
) {
367+
return true
368+
}
252369
return false
253370
}
254371

372+
/**
373+
* Find the first command injection character in a string.
374+
* Like findInjectionCharCode but uses the narrower command injection set.
375+
* Returns the character code found, or -1.
376+
*/
377+
function findCommandInjectionCharCode(str: string): number {
378+
for (let i = 0, { length } = str; i < length; i += 1) {
379+
const code = StringPrototypeCharCodeAt(str, i)
380+
if (isCommandInjectionCharCode(code)) {
381+
return code
382+
}
383+
}
384+
return -1
385+
}
386+
255387
/**
256388
* Find the first injection character in a string.
257389
* Returns the character code of the first dangerous character found, or -1.
@@ -310,6 +442,7 @@ function trimLeadingSlashes(str: string): string {
310442

311443
export {
312444
containsInjectionCharacters,
445+
findCommandInjectionCharCode,
313446
findInjectionCharCode,
314447
formatInjectionChar,
315448
isBlank,

0 commit comments

Comments
 (0)