feat: package registry providers + 0.4.0 release#48
Conversation
Adds 22 method pairs for seeding test databases of package registries (PyPI, npm, Maven, Cargo, RubyGems). Cross-ecosystem primitives share one API; ecosystem-specific shapes have their own methods where the canonical form genuinely differs. New providers: - commit_sha / short_commit_sha - semver / semver_prerelease / calver - spdx_license (50 common IDs) - git_username (strict GitHub rules) - pypi_version (PEP 440), maven_version (with qualifiers) - pypi/npm/cargo/maven/gem version constraints - pypi/npm/cargo/gem package names, maven group/artifact/coordinate - pypi_requirement (full pip-install line) Bumps version to 0.4.0, updates README, ARCHITECTURE.md, CHANGELOG.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Current word lists can't produce a username long enough to enter the truncation branch, so this fix is defensive rather than bug-chasing — but the invariant was wrong: if a future data entry were a run of hyphens, the pop loop would empty the string and we'd return "". Add a length guard so at least one character always remains. Found in review of PR branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new “package registry data” provider to generate realistic fake registry identifiers (PyPI, npm, Maven, Cargo, RubyGems) across both the Rust and Python APIs, and rolls the project version forward to 0.4.0 with updated documentation/changelog entries.
Changes:
- Add a new Rust provider (
packages) plus PyO3-exposedFakermethods and Python module-level convenience functions for package-registry primitives, versions, constraints, and identifiers. - Add comprehensive Rust + Python test coverage for the new generators, including determinism and parallel-shape checks.
- Bump version to
0.4.0and update README/ARCHITECTURE/CHANGELOG to document the new provider and backfill prior shipped features.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
src/providers/packages.rs |
Implements package-registry generators (SHAs, versions, constraints, package IDs, requirements) with unit + proptest coverage. |
src/providers/mod.rs |
Registers the new packages provider module. |
src/lib.rs |
Exposes new provider methods on Faker (Rust API) and adds corresponding PyO3 Python-visible methods. |
src/data/en_us/packages.rs |
Adds wordlists for package keywords/modifiers, Maven components, npm scopes, prerelease tags, qualifiers. |
src/data/en_us/spdx_licenses.rs |
Adds curated list of common SPDX identifiers for generation. |
src/data/en_us/mod.rs |
Wires new packages and spdx_licenses datasets into the locale data exports. |
python/forgery/__init__.py |
Adds module-level convenience functions and exports; bumps __version__ to 0.4.0. |
python/forgery/__init__.pyi |
Adds type stubs for the new module-level convenience functions. |
tests/test_packages.py |
Adds Python test suite for new package-registry APIs (shape checks, determinism, parallel invariants, convenience functions). |
README.md |
Documents the new “Package Registry Data” generator section and example usage. |
ARCHITECTURE.md |
Adds provider/data-file documentation entries for packages + SPDX data. |
CHANGELOG.md |
Adds 0.4.0 section and updates comparison links/backfills. |
Cargo.toml |
Bumps crate version to 0.4.0. |
Cargo.lock |
Updates lockfile to reflect crate version bump. |
pyproject.toml |
Bumps Python package version to 0.4.0. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Response to Copilot review on PR #48. The docstring and README claimed "PEP 503 normalised" but the generator was emitting underscores ~20% of the time. PEP 503 §Normalized Names collapses runs of `[-_.]+` to a single `-`, so normalized output must contain only `[a-z0-9-]`. Changes: - Drop the underscore branch from generate_pypi_package_name; hyphen is the sole separator. Collapse the two now-identical `py-{primary}` match arms. - Tighten the Rust test to reject any char outside [a-z0-9-] and to enforce no leading/trailing/double hyphens. - Tighten the Python regex to `^[a-z0-9]([a-z0-9-]*[a-z0-9])?$` and add explicit assertions against `_` and `.`. - Revert the README / CHANGELOG softening from the previous iteration — the "PEP 503 normalised" claim is now accurate. - Update the Rust + Python docstrings to explain the normalization. Also addresses three other Copilot findings: - Add the 44 missing Faker-class stubs to python/forgery/_forgery.pyi so IDE autocomplete and type checking work for callers using the Faker class directly. - Fix a broken assertion in test_sometimes_has_qualifier: the check `"." in v.split(".")[-1]` was tautologically false (the last split segment never contains a dot). Replace with `v.count(".") > 2`, which correctly identifies dot-separated Maven qualifiers like `.Final` / `.RELEASE`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine methods that select from combinatorial patterns now accept unique=True to guarantee no duplicates, matching the names(n, unique=True) contract used elsewhere in the library. Useful for seeding registry tables that have a unique-name constraint. Methods with unique support: - pypi_package_names, npm_package_names, cargo_package_names, gem_names - maven_group_ids, maven_artifact_ids, maven_coordinates - git_usernames - spdx_licenses (capped at 50 — the pool size) Also: - PACKAGE_KEYWORDS: 94 -> 245 entries - PACKAGE_MODIFIERS: 32 -> 67 entries Combinatorial headroom is now ~77k distinct pypi names, ~1.9M distinct git usernames, millions for maven coordinates — plenty of room before UniqueExhaustedError hits for realistic batch sizes. Implementation: - New batch_simple_unique! macro mirrors batch_locale_unique! for generators that don't take locale; wraps them in a closure that ignores the locale argument generate_unique passes. - Return type shifts from Result<_, BatchSizeError> to Result<_, ForgeryError> for the affected methods, since ForgeryError covers both batch-size and unique-exhaustion failures. - Python signatures gain `unique: bool = False` (keyword-compatible, so existing calls keep working). - Both .pyi stub files updated. - New TestUnique pytest class covers all 9 methods: no-duplicates, determinism under seed, exhaustion error, non-unique path unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The v0.4.0 release workflow failed at the Publish step because twine
rejected sbom.cdx.json as an InvalidDistribution. The first download
step used `merge-multiple: true` with no filter, which pulled every
workflow artifact — including the SBOM — into dist/. pypa/gh-action-pypi-publish
v1.13.0 then ran twine against dist/* and choked on the JSON file:
Checking dist/sbom.cdx.json: ERROR InvalidDistribution:
Unknown distribution format: 'sbom.cdx.json'
Fix: replace the single "download all" step with three targeted steps.
Wheels (pattern wheels-*) and sdist (name: sdist) land in dist/; the
SBOM (name: sbom, no path) lands in the workflow root where the later
`gh release upload` step already expects it. twine now only sees valid
distribution files.
Noticed on PR #48's v0.4.0 release run; nothing ever made it to PyPI
and the GitHub release assets never populated. Re-tagging v0.4.0 after
this merges will go through cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…49) The v0.4.0 release workflow failed at the Publish step because twine rejected sbom.cdx.json as an InvalidDistribution. The first download step used `merge-multiple: true` with no filter, which pulled every workflow artifact — including the SBOM — into dist/. pypa/gh-action-pypi-publish v1.13.0 then ran twine against dist/* and choked on the JSON file: Checking dist/sbom.cdx.json: ERROR InvalidDistribution: Unknown distribution format: 'sbom.cdx.json' Fix: replace the single "download all" step with three targeted steps. Wheels (pattern wheels-*) and sdist (name: sdist) land in dist/; the SBOM (name: sbom, no path) lands in the workflow root where the later `gh release upload` step already expects it. twine now only sees valid distribution files. Noticed on PR #48's v0.4.0 release run; nothing ever made it to PyPI and the GitHub release assets never populated. Re-tagging v0.4.0 after this merges will go through cleanly. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>



Summary
0.3.0→0.4.0acrossCargo.toml,pyproject.toml,__version__.0.4.0changelog section — they shipped onmainbut weren't in the changelog.What's in the new provider
Cross-ecosystem primitives:
commit_sha/short_commit_sha,semver/semver_prerelease,calver,spdx_license(50 common IDs),git_username(enforces GitHub's rules — alphanumerics + single hyphens, no leading/trailing hyphen, no consecutive hyphens, ≤ 39 chars).Ecosystem-specific versions:
pypi_version(PEP 440 — includes pre/post/dev releases),maven_version(with qualifiers-SNAPSHOT,.RELEASE,.Final, etc.).Version constraints (syntax differs per ecosystem):
pypi_version_specifier,npm_version_range,cargo_version_req,maven_version_range,gem_version_requirement.Package identity:
pypi_package_name(PEP 503-normalised),npm_package_name(plain or@scope/pkg),cargo_package_name,gem_name,maven_group_id/maven_artifact_id/maven_coordinate(GAV).Full lines:
pypi_requirement— e.g.requests>=2.0.0,<3.0.0.All batch methods are parallel-safe via
set_parallel().Docs
README.md— new "Package Registry Data" section under## Available Generatorswith a usage snippetARCHITECTURE.md— provider table + en_us data directoryCHANGELOG.md—[0.4.0]section dated2026-04-17with packages + backfilled featuresTest plan
cargo test --lib— 832 tests pass (30 new; unit + proptest)pytest— 1446 tests pass, 100% Python coveragecargo clippy --all-targets -- -D warningscleancargo fmt --checkcleanruff check+ruff format --checkcleanmypy --strictcleanbandit -r python/ -llcleanpypi_versionoutput againstpackaging.version.Versionandpypi_version_specifieragainstpackaging.specifiers.SpecifierSet🤖 Generated with Claude Code