Skip to content

SA-653/migrate sic data access to library#70

Merged
dstewartons merged 14 commits into
mainfrom
SA-653/migrate-sic-data-access-to-library
Jun 15, 2026
Merged

SA-653/migrate sic data access to library#70
dstewartons merged 14 commits into
mainfrom
SA-653/migrate-sic-data-access-to-library

Conversation

@dstewartons

@dstewartons dstewartons commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

📌 Pull Request Template

Please complete all sections

✨ Summary

Consumes SIC workbook data access from sic-classification-library (SA-653) and removes the duplicate sic_data_access module from utils. LLM, prompt, and sic_specific_embed import industrial_classification.data_access.sic_data_access instead, behaviour and config tuples are unchanged).

CI note: CI purposefully fails here as until the dependency tags exist in the remote remerged repos, sopyproject.toml temporarily uses local editable paths:

📜 Changes Introduced

  • Feature implementation (feat:) / bug fix (fix:) / refactoring (chore:) / documentation (docs:) / testing (test:)

  • Updates to tests and/or documentation

  • Terraform changes (if applicable) — N/A

  • Removed src/industrial_classification_utils/utils/sic_data_access.py and tests/test_sic_data_access.py (coverage lives in library tests/test_data_access.py).

  • llm/llm.py: load_sic_hierarchy from library for lazy hierarchy build in _prompt_candidate.

  • llm/prompt.py: load_sic_index from library (import-time index load unchanged).

  • embed/sic_specific_embed.py: load_sic_hierarchy from library for vector-store build helper.

  • tests/test_embedding.py: patch load_sic_hierarchy instead of separate index/structure/hierarchy mocks.

  • docs/utils.md, README.md: point to library sic_data_access module.

  • pyproject.toml / poetry.lock: pin sic-classification-library v0.1.5; utils version 0.1.14.

Unchanged

  • Packaged .xlsx paths under industrial_classification_utils.data.sic_index.
  • get_default_config() lookup tuples and flat-file EmbeddingHandler behaviour.

✅ Checklist

Please confirm you've completed these checks before requesting a review.

  • Code is formatted using Black
  • Imports are sorted using isort
  • Code passes linting with Ruff, Pylint, and Mypy
  • Security checks pass using Bandit
  • API and Unit tests are written and pass using pytest
  • Terraform files (if applicable) follow best practices and have been validated (terraform fmt & terraform validate)
  • DocStrings follow Google-style and are added as per Pylint recommendations
  • Documentation has been updated if needed

🔍 How to Test

Unit tests (this repo only)

cd sic-classification-utils
poetry install
poetry run pytest tests/test_embedding.py -k "sic_index_files" -q

Verified: 2 passed

Import-time prompt load (loads full SIC index at module import):

poetry run python -c "from industrial_classification_utils.llm import prompt; print(len(prompt.sic_index))"

Expected: large row count (e.g. ~15k), no import error.

Broader regression (optional):

poetry run pytest tests/test_embedding.py -q
make check-python   # if used in this repo

@dstewartons dstewartons requested review from gibbardsteve and ivyONS and removed request for ivyONS June 5, 2026 11:40
@dstewartons dstewartons force-pushed the SA-653/migrate-sic-data-access-to-library branch from 15f4065 to 817e8fd Compare June 5, 2026 12:02

@gibbardsteve gibbardsteve left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments against sic-classification-library. Happy to approve the PRs

- use load_sic_hierarchy instead of utils sic_data_access and load_hierarchy
- use load_sic_hierarchy in _prompt_candidate instead of utils sic_data_access and load_hierarchy
…for SA-653

- patch load_sic_hierarchy once instead of separate index, structure and hierarchy mocks
Data access now lives in sic-classification-library; drop the old module and its tests.
@dstewartons dstewartons force-pushed the SA-653/migrate-sic-data-access-to-library branch from 817e8fd to 4e5686e Compare June 15, 2026 11:59
@dstewartons dstewartons merged commit 7f8303f into main Jun 15, 2026
5 checks passed
@dstewartons dstewartons deleted the SA-653/migrate-sic-data-access-to-library branch June 15, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants