SA-653/migrate sic data access to library#70
Merged
Conversation
11 tasks
15f4065 to
817e8fd
Compare
gibbardsteve
left a comment
Contributor
There was a problem hiding this comment.
See comments against sic-classification-library. Happy to approve the PRs
- use load_sic_hierarchy instead of utils sic_data_access and load_hierarchy
- use load_sic_hierarchy in _prompt_candidate instead of utils sic_data_access and load_hierarchy
…for SA-653 - patch load_sic_hierarchy once instead of separate index, structure and hierarchy mocks
Data access now lives in sic-classification-library; drop the old module and its tests.
817e8fd to
4e5686e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📌 Pull Request Template
✨ Summary
Consumes SIC workbook data access from
sic-classification-library(SA-653) and removes the duplicatesic_data_accessmodule from utils. LLM, prompt, andsic_specific_embedimportindustrial_classification.data_access.sic_data_accessinstead, behaviour and config tuples are unchanged).📜 Changes Introduced
Feature implementation (feat:) / bug fix (fix:) / refactoring (chore:) / documentation (docs:) / testing (test:)
Updates to tests and/or documentation
Terraform changes (if applicable) — N/A
Removed
src/industrial_classification_utils/utils/sic_data_access.pyandtests/test_sic_data_access.py(coverage lives in librarytests/test_data_access.py).llm/llm.py:load_sic_hierarchyfrom library for lazy hierarchy build in_prompt_candidate.llm/prompt.py:load_sic_indexfrom library (import-time index load unchanged).embed/sic_specific_embed.py:load_sic_hierarchyfrom library for vector-store build helper.tests/test_embedding.py: patchload_sic_hierarchyinstead of separate index/structure/hierarchy mocks.docs/utils.md,README.md: point to librarysic_data_accessmodule.pyproject.toml/poetry.lock: pinsic-classification-libraryv0.1.5; utils version 0.1.14.Unchanged
.xlsxpaths underindustrial_classification_utils.data.sic_index.get_default_config()lookup tuples and flat-fileEmbeddingHandlerbehaviour.✅ Checklist
terraform fmt&terraform validate)🔍 How to Test
Unit tests (this repo only)
Verified:
2 passedImport-time prompt load (loads full SIC index at module import):
poetry run python -c "from industrial_classification_utils.llm import prompt; print(len(prompt.sic_index))"Expected: large row count (e.g. ~15k), no import error.
Broader regression (optional):
poetry run pytest tests/test_embedding.py -q make check-python # if used in this repo