Skip to content

SA-753/new release with complete socmeta#53

Merged
dstewartons merged 8 commits into
mainfrom
SA-753/new-release-with-complete-socmeta
Jun 23, 2026
Merged

SA-753/new release with complete socmeta#53
dstewartons merged 8 commits into
mainfrom
SA-753/new-release-with-complete-socmeta

Conversation

@dstewartons

@dstewartons dstewartons commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

📌 Pull Request Template

Please complete all sections

✨ Summary

Release v0.1.6 of soc-classification-library with the full in-code SOCmeta map from SOC 2020 Volume 1 and removal of silent parent-group fallback in SocMeta.get_meta_by_code().

Previously, v0.1.5 shipped a stub metadata file (~13 entries). Missing unit codes silently fell back to major-group text, so lookup and classification appeared to work but returned broad parent descriptions instead of unit-level SOC 2020 detail.

Following this merge, soc-classification-library as a dependency will be updated in survey-assist-api and soc-classification-utils

📜 Changes Introduced

  • Expanded src/occupational_classification/meta/soc_meta.py to the full SOC 2020 Volume 1 in-code map (~550 entries).

  • Removed silent parent fallback from SocMeta.get_meta_by_code() — exact match only, returns {"error": ...} on miss.

  • get_meta_by_code_exact() now delegates to get_meta_by_code() (same behaviour).

  • SOCLookup uses exact metadata lookup for code_meta and all hierarchy levels.

  • soc_hierarchy uses exact metadata lookup; missing entries get empty fields instead of parent substitution.

  • Version bump to 0.1.6 in pyproject.toml and CHANGELOG.md.

  • Added tests/test_soc_meta.py and extended lookup tests for unit-level metadata and missing-code behaviour.

  • Feature implementation (feat:) / bug fix (fix:) / refactoring (chore:) / documentation (docs:) / testing (test:)

  • Updates to tests and/or documentation

  • Terraform changes (if applicable) - N/A

✅ Checklist

  • Code is formatted using Black
  • Imports are sorted using isort
  • Code passes linting with Ruff, Pylint, and Mypy
  • Security checks pass using Bandit
  • API and Unit tests are written and pass using pytest
  • Terraform files (if applicable) follow best practices and have been validated
  • DocStrings follow Google-style and are added as per Pylint recommendations
  • Documentation has been updated if needed

🔍 How to Test

1) Library unit tests

From soc-classification-library repo root:

poetry install
poetry run pytest -q

Observed on this branch: 61 passed.

Focused SA-753 checks:

poetry run pytest -q tests/test_soc_meta.py tests/test_lookup.py::test_soc_lookup_default_path_uses_example_csv tests/test_lookup.py::test_lookup_returns_null_code_meta_when_metadata_missing

2) Library spot-check (no parent fallback)

poetry run python -c "
from occupational_classification.meta.soc_meta import SocMeta
from occupational_classification.lookup.soc_lookup import SOCLookup
m = SocMeta()
assert m.get_meta_by_code('1111')['code'] == '1111'
assert 'error' in m.get_meta_by_code('9999')
r = SOCLookup().lookup('chief executives and senior officials')
assert r['code_meta']['code'] == '1111'
print('OK')
"

Expected:

  • 1111 returns unit-level metadata (code is 1111, not major group 1).
  • 9999 returns an error dict (no parent fallback).
  • Lookup code_meta.code matches matched unit code 1111.

3) Local API verification (optional, with path deps)

Point survey-assist-api and soc-classification-utils at this library locally:

soc-classification-library = { path = "../soc-classification-library", develop = true }

Then:

# survey-assist-api
poetry lock && poetry install
make run-api   # port 8080

# soc + sic vector stores (separate terminals)
# soc-classification-vector-store: make run-vector-store  # port 8089
# sic-classification-vector-store: make run-vector-store  # port 8088

Live lookup check:

curl -sS "http://localhost:8080/v1/survey-assist/soc-lookup?description=chief%20executives%20and%20senior%20officials&similarity=false" | python3 -m json.tool

Observed on this branch (local API with path dep to this library):

  • code: "1111"
  • code_meta.code: "1111" (not "1")
  • code_meta.group_title: "Chief executives and senior officials"
  • Layered fields present: code_minor_group_meta, code_sub_major_group_meta, code_major_group_meta

Expected outcomes

  • Unit-level code_meta reflects the matched SOC code, not a parent major group.
  • Missing metadata returns error/null — no silent parent substitution.
  • Downstream repos can pin v0.1.6 after tag is published.

@dstewartons dstewartons force-pushed the SA-753/new-release-with-complete-socmeta branch from 6c85e5b to 62871f8 Compare June 22, 2026 15:11
@dstewartons dstewartons requested a review from ivyONS June 23, 2026 08:30
node.tasks = tasks_list
meta = soc_meta.get_meta_by_code_exact(code)
if "error" in meta:
node.qualifications = ""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also retrieve the description into the node metadata?
(we may choose not to do it as part of this PR, because at some point we will want to load all this meta from xls instead anyway)

@dstewartons dstewartons Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes group_description is already populated on hierarchy nodes from SocMeta

@dstewartons dstewartons merged commit 62871f8 into main Jun 23, 2026
5 checks passed
@dstewartons dstewartons deleted the SA-753/new-release-with-complete-socmeta branch June 23, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants