Standard Industrial Classification (SIC) Library, initially developed for Survey Assist API but can be used elsewhere.
SIC classification library, utilities used to classify industry code based off of Job Title, Job Description and Organisation Description.
- SIC Lookup. A utility that uses a well-known set of SIC mappings of organisation descriptions to SIC classification codes.
- SIC Classification. A RAG approach to classification of SIC using input data, semantic search and LLM.
Ensure you have the following installed on your local machine:
- Python 3.12 (Recommended: use
pyenvto manage versions) -
poetry(for dependency management) - Colima (if running locally with containers)
- Terraform (for infrastructure management)
- Google Cloud SDK (
gcloud) with appropriate permissions
The Makefile defines a set of commonly used commands and workflows. Where possible use the files defined in the Makefile.
git clone https://github.com/ONSdigital/sic-classification-library.git
cd sic-classification-librarypoetry installGit hooks can be used to check code before commit. To install run:
pre-commit installThere is example source for using the SIC Lookup functionality (and by proxy, the SIC Meta code) in sic_lookup_example.py to run:
poetry run python src/industrial_classification/lookup/sic_lookup_example.pyPlaceholder
Code quality and static analysis will be enforced using isort, black, ruff, mypy and pylint. Security checking will be enhanced by running bandit.
To check the code quality, but only report any errors without auto-fix run:
make check-python-nofixTo check the code quality and automatically fix errors where possible run:
make check-pythonDocumentation is available in the docs folder and can be viewed using mkdocs
make run-docsPytest is used for testing alongside pytest-cov for coverage testing. /tests/conftest.py defines config used by the tests.
Unit testing for utility functions is added to the /tests/tests_utils.py
make unit-testsAll tests can be run using
make all-testsPlaceholder