Skip to content

ONSdigital/sic-classification-library

Repository files navigation

SIC Classification Library

Standard Industrial Classification (SIC) Library, initially developed for Survey Assist API but can be used elsewhere.

Overview

SIC classification library, utilities used to classify industry code based off of Job Title, Job Description and Organisation Description.

Features

  • SIC Lookup. A utility that uses a well-known set of SIC mappings of organisation descriptions to SIC classification codes.
  • SIC Classification. A RAG approach to classification of SIC using input data, semantic search and LLM.

Prerequisites

Ensure you have the following installed on your local machine:

  • Python 3.12 (Recommended: use pyenv to manage versions)
  • poetry (for dependency management)
  • Colima (if running locally with containers)
  • Terraform (for infrastructure management)
  • Google Cloud SDK (gcloud) with appropriate permissions

Local Development Setup

The Makefile defines a set of commonly used commands and workflows. Where possible use the files defined in the Makefile.

Clone the repository

git clone https://github.com/ONSdigital/sic-classification-library.git
cd sic-classification-library

Install Dependencies

poetry install

Add Git Hooks

Git hooks can be used to check code before commit. To install run:

pre-commit install

Run Locally

There is example source for using the SIC Lookup functionality (and by proxy, the SIC Meta code) in sic_lookup_example.py to run:

poetry run python src/industrial_classification/lookup/sic_lookup_example.py

GCP Setup

Placeholder

Code Quality

Code quality and static analysis will be enforced using isort, black, ruff, mypy and pylint. Security checking will be enhanced by running bandit.

To check the code quality, but only report any errors without auto-fix run:

make check-python-nofix

To check the code quality and automatically fix errors where possible run:

make check-python

Documentation

Documentation is available in the docs folder and can be viewed using mkdocs

make run-docs

Testing

Pytest is used for testing alongside pytest-cov for coverage testing. /tests/conftest.py defines config used by the tests.

Unit testing for utility functions is added to the /tests/tests_utils.py

make unit-tests

All tests can be run using

make all-tests

Environment Variables

Placeholder

About

Library of classification functionality associated with UK SIC (Standard Industrial Classification)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors