Vantage6 algorithm that retrieves descriptive statistics.
The algorithm computes various descriptions in a federated manner whilst upholding basic privacy considerations. The statistics that are computed include:
- Count
- Mean
- Minimum
- Maximum
- Percentiles/Quantiles (set to 25th, 50th, and 75th percentile by default)
- Frequency counts for categorical variables
A more detailed description of the algorithm and how to use it can be found in the algorithm's
Wiki and the /docs directory.
This algorithm is designed to be run with the vantage6 infrastructure for distributed analysis and learning.
The base code for this algorithm has been created via the v6-algorithm-template template generator.
The repository includes comprehensive CI/CD workflows:
At every commit and pull request, the following workflows are executed:
- Comprehensive Test Suite (
test-suite.yml): Runs all test categories - Individual Quality Checks: Separate workflows for Black, Flake8, MyPy, Bandit, Safety
- Docker Integration: Validates Docker builds and container execution
- Vantage6 Integration: Tests integration with Vantage6 infrastructure
This workflow verifies code quality, and security, and the workflow simultaneously performs integration tests for various scenarios which the algorithm can be subjected to.
At releases and tags on the main branch, the following workflows are executed:
- Release Build (
release.yml): Builds and pushes Docker image to GitHub Container Registry - Documentation Deployment (
deploy-docs.yml): Builds and deploys documentation to GitHub Pages
This workflow ensures that new releases are properly built, documented, and made publicly available.
When contributing new functionality:
- Add tests for all new features (unit, integration, empirical as appropriate)
- Ensure all quality checks pass (Black, Flake8, MyPy, Bandit, Safety)
- Update documentation for any new features or changes
- Use descriptive test names that explain what is being tested
- Include both positive and negative test cases
- Test edge cases and error conditions
- Use realistic synthetic data and test-case scenarios
- Mock external dependencies appropriately
- Validate both structure and values of results
If you want to create your own version of this algorithm, you can do so by forking or cloning the repository and follow the following steps:
To finally run your algorithm on the vantage6 infrastructure, you need to create a Docker image of your algorithm.
The easiest way to create a Docker image is to use the GitHub Actions pipeline to
automatically build and push the Docker image.
All that you need to do is create a release or tag on the main branch.
A Docker image can be created by executing the following command in the root of your algorithm directory:
docker build -t [my_docker_image_name] .Here you should provide a sensible value for the Docker image name.
The docker build command will create a Docker image that contains your algorithm.
You can create an additional tag for it by running:
docker tag [my_docker_image_name] [another_image_name]This way, you can e.g. do docker tag local_average_algorithm harbor2.vantage6.ai/algorithms/average
to make the algorithm available on a remote Docker registry (in this case harbor2.vantage6.ai).
Finally, you need to push the image to the Docker registry. This can be done by running:
docker push [my_docker_image_name]Note that you need to be logged in to the Docker registry before you can push
the image.
You can do this by running docker login and providing your
credentials. Check this page
For more details on sharing images on Docker Hub. If you are using a different
Docker registry, check the documentation of that registry and be sure that you
have sufficient permissions.