From a8ad18b471361c86bad00087ff0366ed6f73205e Mon Sep 17 00:00:00 2001 From: LennartPurucker Date: Mon, 16 Jun 2025 10:20:21 +0200 Subject: [PATCH 1/4] maint: switch local install description to use uv --- CONTRIBUTING.md | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index cc8633f84..24e32b0d0 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -181,15 +181,24 @@ following rules before you submit a pull request: However it is also possible to use the [openml-python docker image](https://github.com/openml/openml-python/blob/main/docker/readme.md) for testing and building documentation. This can be useful for one-off contributions or when you are experiencing installation issues. -First install openml with its test dependencies by running - ```bash - $ pip install -e .[test] - ``` +First install Python 3.8 or higher, pip, and git. +Then clone the repository: +```bash +pip install uv # Install uv via pip (or see https://docs.astral.sh/uv/getting-started/installation/) +uv venv --seed --python 3.8 ~/.venvs/openml-python +source ~/.venvs/openml-python/bin/activate +pip install uv # Install uv within the virtual environment +``` + +Then install openml with its test dependencies by running +```bash +uv pip install -e .[test] +``` from the repository folder. Then configure pre-commit through - ```bash - $ pre-commit install - ``` +```bash +pre-commit install +``` This will install dependencies to run unit tests, as well as [pre-commit](https://pre-commit.com/). To run the unit tests, and check their code coverage, run: ```bash @@ -247,7 +256,7 @@ information. For building the documentation, you will need to install a few additional dependencies: ```bash -$ pip install -e .[examples,docs] +$ uv pip install -e .[examples,docs] ``` When dependencies are installed, run ```bash From b306c5ec1f855659fe77d230510f05b7951fe097 Mon Sep 17 00:00:00 2001 From: LennartPurucker Date: Tue, 17 Jun 2025 16:25:00 +0200 Subject: [PATCH 2/4] refactor/maint: big documentation update --- .all-contributorsrc | 36 ------ CONTRIBUTING.md | 245 +++++++++++++++------------------------ ISSUE_TEMPLATE.md | 20 +++- PULL_REQUEST_TEMPLATE.md | 23 ++-- doc/contributing.rst | 5 +- pyproject.toml | 7 +- 6 files changed, 135 insertions(+), 201 deletions(-) delete mode 100644 .all-contributorsrc diff --git a/.all-contributorsrc b/.all-contributorsrc deleted file mode 100644 index 3e16fe084..000000000 --- a/.all-contributorsrc +++ /dev/null @@ -1,36 +0,0 @@ -{ - "files": [ - "README.md" - ], - "imageSize": 100, - "commit": false, - "contributors": [ - { - "login": "a-moadel", - "name": "a-moadel", - "avatar_url": "https://avatars0.githubusercontent.com/u/46557866?v=4", - "profile": "https://github.com/a-moadel", - "contributions": [ - "doc", - "example" - ] - }, - { - "login": "Neeratyoy", - "name": "Neeratyoy Mallik", - "avatar_url": "https://avatars2.githubusercontent.com/u/3191233?v=4", - "profile": "https://github.com/Neeratyoy", - "contributions": [ - "code", - "doc", - "example" - ] - } - ], - "contributorsPerLine": 7, - "projectName": "openml-python", - "projectOwner": "openml", - "repoType": "github", - "repoHost": "https://github.com", - "skipCi": true -} diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 24e32b0d0..6e2f41df2 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,9 +1,9 @@ +# Contributing to `openml-python` This document describes the workflow on how to contribute to the openml-python package. If you are interested in connecting a machine learning package with OpenML (i.e. write an openml-python extension) or want to find other ways to contribute, see [this page](https://openml.github.io/openml-python/main/contributing.html#contributing). -Scope of the package --------------------- +## Scope of the package The scope of the OpenML Python package is to provide a Python interface to the OpenML platform which integrates well with Python's scientific stack, most @@ -15,66 +15,112 @@ in Python, [scikit-learn](http://scikit-learn.org/stable/index.html). Thereby it will automatically be compatible with many machine learning libraries written in Python. -We aim to keep the package as light-weight as possible and we will try to +We aim to keep the package as light-weight as possible, and we will try to keep the number of potential installation dependencies as low as possible. Therefore, the connection to other machine learning libraries such as *pytorch*, *keras* or *tensorflow* should not be done directly inside this package, but in a separate package using the OpenML Python connector. More information on OpenML Python connectors can be found [here](https://openml.github.io/openml-python/main/contributing.html#contributing). -Reporting bugs --------------- -We use GitHub issues to track all bugs and feature requests; feel free to -open an issue if you have found a bug or wish to see a feature implemented. - -It is recommended to check that your issue complies with the -following rules before submitting: - -- Verify that your issue is not being currently addressed by other - [issues](https://github.com/openml/openml-python/issues) - or [pull requests](https://github.com/openml/openml-python/pulls). - -- Please ensure all code snippets and error messages are formatted in - appropriate code blocks. - See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks). - -- Please include your operating system type and version number, as well - as your Python, openml, scikit-learn, numpy, and scipy versions. This information - can be found by running the following code snippet: -```python -import platform; print(platform.platform()) -import sys; print("Python", sys.version) -import numpy; print("NumPy", numpy.__version__) -import scipy; print("SciPy", scipy.__version__) -import sklearn; print("Scikit-Learn", sklearn.__version__) -import openml; print("OpenML", openml.__version__) -``` +## Determine what contribution to make -Determine what contribution to make ------------------------------------ Great! You've decided you want to help out. Now what? -All contributions should be linked to issues on the [Github issue tracker](https://github.com/openml/openml-python/issues). +All contributions should be linked to issues on the [GitHub issue tracker](https://github.com/openml/openml-python/issues). In particular for new contributors, the *good first issue* label should help you find -issues which are suitable for beginners. Resolving these issues allow you to start +issues which are suitable for beginners. Resolving these issues allows you to start contributing to the project without much prior knowledge. Your assistance in this area will be greatly appreciated by the more experienced developers as it helps free up their time to concentrate on other issues. -If you encountered a particular part of the documentation or code that you want to improve, +If you encounter a particular part of the documentation or code that you want to improve, but there is no related open issue yet, open one first. This is important since you can first get feedback or pointers from experienced contributors. To let everyone know you are working on an issue, please leave a comment that states you will work on the issue (or, if you have the permission, *assign* yourself to the issue). This avoids double work! -General git workflow --------------------- +## Contributing Workflow Overview +To contribute to the openml-python package, follow these steps: + +0. Determine how you want to contribute (see above). +1. Set up your local development environment. + 1. Fork and clone the `openml-python` repository. Then, create a new branch from the ``develop`` branch. If you are new to `git`, see our [detailed documentation](#basic-git-workflow), or rely on your favorite IDE. + 2. [Install the local dependencies](#install-local-dependencies) to run the tests your contribution. + 3. [Test your installation](#testing-your-installation) to ensure everything is set up correctly. +4. Implement your contribution. If contributing to the documentation, see [here](. +5. [Create a pull request](#pull-request-checklist). + +### Install Local Dependencies + +We recommend following the instructions below to install all requirements locally. +However, it is also possible to use the [openml-python docker image](https://github.com/openml/openml-python/blob/main/docker/readme.md) for testing and building documentation. Moreover, feel free to use any alternative package managers, such as `pip`. + + +1. To ensure a smooth development experience, we recommend using the `uv` package manager. Thus, first install `uv`. If any Python version already exists on your system, follow the steps below, otherwise see [here](https://docs.astral.sh/uv/getting-started/installation/). + ```bash + pip install uv + ``` +2. Create a virtual environment using `uv` and activate it. This will ensure that the dependencies for `openml-python` do not interfere with other Python projects on your system. + ```bash + uv venv --seed --python 3.8 ~/.venvs/openml-python + source ~/.venvs/openml-python/bin/activate + pip install uv # Install uv within the virtual environment + ``` +3. Then install openml with its test dependencies by running + ```bash + uv pip install -e .[test] + ``` + from the repository folder. + Then configure the pre-commit to be able to run unit tests, as well as [pre-commit](#pre-commit-details) through: + ```bash + pre-commit install + ``` + +### Testing (Your Installation) +To test your installation and run the tests for the first time, run the following from the repository folder: +```bash +pytest tests +``` +For Windows systems, you may need to add `pytest` to PATH before executing the command. + +Executing a specific unit test can be done by specifying the module, test case, and test. +You may then run a specific module, test case, or unit test respectively: +```bash +pytest tests/test_datasets/test_dataset.py +pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest +pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data +``` + +To test your new contribution, add [unit tests](https://github.com/openml/openml-python/tree/develop/tests), and, if needed, [examples](https://github.com/openml/openml-python/tree/develop/examples) for any new functionality being introduced. Some notes on unit tests and examples: +* If a unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, `TestBase._mark_entity_for_removal('data', dataset.dataset_id)`, `TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))`. +* Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`. +* Add the `@pytest.mark.sklearn` marker to your unit tests if they have a dependency on scikit-learn. + +### Pull Request Checklist + +You can go to the `openml-python` GitHub repository to create the pull request by [comparing the branch](https://github.com/openml/openml-python/compare) from your fork with the `develop` branch of the `openml-python` repository. When creating a pull request, make sure to follow the comments and structured provided by the template on GitHub. + +**An incomplete contribution** -- where you expect to do more work before +receiving a full review -- should be submitted as a `draft`. These may be useful +to: indicate you are working on something to avoid duplicated work, +request broad review of functionality or API, or seek collaborators. +Drafts often benefit from the inclusion of a +[task list](https://github.com/blog/1375-task-lists-in-gfm-issues-pulls-comments) +in the PR description. + +--- + +# Appendix + +## Basic `git` Workflow The preferred workflow for contributing to openml-python is to fork the [main repository](https://github.com/openml/openml-python) on GitHub, clone, check out the branch `develop`, and develop on a new branch branch. Steps: +0. Make sure you have git installed, and a GitHub account. + 1. Fork the [project repository](https://github.com/openml/openml-python) by clicking on the 'Fork' button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on @@ -84,20 +130,20 @@ branch. Steps: local disk: ```bash - $ git clone git@github.com:YourLogin/openml-python.git - $ cd openml-python + git clone git@github.com:YourLogin/openml-python.git + cd openml-python ``` 3. Switch to the ``develop`` branch: ```bash - $ git checkout develop + git checkout develop ``` 3. Create a ``feature`` branch to hold your development changes: ```bash - $ git checkout -b feature/my-feature + git checkout -b feature/my-feature ``` Always use a ``feature`` branch. It's good practice to never work on the ``main`` or ``develop`` branch! @@ -106,107 +152,24 @@ local disk: 4. Develop the feature on your feature branch. Add changed files using ``git add`` and then ``git commit`` files: ```bash - $ git add modified_files - $ git commit + git add modified_files + git commit ``` to record your changes in Git, then push the changes to your GitHub account with: ```bash - $ git push -u origin my-feature + git push -u origin my-feature ``` 5. Follow [these instructions](https://help.github.com/articles/creating-a-pull-request-from-a-fork) -to create a pull request from your fork. This will send an email to the committers. +to create a pull request from your fork. (If any of the above seems like magic to you, please look up the [Git documentation](https://git-scm.com/documentation) on the web, or ask a friend or another contributor for help.) -Pull Request Checklist ----------------------- - -We recommended that your contribution complies with the -following rules before you submit a pull request: - -- Follow the - [pep8 style guide](https://www.python.org/dev/peps/pep-0008/). - With the following exceptions or additions: - - The max line length is 100 characters instead of 80. - - When creating a multi-line expression with binary operators, break before the operator. - - Add type hints to all function signatures. - (note: not all functions have type hints yet, this is work in progress.) - - Use the [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format) over [`printf`](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting) style formatting. - E.g. use `"{} {}".format('hello', 'world')` not `"%s %s" % ('hello', 'world')`. - (note: old code may still use `printf`-formatting, this is work in progress.) - -- If your pull request addresses an issue, please use the pull request title - to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is - created. Make sure the title is descriptive enough to understand what the pull request does! - -- An incomplete contribution -- where you expect to do more work before - receiving a full review -- should be submitted as a `draft`. These may be useful - to: indicate you are working on something to avoid duplicated work, - request broad review of functionality or API, or seek collaborators. - Drafts often benefit from the inclusion of a - [task list](https://github.com/blog/1375-task-lists-in-gfm-issues-pulls-comments) - in the PR description. - -- Add [unit tests](https://github.com/openml/openml-python/tree/develop/tests) and [examples](https://github.com/openml/openml-python/tree/develop/examples) for any new functionality being introduced. - - If an unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, `TestBase._mark_entity_for_removal('data', dataset.dataset_id)`, `TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))`. - - Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`. - - Add the `@pytest.mark.sklearn` marker to your unit tests if they have a dependency on scikit-learn. - -- All tests pass when running `pytest`. On - Unix-like systems, check with (from the toplevel source folder): - - ```bash - $ pytest - ``` - - For Windows systems, execute the command from an Anaconda Prompt or add `pytest` to PATH before executing the command. - -- Documentation and high-coverage tests are necessary for enhancements to be - accepted. Bug-fixes or new features should be provided with - [non-regression tests](https://en.wikipedia.org/wiki/Non-regression_testing). - These tests verify the correct behavior of the fix or feature. In this - manner, further modifications on the code base are granted to be consistent - with the desired behavior. - For the Bug-fixes case, at the time of the PR, this tests should fail for - the code base in develop and pass for the PR code. - - - If any source file is being added to the repository, please add the BSD 3-Clause license to it. - - -*Note*: We recommend to follow the instructions below to install all requirements locally. -However it is also possible to use the [openml-python docker image](https://github.com/openml/openml-python/blob/main/docker/readme.md) for testing and building documentation. -This can be useful for one-off contributions or when you are experiencing installation issues. - -First install Python 3.8 or higher, pip, and git. -Then clone the repository: -```bash -pip install uv # Install uv via pip (or see https://docs.astral.sh/uv/getting-started/installation/) -uv venv --seed --python 3.8 ~/.venvs/openml-python -source ~/.venvs/openml-python/bin/activate -pip install uv # Install uv within the virtual environment -``` - -Then install openml with its test dependencies by running -```bash -uv pip install -e .[test] -``` -from the repository folder. -Then configure pre-commit through -```bash -pre-commit install -``` -This will install dependencies to run unit tests, as well as [pre-commit](https://pre-commit.com/). -To run the unit tests, and check their code coverage, run: - ```bash - $ pytest --cov=. path/to/tests_for_package - ``` -Make sure your code has good unittest **coverage** (at least 80%). - -Pre-commit is used for various style checking and code formatting. +## Pre-commit Details +[Pre-commit](https://pre-commit.com/) is used for various style checking and code formatting. Before each commit, it will automatically run: - [ruff](https://docs.astral.sh/ruff/) a code formatter and linter. This will automatically format your code. @@ -225,23 +188,7 @@ $ pre-commit run --all-files ``` Make sure to do this at least once before your first commit to check your setup works. -Executing a specific unit test can be done by specifying the module, test case, and test. -You may then run a specific module, test case, or unit test respectively: -```bash - $ pytest tests/test_datasets/test_dataset.py - $ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest - $ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data -``` - -*NOTE*: In the case the examples build fails during the Continuous Integration test online, please -fix the first failing example. If the first failing example switched the server from live to test -or vice-versa, and the subsequent examples expect the other server, the ensuing examples will fail -to be built as well. - -Happy testing! - -Documentation -------------- +## Contributing to the Documentation We are glad to accept any sort of documentation: function docstrings, reStructuredText documents, tutorials, etc. @@ -256,9 +203,9 @@ information. For building the documentation, you will need to install a few additional dependencies: ```bash -$ uv pip install -e .[examples,docs] +uv pip install -e .[examples,docs] ``` When dependencies are installed, run ```bash -$ sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY +sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY ``` diff --git a/ISSUE_TEMPLATE.md b/ISSUE_TEMPLATE.md index bcd5e0c1e..11290dc66 100644 --- a/ISSUE_TEMPLATE.md +++ b/ISSUE_TEMPLATE.md @@ -1,3 +1,15 @@ + + #### Description @@ -20,7 +32,10 @@ it in the issue: https://gist.github.com #### Versions - \ No newline at end of file + + diff --git a/PULL_REQUEST_TEMPLATE.md b/PULL_REQUEST_TEMPLATE.md index f0bee81e0..068f69872 100644 --- a/PULL_REQUEST_TEMPLATE.md +++ b/PULL_REQUEST_TEMPLATE.md @@ -4,8 +4,8 @@ the contribution guidelines: https://github.com/openml/openml-python/blob/main/C Please make sure that: +* the title of the pull request is descriptive * this pull requests is against the `develop` branch -* you updated all docs, this includes the changelog (doc/progress.rst) * for any new function or class added, please add it to doc/api.rst * the list of classes and functions should be alphabetical * for any new functionality, consider adding a relevant example @@ -14,15 +14,20 @@ Please make sure that: * add the BSD 3-Clause license to any new file created --> -#### Reference Issue - +#### Metadata +* Reference Issue: +* New Tests Added: +* Documentation Updated: +* Change Log Entry: -#### What does this PR implement/fix? Explain your changes. - - -#### How should this PR be tested? - +#### Details + diff --git a/doc/contributing.rst b/doc/contributing.rst index 34d1edb14..affe597de 100644 --- a/doc/contributing.rst +++ b/doc/contributing.rst @@ -16,10 +16,7 @@ In particular, a few ways to contribute to openml-python are: * A contribution to an openml-python extension. An extension package allows OpenML to interface with a machine learning package (such as scikit-learn or keras). These extensions are hosted in separate repositories and may have their own guidelines. - For more information, see the :ref:`extensions` below. - - * Bug reports. If something doesn't work for you or is cumbersome, please open a new issue to let - us know about the problem. See `this section `_. + For more information, see the :ref:`extensions`. * `Cite OpenML `_ if you use it in a scientific publication. diff --git a/pyproject.toml b/pyproject.toml index 83f0793f7..215d0f824 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -23,8 +23,12 @@ dependencies = [ "packaging", ] requires-python = ">=3.8" +maintainers = [ + { name = "Pieter Gijsbers", email="p.gijsbers@tue.nl"}, + { name = "Lennart Purucker"}, +] authors = [ - { name = "Matthias Feurer", email="feurerm@informatik.uni-freiburg.de" }, + { name = "Matthias Feurer"}, { name = "Jan van Rijn" }, { name = "Arlind Kadra" }, { name = "Pieter Gijsbers" }, @@ -52,6 +56,7 @@ classifiers = [ "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3.13", ] license = { file = "LICENSE" } From 39b485e4c4ccb8ddcbeea4b55c9298de32aa0798 Mon Sep 17 00:00:00 2001 From: Lennart Purucker Date: Tue, 17 Jun 2025 16:49:43 +0200 Subject: [PATCH 3/4] Update CONTRIBUTING.md Co-authored-by: Pieter Gijsbers --- CONTRIBUTING.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6e2f41df2..1c91dd296 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -45,7 +45,7 @@ To contribute to the openml-python package, follow these steps: 0. Determine how you want to contribute (see above). 1. Set up your local development environment. 1. Fork and clone the `openml-python` repository. Then, create a new branch from the ``develop`` branch. If you are new to `git`, see our [detailed documentation](#basic-git-workflow), or rely on your favorite IDE. - 2. [Install the local dependencies](#install-local-dependencies) to run the tests your contribution. + 2. [Install the local dependencies](#install-local-dependencies) to run the tests for your contribution. 3. [Test your installation](#testing-your-installation) to ensure everything is set up correctly. 4. Implement your contribution. If contributing to the documentation, see [here](. 5. [Create a pull request](#pull-request-checklist). From 17cb3c9afe4c5c30d3a0ea6d67a50df39569feb6 Mon Sep 17 00:00:00 2001 From: LennartPurucker Date: Tue, 17 Jun 2025 17:26:15 +0200 Subject: [PATCH 4/4] final changes --- CONTRIBUTING.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 6e2f41df2..861c8219e 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -47,7 +47,7 @@ To contribute to the openml-python package, follow these steps: 1. Fork and clone the `openml-python` repository. Then, create a new branch from the ``develop`` branch. If you are new to `git`, see our [detailed documentation](#basic-git-workflow), or rely on your favorite IDE. 2. [Install the local dependencies](#install-local-dependencies) to run the tests your contribution. 3. [Test your installation](#testing-your-installation) to ensure everything is set up correctly. -4. Implement your contribution. If contributing to the documentation, see [here](. +4. Implement your contribution. If contributing to the documentation, see [here](#contributing-to-the-documentation). 5. [Create a pull request](#pull-request-checklist). ### Install Local Dependencies @@ -93,7 +93,7 @@ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data To test your new contribution, add [unit tests](https://github.com/openml/openml-python/tree/develop/tests), and, if needed, [examples](https://github.com/openml/openml-python/tree/develop/examples) for any new functionality being introduced. Some notes on unit tests and examples: * If a unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, `TestBase._mark_entity_for_removal('data', dataset.dataset_id)`, `TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))`. -* Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`. +* Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`, which is done by default for tests derived from `TestBase`. * Add the `@pytest.mark.sklearn` marker to your unit tests if they have a dependency on scikit-learn. ### Pull Request Checklist