openml · LennartPurucker · Jun 17, 2025 · Jun 16, 2025 · Jun 17, 2025 · Jun 17, 2025
diff --git a/.all-contributorsrc b/.all-contributorsrc
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,9 +1,9 @@
+# Contributing to `openml-python`
 This document describes the workflow on how to contribute to the openml-python package.
 If you are interested in connecting a machine learning package with OpenML (i.e.
 write an openml-python extension) or want to find other ways to contribute, see [this page](https://openml.github.io/openml-python/main/contributing.html#contributing).
 
-Scope of the package
---------------------
+## Scope of the package
 
 The scope of the OpenML Python package is to provide a Python interface to
 the OpenML platform which integrates well with Python's scientific stack, most
@@ -15,66 +15,112 @@ in Python, [scikit-learn](http://scikit-learn.org/stable/index.html).
 Thereby it will automatically be compatible with many machine learning
 libraries written in Python.
 
-We aim to keep the package as light-weight as possible and we will try to
+We aim to keep the package as light-weight as possible, and we will try to
 keep the number of potential installation dependencies as low as possible.
 Therefore, the connection to other machine learning libraries such as
 *pytorch*, *keras* or *tensorflow* should not be done directly inside this
 package, but in a separate package using the OpenML Python connector.
 More information on OpenML Python connectors can be found [here](https://openml.github.io/openml-python/main/contributing.html#contributing).
 
-Reporting bugs
---------------
-We use GitHub issues to track all bugs and feature requests; feel free to
-open an issue if you have found a bug or wish to see a feature implemented.
-
-It is recommended to check that your issue complies with the
-following rules before submitting:
-
--  Verify that your issue is not being currently addressed by other
-   [issues](https://github.com/openml/openml-python/issues)
-   or [pull requests](https://github.com/openml/openml-python/pulls).
-
--  Please ensure all code snippets and error messages are formatted in
-   appropriate code blocks.
-   See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks).
-
--  Please include your operating system type and version number, as well
-   as your Python, openml, scikit-learn, numpy, and scipy versions. This information
-   can be found by running the following code snippet:
-```python
-import platform; print(platform.platform())
-import sys; print("Python", sys.version)
-import numpy; print("NumPy", numpy.__version__)
-import scipy; print("SciPy", scipy.__version__)
-import sklearn; print("Scikit-Learn", sklearn.__version__)
-import openml; print("OpenML", openml.__version__)
-```
+## Determine what contribution to make
 
-Determine what contribution to make
------------------------------------
 Great! You've decided you want to help out. Now what?
-All contributions should be linked to issues on the [Github issue tracker](https://github.com/openml/openml-python/issues).
+All contributions should be linked to issues on the [GitHub issue tracker](https://github.com/openml/openml-python/issues).
 In particular for new contributors, the *good first issue* label should help you find
-issues which are suitable for beginners.  Resolving these issues allow you to start
+issues which are suitable for beginners.  Resolving these issues allows you to start
 contributing to the project without much prior knowledge. Your assistance in this area 
 will be greatly appreciated by the more experienced developers as it helps free up 
 their time to concentrate on other issues.
 
-If you encountered a particular part of the documentation or code that you want to improve,
+If you encounter a particular part of the documentation or code that you want to improve,
 but there is no related open issue yet, open one first.
 This is important since you can first get feedback or pointers from experienced contributors.
 
 To let everyone know you are working on an issue, please leave a comment that states you will work on the issue
 (or, if you have the permission, *assign* yourself to the issue). This avoids double work!
 
-General git workflow
---------------------
+## Contributing Workflow Overview 
+To contribute to the openml-python package, follow these steps:
+
+0. Determine how you want to contribute (see above).
+1. Set up your local development environment.
+   1. Fork and clone the `openml-python` repository. Then, create a new branch from the ``develop`` branch. If you are new to `git`, see our [detailed documentation](#basic-git-workflow), or rely on your favorite IDE.   
+   2. [Install the local dependencies](#install-local-dependencies) to run the tests for your contribution.
+   3. [Test your installation](#testing-your-installation) to ensure everything is set up correctly.
+4. Implement your contribution. If contributing to the documentation, see [here](#contributing-to-the-documentation).
+5. [Create a pull request](#pull-request-checklist). 
+
+### Install Local Dependencies
+
+We recommend following the instructions below to install all requirements locally.
+However, it is also possible to use the [openml-python docker image](https://github.com/openml/openml-python/blob/main/docker/readme.md) for testing and building documentation. Moreover, feel free to use any alternative package managers, such as `pip`.
+
+
+1. To ensure a smooth development experience, we recommend using the `uv` package manager. Thus, first install `uv`. If any Python version already exists on your system, follow the steps below, otherwise see [here](https://docs.astral.sh/uv/getting-started/installation/). 
+    ```bash
+    pip install uv
+    ```
+2. Create a virtual environment using `uv` and activate it. This will ensure that the dependencies for `openml-python` do not interfere with other Python projects on your system. 
+   ```bash
+   uv venv --seed --python 3.8 ~/.venvs/openml-python
+   source ~/.venvs/openml-python/bin/activate
+   pip install uv # Install uv within the virtual environment
+   ```
+3. Then install openml with its test dependencies by running
+   ```bash
+   uv pip install -e .[test]
+   ```
+   from the repository folder.
+   Then configure the pre-commit to be able to run unit tests, as well as [pre-commit](#pre-commit-details) through:
+   ```bash
+   pre-commit install
+   ```
+
+### Testing (Your Installation)
+To test your installation and run the tests for the first time, run the following from the repository folder:
+```bash
+pytest tests
+```
+For Windows systems, you may need to add `pytest` to PATH before executing the command.
+
+Executing a specific unit test can be done by specifying the module, test case, and test.
+You may then run a specific module, test case, or unit test respectively:
+```bash
+pytest tests/test_datasets/test_dataset.py
+pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest
+pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data
+```
+
+To test your new contribution, add [unit tests](https://github.com/openml/openml-python/tree/develop/tests), and, if needed, [examples](https://github.com/openml/openml-python/tree/develop/examples) for any new functionality being introduced. Some notes on unit tests and examples:
+* If a unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, `TestBase._mark_entity_for_removal('data', dataset.dataset_id)`, `TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))`.
+* Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`, which is done by default for tests derived from `TestBase`.
+* Add the `@pytest.mark.sklearn` marker to your unit tests if they have a dependency on scikit-learn.
+
+### Pull Request Checklist
+
+You can go to the `openml-python` GitHub repository to create the pull request by [comparing the branch](https://github.com/openml/openml-python/compare) from your fork with the `develop` branch of the `openml-python` repository. When creating a pull request, make sure to follow the comments and structured provided by the template on GitHub.
+
+**An incomplete contribution** -- where you expect to do more work before
+receiving a full review -- should be submitted as a `draft`. These may be useful
+to: indicate you are working on something to avoid duplicated work,
+request broad review of functionality or API, or seek collaborators.
+Drafts often benefit from the inclusion of a
+[task list](https://github.com/blog/1375-task-lists-in-gfm-issues-pulls-comments)
+in the PR description.
+
+--- 
+
+# Appendix
+
+## Basic `git` Workflow
 
 The preferred workflow for contributing to openml-python is to
 fork the [main repository](https://github.com/openml/openml-python) on
 GitHub, clone, check out the branch `develop`, and develop on a new branch
 branch. Steps:
 
+0. Make sure you have git installed, and a GitHub account.
+
 1. Fork the [project repository](https://github.com/openml/openml-python)
    by clicking on the 'Fork' button near the top right of the page. This creates
    a copy of the code under your GitHub user account. For more details on
@@ -84,20 +130,20 @@ branch. Steps:
 local disk:
 
    ```bash
-   $ git clone git@github.com:YourLogin/openml-python.git
-   $ cd openml-python
+   git clone git@github.com:YourLogin/openml-python.git
+   cd openml-python
    ```
 
 3. Switch to the ``develop`` branch:
 
    ```bash
-   $ git checkout develop
+   git checkout develop
    ```
 
 3. Create a ``feature`` branch to hold your development changes:
 
    ```bash
-   $ git checkout -b feature/my-feature
+   git checkout -b feature/my-feature
    ```
 
    Always use a ``feature`` branch. It's good practice to never work on the ``main`` or ``develop`` branch! 
@@ -106,98 +152,24 @@ local disk:
 4. Develop the feature on your feature branch. Add changed files using ``git add`` and then ``git commit`` files:
 
    ```bash
-   $ git add modified_files
-   $ git commit
+   git add modified_files
+   git commit
    ```
 
    to record your changes in Git, then push the changes to your GitHub account with:
 
    ```bash
-   $ git push -u origin my-feature
+   git push -u origin my-feature
    ```
 
 5. Follow [these instructions](https://help.github.com/articles/creating-a-pull-request-from-a-fork)
-to create a pull request from your fork. This will send an email to the committers.
+to create a pull request from your fork.
 
 (If any of the above seems like magic to you, please look up the
 [Git documentation](https://git-scm.com/documentation) on the web, or ask a friend or another contributor for help.)
 
-Pull Request Checklist
-----------------------
-
-We recommended that your contribution complies with the
-following rules before you submit a pull request:
-
--  Follow the
-   [pep8 style guide](https://www.python.org/dev/peps/pep-0008/).
-   With the following exceptions or additions:
-    - The max line length is 100 characters instead of 80.
-    - When creating a multi-line expression with binary operators, break before the operator.
-    - Add type hints to all function signatures.
-    (note: not all functions have type hints yet, this is work in progress.)
-    - Use the [`str.format`](https://docs.python.org/3/library/stdtypes.html#str.format) over [`printf`](https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting) style formatting.
-     E.g. use `"{} {}".format('hello', 'world')` not `"%s %s" % ('hello', 'world')`.
-     (note: old code may still use `printf`-formatting, this is work in progress.)
-
--  If your pull request addresses an issue, please use the pull request title
-   to describe the issue and mention the issue number in the pull request description. This will make sure a link back to the original issue is
-   created. Make sure the title is descriptive enough to understand what the pull request does! 
-
--  An incomplete contribution -- where you expect to do more work before
-   receiving a full review -- should be submitted as a `draft`. These may be useful
-   to: indicate you are working on something to avoid duplicated work,
-   request broad review of functionality or API, or seek collaborators.
-   Drafts often benefit from the inclusion of a
-   [task list](https://github.com/blog/1375-task-lists-in-gfm-issues-pulls-comments)
-   in the PR description.
-
-- Add [unit tests](https://github.com/openml/openml-python/tree/develop/tests) and [examples](https://github.com/openml/openml-python/tree/develop/examples) for any new functionality being introduced. 
-    - If an unit test contains an upload to the test server, please ensure that it is followed by a file collection for deletion, to prevent the test server from bulking up. For example, `TestBase._mark_entity_for_removal('data', dataset.dataset_id)`, `TestBase._mark_entity_for_removal('flow', (flow.flow_id, flow.name))`.
-    - Please ensure that the example is run on the test server by beginning with the call to `openml.config.start_using_configuration_for_example()`.
-    - Add the `@pytest.mark.sklearn` marker to your unit tests if they have a dependency on scikit-learn.
-
--  All tests pass when running `pytest`. On
-   Unix-like systems, check with (from the toplevel source folder):
-
-      ```bash
-      $ pytest
-      ```
-
-   For Windows systems, execute the command from an Anaconda Prompt or add `pytest` to PATH before executing the command.
-
--  Documentation and high-coverage tests are necessary for enhancements to be
-   accepted. Bug-fixes or new features should be provided with
-   [non-regression tests](https://en.wikipedia.org/wiki/Non-regression_testing).
-   These tests verify the correct behavior of the fix or feature. In this
-   manner, further modifications on the code base are granted to be consistent
-   with the desired behavior.
-   For the Bug-fixes case, at the time of the PR, this tests should fail for
-   the code base in develop and pass for the PR code.
-
- - If any source file is being added to the repository, please add the BSD 3-Clause license to it.
-
-
-*Note*: We recommend to follow the instructions below to install all requirements locally.
-However it is also possible to use the [openml-python docker image](https://github.com/openml/openml-python/blob/main/docker/readme.md) for testing and building documentation.
-This can be useful for one-off contributions or when you are experiencing installation issues.
-
-First install openml with its test dependencies by running
-  ```bash
-  $ pip install -e .[test]
-  ```
-from the repository folder.
-Then configure pre-commit through
- ```bash
- $ pre-commit install
- ```
-This will install dependencies to run unit tests, as well as [pre-commit](https://pre-commit.com/).
-To run the unit tests, and check their code coverage, run:
-  ```bash
-  $ pytest --cov=. path/to/tests_for_package
-  ```
-Make sure your code has good unittest **coverage** (at least 80%).
-
-Pre-commit is used for various style checking and code formatting.
+## Pre-commit Details
+[Pre-commit](https://pre-commit.com/) is used for various style checking and code formatting.
 Before each commit, it will automatically run:
  - [ruff](https://docs.astral.sh/ruff/) a code formatter and linter.
    This will automatically format your code.
@@ -216,23 +188,7 @@ $ pre-commit run --all-files
 ```
 Make sure to do this at least once before your first commit to check your setup works.
 
-Executing a specific unit test can be done by specifying the module, test case, and test.
-You may then run a specific module, test case, or unit test respectively:
-```bash
-  $ pytest tests/test_datasets/test_dataset.py
-  $ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest
-  $ pytest tests/test_datasets/test_dataset.py::OpenMLDatasetTest::test_get_data
-```
-
-*NOTE*: In the case the examples build fails during the Continuous Integration test online, please 
-fix the first failing example. If the first failing example switched the server from live to test 
-or vice-versa, and the subsequent examples expect the other server, the ensuing examples will fail 
-to be built as well.
-
-Happy testing!
-
-Documentation
--------------
+## Contributing to the Documentation
 
 We are glad to accept any sort of documentation: function docstrings,
 reStructuredText documents, tutorials, etc.
@@ -247,9 +203,9 @@ information.
 
 For building the documentation, you will need to install a few additional dependencies:
 ```bash
-$ pip install -e .[examples,docs]
+uv pip install -e .[examples,docs]
 ```
 When dependencies are installed, run
 ```bash
-$ sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY
+sphinx-build -b html doc YOUR_PREFERRED_OUTPUT_DIRECTORY
 ```
diff --git a/ISSUE_TEMPLATE.md b/ISSUE_TEMPLATE.md
@@ -1,3 +1,15 @@
+<!--
+It is recommended to check that your issue complies with the
+following rules before submitting:
+
+-  Verify that your issue is not being currently addressed by other
+   issues (https://github.com/openml/openml-python/issues)
+   or pull requests (https://github.com/openml/openml-python/pulls).
+
+-  Please ensure all code snippets and error messages are formatted in
+   appropriate code blocks. See https://help.github.com/articles/creating-and-highlighting-code-blocks
+-->
+
 #### Description
 <!-- Example: Joblib Error thrown when calling fit on LatentDirichletAllocation with evaluate_every > 0-->
 
@@ -20,7 +32,10 @@ it in the issue: https://gist.github.com
 
 #### Versions
 <!--
-Please run the following snippet and paste the output below.
+Please include your operating system type and version number, as well
+as your Python, openml, scikit-learn, numpy, and scipy versions. This information
+can be found by running the following code snippet:
+
 import platform; print(platform.platform())
 import sys; print("Python", sys.version)
 import numpy; print("NumPy", numpy.__version__)
@@ -30,4 +45,5 @@ import openml; print("OpenML", openml.__version__)
 -->
 
 
-<!-- Thanks for contributing! -->
+<!-- Thanks for contributing! -->
+