This is a codebase for basic Python utilities.
We recommend using uv for environment management (10-100x faster than pip/conda).
To install uv and set up the environment:
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create and activate environment
uv venv ~/.venvs/pybase --python 3.11
source ~/.venvs/pybase/bin/activate
# Install dependencies
uv pip install -r requirements.txt
Press to get the instructions for PySpark on Linux or MacOS
For PySpark, make sure Java is installed. We recommend Temurin JDK 21:
# Install Java (Ubuntu/Debian)
sudo apt install -y wget apt-transport-https gpg
wget -qO - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo gpg --dearmor -o /usr/share/keyrings/adoptium.gpg
echo "deb [signed-by=/usr/share/keyrings/adoptium.gpg] https://packages.adoptium.net/artifactory/deb $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/adoptium.list
sudo apt update && sudo apt install -y temurin-21-jdk
Set the environment variables. Add to ~/.bashrc:
export JAVA_HOME=/usr/lib/jvm/temurin-21-jdk-amd64
export PYSPARK_PYTHON=~/.venvs/pybase/bin/python
export PYSPARK_DRIVER_PYTHON=~/.venvs/pybase/bin/python
Then reload:
source ~/.bashrc
Press to get the instructions for PySpark on Windows
- Install Java (download Temurin JDK from https://adoptium.net/)
- Set environment variables in System Properties > Environment Variables:
JAVA_HOME=C:\Program Files\Eclipse Adoptium\jdk-21...PYSPARK_PYTHON=%USERPROFILE%\.venvs\pybase\Scripts\python.exePYSPARK_DRIVER_PYTHON=%USERPROFILE%\.venvs\pybase\Scripts\python.exe
See more details on how to install PySpark on Windows here.
Press to get the instructions for CUDA and CuDNN on Linux or MacOS
TODO
Press to get the instructions for CUDA and CuDNN on Windows
- Check the capability of your GPU here.
- Select the version of CUDA toolkit you want to download. The latest version can be found here.
- Download the corresponding CuDNN based on the CUDA version here.
- Copy three files from the unzipped directory to CUDA X.X install location. For reference, NVIDIA team has put them in their own directory. So all you have to do is to copy file from :
- {unzipped dir}/bin/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\bin
- {unzipped dir}/include/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\include
- {unzipped dir}/lib/ --> C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\lib
See the full installation guide here.
To execute the tests:
pytest --doctest-modules --continue-on-collection-errors --durations 0 --disable-warnings
To execute coverage and see the report:
coverage run playground.py
coverage report
To see more details on the result, the following command will generate a web where the coverage details can be examined line by line:
coverage html
To handle variable outputs in doctest you need to add at the end of the execution line #doctest: +ELLIPSIS and substitute the variable output with ...
An example can be found in the file timer.py.
Original:
>>> "Time elapsed {}".format(t)
'Time elapsed 0:00:1.9875734'
With ellipsis:
>>> "Time elapsed {}".format(t) # doctest: +ELLIPSIS
'Time elapsed 0:00:...'
To skip a test, one can also add: # doctest: +SKIP.
To handle exceptions, you can just add the Traceback info, then ... and then the exception:
>>> raise ValueError("Something bad happened")
Traceback (most recent call last):
...
ValueError: "Something bad happened"
To execute a context manager with doctests:
>>> with TemporaryDirectory() as td:
... print(td.name)
For the documentation, I'm using the Google Style.
To add a code block that can be rendered with sphinx:
.. code-block:: python
import sys
print(sys.executable)
This is equivalent, having the python syntax:
Code::
import sys
print(sys.executable)
To add a note:
.. note::
This is a note
or
Note:
This is a note
In the requirements.txt file, you can specify the Python version for each library. For example:
dask[dataframe]>=0.17.1;python_version=='3.6'
dask>=0.17.1;python_version>='3.7'