diff --git a/.envrc b/.envrc new file mode 100644 index 0000000..a1e6d24 --- /dev/null +++ b/.envrc @@ -0,0 +1,19 @@ +# direnv: project-scoped Java + Python environment for Impulse. +# Loaded only when you cd into this directory. Does not pollute global PATH. +# +# Setup (one-time): direnv allow + +# Java — required by PySpark (Spark 4.0 needs JDK 17 or 21). +# Installed via Homebrew as keg-only so it is NOT on the global PATH by default. +export JAVA_HOME="$(brew --prefix openjdk@17)/libexec/openjdk.jdk/Contents/Home" +PATH_add "$JAVA_HOME/bin" + +# Python — activate the uv-managed virtualenv so `python` and `pytest` resolve +# to this project's interpreter rather than system Python. +if [ -d "$PWD/.venv" ]; then + source "$PWD/.venv/bin/activate" +fi + +# PySpark Python interpreter — must match the venv to avoid worker mismatch. +export PYSPARK_PYTHON="$PWD/.venv/bin/python" +export PYSPARK_DRIVER_PYTHON="$PWD/.venv/bin/python" diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d67ccd7..032ead6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -34,6 +34,37 @@ make dev runtime and development dependencies (pytest, ruff, black, plus pyspark/pandas/etc. declared as `local-dev` extras for use outside Databricks runtimes). +### Project-scoped environment via direnv (recommended on macOS) + +This repo ships an [`.envrc`](.envrc) that activates Java, the venv, and the +PySpark interpreter **only when you `cd` into the repo** — no pollution of your +global `PATH` or `JAVA_HOME`. + +One-time setup: + +```bash +brew install direnv openjdk@17 # JDK 17 stays keg-only — not on global PATH +echo 'eval "$(direnv hook zsh)"' >> ~/.zshrc # or ~/.bashrc for bash +exec $SHELL # reload so the hook is active +cd && direnv allow # whitelist the .envrc +``` + +After that, every fresh terminal that enters this directory will auto-export: + +| Variable | Value | +|---|---| +| `JAVA_HOME` | `$(brew --prefix openjdk@17)/libexec/openjdk.jdk/Contents/Home` | +| `PATH` | `$JAVA_HOME/bin` prepended | +| Python venv | `.venv/` activated | +| `PYSPARK_PYTHON` / `PYSPARK_DRIVER_PYTHON` | `.venv/bin/python` (worker = driver) | + +Verify with `java -version` (should show OpenJDK 17) and `which python` (should +point inside `.venv/`). Outside the repo, both should fall back to whatever +your system default is — direnv unloads automatically. + +> If you can't use direnv, the `.envrc` is a plain shell script — `source .envrc` +> works as a manual alternative inside a single shell session. + ## Running tests ```bash diff --git a/docs/impulse/docs/references/tsal.md b/docs/impulse/docs/references/tsal.md index 3b51d19..572b20b 100644 --- a/docs/impulse/docs/references/tsal.md +++ b/docs/impulse/docs/references/tsal.md @@ -41,7 +41,7 @@ The solver tries each alias in order and returns the first match. For workflows where a stable logical name should resolve to one of many physical channels through a separately maintained mapping table, use `channel_with_alias()` instead. This is supported by `KeyValueStoreSolver` and requires -a `channel_mapping_table` to be configured in `source` (see [Configuration](../config/configuration)). +a `channel_mapping_table` to be configured in `source` (see [Configuration](../config/configuration.md)). ```python rpm = db.query.channel_with_alias(channel_name='Engine RPM')