diff --git a/PRIVACY.md b/PRIVACY.md index 468fbc0..092b4fa 100644 --- a/PRIVACY.md +++ b/PRIVACY.md @@ -58,7 +58,7 @@ For air-gapped or egress-restricted environments, see [`docs/admin-guide.md`](do > Treat `~/.jupyter/nbi/config.json` and `~/.jupyter/nbi/user-data.json` as secrets. They contain your API keys and (encrypted) GitHub token. Do not commit them to git, share them, or sync them across users. If a key leaks, rotate it at the provider immediately. -The encrypted GitHub token uses a default password (`nbi-access-token-password`) unless you set `NBI_GH_ACCESS_TOKEN_PASSWORD`. The default is **shared across installs** and provides obfuscation, not real protection. Set a custom password before enabling "remember login" on any shared or multi-tenant system. +The encrypted GitHub token uses a default password (`nbi-access-token-password`) unless you set `NBI_GH_ACCESS_TOKEN_PASSWORD`. As of v4.9 the password is mixed with the local `/etc/machine-id` (or hostname) plus POSIX uid before PBKDF2, so a stolen `user-data.json` can no longer be decrypted on a different machine using the default password alone. On Kubernetes deployments where co-tenants share a host machine-id and uid, set `NBI_POD_IDENTITY` (pin per pod or per user) **or** `NBI_GH_ACCESS_TOKEN_PASSWORD` for real cross-tenant isolation. Disabling "remember login" entirely is the strongest option. ## Telemetry diff --git a/docs/admin-guide.md b/docs/admin-guide.md index 774ca69..bb508d1 100644 --- a/docs/admin-guide.md +++ b/docs/admin-guide.md @@ -66,7 +66,7 @@ For Kubeflow or KubeSpawner: mount the user's home directory on a PVC and ensure If users share a home directory across nodes (NFS-backed shared HPC, classroom labs): - **Race conditions in `~/.jupyter/nbi/`.** Concurrent writes from two login nodes can corrupt `config.json`. NBI does not file-lock. Pin each user to one node, or use a per-node config prefix. -- **`NBI_GH_ACCESS_TOKEN_PASSWORD` default is unsafe.** The default password (`nbi-access-token-password`) is shared across installs. On a multi-tenant cluster, anyone with read access to another user's `~/.jupyter/nbi/user-data.json` can decrypt their Copilot token. Set a per-user password (e.g., derived from the Hub user secret), or disable "remember login" entirely (see [Restricting features](#restricting-features-for-managed-deployments)). +- **`NBI_GH_ACCESS_TOKEN_PASSWORD` default needs help on shared hosts.** NBI now mixes the local `/etc/machine-id` (or hostname fallback) and POSIX uid into the KDF so a stolen `user-data.json` cannot be decrypted on a different machine using the default password alone. However, on a Kubernetes deployment where `/etc/machine-id` resolves to the host node's value and every container runs as uid `1000`, the automatic entropy sources collapse to a value shared across co-tenants on that node. For real cross-tenant isolation, set **either** a per-pod `NBI_POD_IDENTITY` (recommended: pin to the JupyterHub username, the spawn token, or the pod's `metadata.uid` via the downward API) **or** a per-user `NBI_GH_ACCESS_TOKEN_PASSWORD`. Disabling "remember login" entirely is the strongest option; see [Restricting features](#restricting-features-for-managed-deployments). - **Skill collisions.** Two users sharing `~/.claude/skills/` will see each other's skills. Make sure each user has a unique home. --- @@ -107,7 +107,8 @@ The full surface, in one table. | `NBI_UPLOAD_RETENTION_HOURS` | int | unset | env (overrides traitlet) | Same as above; env takes precedence. | | `tour_config_path` | str | `""` | traitlet | Filesystem path to a YAML/JSON file with admin overrides for the first-run sidebar tour copy. See [`docs/admin-tour-config.md`](admin-tour-config.md). | | `NBI_TOUR_CONFIG_PATH` | str | unset | env (overrides traitlet) | Same as above; env takes precedence. | -| `NBI_GH_ACCESS_TOKEN_PASSWORD` | str | `nbi-access-token-password` | env | Password used to encrypt the stored Copilot token in `user-data.json`. **Change in multi-tenant deployments.** | +| `NBI_GH_ACCESS_TOKEN_PASSWORD` | str | `nbi-access-token-password` | env | Password used to encrypt the stored Copilot token in `user-data.json`. Combined with the local machine-id (or hostname) and POSIX uid before PBKDF2 (since v4.9), so a stolen file cannot be decrypted on another machine using the default. Still **change in multi-tenant deployments where pods share a machine-id**. | +| `NBI_POD_IDENTITY` | str | unset | env | Highest-priority entropy source mixed into the Copilot token KDF. Pin to a per-pod or per-user value (JupyterHub username, spawn token, Pod metadata.uid via the downward API) when `/etc/machine-id` is shared across co-tenants on the same node. Recommended for any K8s deployment using the default token password. | | `NBI_RULES_AUTO_RELOAD` | bool | `true` | env | When `false`, ruleset edits require a JupyterLab restart to take effect. | | `NBI_CLAUDE_CLI_PATH` | str | unset | env | Absolute path to the Claude Code CLI binary. When unset, NBI looks up `claude` on `PATH`. | | `NBI_OPENCODE_CLI_PATH` | str | unset | env | Absolute path to the opencode CLI. When unset, NBI looks up `opencode` on `PATH`. Gates the opencode launcher tile. | diff --git a/notebook_intelligence/github_copilot.py b/notebook_intelligence/github_copilot.py index 608fb7f..1d69a91 100644 --- a/notebook_intelligence/github_copilot.py +++ b/notebook_intelligence/github_copilot.py @@ -14,7 +14,7 @@ import logging from notebook_intelligence.api import BackendMessageType, CancelToken, ChatResponse, CompletionContext, MarkdownData from notebook_intelligence.config import _atomic_write_json -from notebook_intelligence.util import decrypt_with_password, encrypt_with_password, ThreadSafeWebSocketConnector +from notebook_intelligence.util import decrypt_user_secret, encrypt_user_secret, ThreadSafeWebSocketConnector from ._version import __version__ as NBI_VERSION @@ -121,7 +121,25 @@ def read_stored_github_access_token() -> str: if base64_access_token is not None: base64_bytes = base64.b64decode(base64_access_token.encode('utf-8')) - return decrypt_with_password(access_token_password, base64_bytes).decode('utf-8') + token_bytes, was_legacy = decrypt_user_secret( + access_token_password, base64_bytes + ) + if was_legacy: + # Re-encrypt under the machine-derived KDF so subsequent + # reads no longer fall back to the bare-password path. + # Best-effort: if the rewrite fails the next read just + # takes the same legacy branch again. + log.info( + "Upgrading stored GitHub access token to per-pod KDF" + ) + try: + write_github_access_token(token_bytes.decode('utf-8')) + except Exception as rewrite_exc: + log.warning( + "Token upgrade rewrite failed (will retry next read): %s", + rewrite_exc, + ) + return token_bytes.decode('utf-8') except Exception as e: log.error(f"Failed to read GitHub access token: {e}") @@ -142,7 +160,9 @@ def _save_user_data(user_data: dict) -> None: def write_github_access_token(access_token: str) -> bool: try: - encrypted_access_token = encrypt_with_password(access_token_password, access_token.encode()) + encrypted_access_token = encrypt_user_secret( + access_token_password, access_token.encode() + ) base64_bytes = base64.b64encode(encrypted_access_token) base64_access_token = base64_bytes.decode('utf-8') diff --git a/notebook_intelligence/util.py b/notebook_intelligence/util.py index 6194688..8713310 100644 --- a/notebook_intelligence/util.py +++ b/notebook_intelligence/util.py @@ -4,11 +4,12 @@ import base64 import re import shutil +import socket import subprocess from typing import Optional, Set from cryptography.hazmat.primitives import hashes from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC -from cryptography.fernet import Fernet +from cryptography.fernet import Fernet, InvalidToken import asyncio from tornado import ioloop @@ -180,6 +181,122 @@ def split_csv(raw) -> list: return [token for token in (part.strip() for part in raw.split(",")) if token] +_MACHINE_ID_PATHS: tuple[str, ...] = ( + "/etc/machine-id", + "/var/lib/dbus/machine-id", +) + + +def _read_machine_id() -> str: + """Read the systemd / dbus machine ID. Empty string when unavailable. + + The two standard paths are tried in priority order. Containers that + don't surface either file still produce a stable per-pod string from + the hostname fallback in ``_machine_user_secret``. + """ + for path in _MACHINE_ID_PATHS: + try: + with open(path, "r", encoding="utf-8") as f: + value = f.read().strip() + if value: + return value + except OSError: + continue + return "" + + +def _safe_getuid() -> str: + """``str(os.getuid())`` on POSIX, ``""`` on platforms without ``getuid``.""" + getuid = getattr(os, "getuid", None) + if getuid is None: + return "" + try: + return str(getuid()) + except OSError: + return "" + + +def _machine_user_secret() -> str: + """Stable per-pod + per-user secret to mix into token KDF input. + + Resolution order, first non-empty wins (the rest still contribute + to the concatenation, since stacking weakly-unique values can only + help isolation, never hurt): + + 1. ``NBI_POD_IDENTITY`` env var. Highest-priority because in a + multi-tenant K8s deployment ``/etc/machine-id`` is typically the + host node's value (shared across co-tenants) and POSIX uid + collapses to ``1000`` for every ``jovyan`` container. Admins + point this at a per-pod or per-user value (e.g., the JupyterHub + username, the spawn token) for real cross-tenant isolation. + 2. ``/etc/machine-id`` or ``/var/lib/dbus/machine-id`` if mounted. + 3. ``socket.gethostname()`` as a last resort; in K8s the hostname + equals the pod name, which differs across pods but is not a + confidential value. + + POSIX uid is always included, but is not by itself sufficient on + a typical JupyterHub deployment (every tenant runs as uid 1000). + """ + parts: list[str] = [_safe_getuid()] + explicit = os.environ.get("NBI_POD_IDENTITY", "").strip() + if explicit: + parts.append(explicit) + machine_id = _read_machine_id() + if machine_id: + parts.append(machine_id) + try: + hostname = socket.gethostname() or "" + except OSError: + hostname = "" + parts.append(hostname) + return "::".join(parts) + + +def _derive_token_password(password: str) -> str: + """Combine the admin-supplied password with the machine/user secret. + + The result is fed to ``encrypt_with_password`` / ``decrypt_with_password`` + in place of the bare ``password``. The on-disk blob format is + unchanged; the change is purely in the KDF input. + """ + return f"{_machine_user_secret()}::{password}" + + +def encrypt_user_secret(password: str, data: bytes) -> bytes: + """Encrypt ``data`` with a password mixed with the per-pod secret. + + The on-disk format is identical to ``encrypt_with_password`` so + callers writing through the existing read/write pipelines don't + need a schema bump; the change is purely in the KDF input. + """ + return encrypt_with_password(_derive_token_password(password), data) + + +def decrypt_user_secret( + password: str, ciphertext: bytes, *, allow_legacy: bool = True +) -> tuple[bytes, bool]: + """Decrypt with the per-pod password; on failure, fall back to the + bare password for blobs written before this change rolled out. + + Returns a (plaintext, was_legacy) tuple so the caller can re-encrypt + legacy blobs in place to upgrade them transparently. When + ``allow_legacy=False`` the bare-password fallback is skipped, useful + for tests that want to confirm a blob was written under v2. + + Only ``InvalidToken`` falls through to the legacy path; other + exceptions (e.g. ``ValueError`` from a corrupted blob) propagate so + a programmer error or malformed-on-disk file isn't masked as a + legacy blob. + """ + derived = _derive_token_password(password) + try: + return decrypt_with_password(derived, ciphertext), False + except InvalidToken: + if not allow_legacy: + raise + return decrypt_with_password(password, ciphertext), True + + def get_enabled_builtin_tools_in_env() -> Set[str]: global _enabled_tools if _enabled_tools is not None: diff --git a/tests/test_user_secret_kdf.py b/tests/test_user_secret_kdf.py new file mode 100644 index 0000000..e0d7a88 --- /dev/null +++ b/tests/test_user_secret_kdf.py @@ -0,0 +1,231 @@ +# Copyright (c) Mehmet Bektas + +"""Tests for the per-pod KDF that protects the stored Copilot token. + +``encrypt_user_secret`` / ``decrypt_user_secret`` mix the admin-supplied +``NBI_GH_ACCESS_TOKEN_PASSWORD`` with a machine + user secret before +running PBKDF2. A blob written on one pod must not decrypt on a +different pod even when both ship with the default password, which is +the shared-NFS-home cross-tenant exposure the audit called out. +""" + +from __future__ import annotations + +from unittest.mock import patch + +import pytest + +from cryptography.fernet import InvalidToken + +from notebook_intelligence.util import ( + decrypt_user_secret, + decrypt_with_password, + encrypt_user_secret, + encrypt_with_password, +) + + +PASSWORD = "test-password" +PLAINTEXT = b"ghu_test_token_value" + + +class TestRoundTripV2: + def test_basic_round_trip(self): + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + decrypted, was_legacy = decrypt_user_secret(PASSWORD, ciphertext) + assert decrypted == PLAINTEXT + assert was_legacy is False + + def test_format_is_unchanged_on_disk(self): + # ``encrypt_user_secret`` must produce a blob shape identical to + # ``encrypt_with_password`` so existing readers (and the + # ``user-data.json`` JSON shape) don't need a schema bump. The + # first 16 bytes are still the random salt; the rest is Fernet. + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + assert len(ciphertext) > 16 + + +class TestCrossPodIsolation: + def test_blob_from_pod_a_does_not_decrypt_on_pod_b(self): + # Simulate two pods with different machine-ids. A token + # encrypted on pod A must not decrypt on pod B, even though both + # are using the same admin-supplied password. This is the + # central guarantee against the shared-NFS cross-tenant attack. + with patch( + "notebook_intelligence.util._read_machine_id", + return_value="pod-a-machine-id", + ): + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + with patch( + "notebook_intelligence.util._read_machine_id", + return_value="pod-b-machine-id", + ): + with pytest.raises(InvalidToken): + decrypt_user_secret(PASSWORD, ciphertext, allow_legacy=False) + + def test_different_uid_yields_different_ciphertext(self): + # Same machine-id, different POSIX uid => still isolated. This + # covers the rare multi-user pod (e.g., a shared-cluster shell + # node) where the machine-id is constant but UIDs differ. + with patch("os.getuid", return_value=1001), patch( + "notebook_intelligence.util._read_machine_id", + return_value="m", + ): + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + with patch("os.getuid", return_value=1002), patch( + "notebook_intelligence.util._read_machine_id", + return_value="m", + ): + with pytest.raises(InvalidToken): + decrypt_user_secret(PASSWORD, ciphertext, allow_legacy=False) + + +class TestLegacyFallback: + def test_legacy_blob_decrypts_via_fallback_path(self): + # A blob produced by the old bare-password encrypt path must + # still decrypt; the helper signals was_legacy=True so the + # caller can rewrite under v2. + legacy_ciphertext = encrypt_with_password(PASSWORD, PLAINTEXT) + plaintext, was_legacy = decrypt_user_secret( + PASSWORD, legacy_ciphertext + ) + assert plaintext == PLAINTEXT + assert was_legacy is True + + def test_allow_legacy_false_refuses_old_blobs(self): + legacy_ciphertext = encrypt_with_password(PASSWORD, PLAINTEXT) + with pytest.raises(InvalidToken): + decrypt_user_secret( + PASSWORD, legacy_ciphertext, allow_legacy=False + ) + + +class TestMachineIdFallback: + def test_missing_machine_id_falls_back_to_hostname(self): + # In containers that don't mount /etc/machine-id, the helper + # falls back to the hostname (which equals the pod name in + # Kubernetes), preserving per-pod isolation. + with patch( + "notebook_intelligence.util._read_machine_id", return_value="" + ), patch("socket.gethostname", return_value="pod-x"): + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + with patch( + "notebook_intelligence.util._read_machine_id", return_value="" + ), patch("socket.gethostname", return_value="pod-x"): + plaintext, was_legacy = decrypt_user_secret( + PASSWORD, ciphertext, allow_legacy=False + ) + assert plaintext == PLAINTEXT + assert was_legacy is False + + def test_hostname_change_isolates_pods(self): + with patch( + "notebook_intelligence.util._read_machine_id", return_value="" + ), patch("socket.gethostname", return_value="pod-x"): + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + with patch( + "notebook_intelligence.util._read_machine_id", return_value="" + ), patch("socket.gethostname", return_value="pod-y"): + with pytest.raises(InvalidToken): + decrypt_user_secret( + PASSWORD, ciphertext, allow_legacy=False + ) + + +class TestPasswordStillMatters: + def test_changing_admin_password_invalidates_existing_blob(self): + # The machine-derived secret augments the admin password; it + # does not replace it. A change to NBI_GH_ACCESS_TOKEN_PASSWORD + # must still invalidate every previously-encrypted blob. + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + with pytest.raises(InvalidToken): + decrypt_user_secret( + "different-password", ciphertext, allow_legacy=False + ) + + +class TestPodIdentityEnv: + """NBI_POD_IDENTITY is the highest-priority entropy source. + + On a K8s deployment with /etc/machine-id mounted from the host node + and uid pinned to 1000 (the default for jupyter/jovyan images), the + automatic sources collapse to a value shared across co-tenants on + that node. An admin who pins NBI_POD_IDENTITY to a per-pod or + per-user identity (e.g., the JupyterHub username, the spawn token) + restores cross-tenant isolation. + """ + + def test_pod_identity_isolates_when_machine_id_is_shared(self, monkeypatch): + # Same machine-id + same uid (the realistic K8s shape), but + # different NBI_POD_IDENTITY between two pods. + monkeypatch.setattr( + "notebook_intelligence.util._read_machine_id", + lambda: "shared-host-machine-id", + ) + monkeypatch.setattr("socket.gethostname", lambda: "shared-host") + monkeypatch.setattr("os.getuid", lambda: 1000) + + monkeypatch.setenv("NBI_POD_IDENTITY", "tenant-a") + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + monkeypatch.setenv("NBI_POD_IDENTITY", "tenant-b") + with pytest.raises(InvalidToken): + decrypt_user_secret(PASSWORD, ciphertext, allow_legacy=False) + + def test_pod_identity_unset_uses_machine_id(self, monkeypatch): + # When NBI_POD_IDENTITY is absent, behavior matches the + # machine-id path. Pin so removing the env-var branch doesn't + # quietly break the auto-detection fallback. + monkeypatch.delenv("NBI_POD_IDENTITY", raising=False) + monkeypatch.setattr( + "notebook_intelligence.util._read_machine_id", + lambda: "machine-id-x", + ) + ciphertext = encrypt_user_secret(PASSWORD, PLAINTEXT) + plaintext, was_legacy = decrypt_user_secret( + PASSWORD, ciphertext, allow_legacy=False + ) + assert plaintext == PLAINTEXT + assert was_legacy is False + + +class TestSilentLegacyUpgrade: + """The token reader rewrites legacy blobs under v2 on first read. + + Without this test, a regression that drops the rewrite call (or + inverts the ``if was_legacy`` branch) ships clean because no other + test exercises the github_copilot flow end-to-end. + """ + + def test_legacy_blob_on_disk_gets_upgraded(self, tmp_path, monkeypatch): + import base64 + import json + + from notebook_intelligence import github_copilot + + # Stand up a fake user-data.json that contains a legacy blob + # (encrypted with the bare password) and point the module at it. + legacy_ciphertext = encrypt_with_password(PASSWORD, PLAINTEXT) + legacy_b64 = base64.b64encode(legacy_ciphertext).decode("utf-8") + user_data_path = tmp_path / "user-data.json" + user_data_path.write_text( + json.dumps({"github_access_token": legacy_b64}), + encoding="utf-8", + ) + monkeypatch.setattr(github_copilot, "user_data_file", str(user_data_path)) + monkeypatch.setattr(github_copilot, "access_token_password", PASSWORD) + + first = github_copilot.read_stored_github_access_token() + assert first == PLAINTEXT.decode("utf-8") + + # After the first read the file should now contain a v2 blob, + # so a second read decrypts via the non-legacy path. Confirm by + # checking the on-disk bytes changed (rewritten) and that + # decrypt_user_secret reports was_legacy=False on the new bytes. + on_disk_now = json.loads(user_data_path.read_text(encoding="utf-8")) + new_b64 = on_disk_now["github_access_token"] + assert new_b64 != legacy_b64 + plaintext, was_legacy = decrypt_user_secret( + PASSWORD, base64.b64decode(new_b64.encode("utf-8")) + ) + assert plaintext == PLAINTEXT + assert was_legacy is False