plmbr · pjdoland · May 18, 2026
diff --git a/PRIVACY.md b/PRIVACY.md
@@ -58,7 +58,7 @@ For air-gapped or egress-restricted environments, see [`docs/admin-guide.md`](do
 
 > Treat `~/.jupyter/nbi/config.json` and `~/.jupyter/nbi/user-data.json` as secrets. They contain your API keys and (encrypted) GitHub token. Do not commit them to git, share them, or sync them across users. If a key leaks, rotate it at the provider immediately.
 
-The encrypted GitHub token uses a default password (`nbi-access-token-password`) unless you set `NBI_GH_ACCESS_TOKEN_PASSWORD`. The default is **shared across installs** and provides obfuscation, not real protection. Set a custom password before enabling "remember login" on any shared or multi-tenant system.
+The encrypted GitHub token uses a default password (`nbi-access-token-password`) unless you set `NBI_GH_ACCESS_TOKEN_PASSWORD`. As of v4.9 the password is mixed with the local `/etc/machine-id` (or hostname) plus POSIX uid before PBKDF2, so a stolen `user-data.json` can no longer be decrypted on a different machine using the default password alone. On Kubernetes deployments where co-tenants share a host machine-id and uid, set `NBI_POD_IDENTITY` (pin per pod or per user) **or** `NBI_GH_ACCESS_TOKEN_PASSWORD` for real cross-tenant isolation. Disabling "remember login" entirely is the strongest option.
 
 ## Telemetry
 

diff --git a/docs/admin-guide.md b/docs/admin-guide.md
@@ -66,7 +66,7 @@ For Kubeflow or KubeSpawner: mount the user's home directory on a PVC and ensure
 If users share a home directory across nodes (NFS-backed shared HPC, classroom labs):
 
 - **Race conditions in `~/.jupyter/nbi/`.** Concurrent writes from two login nodes can corrupt `config.json`. NBI does not file-lock. Pin each user to one node, or use a per-node config prefix.
-- **`NBI_GH_ACCESS_TOKEN_PASSWORD` default is unsafe.** The default password (`nbi-access-token-password`) is shared across installs. On a multi-tenant cluster, anyone with read access to another user's `~/.jupyter/nbi/user-data.json` can decrypt their Copilot token. Set a per-user password (e.g., derived from the Hub user secret), or disable "remember login" entirely (see [Restricting features](#restricting-features-for-managed-deployments)).
+- **`NBI_GH_ACCESS_TOKEN_PASSWORD` default needs help on shared hosts.** NBI now mixes the local `/etc/machine-id` (or hostname fallback) and POSIX uid into the KDF so a stolen `user-data.json` cannot be decrypted on a different machine using the default password alone. However, on a Kubernetes deployment where `/etc/machine-id` resolves to the host node's value and every container runs as uid `1000`, the automatic entropy sources collapse to a value shared across co-tenants on that node. For real cross-tenant isolation, set **either** a per-pod `NBI_POD_IDENTITY` (recommended: pin to the JupyterHub username, the spawn token, or the pod's `metadata.uid` via the downward API) **or** a per-user `NBI_GH_ACCESS_TOKEN_PASSWORD`. Disabling "remember login" entirely is the strongest option; see [Restricting features](#restricting-features-for-managed-deployments).
 - **Skill collisions.** Two users sharing `~/.claude/skills/` will see each other's skills. Make sure each user has a unique home.
 
 ---
@@ -107,7 +107,8 @@ The full surface, in one table.
 | `NBI_UPLOAD_RETENTION_HOURS`                     | int  | unset                       | env (overrides traitlet)           | Same as above; env takes precedence.                                                                                                                                                                                                                                                                                                                                                                                                                                     |
 | `tour_config_path`                               | str  | `""`                        | traitlet                           | Filesystem path to a YAML/JSON file with admin overrides for the first-run sidebar tour copy. See [`docs/admin-tour-config.md`](admin-tour-config.md).                                                                                                                                                                                                                                                                                                                   |
 | `NBI_TOUR_CONFIG_PATH`                           | str  | unset                       | env (overrides traitlet)           | Same as above; env takes precedence.                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-| `NBI_GH_ACCESS_TOKEN_PASSWORD`                   | str  | `nbi-access-token-password` | env                                | Password used to encrypt the stored Copilot token in `user-data.json`. **Change in multi-tenant deployments.**                                                                                                                                                                                                                                                                                                                                                           |
+| `NBI_GH_ACCESS_TOKEN_PASSWORD`                   | str  | `nbi-access-token-password` | env                                | Password used to encrypt the stored Copilot token in `user-data.json`. Combined with the local machine-id (or hostname) and POSIX uid before PBKDF2 (since v4.9), so a stolen file cannot be decrypted on another machine using the default. Still **change in multi-tenant deployments where pods share a machine-id**.                                                                                                                                                 |
+| `NBI_POD_IDENTITY`                               | str  | unset                       | env                                | Highest-priority entropy source mixed into the Copilot token KDF. Pin to a per-pod or per-user value (JupyterHub username, spawn token, Pod metadata.uid via the downward API) when `/etc/machine-id` is shared across co-tenants on the same node. Recommended for any K8s deployment using the default token password.                                                                                                                                                 |
 | `NBI_RULES_AUTO_RELOAD`                          | bool | `true`                      | env                                | When `false`, ruleset edits require a JupyterLab restart to take effect.                                                                                                                                                                                                                                                                                                                                                                                                 |
 | `NBI_CLAUDE_CLI_PATH`                            | str  | unset                       | env                                | Absolute path to the Claude Code CLI binary. When unset, NBI looks up `claude` on `PATH`.                                                                                                                                                                                                                                                                                                                                                                                |
 | `NBI_OPENCODE_CLI_PATH`                          | str  | unset                       | env                                | Absolute path to the opencode CLI. When unset, NBI looks up `opencode` on `PATH`. Gates the opencode launcher tile.                                                                                                                                                                                                                                                                                                                                                      |

diff --git a/notebook_intelligence/github_copilot.py b/notebook_intelligence/github_copilot.py
@@ -14,7 +14,7 @@
 import logging
 from notebook_intelligence.api import BackendMessageType, CancelToken, ChatResponse, CompletionContext, MarkdownData
 from notebook_intelligence.config import _atomic_write_json
-from notebook_intelligence.util import decrypt_with_password, encrypt_with_password, ThreadSafeWebSocketConnector
+from notebook_intelligence.util import decrypt_user_secret, encrypt_user_secret, ThreadSafeWebSocketConnector
 
 from ._version import __version__ as NBI_VERSION
 
@@ -121,7 +121,25 @@ def read_stored_github_access_token() -> str:
 
         if base64_access_token is not None:
             base64_bytes = base64.b64decode(base64_access_token.encode('utf-8'))
-            return decrypt_with_password(access_token_password, base64_bytes).decode('utf-8')
+            token_bytes, was_legacy = decrypt_user_secret(
+                access_token_password, base64_bytes
+            )
+            if was_legacy:
+                # Re-encrypt under the machine-derived KDF so subsequent
+                # reads no longer fall back to the bare-password path.
+                # Best-effort: if the rewrite fails the next read just
+                # takes the same legacy branch again.
+                log.info(
+                    "Upgrading stored GitHub access token to per-pod KDF"
+                )
+                try:
+                    write_github_access_token(token_bytes.decode('utf-8'))
+                except Exception as rewrite_exc:
+                    log.warning(
+                        "Token upgrade rewrite failed (will retry next read): %s",
+                        rewrite_exc,
+                    )
+            return token_bytes.decode('utf-8')
     except Exception as e:
         log.error(f"Failed to read GitHub access token: {e}")
 
@@ -142,7 +160,9 @@ def _save_user_data(user_data: dict) -> None:
 
 def write_github_access_token(access_token: str) -> bool:
     try:
-        encrypted_access_token = encrypt_with_password(access_token_password, access_token.encode())
+        encrypted_access_token = encrypt_user_secret(
+            access_token_password, access_token.encode()
+        )
         base64_bytes = base64.b64encode(encrypted_access_token)
         base64_access_token = base64_bytes.decode('utf-8')
 

diff --git a/notebook_intelligence/util.py b/notebook_intelligence/util.py
@@ -4,11 +4,12 @@
 import base64
 import re
 import shutil
+import socket
 import subprocess
 from typing import Optional, Set
 from cryptography.hazmat.primitives import hashes
 from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
-from cryptography.fernet import Fernet
+from cryptography.fernet import Fernet, InvalidToken
 import asyncio
 from tornado import ioloop
 
@@ -180,6 +181,122 @@ def split_csv(raw) -> list:
     return [token for token in (part.strip() for part in raw.split(",")) if token]
 
 
+_MACHINE_ID_PATHS: tuple[str, ...] = (
+    "/etc/machine-id",
+    "/var/lib/dbus/machine-id",
+)
+
+
+def _read_machine_id() -> str:
+    """Read the systemd / dbus machine ID. Empty string when unavailable.
+
+    The two standard paths are tried in priority order. Containers that
+    don't surface either file still produce a stable per-pod string from
+    the hostname fallback in ``_machine_user_secret``.
+    """
+    for path in _MACHINE_ID_PATHS:
+        try:
+            with open(path, "r", encoding="utf-8") as f:
+                value = f.read().strip()
+                if value:
+                    return value
+        except OSError:
+            continue
+    return ""
+
+
+def _safe_getuid() -> str:
+    """``str(os.getuid())`` on POSIX, ``""`` on platforms without ``getuid``."""
+    getuid = getattr(os, "getuid", None)
+    if getuid is None:
+        return ""
+    try:
+        return str(getuid())
+    except OSError:
+        return ""
+
+
+def _machine_user_secret() -> str:
+    """Stable per-pod + per-user secret to mix into token KDF input.
+
+    Resolution order, first non-empty wins (the rest still contribute
+    to the concatenation, since stacking weakly-unique values can only
+    help isolation, never hurt):
+
+    1. ``NBI_POD_IDENTITY`` env var. Highest-priority because in a
+       multi-tenant K8s deployment ``/etc/machine-id`` is typically the
+       host node's value (shared across co-tenants) and POSIX uid
+       collapses to ``1000`` for every ``jovyan`` container. Admins
+       point this at a per-pod or per-user value (e.g., the JupyterHub
+       username, the spawn token) for real cross-tenant isolation.
+    2. ``/etc/machine-id`` or ``/var/lib/dbus/machine-id`` if mounted.
+    3. ``socket.gethostname()`` as a last resort; in K8s the hostname
+       equals the pod name, which differs across pods but is not a
+       confidential value.
+
+    POSIX uid is always included, but is not by itself sufficient on
+    a typical JupyterHub deployment (every tenant runs as uid 1000).
+    """
+    parts: list[str] = [_safe_getuid()]
+    explicit = os.environ.get("NBI_POD_IDENTITY", "").strip()
+    if explicit:
+        parts.append(explicit)
+    machine_id = _read_machine_id()
+    if machine_id:
+        parts.append(machine_id)
+    try:
+        hostname = socket.gethostname() or ""
+    except OSError:
+        hostname = ""
+    parts.append(hostname)
+    return "::".join(parts)
+
+
+def _derive_token_password(password: str) -> str:
+    """Combine the admin-supplied password with the machine/user secret.
+
+    The result is fed to ``encrypt_with_password`` / ``decrypt_with_password``
+    in place of the bare ``password``. The on-disk blob format is
+    unchanged; the change is purely in the KDF input.
+    """
+    return f"{_machine_user_secret()}::{password}"
+
+
+def encrypt_user_secret(password: str, data: bytes) -> bytes:
+    """Encrypt ``data`` with a password mixed with the per-pod secret.
+
+    The on-disk format is identical to ``encrypt_with_password`` so
+    callers writing through the existing read/write pipelines don't
+    need a schema bump; the change is purely in the KDF input.
+    """
+    return encrypt_with_password(_derive_token_password(password), data)
+
+
+def decrypt_user_secret(
+    password: str, ciphertext: bytes, *, allow_legacy: bool = True
+) -> tuple[bytes, bool]:
+    """Decrypt with the per-pod password; on failure, fall back to the
+    bare password for blobs written before this change rolled out.
+
+    Returns a (plaintext, was_legacy) tuple so the caller can re-encrypt
+    legacy blobs in place to upgrade them transparently. When
+    ``allow_legacy=False`` the bare-password fallback is skipped, useful
+    for tests that want to confirm a blob was written under v2.
+
+    Only ``InvalidToken`` falls through to the legacy path; other
+    exceptions (e.g. ``ValueError`` from a corrupted blob) propagate so
+    a programmer error or malformed-on-disk file isn't masked as a
+    legacy blob.
+    """
+    derived = _derive_token_password(password)
+    try:
+        return decrypt_with_password(derived, ciphertext), False
+    except InvalidToken:
+        if not allow_legacy:
+            raise
+        return decrypt_with_password(password, ciphertext), True
+
+
 def get_enabled_builtin_tools_in_env() -> Set[str]:
   global _enabled_tools
   if _enabled_tools is not None: