Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion PRIVACY.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ For air-gapped or egress-restricted environments, see [`docs/admin-guide.md`](do

> Treat `~/.jupyter/nbi/config.json` and `~/.jupyter/nbi/user-data.json` as secrets. They contain your API keys and (encrypted) GitHub token. Do not commit them to git, share them, or sync them across users. If a key leaks, rotate it at the provider immediately.

The encrypted GitHub token uses a default password (`nbi-access-token-password`) unless you set `NBI_GH_ACCESS_TOKEN_PASSWORD`. The default is **shared across installs** and provides obfuscation, not real protection. Set a custom password before enabling "remember login" on any shared or multi-tenant system.
The encrypted GitHub token uses a default password (`nbi-access-token-password`) unless you set `NBI_GH_ACCESS_TOKEN_PASSWORD`. As of v4.9 the password is mixed with the local `/etc/machine-id` (or hostname) plus POSIX uid before PBKDF2, so a stolen `user-data.json` can no longer be decrypted on a different machine using the default password alone. On Kubernetes deployments where co-tenants share a host machine-id and uid, set `NBI_POD_IDENTITY` (pin per pod or per user) **or** `NBI_GH_ACCESS_TOKEN_PASSWORD` for real cross-tenant isolation. Disabling "remember login" entirely is the strongest option.

## Telemetry

Expand Down
5 changes: 3 additions & 2 deletions docs/admin-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ For Kubeflow or KubeSpawner: mount the user's home directory on a PVC and ensure
If users share a home directory across nodes (NFS-backed shared HPC, classroom labs):

- **Race conditions in `~/.jupyter/nbi/`.** Concurrent writes from two login nodes can corrupt `config.json`. NBI does not file-lock. Pin each user to one node, or use a per-node config prefix.
- **`NBI_GH_ACCESS_TOKEN_PASSWORD` default is unsafe.** The default password (`nbi-access-token-password`) is shared across installs. On a multi-tenant cluster, anyone with read access to another user's `~/.jupyter/nbi/user-data.json` can decrypt their Copilot token. Set a per-user password (e.g., derived from the Hub user secret), or disable "remember login" entirely (see [Restricting features](#restricting-features-for-managed-deployments)).
- **`NBI_GH_ACCESS_TOKEN_PASSWORD` default needs help on shared hosts.** NBI now mixes the local `/etc/machine-id` (or hostname fallback) and POSIX uid into the KDF so a stolen `user-data.json` cannot be decrypted on a different machine using the default password alone. However, on a Kubernetes deployment where `/etc/machine-id` resolves to the host node's value and every container runs as uid `1000`, the automatic entropy sources collapse to a value shared across co-tenants on that node. For real cross-tenant isolation, set **either** a per-pod `NBI_POD_IDENTITY` (recommended: pin to the JupyterHub username, the spawn token, or the pod's `metadata.uid` via the downward API) **or** a per-user `NBI_GH_ACCESS_TOKEN_PASSWORD`. Disabling "remember login" entirely is the strongest option; see [Restricting features](#restricting-features-for-managed-deployments).
- **Skill collisions.** Two users sharing `~/.claude/skills/` will see each other's skills. Make sure each user has a unique home.

---
Expand Down Expand Up @@ -107,7 +107,8 @@ The full surface, in one table.
| `NBI_UPLOAD_RETENTION_HOURS` | int | unset | env (overrides traitlet) | Same as above; env takes precedence. |
| `tour_config_path` | str | `""` | traitlet | Filesystem path to a YAML/JSON file with admin overrides for the first-run sidebar tour copy. See [`docs/admin-tour-config.md`](admin-tour-config.md). |
| `NBI_TOUR_CONFIG_PATH` | str | unset | env (overrides traitlet) | Same as above; env takes precedence. |
| `NBI_GH_ACCESS_TOKEN_PASSWORD` | str | `nbi-access-token-password` | env | Password used to encrypt the stored Copilot token in `user-data.json`. **Change in multi-tenant deployments.** |
| `NBI_GH_ACCESS_TOKEN_PASSWORD` | str | `nbi-access-token-password` | env | Password used to encrypt the stored Copilot token in `user-data.json`. Combined with the local machine-id (or hostname) and POSIX uid before PBKDF2 (since v4.9), so a stolen file cannot be decrypted on another machine using the default. Still **change in multi-tenant deployments where pods share a machine-id**. |
| `NBI_POD_IDENTITY` | str | unset | env | Highest-priority entropy source mixed into the Copilot token KDF. Pin to a per-pod or per-user value (JupyterHub username, spawn token, Pod metadata.uid via the downward API) when `/etc/machine-id` is shared across co-tenants on the same node. Recommended for any K8s deployment using the default token password. |
| `NBI_RULES_AUTO_RELOAD` | bool | `true` | env | When `false`, ruleset edits require a JupyterLab restart to take effect. |
| `NBI_CLAUDE_CLI_PATH` | str | unset | env | Absolute path to the Claude Code CLI binary. When unset, NBI looks up `claude` on `PATH`. |
| `NBI_OPENCODE_CLI_PATH` | str | unset | env | Absolute path to the opencode CLI. When unset, NBI looks up `opencode` on `PATH`. Gates the opencode launcher tile. |
Expand Down
26 changes: 23 additions & 3 deletions notebook_intelligence/github_copilot.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import logging
from notebook_intelligence.api import BackendMessageType, CancelToken, ChatResponse, CompletionContext, MarkdownData
from notebook_intelligence.config import _atomic_write_json
from notebook_intelligence.util import decrypt_with_password, encrypt_with_password, ThreadSafeWebSocketConnector
from notebook_intelligence.util import decrypt_user_secret, encrypt_user_secret, ThreadSafeWebSocketConnector

from ._version import __version__ as NBI_VERSION

Expand Down Expand Up @@ -121,7 +121,25 @@ def read_stored_github_access_token() -> str:

if base64_access_token is not None:
base64_bytes = base64.b64decode(base64_access_token.encode('utf-8'))
return decrypt_with_password(access_token_password, base64_bytes).decode('utf-8')
token_bytes, was_legacy = decrypt_user_secret(
access_token_password, base64_bytes
)
if was_legacy:
# Re-encrypt under the machine-derived KDF so subsequent
# reads no longer fall back to the bare-password path.
# Best-effort: if the rewrite fails the next read just
# takes the same legacy branch again.
log.info(
"Upgrading stored GitHub access token to per-pod KDF"
)
try:
write_github_access_token(token_bytes.decode('utf-8'))
except Exception as rewrite_exc:
log.warning(
"Token upgrade rewrite failed (will retry next read): %s",
rewrite_exc,
)
return token_bytes.decode('utf-8')
except Exception as e:
log.error(f"Failed to read GitHub access token: {e}")

Expand All @@ -142,7 +160,9 @@ def _save_user_data(user_data: dict) -> None:

def write_github_access_token(access_token: str) -> bool:
try:
encrypted_access_token = encrypt_with_password(access_token_password, access_token.encode())
encrypted_access_token = encrypt_user_secret(
access_token_password, access_token.encode()
)
base64_bytes = base64.b64encode(encrypted_access_token)
base64_access_token = base64_bytes.decode('utf-8')

Expand Down
119 changes: 118 additions & 1 deletion notebook_intelligence/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,12 @@
import base64
import re
import shutil
import socket
import subprocess
from typing import Optional, Set
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC
from cryptography.fernet import Fernet
from cryptography.fernet import Fernet, InvalidToken
import asyncio
from tornado import ioloop

Expand Down Expand Up @@ -180,6 +181,122 @@ def split_csv(raw) -> list:
return [token for token in (part.strip() for part in raw.split(",")) if token]


_MACHINE_ID_PATHS: tuple[str, ...] = (
"/etc/machine-id",
"/var/lib/dbus/machine-id",
)


def _read_machine_id() -> str:
"""Read the systemd / dbus machine ID. Empty string when unavailable.

The two standard paths are tried in priority order. Containers that
don't surface either file still produce a stable per-pod string from
the hostname fallback in ``_machine_user_secret``.
"""
for path in _MACHINE_ID_PATHS:
try:
with open(path, "r", encoding="utf-8") as f:
value = f.read().strip()
if value:
return value
except OSError:
continue
return ""


def _safe_getuid() -> str:
"""``str(os.getuid())`` on POSIX, ``""`` on platforms without ``getuid``."""
getuid = getattr(os, "getuid", None)
if getuid is None:
return ""
try:
return str(getuid())
except OSError:
return ""


def _machine_user_secret() -> str:
"""Stable per-pod + per-user secret to mix into token KDF input.

Resolution order, first non-empty wins (the rest still contribute
to the concatenation, since stacking weakly-unique values can only
help isolation, never hurt):

1. ``NBI_POD_IDENTITY`` env var. Highest-priority because in a
multi-tenant K8s deployment ``/etc/machine-id`` is typically the
host node's value (shared across co-tenants) and POSIX uid
collapses to ``1000`` for every ``jovyan`` container. Admins
point this at a per-pod or per-user value (e.g., the JupyterHub
username, the spawn token) for real cross-tenant isolation.
2. ``/etc/machine-id`` or ``/var/lib/dbus/machine-id`` if mounted.
3. ``socket.gethostname()`` as a last resort; in K8s the hostname
equals the pod name, which differs across pods but is not a
confidential value.

POSIX uid is always included, but is not by itself sufficient on
a typical JupyterHub deployment (every tenant runs as uid 1000).
"""
parts: list[str] = [_safe_getuid()]
explicit = os.environ.get("NBI_POD_IDENTITY", "").strip()
if explicit:
parts.append(explicit)
machine_id = _read_machine_id()
if machine_id:
parts.append(machine_id)
try:
hostname = socket.gethostname() or ""
except OSError:
hostname = ""
parts.append(hostname)
return "::".join(parts)


def _derive_token_password(password: str) -> str:
"""Combine the admin-supplied password with the machine/user secret.

The result is fed to ``encrypt_with_password`` / ``decrypt_with_password``
in place of the bare ``password``. The on-disk blob format is
unchanged; the change is purely in the KDF input.
"""
return f"{_machine_user_secret()}::{password}"


def encrypt_user_secret(password: str, data: bytes) -> bytes:
"""Encrypt ``data`` with a password mixed with the per-pod secret.

The on-disk format is identical to ``encrypt_with_password`` so
callers writing through the existing read/write pipelines don't
need a schema bump; the change is purely in the KDF input.
"""
return encrypt_with_password(_derive_token_password(password), data)


def decrypt_user_secret(
password: str, ciphertext: bytes, *, allow_legacy: bool = True
) -> tuple[bytes, bool]:
"""Decrypt with the per-pod password; on failure, fall back to the
bare password for blobs written before this change rolled out.

Returns a (plaintext, was_legacy) tuple so the caller can re-encrypt
legacy blobs in place to upgrade them transparently. When
``allow_legacy=False`` the bare-password fallback is skipped, useful
for tests that want to confirm a blob was written under v2.

Only ``InvalidToken`` falls through to the legacy path; other
exceptions (e.g. ``ValueError`` from a corrupted blob) propagate so
a programmer error or malformed-on-disk file isn't masked as a
legacy blob.
"""
derived = _derive_token_password(password)
try:
return decrypt_with_password(derived, ciphertext), False
except InvalidToken:
if not allow_legacy:
raise
return decrypt_with_password(password, ciphertext), True


def get_enabled_builtin_tools_in_env() -> Set[str]:
global _enabled_tools
if _enabled_tools is not None:
Expand Down
Loading
Loading