Skip to content

Latest commit

 

History

History
568 lines (402 loc) · 30.1 KB

File metadata and controls

568 lines (402 loc) · 30.1 KB

Architecture

MidTerm is a web-based terminal workspace built around a native server (mt), a per-session PTY host (mthost), and a browser frontend that adds layout, files, git, commands, web preview, mobile controls, and operations UI around live terminal sessions.

The important architectural point is that MidTerm is not only a terminal renderer. The browser shell coordinates multiple long-lived sessions, several WebSocket channels, local settings and storage, browser preview bridges, session sharing, and an installer/update pipeline that has to keep real user installs recoverable.

Runtime Topology

Browser
├─ xterm.js terminals
├─ sidebar, layout engine, files/git/commands panels
├─ Command Bay (smart input, automation bar, touch/mobile shell), diagnostics
├─ web preview iframe or detached preview window
├─ /ws/mux       binary terminal I/O
├─ /ws/state     JSON session/update state
├─ /ws/settings  JSON settings sync
└─ REST APIs for auth, sessions, files, preview, updates, logs
            │
            ▼
mt / mt.exe
├─ Kestrel HTTP + WebSocket host
├─ Session lifecycle + mux fanout
├─ settings, auth, share, cert, update, diagnostics services
├─ embedded static assets
└─ web preview proxy + browser bridge coordination
            │
            ▼
mthost / mthost.exe (one per session)
└─ PTY host for ConPTY on Windows or forkpty on Unix

1. Runtime Model

mt

mt is the long-lived server process. It owns:

  • HTTP endpoints, authentication, and static file serving
  • the terminal session registry and lifecycle
  • the per-instance ownership identity used to claim and reconnect only its own sidecars
  • mux fanout for terminal output and client input
  • settings persistence and settings WebSocket sync
  • updates, logs, diagnostics, certificate lifecycle, and share-link services
  • the web preview reverse proxy and preview/browser bridge routing

The server is compiled with Native AOT, uses source-generated JSON serialization, and keeps platform-specific behavior explicit rather than reflection-driven.

mthost

Each terminal session runs in its own mthost process. That gives MidTerm:

  • crash isolation between sessions
  • a clean privilege boundary between the web server and the PTY process
  • platform-specific PTY handling without pulling terminal lifecycle into the web host
  • the ability to restart or replace the web server separately from terminal hosts in web-only update flows

Instance Ownership Model

MidTerm now treats the connection between mt and mthost as an explicit ownership contract instead of a best-effort local reconnect.

  • every running mt instance loads a stable install-scope secret from the settings directory
  • the live instance identity is derived from that stable scope plus the configured port
  • mthost is launched with that instance identity and owner token
  • IPC endpoints are namespaced by instance identity, so side-by-side MidTerm instances on different ports do not enumerate each other's PTY hosts
  • after connecting, mt must still complete an attach handshake; mthost rejects foreign instances even if they somehow reach the endpoint
  • only a successfully attached owner is allowed to replace the current mt connection during reconnect

This is what allows multiple MidTerm installations or ports to run side by side while still keeping reconnect fast and deterministic.

Static Assets

Production assets are precompressed and embedded into the server assembly. MidTerm serves its frontend from memory instead of relying on a mutable on-disk web root.

2. Frontend Composition

MidTerm's frontend is vanilla TypeScript organized by feature modules rather than a component framework. main.ts wires the subsystems together at startup.

The browser shell includes:

  • sidebar modules for sessions, history, update notices, network/share, and voice controls
  • terminal modules for creation, sizing, search, paste/drop handling, scaling, and mobile PiP
  • layout modules for split panes and dock overlays
  • session wrappers that add Files tabs plus web, commands, share, git, and experimental Lens surfaces per session
  • feature panels for files, git, commands, and web preview
  • Command Bay modules for smart input, the automation bar, touch controller, Lens quick settings, and attachment/media affordances, plus chat, PWA, and diagnostics modules

State is split between:

  • nanostores for reactive shared state such as sessions, active session, settings, layout, and process metadata
  • module-local state for ephemeral UI concerns such as DOM handles, timers, drag state, preview clients, and pending buffers

That split keeps high-frequency terminal paths imperative while still allowing the rest of the UI to react to shared state changes.

3. Session and Terminal Pipeline

Session Lifecycle

Session creation, deletion, reordering, naming, bookmarking, sharing, and resize requests go through the server APIs and state WebSocket updates. The frontend renders the session list from live state instead of polling.

mt also persists an instance-owned session registry for PTY hosts. That registry is used on restart to reconnect directly to known mthost processes instead of adopting arbitrary local endpoints.

Mux Channel

/ws/mux carries multiplexed binary terminal traffic for every visible session. The server prioritizes the active session and can batch and compress background output.

Relevant frame families include:

  • output
  • input
  • resize
  • resync
  • compressed background output
  • active-session hint
  • foreground-process change
  • data-loss notification

Foreground Process and Session Metadata

MidTerm tracks foreground cwd, process, command line, and terminal title. That data feeds:

  • session naming fallbacks
  • per-session cwd display in the session bar
  • tab-title modes
  • history/bookmark labeling
  • session heat and activity presentation

Terminal Resize Principle

MidTerm intentionally does not auto-resize existing sessions just because another client connected or a page reloaded. MidTerm also treats terminal size ownership as a manual decision, not something the system should guess.

The model is:

  1. One browser is the explicit leading browser for terminal sizing.
  2. Only the leading browser may send authoritative server-side cols/rows.
  3. New sessions are created at the best size for the leading browser's viewport, never from a follower's viewport.
  4. Existing sessions keep their server-side dimensions until the leading browser explicitly changes them.
  5. Secondary browsers CSS-scale terminals locally instead of sending resize commands.
  6. Users explicitly claim size ownership from another browser when they want a different screen to become authoritative.
  7. Disconnects, reconnects, inactivity, focus changes, visibility changes, or device changes must not automatically transfer size ownership.

This is what makes multi-device usage predictable instead of having one client constantly break another client's layout. The engineering goal is therefore twofold:

  • keep the leading browser's sizing path reliable for all relevant UI changes such as window resizes, panel open/close, layout changes, session switches, and new session creation
  • keep follower browsers strictly non-authoritative even when they render a different viewport more cleanly

Host Reconnect and Updates

MidTerm's PTY reconnect path is now split into two cases:

  • owned reconnect: mt reconnects to namespaced mthost endpoints belonging to its current instance identity
  • legacy import: after upgrading from older single-instance builds, mt can do a one-time import of pre-ownership mthost endpoints and then records them in its owned session registry

The legacy path exists so a full mt + mthost upgrade can keep already-running PTY hosts alive while the web server restarts. Once those legacy hosts exit, all newly spawned hosts use the owned endpoint namespace plus attach handshake.

Terminal UX Layer

Around the raw PTY stream, MidTerm adds:

  • font preloading and calibration terminals
  • WebGL-backed rendering when enabled
  • search UI with keyboard navigation
  • copy/paste and OSC52 clipboard support
  • image paste and file-drop handling
  • File Radar path detection with a per-session allowlist boundary
  • scrollback protection and visibility-aware focus handling

MidTerm intentionally keeps shown sessions as live terminals. Latency work is expected to optimize transport, scheduling, buffering, and rendering costs without proposing terminal virtualization or deactivation for visible sessions.

4. Workspace Surfaces Around the Terminal

Sidebar and Layout

The sidebar is a full control surface, not just a tab strip. It handles:

  • create/settings/history entry points
  • session rename, close, bookmark, inject-guidance, and undock actions
  • session ordering and drag-to-layout docking
  • update notices, voice controls, network/share helpers, and footer telemetry
  • mobile open/close behavior and desktop collapse/resize persistence

The layout subsystem stores split trees in backend state and reattaches sessions into panes without resizing them behind the user's back.

Files, Git, and Commands

Each session wrapper adds:

  • a Files tab with a cwd-rooted tree, previews, syntax-highlighted text viewing, and inline save
  • git status summaries with sectioned file lists, hierarchical trees, dock-native diff/commit inspection, and terminal command handoff for write actions
  • a commands panel for saved scripts that run in hidden backing sessions

Command Bay

The Command Bay is the shared active-session footer system beneath Terminal and Lens. It is the superset that now contains the old Smart Input composer, the old automation bar (formerly the middle manager bar), the old Lens quick settings strip, the embedded touch controller path, attachment/media affordances, and the small session status controls. It exists because MidTerm no longer treats those pieces as unrelated bars stacked under the pane.

  • the primary rail hosts Smart Input / the composer when input is visible
  • the automation rail hosts the old automation bar and keeps it to one line with overflow instead of wrapping into extra toolbar bands; on cramped mobile Terminal layouts it may collapse visible action chips into overflow-first chrome rather than spending a full inline row on them
  • the Command Bay queue is backend-owned and persists queued work per session so follow-up prompts and Automation Bar items survive browser disconnects or reconnects
  • Terminal queue draining is heat-gated: one queued item may dispatch when heat falls below 25%, then the session must rearm above that threshold before the next queued item can drain
  • explicit Lens queue draining is turn-gated: one queued item may dispatch only after the current provider turn has settled back to the user
  • the context rail hosts attachment/media controls for mobile Lens or terminal special keys from the touch controller for mobile Terminal, including the collapsed special-keys toggle when the full key row is hidden
  • the status rail hosts Lens model / effort / plan / permission awareness or other compact terminal state pills without forcing a dedicated extra row just to reopen special keys
  • mobile Terminal keeps the compact status rail above the expanded special-keys grid so the keys toggle and automation proxies stay on the same header row while the key grid opens beneath them
  • Lens always uses the Command Bay; Terminal may show the full bay, a reduced bay, or only automation depending on Smart Input mode
  • Lens keeps model / effort / plan awareness visible at all times even when the editable controls collapse on mobile
  • desktop Terminal assumes a hardware keyboard and therefore does not surface cursor-key buttons in the Command Bay
  • mobile Terminal may expand or collapse terminal special keys without changing Terminal size ownership rules
  • desktop glass styling follows terminal transparency; mobile Command Bay stays solid for contrast and touch reliability
  • the Command Bay itself must reserve space beneath Terminal or Lens instead of floating over session content
  • only the prompt textbox's extra multiline growth may overflow upward over the pane; command-bay rails and visible command-bay panels must not hide session content underneath
  • on Android and iOS, the Command Bay must stay attached to the visual viewport above the on-screen keyboard; when space gets tight it should compress and scroll internally instead of slipping under the OSK
  • voice capture still hangs off the Smart Input mic affordance, with the current experimental gating unchanged
  • the mobile action menu still mirrors common quick actions, but the Command Bay is the primary active-session interaction shell
  • mobile Lens uses automation above context controls; other permutations keep the default primary -> context -> automation -> status flow
  • document Picture-in-Picture remains separate from the Command Bay and can still show a miniature live terminal when the app backgrounds on supported mobile browsers

Agent Conversation Surface

Lens is MidTerm's conversation-first surface for agent-controlled sessions. Architecturally it stays thin on purpose:

  • the canonical turn, request, and stream state still belongs to the backend Lens runtime
  • the frontend Lens panel renders that state as provider-backed history/timeline UI without taking ownership away from Terminal
  • when live attach is unavailable, Lens can stay open on read-only history or a terminal-buffer fallback instead of pretending the conversation lane is authoritative
  • Lens is currently dev-gated in the session tabs while the UX is still being refined

The boundary between Terminal and Lens is a core design rule:

  • a plain terminal session remains terminal-owned even if its foreground process is codex, claude, or another AI CLI
  • foreground process detection may label, summarize, or describe a session, but it must not by itself promote that session into Lens
  • only sessions explicitly created as Lens sessions should expose provider-primary tabs such as Codex or Claude
  • the IDE bar is exclusive by surface: terminal sessions show Terminal plus Files, while explicit Lens sessions show the provider tab plus Files

Lens Provider Runtime Decision

For provider-backed Lens sessions, MidTerm should treat the provider runtime as the source of truth instead of trying to reconstruct an agent conversation from PTY output.

Terminology matters here:

  • history means the canonical provider-backed ordered sequence of Lens items
  • timeline means the rendered web presentation of that history
  • transcript is reserved for PTY/terminal capture or unavoidable legacy wire/schema names, not Lens semantics

That means:

  • an explicit Codex or Claude Lens session owns a dedicated Lens runtime for that provider
  • mtagenthost is the intended MidTerm host/runtime boundary for those provider-backed Lens sessions
  • explicit Lens sessions do not use mthost and do not gain terminal access through the PTY layer
  • the runtime launches or attaches using the provider's supported structured protocol
  • MidTerm normalizes that provider traffic into canonical Lens turn, item, request, stream, and diff events
  • the Lens UI renders those canonical events and snapshots as a conversation surface
  • the terminal remains a separate surface with separate ownership and behavior

This rule exists to prevent a class of design failures:

  • terminal transcripts are not a reliable protocol boundary
  • foreground process detection is not enough to define conversation identity
  • Lens is not a terminal transcript view and must not treat PTY stdout/stderr as its authoritative event stream
  • screen-scraping or buffer-parsing makes streaming, tool lifecycle, approvals, plan-mode questions, and diff state fragile
  • terminal behavior and Lens behavior become entangled unless the runtime boundary is explicit

The correct architectural direction is therefore:

  • Terminal stays terminal-native
  • Lens stays provider-runtime-native through mtagenthost plus provider APIs and structured protocols intended for rich UI clients
  • mthost is for real terminals; mtagenthost is for explicit provider Lens sessions
  • canonical Lens events bridge the runtime and the web UI

Lens Sync Transport

Lens sync is now owned by a dedicated /ws/lens channel rather than REST snapshot polling plus SSE.

  • HTTP remains for explicit Lens session creation/bootstrap only
  • after session start, Lens attach, snapshot reads, history window reads, turn submission, interrupts, approvals, and user-input answers all flow through /ws/lens
  • mt remains the state master and durable owner of canonical Lens history plus the derived live read model
  • the browser keeps one multiplexed Lens socket and can subscribe to many Lens sessions at once
  • Lens history is synchronized as a windowed read model, not as a full-history replay on every reconnect
  • reconnect starts from a fresh bounded history window, usually anchored at the live bottom, then resumes ordered live events
  • the frontend stays provider-neutral and does not reconstruct Lens state from PTY output or provider-specific raw transports

Lens History Ownership And Byte Budget

Provider-backed Lens runtimes can emit huge amounts of low-value transport noise: repetitive progress chatter, superseded intermediate states, raw command stdout, and full file bodies that are far larger than any useful on-screen view.

Lens must therefore enforce a strict ownership and byte-budget model:

  • mtagenthost and MidTerm own the in-flight provider reduction path plus the canonical derived Lens history
  • the browser does not own full Lens history and must not accumulate the full provider event stream in memory
  • the browser consumes a bounded view window over canonical history, not an unbounded raw-event feed
  • multiple browsers may view the same Lens session concurrently, but each browser owns only its own local viewport/window state
  • browser scrolling is a read-window operation against MidTerm-owned canonical history, not a request for provider raw-event replay

This leads to the following transport rules:

  • raw provider payloads are transient reducer inputs, not retained Lens history
  • giant file bodies, giant command stdout blobs, and repetitive transport chatter must be summarized, windowed, or suppressed before they become canonical history rows
  • the canonical Lens history should preserve what a human needs to understand the work, not every raw provider emission
  • /ws/lens should transport only:
    • the currently materialized history slice
    • stable total-count/window metadata
    • live deltas that affect rows already in or near the active slice
    • explicit older/newer window fetch results when requested
  • scrolling one browser must not force all other browsers to download the same older slices
  • hidden/background browsers should collapse back to a latest anchored slice and stop retaining wide browser-side history windows

The architectural target is:

  • one canonical history store in MidTerm
  • MidTerm durability uses canonical reduced Lens state, not appended provider-shaped event logs
  • one bounded visible history window per browser/session view
  • deterministic fetches for arbitrary older/newer portions of that history
  • minimal duplicated byte transfer across reconnects and across multiple browsers

Lens History Reduction Policy

MidTerm needs an explicit reduction layer between raw provider events and canonical Lens history.

Canonical history should keep:

  • user prompts and durable assistant output
  • stable tool identity and meaningful tool lifecycle state
  • compact command invocations plus bounded output summaries
  • compact file-read/file-change summaries and working diffs
  • approvals, plan-mode questions, user-input requests, and their resolutions
  • durable runtime notices that materially affect operator understanding

Canonical history should usually reduce or suppress:

  • repetitive in-progress status chatter that conveys no new operator value
  • duplicate final content that only restates already-streamed material
  • full raw command/file payloads when a bounded summary or excerpt is sufficient
  • transport-level noise that exists only because of provider protocol granularity
  • superseded intermediate states once the canonical row has settled
  • any content that is neither shown later nor required to determine what is shown later

Where giant payloads exist, MidTerm should prefer:

  • command invocation + bounded tail/head window + omitted-line markers
  • file-read path + excerpt policy + compact preview, not full file body
  • summarized tool output for timeline rendering instead of hidden retained raw payloads
  • canonical identity-preserving row updates instead of spawning many noisy sibling rows

Lens Screen Logs

For UI iteration and bug discussion, Lens also emits a dev-only per-session screen log derived from the same canonical backend history model that drives /ws/lens.

  • the screen log is written by MidTerm, not by the browser
  • one GUID-named log file is created per Lens session under the normal MidTerm log root
  • records are screen-oriented and capture rendered-history facts such as kind, label, title, meta, body, render mode, and collapsed-by-default hints
  • raw tool output should be summarized before it reaches both the Lens timeline and the screen log, and duplicate no-op screen states should not be re-logged
  • raw provider payloads and PTY output are not the screen log contract

Lens UX Target And DOD

The intended Definition of Done for provider-backed Lens sessions is:

  1. A user can create a new session in MidTerm and explicitly choose Codex or Claude.
  2. The session opens on the provider Lens surface with the Smart Input / composer visible.
  3. MidTerm shows a subtle ready indication when the provider runtime is connected and able to accept a prompt.
  4. The user can submit a prompt from the Lens composer without switching to Terminal.
  5. Assistant output streams into the Lens history/timeline incrementally as it is generated, rather than appearing only after full completion.
  6. Tool activity is visible as it happens, including starts, updates, completions, approvals, and user-input questions.
  7. File edits and working diff updates are surfaced live in the Lens UI.
  8. Plan-mode or equivalent provider-driven question flows appear as first-class Lens interactions, not as raw terminal text.
  9. The full Lens experience is implemented without hijacking or reclassifying normal terminal sessions.

In practical terms, the user should experience Lens as a polished web conversation surface for explicit provider sessions, with the same functional breadth as the provider CLI, while Terminal remains an independent real terminal.

The visual and interaction design rules for that Lens surface are maintained separately in LensDesign.md. Architecture decisions belong here; the concrete Lens UX contract, hierarchy, history/timeline behavior, and performance-oriented rendering rules belong in that design document and should evolve alongside implementation.

5. Web Preview and Browser Automation

Web preview is its own subsystem, not a simple iframe wrapper.

Preview Model

Each terminal session can own multiple named previews. Every named preview keeps separate:

  • target URL
  • proxy route key
  • cookie jar
  • detached/docked state
  • proxy log
  • browser bridge client identity

Previews can be hidden, docked beside the terminal, or detached into a dedicated popup window.

Reverse Proxy

The preview proxy rewrites outgoing browser-side requests so the embedded app stays inside /webpreview/{routeKey}/.... The injected runtime handles:

  • fetch
  • XHR
  • WebSocket and EventSource
  • history mutations
  • DOM src / href / action writes

HTTP and HTML handling are separate from WebSocket relay. HTTP responses may be rewritten or augmented; WebSocket payloads are intentionally relayed without content rewriting.

Browser Bridge

MidTerm also exposes browser-control APIs and CLI helpers for the current preview client. That bridge is preview-scoped, not global, so browser actions target the intended session and preview.

The same design principle now applies to native sidecars: mtagenthost processes are launched with the current MidTerm instance identity so auxiliary session runtimes stay aligned with the owning mt instance.

Available operations include:

  • open, dock, detach, and viewport changes
  • DOM query/click/fill/submit
  • script execution and wait operations
  • screenshot, snapshot, outline, attrs, CSS, forms, links, and proxy-log flows

For deeper implementation detail, see devbrowser.md.

6. Settings, Data Model, and Storage

Public vs Internal Settings

MidTerm uses two settings models:

  • MidTermSettings for internal state, including secrets and platform-only details
  • MidTermSettingsPublic for the API-safe subset exposed to the browser

That separation prevents accidental secret exposure even if serialization or endpoint code changes.

Settings Transport

Settings are:

  • loaded from disk on the server
  • served to clients during bootstrap
  • edited through the settings API
  • synchronized live over /ws/settings

The frontend settings registry defines editability, apply mode, control ownership, and special writers such as background-image upload/delete flows.

Storage Boundaries

MidTerm uses a mix of server-side and browser-side storage:

Area Storage
Server settings settings.json
Secrets platform-specific secret storage
Certificates and keys settings directory plus protected key storage
History and share data server-side files/services
Split layout server-side session-layout.json
Sidebar width/collapse cookies
Smart Input/chat/touch prefs browser localStorage
Preview snapshots .midterm/snapshot_* under the working tree

7. Security and Remote Access

MidTerm assumes that anyone who reaches the UI could gain shell access, so the design layers multiple controls.

Authentication

  • PBKDF2-SHA256 password hashing
  • fixed-time comparison for secrets
  • signed session cookies
  • rate limiting on failed logins
  • session invalidation on password changes

Secret Storage

Platform Secret storage
Windows DPAPI-backed secrets.bin
macOS user mode Keychain-backed storage
macOS service mode / Linux file-backed secret storage with restricted permissions

Certificates

MidTerm generates and manages a local HTTPS certificate, exposes trust helpers in the UI, and can download platform-friendly trust artifacts such as PEM output and Apple mobileconfig profiles.

Additional Security Surfaces

MidTerm also includes:

  • API-key management
  • run-as-user support for service installs
  • Windows firewall helpers
  • single-session share grants with expiry and scoped access modes
  • shared-session UI reduction so the recipient only sees the granted terminal context

8. Install and Update Pipeline

MidTerm treats installer and self-update reliability as part of the architecture, not an afterthought.

Installers

The root install.ps1 and install.sh scripts handle:

  • service mode versus user mode decisions
  • password setup, preservation, and intentional replacement during reinstall
  • certificate reuse plus trust flows for both newly generated and reused certificates
  • platform-specific install paths and service registration
  • channel selection and release download
  • update logging

Update Service

The update service reads version.json, checks GitHub releases, compares protocol/web/PTY versions, and classifies releases as:

  • web-only when only the web server/UI needs replacement
  • full when PTY compatibility or protocol changes require replacing mthost too

Generated Update Scripts

The update-script generator produces non-interactive scripts that:

  • stop services and running processes
  • wait for file handles to release
  • create backups of binaries, settings, secrets, and certificates
  • copy and verify replacement files
  • write logs and a structured result file
  • roll back if replacement or restart fails

That is how MidTerm can update installed systems without asking users to manually babysit file replacement.

9. Protocols and APIs

WebSockets

Endpoint Purpose
/ws/mux Binary multiplexed terminal I/O
/ws/state Session list, update state, and related JSON state pushes
/ws/settings Live settings synchronization

HTTP API Groups

Major API areas include:

  • auth and password management
  • bootstrap and system info
  • sessions, resize, names, bookmarks, clipboard image paste, guidance injection
  • files, tree browsing, viewing, and save
  • git and commands panels
  • certificates, trust assets, and share packets
  • share grants and shared-session bootstrap
  • browser preview and browser-control commands
  • update check/apply/result/log
  • diagnostics, logs, restart, and shutdown

MidTerm's API surface is large because the browser shell is a real workstation shell, not only a terminal transport.

10. Diagnostics and Operations

The diagnostics layer exposes:

  • server RTT
  • mthost RTT
  • output latency
  • latency and git debug overlays
  • settings, secrets, certificate, and log paths
  • settings reload and server restart actions
  • frontend logging helpers

Operationally, MidTerm also tracks update results, log files, session ordering, and preview proxy logs so users can debug the product from inside the product.

Related Documents