Architecture

MidTerm is a web-based terminal workspace built around a native server (mt), a per-session PTY host (mthost), and a browser frontend that adds layout, files, git, commands, web preview, mobile controls, and operations UI around live terminal sessions.

The important architectural point is that MidTerm is not only a terminal renderer. The browser shell coordinates multiple long-lived sessions, several WebSocket channels, local settings and storage, browser preview bridges, session sharing, and an installer/update pipeline that has to keep real user installs recoverable.

Runtime Topology

Browser
├─ xterm.js terminals
├─ sidebar, layout engine, files/git/commands panels
├─ Command Bay (smart input, automation bar, touch/mobile shell), diagnostics
├─ web preview iframe or detached preview window
├─ /ws/mux       binary terminal I/O
├─ /ws/state     JSON session/update state
├─ /ws/settings  JSON settings sync
└─ REST APIs for auth, sessions, files, preview, updates, logs
            │
            ▼
mt / mt.exe
├─ Kestrel HTTP + WebSocket host
├─ Session lifecycle + mux fanout
├─ settings, auth, share, cert, update, diagnostics services
├─ embedded static assets
└─ web preview proxy + browser bridge coordination
            │
            ▼
mthost / mthost.exe (one per session)
└─ PTY host for ConPTY on Windows or forkpty on Unix

1. Runtime Model

`mt`

mt is the long-lived server process. It owns:

HTTP endpoints, authentication, and static file serving
the terminal session registry and lifecycle
the per-instance ownership identity used to claim and reconnect only its own sidecars
mux fanout for terminal output and client input
settings persistence and settings WebSocket sync
updates, logs, diagnostics, certificate lifecycle, and share-link services
the web preview reverse proxy and preview/browser bridge routing

The server is compiled with Native AOT, uses source-generated JSON serialization, and keeps platform-specific behavior explicit rather than reflection-driven.

`mthost`

Each terminal session runs in its own mthost process. That gives MidTerm:

crash isolation between sessions
a clean privilege boundary between the web server and the PTY process
platform-specific PTY handling without pulling terminal lifecycle into the web host
the ability to restart or replace the web server separately from terminal hosts in web-only update flows

Instance Ownership Model

MidTerm now treats the connection between mt and mthost as an explicit ownership contract instead of a best-effort local reconnect.

every running mt instance loads a stable install-scope secret from the settings directory
the live instance identity is derived from that stable scope plus the configured port
mthost is launched with that instance identity and owner token
IPC endpoints are namespaced by instance identity, so side-by-side MidTerm instances on different ports do not enumerate each other's PTY hosts
after connecting, mt must still complete an attach handshake; mthost rejects foreign instances even if they somehow reach the endpoint
only a successfully attached owner is allowed to replace the current mt connection during reconnect

This is what allows multiple MidTerm installations or ports to run side by side while still keeping reconnect fast and deterministic.

Static Assets

Production assets are precompressed and embedded into the server assembly. MidTerm serves its frontend from memory instead of relying on a mutable on-disk web root.

2. Frontend Composition

MidTerm's frontend is vanilla TypeScript organized by feature modules rather than a component framework. main.ts wires the subsystems together at startup.

The browser shell includes:

sidebar modules for sessions, history, update notices, network/share, and voice controls
terminal modules for creation, sizing, search, paste/drop handling, scaling, and mobile PiP
layout modules for split panes and dock overlays
session wrappers that add Files tabs plus web, commands, share, git, and experimental Lens surfaces per session
feature panels for files, git, commands, and web preview
Command Bay modules for smart input, the automation bar, touch controller, Lens quick settings, and attachment/media affordances, plus chat, PWA, and diagnostics modules

State is split between:

nanostores for reactive shared state such as sessions, active session, settings, layout, and process metadata
module-local state for ephemeral UI concerns such as DOM handles, timers, drag state, preview clients, and pending buffers

That split keeps high-frequency terminal paths imperative while still allowing the rest of the UI to react to shared state changes.

3. Session and Terminal Pipeline

Session Lifecycle

Session creation, deletion, reordering, naming, bookmarking, sharing, and resize requests go through the server APIs and state WebSocket updates. The frontend renders the session list from live state instead of polling.

mt also persists an instance-owned session registry for PTY hosts. That registry is used on restart to reconnect directly to known mthost processes instead of adopting arbitrary local endpoints.

Mux Channel

/ws/mux carries multiplexed binary terminal traffic for every visible session. The server prioritizes the active session and can batch and compress background output.

Relevant frame families include:

output
input
resize
resync
compressed background output
active-session hint
foreground-process change
data-loss notification

Foreground Process and Session Metadata

MidTerm tracks foreground cwd, process, command line, and terminal title. That data feeds:

session naming fallbacks
per-session cwd display in the session bar
tab-title modes
history/bookmark labeling
session heat and activity presentation

Terminal Resize Principle

MidTerm intentionally does not auto-resize existing sessions just because another client connected or a page reloaded. MidTerm also treats terminal size ownership as a manual decision, not something the system should guess.

The model is:

One browser is the explicit leading browser for terminal sizing.
Only the leading browser may send authoritative server-side cols/rows.
New sessions are created at the best size for the leading browser's viewport, never from a follower's viewport.
Existing sessions keep their server-side dimensions until the leading browser explicitly changes them.
Secondary browsers CSS-scale terminals locally instead of sending resize commands.
Users explicitly claim size ownership from another browser when they want a different screen to become authoritative.
Disconnects, reconnects, inactivity, focus changes, visibility changes, or device changes must not automatically transfer size ownership.

This is what makes multi-device usage predictable instead of having one client constantly break another client's layout. The engineering goal is therefore twofold:

keep the leading browser's sizing path reliable for all relevant UI changes such as window resizes, panel open/close, layout changes, session switches, and new session creation
keep follower browsers strictly non-authoritative even when they render a different viewport more cleanly

Host Reconnect and Updates

MidTerm's PTY reconnect path is now split into two cases:

owned reconnect: mt reconnects to namespaced mthost endpoints belonging to its current instance identity
legacy import: after upgrading from older single-instance builds, mt can do a one-time import of pre-ownership mthost endpoints and then records them in its owned session registry

The legacy path exists so a full mt + mthost upgrade can keep already-running PTY hosts alive while the web server restarts. Once those legacy hosts exit, all newly spawned hosts use the owned endpoint namespace plus attach handshake.

Terminal UX Layer

Around the raw PTY stream, MidTerm adds:

font preloading and calibration terminals
WebGL-backed rendering when enabled
search UI with keyboard navigation
copy/paste and OSC52 clipboard support
image paste and file-drop handling
File Radar path detection with a per-session allowlist boundary
scrollback protection and visibility-aware focus handling

MidTerm intentionally keeps shown sessions as live terminals. Latency work is expected to optimize transport, scheduling, buffering, and rendering costs without proposing terminal virtualization or deactivation for visible sessions.

4. Workspace Surfaces Around the Terminal

Sidebar and Layout

The sidebar is a full control surface, not just a tab strip. It handles:

create/settings/history entry points
session rename, close, bookmark, inject-guidance, and undock actions
session ordering and drag-to-layout docking
update notices, voice controls, network/share helpers, and footer telemetry
mobile open/close behavior and desktop collapse/resize persistence

The layout subsystem stores split trees in backend state and reattaches sessions into panes without resizing them behind the user's back.

Files, Git, and Commands

Each session wrapper adds:

a Files tab with a cwd-rooted tree, previews, syntax-highlighted text viewing, and inline save
git status summaries with sectioned file lists, hierarchical trees, dock-native diff/commit inspection, and terminal command handoff for write actions
a commands panel for saved scripts that run in hidden backing sessions

Command Bay

The Command Bay is the shared active-session footer system beneath Terminal and Lens. It is the superset that now contains the old Smart Input composer, the old automation bar (formerly the middle manager bar), the old Lens quick settings strip, the embedded touch controller path, attachment/media affordances, and the small session status controls. It exists because MidTerm no longer treats those pieces as unrelated bars stacked under the pane.

the primary rail hosts Smart Input / the composer when input is visible
the automation rail hosts the old automation bar and keeps it to one line with overflow instead of wrapping into extra toolbar bands; on cramped mobile Terminal layouts it may collapse visible action chips into overflow-first chrome rather than spending a full inline row on them
the Command Bay queue is backend-owned and persists queued work per session so follow-up prompts and Automation Bar items survive browser disconnects or reconnects
Terminal queue draining is heat-gated: one queued item may dispatch when heat falls below 25%, then the session must rearm above that threshold before the next queued item can drain
explicit Lens queue draining is turn-gated: one queued item may dispatch only after the current provider turn has settled back to the user
the context rail hosts attachment/media controls for mobile Lens or terminal special keys from the touch controller for mobile Terminal, including the collapsed special-keys toggle when the full key row is hidden
the status rail hosts Lens model / effort / plan / permission awareness or other compact terminal state pills without forcing a dedicated extra row just to reopen special keys
mobile Terminal keeps the compact status rail above the expanded special-keys grid so the keys toggle and automation proxies stay on the same header row while the key grid opens beneath them
Lens always uses the Command Bay; Terminal may show the full bay, a reduced bay, or only automation depending on Smart Input mode
Lens keeps model / effort / plan awareness visible at all times even when the editable controls collapse on mobile
desktop Terminal assumes a hardware keyboard and therefore does not surface cursor-key buttons in the Command Bay
mobile Terminal may expand or collapse terminal special keys without changing Terminal size ownership rules
desktop glass styling follows terminal transparency; mobile Command Bay stays solid for contrast and touch reliability
the Command Bay itself must reserve space beneath Terminal or Lens instead of floating over session content
only the prompt textbox's extra multiline growth may overflow upward over the pane; command-bay rails and visible command-bay panels must not hide session content underneath
on Android and iOS, the Command Bay must stay attached to the visual viewport above the on-screen keyboard; when space gets tight it should compress and scroll internally instead of slipping under the OSK
voice capture still hangs off the Smart Input mic affordance, with the current experimental gating unchanged
the mobile action menu still mirrors common quick actions, but the Command Bay is the primary active-session interaction shell
mobile Lens uses automation above context controls; other permutations keep the default primary -> context -> automation -> status flow
document Picture-in-Picture remains separate from the Command Bay and can still show a miniature live terminal when the app backgrounds on supported mobile browsers

Agent Conversation Surface

Lens is MidTerm's conversation-first surface for agent-controlled sessions. Architecturally it stays thin on purpose:

the canonical turn, request, and stream state still belongs to the backend Lens runtime
the frontend Lens panel renders that state as provider-backed history/timeline UI without taking ownership away from Terminal
when live attach is unavailable, Lens can stay open on read-only history or a terminal-buffer fallback instead of pretending the conversation lane is authoritative
Lens is currently dev-gated in the session tabs while the UX is still being refined

The boundary between Terminal and Lens is a core design rule:

a plain terminal session remains terminal-owned even if its foreground process is codex, claude, or another AI CLI
foreground process detection may label, summarize, or describe a session, but it must not by itself promote that session into Lens
only sessions explicitly created as Lens sessions should expose provider-primary tabs such as Codex or Claude
the IDE bar is exclusive by surface: terminal sessions show Terminal plus Files, while explicit Lens sessions show the provider tab plus Files

Lens Provider Runtime Decision

For provider-backed Lens sessions, MidTerm should treat the provider runtime as the source of truth instead of trying to reconstruct an agent conversation from PTY output.

Terminology matters here:

history means the canonical provider-backed ordered sequence of Lens items
timeline means the rendered web presentation of that history
transcript is reserved for PTY/terminal capture or unavoidable legacy wire/schema names, not Lens semantics

That means:

an explicit Codex or Claude Lens session owns a dedicated Lens runtime for that provider
mtagenthost is the intended MidTerm host/runtime boundary for those provider-backed Lens sessions
explicit Lens sessions do not use mthost and do not gain terminal access through the PTY layer
the runtime launches or attaches using the provider's supported structured protocol
MidTerm normalizes that provider traffic into canonical Lens turn, item, request, stream, and diff events
the Lens UI renders those canonical events and snapshots as a conversation surface
the terminal remains a separate surface with separate ownership and behavior

This rule exists to prevent a class of design failures:

terminal transcripts are not a reliable protocol boundary
foreground process detection is not enough to define conversation identity
Lens is not a terminal transcript view and must not treat PTY stdout/stderr as its authoritative event stream
screen-scraping or buffer-parsing makes streaming, tool lifecycle, approvals, plan-mode questions, and diff state fragile
terminal behavior and Lens behavior become entangled unless the runtime boundary is explicit

The correct architectural direction is therefore:

Terminal stays terminal-native
Lens stays provider-runtime-native through mtagenthost plus provider APIs and structured protocols intended for rich UI clients
mthost is for real terminals; mtagenthost is for explicit provider Lens sessions
canonical Lens events bridge the runtime and the web UI

Lens Sync Transport

Lens sync is now owned by a dedicated /ws/lens channel rather than REST snapshot polling plus SSE.

HTTP remains for explicit Lens session creation/bootstrap only
after session start, Lens attach, snapshot reads, history window reads, turn submission, interrupts, approvals, and user-input answers all flow through /ws/lens
mt remains the state master and durable owner of canonical Lens history plus the derived live read model
the browser keeps one multiplexed Lens socket and can subscribe to many Lens sessions at once
Lens history is synchronized as a windowed read model, not as a full-history replay on every reconnect
reconnect starts from a fresh bounded history window, usually anchored at the live bottom, then resumes ordered live events
the frontend stays provider-neutral and does not reconstruct Lens state from PTY output or provider-specific raw transports

Lens History Ownership And Byte Budget

Provider-backed Lens runtimes can emit huge amounts of low-value transport noise: repetitive progress chatter, superseded intermediate states, raw command stdout, and full file bodies that are far larger than any useful on-screen view.

Lens must therefore enforce a strict ownership and byte-budget model:

mtagenthost and MidTerm own the in-flight provider reduction path plus the canonical derived Lens history
the browser does not own full Lens history and must not accumulate the full provider event stream in memory
the browser consumes a bounded view window over canonical history, not an unbounded raw-event feed
multiple browsers may view the same Lens session concurrently, but each browser owns only its own local viewport/window state
browser scrolling is a read-window operation against MidTerm-owned canonical history, not a request for provider raw-event replay

This leads to the following transport rules:

raw provider payloads are transient reducer inputs, not retained Lens history
giant file bodies, giant command stdout blobs, and repetitive transport chatter must be summarized, windowed, or suppressed before they become canonical history rows
the canonical Lens history should preserve what a human needs to understand the work, not every raw provider emission
/ws/lens should transport only:
- the currently materialized history slice
- stable total-count/window metadata
- live deltas that affect rows already in or near the active slice
- explicit older/newer window fetch results when requested
scrolling one browser must not force all other browsers to download the same older slices
hidden/background browsers should collapse back to a latest anchored slice and stop retaining wide browser-side history windows

The architectural target is:

one canonical history store in MidTerm
MidTerm durability uses canonical reduced Lens state, not appended provider-shaped event logs
one bounded visible history window per browser/session view
deterministic fetches for arbitrary older/newer portions of that history
minimal duplicated byte transfer across reconnects and across multiple browsers

Lens History Reduction Policy

MidTerm needs an explicit reduction layer between raw provider events and canonical Lens history.

Canonical history should keep:

user prompts and durable assistant output
stable tool identity and meaningful tool lifecycle state
compact command invocations plus bounded output summaries
compact file-read/file-change summaries and working diffs
approvals, plan-mode questions, user-input requests, and their resolutions
durable runtime notices that materially affect operator understanding

Canonical history should usually reduce or suppress:

repetitive in-progress status chatter that conveys no new operator value
duplicate final content that only restates already-streamed material
full raw command/file payloads when a bounded summary or excerpt is sufficient
transport-level noise that exists only because of provider protocol granularity
superseded intermediate states once the canonical row has settled
any content that is neither shown later nor required to determine what is shown later

Where giant payloads exist, MidTerm should prefer:

command invocation + bounded tail/head window + omitted-line markers
file-read path + excerpt policy + compact preview, not full file body
summarized tool output for timeline rendering instead of hidden retained raw payloads
canonical identity-preserving row updates instead of spawning many noisy sibling rows

Lens Screen Logs

For UI iteration and bug discussion, Lens also emits a dev-only per-session screen log derived from the same canonical backend history model that drives /ws/lens.

the screen log is written by MidTerm, not by the browser
one GUID-named log file is created per Lens session under the normal MidTerm log root
records are screen-oriented and capture rendered-history facts such as kind, label, title, meta, body, render mode, and collapsed-by-default hints
raw tool output should be summarized before it reaches both the Lens timeline and the screen log, and duplicate no-op screen states should not be re-logged
raw provider payloads and PTY output are not the screen log contract

Lens UX Target And DOD

The intended Definition of Done for provider-backed Lens sessions is:

A user can create a new session in MidTerm and explicitly choose Codex or Claude.
The session opens on the provider Lens surface with the Smart Input / composer visible.
MidTerm shows a subtle ready indication when the provider runtime is connected and able to accept a prompt.
The user can submit a prompt from the Lens composer without switching to Terminal.
Assistant output streams into the Lens history/timeline incrementally as it is generated, rather than appearing only after full completion.
Tool activity is visible as it happens, including starts, updates, completions, approvals, and user-input questions.
File edits and working diff updates are surfaced live in the Lens UI.
Plan-mode or equivalent provider-driven question flows appear as first-class Lens interactions, not as raw terminal text.
The full Lens experience is implemented without hijacking or reclassifying normal terminal sessions.

In practical terms, the user should experience Lens as a polished web conversation surface for explicit provider sessions, with the same functional breadth as the provider CLI, while Terminal remains an independent real terminal.

The visual and interaction design rules for that Lens surface are maintained separately in LensDesign.md. Architecture decisions belong here; the concrete Lens UX contract, hierarchy, history/timeline behavior, and performance-oriented rendering rules belong in that design document and should evolve alongside implementation.

5. Web Preview and Browser Automation

Web preview is its own subsystem, not a simple iframe wrapper.

Preview Model

Each terminal session can own multiple named previews. Every named preview keeps separate:

target URL
proxy route key
cookie jar
detached/docked state
proxy log
browser bridge client identity

Previews can be hidden, docked beside the terminal, or detached into a dedicated popup window.

Reverse Proxy

The preview proxy rewrites outgoing browser-side requests so the embedded app stays inside /webpreview/{routeKey}/.... The injected runtime handles:

fetch
XHR
WebSocket and EventSource
history mutations
DOM src / href / action writes

HTTP and HTML handling are separate from WebSocket relay. HTTP responses may be rewritten or augmented; WebSocket payloads are intentionally relayed without content rewriting.

Browser Bridge

MidTerm also exposes browser-control APIs and CLI helpers for the current preview client. That bridge is preview-scoped, not global, so browser actions target the intended session and preview.

The same design principle now applies to native sidecars: mtagenthost processes are launched with the current MidTerm instance identity so auxiliary session runtimes stay aligned with the owning mt instance.

Available operations include:

open, dock, detach, and viewport changes
DOM query/click/fill/submit
script execution and wait operations
screenshot, snapshot, outline, attrs, CSS, forms, links, and proxy-log flows

For deeper implementation detail, see devbrowser.md.

6. Settings, Data Model, and Storage

Public vs Internal Settings

MidTerm uses two settings models:

MidTermSettings for internal state, including secrets and platform-only details
MidTermSettingsPublic for the API-safe subset exposed to the browser

That separation prevents accidental secret exposure even if serialization or endpoint code changes.

Settings Transport

Settings are:

loaded from disk on the server
served to clients during bootstrap
edited through the settings API
synchronized live over /ws/settings

The frontend settings registry defines editability, apply mode, control ownership, and special writers such as background-image upload/delete flows.

Storage Boundaries

MidTerm uses a mix of server-side and browser-side storage:

Area	Storage
Server settings	`settings.json`
Secrets	platform-specific secret storage
Certificates and keys	settings directory plus protected key storage
History and share data	server-side files/services
Split layout	server-side `session-layout.json`
Sidebar width/collapse	cookies
Smart Input/chat/touch prefs	browser `localStorage`
Preview snapshots	`.midterm/snapshot_*` under the working tree

7. Security and Remote Access

MidTerm assumes that anyone who reaches the UI could gain shell access, so the design layers multiple controls.

Authentication

PBKDF2-SHA256 password hashing
fixed-time comparison for secrets
signed session cookies
rate limiting on failed logins
session invalidation on password changes

Secret Storage

Platform	Secret storage
Windows	DPAPI-backed `secrets.bin`
macOS user mode	Keychain-backed storage
macOS service mode / Linux	file-backed secret storage with restricted permissions

Certificates

MidTerm generates and manages a local HTTPS certificate, exposes trust helpers in the UI, and can download platform-friendly trust artifacts such as PEM output and Apple mobileconfig profiles.

Additional Security Surfaces

MidTerm also includes:

API-key management
run-as-user support for service installs
Windows firewall helpers
single-session share grants with expiry and scoped access modes
shared-session UI reduction so the recipient only sees the granted terminal context

8. Install and Update Pipeline

MidTerm treats installer and self-update reliability as part of the architecture, not an afterthought.

Installers

The root install.ps1 and install.sh scripts handle:

service mode versus user mode decisions
password setup, preservation, and intentional replacement during reinstall
certificate reuse plus trust flows for both newly generated and reused certificates
platform-specific install paths and service registration
channel selection and release download
update logging

Update Service

The update service reads version.json, checks GitHub releases, compares protocol/web/PTY versions, and classifies releases as:

web-only when only the web server/UI needs replacement
full when PTY compatibility or protocol changes require replacing mthost too

Generated Update Scripts

The update-script generator produces non-interactive scripts that:

stop services and running processes
wait for file handles to release
create backups of binaries, settings, secrets, and certificates
copy and verify replacement files
write logs and a structured result file
roll back if replacement or restart fails

That is how MidTerm can update installed systems without asking users to manually babysit file replacement.

9. Protocols and APIs

WebSockets

Endpoint	Purpose
`/ws/mux`	Binary multiplexed terminal I/O
`/ws/state`	Session list, update state, and related JSON state pushes
`/ws/settings`	Live settings synchronization

HTTP API Groups

Major API areas include:

auth and password management
bootstrap and system info
sessions, resize, names, bookmarks, clipboard image paste, guidance injection
files, tree browsing, viewing, and save
git and commands panels
certificates, trust assets, and share packets
share grants and shared-session bootstrap
browser preview and browser-control commands
update check/apply/result/log
diagnostics, logs, restart, and shutdown

MidTerm's API surface is large because the browser shell is a real workstation shell, not only a terminal transport.

10. Diagnostics and Operations

The diagnostics layer exposes:

server RTT
mthost RTT
output latency
latency and git debug overlays
settings, secrets, certificate, and log paths
settings reload and server restart actions
frontend logging helpers

Operationally, MidTerm also tracks update results, log files, session ordering, and preview proxy logs so users can debug the product from inside the product.

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History