MidTerm is a web-based terminal workspace built around a native server (mt), a per-session PTY host (mthost), and a browser frontend that adds layout, files, git, commands, web preview, mobile controls, and operations UI around live terminal sessions.
The important architectural point is that MidTerm is not only a terminal renderer. The browser shell coordinates multiple long-lived sessions, several WebSocket channels, local settings and storage, browser preview bridges, session sharing, and an installer/update pipeline that has to keep real user installs recoverable.
Browser
├─ xterm.js terminals
├─ sidebar, layout engine, files/git/commands panels
├─ Command Bay (smart input, automation bar, touch/mobile shell), diagnostics
├─ web preview iframe or detached preview window
├─ /ws/mux binary terminal I/O
├─ /ws/state JSON session/update state
├─ /ws/settings JSON settings sync
└─ REST APIs for auth, sessions, files, preview, updates, logs
│
▼
mt / mt.exe
├─ Kestrel HTTP + WebSocket host
├─ Session lifecycle + mux fanout
├─ settings, auth, share, cert, update, diagnostics services
├─ embedded static assets
└─ web preview proxy + browser bridge coordination
│
▼
mthost / mthost.exe (one per session)
└─ PTY host for ConPTY on Windows or forkpty on Unix
mt is the long-lived server process. It owns:
- HTTP endpoints, authentication, and static file serving
- the terminal session registry and lifecycle
- the per-instance ownership identity used to claim and reconnect only its own sidecars
- mux fanout for terminal output and client input
- settings persistence and settings WebSocket sync
- updates, logs, diagnostics, certificate lifecycle, and share-link services
- the web preview reverse proxy and preview/browser bridge routing
The server is compiled with Native AOT, uses source-generated JSON serialization, and keeps platform-specific behavior explicit rather than reflection-driven.
Each terminal session runs in its own mthost process. That gives MidTerm:
- crash isolation between sessions
- a clean privilege boundary between the web server and the PTY process
- platform-specific PTY handling without pulling terminal lifecycle into the web host
- the ability to restart or replace the web server separately from terminal hosts in web-only update flows
MidTerm now treats the connection between mt and mthost as an explicit ownership contract instead of a best-effort local reconnect.
- every running
mtinstance loads a stable install-scope secret from the settings directory - the live instance identity is derived from that stable scope plus the configured port
mthostis launched with that instance identity and owner token- IPC endpoints are namespaced by instance identity, so side-by-side MidTerm instances on different ports do not enumerate each other's PTY hosts
- after connecting,
mtmust still complete an attach handshake;mthostrejects foreign instances even if they somehow reach the endpoint - only a successfully attached owner is allowed to replace the current
mtconnection during reconnect
This is what allows multiple MidTerm installations or ports to run side by side while still keeping reconnect fast and deterministic.
Production assets are precompressed and embedded into the server assembly. MidTerm serves its frontend from memory instead of relying on a mutable on-disk web root.
MidTerm's frontend is vanilla TypeScript organized by feature modules rather than a component framework. main.ts wires the subsystems together at startup.
The browser shell includes:
- sidebar modules for sessions, history, update notices, network/share, and voice controls
- terminal modules for creation, sizing, search, paste/drop handling, scaling, and mobile PiP
- layout modules for split panes and dock overlays
- session wrappers that add Files tabs plus web, commands, share, git, and experimental Lens surfaces per session
- feature panels for files, git, commands, and web preview
- Command Bay modules for smart input, the automation bar, touch controller, Lens quick settings, and attachment/media affordances, plus chat, PWA, and diagnostics modules
State is split between:
- nanostores for reactive shared state such as sessions, active session, settings, layout, and process metadata
- module-local state for ephemeral UI concerns such as DOM handles, timers, drag state, preview clients, and pending buffers
That split keeps high-frequency terminal paths imperative while still allowing the rest of the UI to react to shared state changes.
Session creation, deletion, reordering, naming, bookmarking, sharing, and resize requests go through the server APIs and state WebSocket updates. The frontend renders the session list from live state instead of polling.
mt also persists an instance-owned session registry for PTY hosts. That registry is used on restart to reconnect directly to known mthost processes instead of adopting arbitrary local endpoints.
/ws/mux carries multiplexed binary terminal traffic for every visible session. The server prioritizes the active session and can batch and compress background output.
Relevant frame families include:
- output
- input
- resize
- resync
- compressed background output
- active-session hint
- foreground-process change
- data-loss notification
MidTerm tracks foreground cwd, process, command line, and terminal title. That data feeds:
- session naming fallbacks
- per-session cwd display in the session bar
- tab-title modes
- history/bookmark labeling
- session heat and activity presentation
MidTerm intentionally does not auto-resize existing sessions just because another client connected or a page reloaded. MidTerm also treats terminal size ownership as a manual decision, not something the system should guess.
The model is:
- One browser is the explicit leading browser for terminal sizing.
- Only the leading browser may send authoritative server-side
cols/rows. - New sessions are created at the best size for the leading browser's viewport, never from a follower's viewport.
- Existing sessions keep their server-side dimensions until the leading browser explicitly changes them.
- Secondary browsers CSS-scale terminals locally instead of sending resize commands.
- Users explicitly claim size ownership from another browser when they want a different screen to become authoritative.
- Disconnects, reconnects, inactivity, focus changes, visibility changes, or device changes must not automatically transfer size ownership.
This is what makes multi-device usage predictable instead of having one client constantly break another client's layout. The engineering goal is therefore twofold:
- keep the leading browser's sizing path reliable for all relevant UI changes such as window resizes, panel open/close, layout changes, session switches, and new session creation
- keep follower browsers strictly non-authoritative even when they render a different viewport more cleanly
MidTerm's PTY reconnect path is now split into two cases:
- owned reconnect:
mtreconnects to namespacedmthostendpoints belonging to its current instance identity - legacy import: after upgrading from older single-instance builds,
mtcan do a one-time import of pre-ownershipmthostendpoints and then records them in its owned session registry
The legacy path exists so a full mt + mthost upgrade can keep already-running PTY hosts alive while the web server restarts. Once those legacy hosts exit, all newly spawned hosts use the owned endpoint namespace plus attach handshake.
Around the raw PTY stream, MidTerm adds:
- font preloading and calibration terminals
- WebGL-backed rendering when enabled
- search UI with keyboard navigation
- copy/paste and OSC52 clipboard support
- image paste and file-drop handling
- File Radar path detection with a per-session allowlist boundary
- scrollback protection and visibility-aware focus handling
MidTerm intentionally keeps shown sessions as live terminals. Latency work is expected to optimize transport, scheduling, buffering, and rendering costs without proposing terminal virtualization or deactivation for visible sessions.
The sidebar is a full control surface, not just a tab strip. It handles:
- create/settings/history entry points
- session rename, close, bookmark, inject-guidance, and undock actions
- session ordering and drag-to-layout docking
- update notices, voice controls, network/share helpers, and footer telemetry
- mobile open/close behavior and desktop collapse/resize persistence
The layout subsystem stores split trees in backend state and reattaches sessions into panes without resizing them behind the user's back.
Each session wrapper adds:
- a Files tab with a cwd-rooted tree, previews, syntax-highlighted text viewing, and inline save
- git status summaries with sectioned file lists, hierarchical trees, dock-native diff/commit inspection, and terminal command handoff for write actions
- a commands panel for saved scripts that run in hidden backing sessions
The Command Bay is the shared active-session footer system beneath Terminal and Lens. It is the superset that now contains the old Smart Input composer, the old automation bar (formerly the middle manager bar), the old Lens quick settings strip, the embedded touch controller path, attachment/media affordances, and the small session status controls. It exists because MidTerm no longer treats those pieces as unrelated bars stacked under the pane.
- the primary rail hosts Smart Input / the composer when input is visible
- the automation rail hosts the old automation bar and keeps it to one line with overflow instead of wrapping into extra toolbar bands; on cramped mobile Terminal layouts it may collapse visible action chips into overflow-first chrome rather than spending a full inline row on them
- the Command Bay queue is backend-owned and persists queued work per session so follow-up prompts and Automation Bar items survive browser disconnects or reconnects
- Terminal queue draining is heat-gated: one queued item may dispatch when heat falls below 25%, then the session must rearm above that threshold before the next queued item can drain
- explicit Lens queue draining is turn-gated: one queued item may dispatch only after the current provider turn has settled back to the user
- the context rail hosts attachment/media controls for mobile Lens or terminal special keys from the touch controller for mobile Terminal, including the collapsed special-keys toggle when the full key row is hidden
- the status rail hosts Lens model / effort / plan / permission awareness or other compact terminal state pills without forcing a dedicated extra row just to reopen special keys
- mobile Terminal keeps the compact status rail above the expanded special-keys grid so the keys toggle and automation proxies stay on the same header row while the key grid opens beneath them
- Lens always uses the Command Bay; Terminal may show the full bay, a reduced bay, or only automation depending on Smart Input mode
- Lens keeps model / effort / plan awareness visible at all times even when the editable controls collapse on mobile
- desktop Terminal assumes a hardware keyboard and therefore does not surface cursor-key buttons in the Command Bay
- mobile Terminal may expand or collapse terminal special keys without changing Terminal size ownership rules
- desktop glass styling follows terminal transparency; mobile Command Bay stays solid for contrast and touch reliability
- the Command Bay itself must reserve space beneath Terminal or Lens instead of floating over session content
- only the prompt textbox's extra multiline growth may overflow upward over the pane; command-bay rails and visible command-bay panels must not hide session content underneath
- on Android and iOS, the Command Bay must stay attached to the visual viewport above the on-screen keyboard; when space gets tight it should compress and scroll internally instead of slipping under the OSK
- voice capture still hangs off the Smart Input mic affordance, with the current experimental gating unchanged
- the mobile action menu still mirrors common quick actions, but the Command Bay is the primary active-session interaction shell
- mobile Lens uses automation above context controls; other permutations keep the default primary -> context -> automation -> status flow
- document Picture-in-Picture remains separate from the Command Bay and can still show a miniature live terminal when the app backgrounds on supported mobile browsers
Lens is MidTerm's conversation-first surface for agent-controlled sessions. Architecturally it stays thin on purpose:
- the canonical turn, request, and stream state still belongs to the backend Lens runtime
- the frontend Lens panel renders that state as provider-backed history/timeline UI without taking ownership away from Terminal
- when live attach is unavailable, Lens can stay open on read-only history or a terminal-buffer fallback instead of pretending the conversation lane is authoritative
- Lens is currently dev-gated in the session tabs while the UX is still being refined
The boundary between Terminal and Lens is a core design rule:
- a plain terminal session remains terminal-owned even if its foreground process is
codex,claude, or another AI CLI - foreground process detection may label, summarize, or describe a session, but it must not by itself promote that session into Lens
- only sessions explicitly created as Lens sessions should expose provider-primary tabs such as
CodexorClaude - the IDE bar is exclusive by surface: terminal sessions show
TerminalplusFiles, while explicit Lens sessions show the provider tab plusFiles
For provider-backed Lens sessions, MidTerm should treat the provider runtime as the source of truth instead of trying to reconstruct an agent conversation from PTY output.
Terminology matters here:
historymeans the canonical provider-backed ordered sequence of Lens itemstimelinemeans the rendered web presentation of that historytranscriptis reserved for PTY/terminal capture or unavoidable legacy wire/schema names, not Lens semantics
That means:
- an explicit Codex or Claude Lens session owns a dedicated Lens runtime for that provider
mtagenthostis the intended MidTerm host/runtime boundary for those provider-backed Lens sessions- explicit Lens sessions do not use
mthostand do not gain terminal access through the PTY layer - the runtime launches or attaches using the provider's supported structured protocol
- MidTerm normalizes that provider traffic into canonical Lens turn, item, request, stream, and diff events
- the Lens UI renders those canonical events and snapshots as a conversation surface
- the terminal remains a separate surface with separate ownership and behavior
This rule exists to prevent a class of design failures:
- terminal transcripts are not a reliable protocol boundary
- foreground process detection is not enough to define conversation identity
- Lens is not a terminal transcript view and must not treat PTY stdout/stderr as its authoritative event stream
- screen-scraping or buffer-parsing makes streaming, tool lifecycle, approvals, plan-mode questions, and diff state fragile
- terminal behavior and Lens behavior become entangled unless the runtime boundary is explicit
The correct architectural direction is therefore:
- Terminal stays terminal-native
- Lens stays provider-runtime-native through
mtagenthostplus provider APIs and structured protocols intended for rich UI clients mthostis for real terminals;mtagenthostis for explicit provider Lens sessions- canonical Lens events bridge the runtime and the web UI
Lens sync is now owned by a dedicated /ws/lens channel rather than REST snapshot polling plus SSE.
- HTTP remains for explicit Lens session creation/bootstrap only
- after session start, Lens attach, snapshot reads, history window reads, turn submission, interrupts, approvals, and user-input answers all flow through
/ws/lens mtremains the state master and durable owner of canonical Lens history plus the derived live read model- the browser keeps one multiplexed Lens socket and can subscribe to many Lens sessions at once
- Lens history is synchronized as a windowed read model, not as a full-history replay on every reconnect
- reconnect starts from a fresh bounded history window, usually anchored at the live bottom, then resumes ordered live events
- the frontend stays provider-neutral and does not reconstruct Lens state from PTY output or provider-specific raw transports
Provider-backed Lens runtimes can emit huge amounts of low-value transport noise: repetitive progress chatter, superseded intermediate states, raw command stdout, and full file bodies that are far larger than any useful on-screen view.
Lens must therefore enforce a strict ownership and byte-budget model:
mtagenthostand MidTerm own the in-flight provider reduction path plus the canonical derived Lens history- the browser does not own full Lens history and must not accumulate the full provider event stream in memory
- the browser consumes a bounded view window over canonical history, not an unbounded raw-event feed
- multiple browsers may view the same Lens session concurrently, but each browser owns only its own local viewport/window state
- browser scrolling is a read-window operation against MidTerm-owned canonical history, not a request for provider raw-event replay
This leads to the following transport rules:
- raw provider payloads are transient reducer inputs, not retained Lens history
- giant file bodies, giant command stdout blobs, and repetitive transport chatter must be summarized, windowed, or suppressed before they become canonical history rows
- the canonical Lens history should preserve what a human needs to understand the work, not every raw provider emission
/ws/lensshould transport only:- the currently materialized history slice
- stable total-count/window metadata
- live deltas that affect rows already in or near the active slice
- explicit older/newer window fetch results when requested
- scrolling one browser must not force all other browsers to download the same older slices
- hidden/background browsers should collapse back to a latest anchored slice and stop retaining wide browser-side history windows
The architectural target is:
- one canonical history store in MidTerm
- MidTerm durability uses canonical reduced Lens state, not appended provider-shaped event logs
- one bounded visible history window per browser/session view
- deterministic fetches for arbitrary older/newer portions of that history
- minimal duplicated byte transfer across reconnects and across multiple browsers
MidTerm needs an explicit reduction layer between raw provider events and canonical Lens history.
Canonical history should keep:
- user prompts and durable assistant output
- stable tool identity and meaningful tool lifecycle state
- compact command invocations plus bounded output summaries
- compact file-read/file-change summaries and working diffs
- approvals, plan-mode questions, user-input requests, and their resolutions
- durable runtime notices that materially affect operator understanding
Canonical history should usually reduce or suppress:
- repetitive in-progress status chatter that conveys no new operator value
- duplicate final content that only restates already-streamed material
- full raw command/file payloads when a bounded summary or excerpt is sufficient
- transport-level noise that exists only because of provider protocol granularity
- superseded intermediate states once the canonical row has settled
- any content that is neither shown later nor required to determine what is shown later
Where giant payloads exist, MidTerm should prefer:
- command invocation + bounded tail/head window + omitted-line markers
- file-read path + excerpt policy + compact preview, not full file body
- summarized tool output for timeline rendering instead of hidden retained raw payloads
- canonical identity-preserving row updates instead of spawning many noisy sibling rows
For UI iteration and bug discussion, Lens also emits a dev-only per-session screen log derived from the same canonical backend history model that drives /ws/lens.
- the screen log is written by MidTerm, not by the browser
- one GUID-named log file is created per Lens session under the normal MidTerm log root
- records are screen-oriented and capture rendered-history facts such as kind, label, title, meta, body, render mode, and collapsed-by-default hints
- raw tool output should be summarized before it reaches both the Lens timeline and the screen log, and duplicate no-op screen states should not be re-logged
- raw provider payloads and PTY output are not the screen log contract
The intended Definition of Done for provider-backed Lens sessions is:
- A user can create a new session in MidTerm and explicitly choose
CodexorClaude. - The session opens on the provider Lens surface with the Smart Input / composer visible.
- MidTerm shows a subtle ready indication when the provider runtime is connected and able to accept a prompt.
- The user can submit a prompt from the Lens composer without switching to Terminal.
- Assistant output streams into the Lens history/timeline incrementally as it is generated, rather than appearing only after full completion.
- Tool activity is visible as it happens, including starts, updates, completions, approvals, and user-input questions.
- File edits and working diff updates are surfaced live in the Lens UI.
- Plan-mode or equivalent provider-driven question flows appear as first-class Lens interactions, not as raw terminal text.
- The full Lens experience is implemented without hijacking or reclassifying normal terminal sessions.
In practical terms, the user should experience Lens as a polished web conversation surface for explicit provider sessions, with the same functional breadth as the provider CLI, while Terminal remains an independent real terminal.
The visual and interaction design rules for that Lens surface are maintained separately in LensDesign.md. Architecture decisions belong here; the concrete Lens UX contract, hierarchy, history/timeline behavior, and performance-oriented rendering rules belong in that design document and should evolve alongside implementation.
Web preview is its own subsystem, not a simple iframe wrapper.
Each terminal session can own multiple named previews. Every named preview keeps separate:
- target URL
- proxy route key
- cookie jar
- detached/docked state
- proxy log
- browser bridge client identity
Previews can be hidden, docked beside the terminal, or detached into a dedicated popup window.
The preview proxy rewrites outgoing browser-side requests so the embedded app stays inside /webpreview/{routeKey}/.... The injected runtime handles:
fetch- XHR
- WebSocket and
EventSource - history mutations
- DOM
src/href/actionwrites
HTTP and HTML handling are separate from WebSocket relay. HTTP responses may be rewritten or augmented; WebSocket payloads are intentionally relayed without content rewriting.
MidTerm also exposes browser-control APIs and CLI helpers for the current preview client. That bridge is preview-scoped, not global, so browser actions target the intended session and preview.
The same design principle now applies to native sidecars: mtagenthost processes are launched with the current MidTerm instance identity so auxiliary session runtimes stay aligned with the owning mt instance.
Available operations include:
- open, dock, detach, and viewport changes
- DOM query/click/fill/submit
- script execution and wait operations
- screenshot, snapshot, outline, attrs, CSS, forms, links, and proxy-log flows
For deeper implementation detail, see devbrowser.md.
MidTerm uses two settings models:
MidTermSettingsfor internal state, including secrets and platform-only detailsMidTermSettingsPublicfor the API-safe subset exposed to the browser
That separation prevents accidental secret exposure even if serialization or endpoint code changes.
Settings are:
- loaded from disk on the server
- served to clients during bootstrap
- edited through the settings API
- synchronized live over
/ws/settings
The frontend settings registry defines editability, apply mode, control ownership, and special writers such as background-image upload/delete flows.
MidTerm uses a mix of server-side and browser-side storage:
| Area | Storage |
|---|---|
| Server settings | settings.json |
| Secrets | platform-specific secret storage |
| Certificates and keys | settings directory plus protected key storage |
| History and share data | server-side files/services |
| Split layout | server-side session-layout.json |
| Sidebar width/collapse | cookies |
| Smart Input/chat/touch prefs | browser localStorage |
| Preview snapshots | .midterm/snapshot_* under the working tree |
MidTerm assumes that anyone who reaches the UI could gain shell access, so the design layers multiple controls.
- PBKDF2-SHA256 password hashing
- fixed-time comparison for secrets
- signed session cookies
- rate limiting on failed logins
- session invalidation on password changes
| Platform | Secret storage |
|---|---|
| Windows | DPAPI-backed secrets.bin |
| macOS user mode | Keychain-backed storage |
| macOS service mode / Linux | file-backed secret storage with restricted permissions |
MidTerm generates and manages a local HTTPS certificate, exposes trust helpers in the UI, and can download platform-friendly trust artifacts such as PEM output and Apple mobileconfig profiles.
MidTerm also includes:
- API-key management
- run-as-user support for service installs
- Windows firewall helpers
- single-session share grants with expiry and scoped access modes
- shared-session UI reduction so the recipient only sees the granted terminal context
MidTerm treats installer and self-update reliability as part of the architecture, not an afterthought.
The root install.ps1 and install.sh scripts handle:
- service mode versus user mode decisions
- password setup, preservation, and intentional replacement during reinstall
- certificate reuse plus trust flows for both newly generated and reused certificates
- platform-specific install paths and service registration
- channel selection and release download
- update logging
The update service reads version.json, checks GitHub releases, compares protocol/web/PTY versions, and classifies releases as:
- web-only when only the web server/UI needs replacement
- full when PTY compatibility or protocol changes require replacing
mthosttoo
The update-script generator produces non-interactive scripts that:
- stop services and running processes
- wait for file handles to release
- create backups of binaries, settings, secrets, and certificates
- copy and verify replacement files
- write logs and a structured result file
- roll back if replacement or restart fails
That is how MidTerm can update installed systems without asking users to manually babysit file replacement.
| Endpoint | Purpose |
|---|---|
/ws/mux |
Binary multiplexed terminal I/O |
/ws/state |
Session list, update state, and related JSON state pushes |
/ws/settings |
Live settings synchronization |
Major API areas include:
- auth and password management
- bootstrap and system info
- sessions, resize, names, bookmarks, clipboard image paste, guidance injection
- files, tree browsing, viewing, and save
- git and commands panels
- certificates, trust assets, and share packets
- share grants and shared-session bootstrap
- browser preview and browser-control commands
- update check/apply/result/log
- diagnostics, logs, restart, and shutdown
MidTerm's API surface is large because the browser shell is a real workstation shell, not only a terminal transport.
The diagnostics layer exposes:
- server RTT
mthostRTT- output latency
- latency and git debug overlays
- settings, secrets, certificate, and log paths
- settings reload and server restart actions
- frontend logging helpers
Operationally, MidTerm also tracks update results, log files, session ordering, and preview proxy logs so users can debug the product from inside the product.
- FEATURES.md for the exhaustive capability inventory
- devbrowser.md for preview proxy and browser-control internals
- file-radar.md for path detection design