Claude Opus 4.8:
Yes, but quality varies sharply by language. The pattern generalizes; the ergonomics don't.
What must hold for the REPL-loop to work
- Persistent namespace across executions (bindings survive).
- Eval-and-print (incremental expression eval, not just run-to-exit).
- Capturable stdout/stderr + structured errors/tracebacks.
- Interruptible / killable on runaway.
Any language with a real REPL satisfies the first three. Fourth depends on runtime.
Tiers
First-class (do this naturally)
- Python (IPython), R, Julia, Ruby (irb/pry), Node/JS, Elixir (IEx), Clojure, Lisp/Scheme, Haskell (ghci), Lua.
- Julia/R especially: built for the data-exploration loop, same as Python. Jupyter itself = Julia + Python + R.
Workable but awkward
- Shell (bash/zsh) — state is env vars + cwd, "namespace" is fuzzy but persists.
- SQL — session/temp tables persist within connection. Stateful in its own way.
- PHP, Perl — REPLs exist, less loved.
Faked, not native
- C/C++, Rust, Go, Java — compiled, no true persistent namespace. Tools simulate it:
- Cling (C++ interpreter), evcxr (Rust), Jupyter Java kernels.
- Trick: re-compile/re-run accumulated cell history each step, or snapshot state. Works, but slow + leaky. Not a real REPL.
The catch
Output capture + error structure isn't uniform. Python tracebacks are easy to parse; a C++ template error or a hung shell pipe is not. The loop generalizes; the observation quality the agent reasons over degrades in compiled/low-level langs. That degradation is what actually limits agent usefulness, not the language count.
Claude Opus 4.8:
Yes, but quality varies sharply by language. The pattern generalizes; the ergonomics don't.
What must hold for the REPL-loop to work
Any language with a real REPL satisfies the first three. Fourth depends on runtime.
Tiers
First-class (do this naturally)
Workable but awkward
Faked, not native
The catch
Output capture + error structure isn't uniform. Python tracebacks are easy to parse; a C++ template error or a hung shell pipe is not. The loop generalizes; the observation quality the agent reasons over degrades in compiled/low-level langs. That degradation is what actually limits agent usefulness, not the language count.