Minimal cognitive runtime engine for observable, testable, and evolvable AI agents.
Nullysh Engine is a minimal cognitive runtime engine for building AI-agent systems with a strong focus on:
- deterministic behavior before autonomy;
- inspectable intermediate representations;
- traceable execution artifacts;
- small, testable stages;
- low operational complexity;
- no premature agentic overengineering.
The current version is not trying to be a full autonomous agent framework yet.
It is the foundation layer: a local, deterministic pipeline that turns a human-written agent specification into an executable, inspectable, and traceable agent session.
Markdown Agent Spec
β Cognitive IR JSON
β XML-like Prompt
β Runtime Session
β Scheduler Decision
β Mock LLM Response
β Output
β Trace JSONLThe pipeline is intentionally simple.
The goal is to make every step visible before adding real LLM providers, tool execution, memory, or complex agent loops.
Most agent projects fail early because they start with too much abstraction:
- multi-agent orchestration before a stable single-agent loop;
- vector memory before basic traceability;
- real LLM providers before deterministic testability;
- tool execution before clear runtime state;
- complex planners before simple decisions;
- autonomy before observability.
Nullysh Engine starts from the opposite direction:
Spec first.
IR second.
Prompt third.
Runtime fourth.
Trace always.
Autonomy later.| Stage | Feature | Status | Output |
|---|---|---|---|
| Stage 1 | Cognitive IR schema | β Done | Type-safe IR models |
| Stage 2 | Markdown compiler | β Done | .nullain/*.ir.json |
| Stage 3 | XML-like prompt renderer | β Done | .nullain/*.prompt.xml |
| Stage 4 | Runtime session | β Done | state.json, prompt.xml |
| Stage 5 | JSONL tracing | β Done | trace.jsonl |
| Stage 6 | Deterministic scheduler | β Done | decision.json |
| Stage 7 | Mock LLM provider | β Done | output.md |
| Stage 8 | Minimal pipeline orchestrator | β Done | reusable pipeline + artifact writer |
| Stage 9 | CLI cleanup and helper reuse | β Done | src/cli/helpers.ts + unified CLI |
| Stage 10 | Ollama provider adapter | β Done | src/llm/ollama-provider.ts + run-ollama command |
| Stage 11 | User task injection | β Done | --task option for run and run-ollama |
| Stage 12 | Basic local tool registry | β Done | src/tools/ β registry + resolver (no execution yet) |
| Stage 13 | Tool availability injection via CLI | β Done | --tool option for run and run-ollama |
| Stage 14 | Tool capability contract | β Done | execution_mode="metadata_only" + <tool_capability_contract> in prompt |
| Stage 15 | Minimal tool execution protocol | β Done | src/tools/execution.ts + echo-tool.ts β local-safe echo executor, no pipeline integration yet |
| Stage 16 | Manual tool execution CLI + artifacts | β Done | pnpm dev tool echo --input "hello" β writes request.json, result.json, trace.jsonl |
| Stage 17 | Local memory prototype | β Done | pnpm dev memory add/search/list β append-only JSONL, no vector DB or embeddings |
| Stage 18 | Evaluation harness | β Done | pnpm dev eval β deterministic mock-based eval with fixtures, no LLM judge |
| Stage 19 | Memory injection experiment | β Done | --memory-query / --memory-limit on run / run-ollama β explicit opt-in only |
Given an agent spec like:
# Deep Research Agent
## Goal
Produce a comprehensive, well-sourced research summary.
## Instructions
- Be accurate.
- Do not invent facts.
- Separate fact, inference, and uncertainty.
## Constraints
- Do not hide uncertainty.
- Do not execute destructive actions.
## Tools
- mock_search
## Output
Return a clear, structured answer.The engine can generate:
.nullain/
research-agent.ir.json
research-agent.prompt.xml
sessions/
session-<id>/
state.json
prompt.xml
decision.json
output.md
trace.jsonlInstall dependencies:
pnpm installRun typecheck:
pnpm typecheckRun tests:
pnpm testBuild:
pnpm buildCompile a Markdown Agent Spec into Cognitive IR:
pnpm dev compile examples/research-agent.mdRender an XML-like prompt:
pnpm dev render examples/research-agent.mdCreate a runtime session:
pnpm dev session examples/research-agent.mdCreate a runtime session and scheduler decision:
pnpm dev decide examples/research-agent.mdRun the full local pipeline with mock LLM:
pnpm dev run examples/research-agent.mdInject a user task into the prompt:
pnpm dev run examples/research-agent.md --task "Research local LLM agent frameworks"
pnpm dev run-ollama examples/research-agent.md --task "Research local LLM agent frameworks"Inject available tool metadata into the prompt:
pnpm dev run examples/research-agent.md --task "Research local LLM agent frameworks" --tool web_search
pnpm dev run-ollama examples/research-agent.md --task "Research local LLM agent frameworks" --tool web_search --tool citation_managerRun the full pipeline with Ollama LLM:
pnpm dev run-ollama examples/research-agent.mdExecute a tool manually (Stage 16 β only echo is available):
pnpm dev tool echo --input "hello from nullysh"This writes artifacts to .nullain/tool-runs/<request-id>/:
request.jsonresult.jsontrace.jsonl
Note: web_search and citation_manager are not executable yet. Only echo is supported as a local-safe proof-of-concept. The LLM pipeline does not call tools automatically in this stage.
Manage local memory records (Stage 17 β append-only JSONL, no vector DB):
pnpm dev memory add --content "Nullysh Engine uses a local JSONL memory prototype."
pnpm dev memory list
pnpm dev memory search --query "jsonl"This stores memory in .nullain/memory/memory.jsonl. It is not integrated into the LLM pipeline yet.
Run the evaluation harness (Stage 18 β deterministic, mock-based, no LLM judge):
pnpm dev evalThis runs a small fixture suite against the mock provider and writes a JSON report to .nullain/evals/. It checks:
- Prompt includes expected strings
- Output includes expected strings
- Event types are emitted in the correct order
It does not use external datasets, embeddings, or real LLMs.
Inject local memory into the prompt (Stage 19 β explicit opt-in only):
pnpm dev run examples/research-agent.md --task "Explain the memory approach" --memory-query "jsonl"
pnpm dev run-ollama examples/research-agent.md --task "Explain the memory approach" --memory-query "jsonl" --memory-limit 2This searches .nullain/memory/memory.jsonl by substring and injects matches into the prompt as <memory_context>. It only activates when --memory-query is explicitly passed. No embeddings, no vector DB, no auto-injection.
src/
cli/ # CLI entrypoint and commands
compiler/ # Markdown Agent Spec β Cognitive IR
ir/ # Zod schemas and TypeScript IR types
llm/ # LLM provider interface and mock provider
pipeline/ # Reusable orchestration and artifact writing
prompt/ # Cognitive IR β XML-like prompt renderer
runtime/ # Runtime state and session creation
scheduler/ # Deterministic scheduler decisions
trace/ # JSONL trace events and writers
examples/
research-agent.md
tests/
compiler.test.ts
ir.test.ts
llm.test.ts
pipeline.test.ts
prompt.test.ts
runtime.test.ts
scheduler.test.ts
trace.test.tsDefines the minimal cognitive intermediate representation:
CognitiveNodeCognitiveDocumentCognitiveNodeType
The IR is the source of truth.
Prompt XML, runtime state, output files, and traces are projections or artifacts generated from the IR.
Parses a small Markdown Agent Spec format and produces a validated Cognitive IR document.
Current mapping:
| Markdown Section | IR Node Type |
|---|---|
# Title |
CognitiveDocument.title |
## Goal |
goal |
## Instructions |
instruction |
## Constraints |
constraint |
## Tools |
tool |
## Output |
output |
The compiler is deterministic and validates the final document with Zod.
Renders the Cognitive IR into an XML-like prompt structure.
The renderer:
- groups nodes by type;
- preserves node IDs;
- escapes XML special characters;
- produces deterministic output;
- keeps XML as a rendering format, not as the source of truth.
Creates a minimal runtime session state.
Current state includes:
sessionIddocumentIdtitleversionstatusstepcreatedAtupdatedAtcurrentGoalpromptnodeCount
Writes JSONL trace events.
Current events:
compile.completedprompt.renderedsession.createdscheduler.decidedllm.mock.completed
Each line in trace.jsonl is a valid JSON event.
Implements a deterministic rule-based scheduler.
Current behavior:
| Runtime State | Decision |
|---|---|
created |
answer |
completed |
finish |
failed |
error |
step < 0 |
error |
The scheduler does not call an LLM.
It does not execute tools.
It does not create plans.
It only returns a simple decision.
Defines a provider interface and a deterministic mock provider.
Current provider:
createMockLlmProvider()Default mock response:
Mock response generated by Nullysh Engine.This is intentional. Real providers come after the local pipeline is stable.
Extracts the main orchestration logic out of the CLI.
Current functions:
runMockPipeline(input) β RunPipelineResult
writeRunArtifacts(result, sessionDir) β ArtifactPathsThe pipeline orchestrator:
- compiles Markdown into Cognitive IR;
- renders the XML-like prompt;
- creates a runtime session;
- gets a deterministic scheduler decision;
- calls the mock LLM provider when the decision is
answer; - creates trace events;
- returns structured results without directly touching the filesystem.
Filesystem writes are handled separately by the artifact writer.
This keeps the CLI thin and makes the pipeline reusable, testable, and ready for future provider integrations.
.nullain/research-agent.ir.jsonCanonical compiled representation of the Markdown Agent Spec.
.nullain/research-agent.prompt.xmlLLM-facing XML-like prompt rendering.
.nullain/sessions/<session-id>/state.jsonInitial runtime state for an agent session.
.nullain/sessions/<session-id>/decision.jsonScheduler decision for the next action.
.nullain/sessions/<session-id>/output.mdMock LLM output.
.nullain/sessions/<session-id>/trace.jsonlOne JSON event per line, useful for debugging and future observability.
Example trace event types:
compile.completed
prompt.rendered
session.created
scheduler.decided
llm.mock.completedThe current version intentionally does not include:
- real LLM provider integration;
- Ollama provider;
- OpenAI provider;
- Anthropic provider;
- tool execution;
- memory;
- vector database;
- graph database;
- multi-agent orchestration;
- MCP integration;
- LangChain or LangGraph dependency;
- Rust modules;
- long-running execution loop;
- autonomous planning.
This is by design.
The goal is to keep the foundation small, testable, observable, and easy to refactor.
Nullysh Engine is being developed with a grounded, red-team-first mindset.
Only extract abstractions after repeated pressure appears.
2. No hidden magic
Every stage should write inspectable artifacts.
If the engine cannot explain what happened, it should not become more autonomous.
XML, prompts, runtime state, outputs, and traces are projections of the IR, not replacements for it.
Mock providers and rule-based scheduling come before real models.
Each stage should be independently testable, reversible, and understandable.
Avoid dependencies, servers, databases, and frameworks until the pain is real.
- Stage 1 β Cognitive IR schema
- Stage 2 β Markdown compiler
- Stage 3 β XML-like prompt renderer
- Stage 4 β Runtime session
- Stage 5 β JSONL tracing
- Stage 6 β Deterministic scheduler
- Stage 7 β Mock LLM provider
- Stage 8 β Minimal pipeline orchestrator
- Stage 9 β CLI cleanup and helper reuse
- Stage 10 β Ollama provider adapter
- Stage 11 β User task injection
- Stage 12 β Basic local tool registry
- Stage 13 β Tool availability injection via CLI
- Stage 14 β Tool capability contract
- Stage 15 β Minimal tool execution protocol
- Stage 16 β Manual tool execution CLI + artifacts
- Stage 17 β Local memory prototype
- Stage 18 β Evaluation harness
- Stage 19 β Memory injection experiment
- Stage 20 β Controlled tool execution in pipeline
- Stage 21 β Prompt compact mode
- Stage 22 β Release hardening
Clone:
git clone https://github.com/nettycpu/nullysh-engine.git
cd nullysh-engineInstall:
pnpm installValidate:
pnpm typecheck
pnpm test
pnpm buildRun full local mock pipeline:
pnpm dev run examples/research-agent.mdNullysh Engine is currently in early v0.1 foundation stage.
It is not production-ready yet.
It is a deliberately small and inspectable base for building future agentic systems.
MIT
