Nullysh Engine

Minimal cognitive runtime engine for observable, testable, and evolvable AI agents.

Overview

Nullysh Engine is a minimal cognitive runtime engine for building AI-agent systems with a strong focus on:

deterministic behavior before autonomy;
inspectable intermediate representations;
traceable execution artifacts;
small, testable stages;
low operational complexity;
no premature agentic overengineering.

The current version is not trying to be a full autonomous agent framework yet.

It is the foundation layer: a local, deterministic pipeline that turns a human-written agent specification into an executable, inspectable, and traceable agent session.

Current Pipeline

Markdown Agent Spec
→ Cognitive IR JSON
→ XML-like Prompt
→ Runtime Session
→ Scheduler Decision
→ Mock LLM Response
→ Output
→ Trace JSONL

The pipeline is intentionally simple.

The goal is to make every step visible before adding real LLM providers, tool execution, memory, or complex agent loops.

Why This Exists

Most agent projects fail early because they start with too much abstraction:

multi-agent orchestration before a stable single-agent loop;
vector memory before basic traceability;
real LLM providers before deterministic testability;
tool execution before clear runtime state;
complex planners before simple decisions;
autonomy before observability.

Nullysh Engine starts from the opposite direction:

Spec first.
IR second.
Prompt third.
Runtime fourth.
Trace always.
Autonomy later.

Features Implemented

Stage	Feature	Status	Output
Stage 1	Cognitive IR schema	✅ Done	Type-safe IR models
Stage 2	Markdown compiler	✅ Done	`.nullain/*.ir.json`
Stage 3	XML-like prompt renderer	✅ Done	`.nullain/*.prompt.xml`
Stage 4	Runtime session	✅ Done	`state.json`, `prompt.xml`
Stage 5	JSONL tracing	✅ Done	`trace.jsonl`
Stage 6	Deterministic scheduler	✅ Done	`decision.json`
Stage 7	Mock LLM provider	✅ Done	`output.md`
Stage 8	Minimal pipeline orchestrator	✅ Done	reusable pipeline + artifact writer
Stage 9	CLI cleanup and helper reuse	✅ Done	`src/cli/helpers.ts` + unified CLI
Stage 10	Ollama provider adapter	✅ Done	`src/llm/ollama-provider.ts` + `run-ollama` command
Stage 11	User task injection	✅ Done	`--task` option for `run` and `run-ollama`
Stage 12	Basic local tool registry	✅ Done	`src/tools/` — registry + resolver (no execution yet)
Stage 13	Tool availability injection via CLI	✅ Done	`--tool` option for `run` and `run-ollama`
Stage 14	Tool capability contract	✅ Done	`execution_mode="metadata_only"` + `<tool_capability_contract>` in prompt
Stage 15	Minimal tool execution protocol	✅ Done	`src/tools/execution.ts` + `echo-tool.ts` — local-safe echo executor, no pipeline integration yet
Stage 16	Manual tool execution CLI + artifacts	✅ Done	`pnpm dev tool echo --input "hello"` — writes request.json, result.json, trace.jsonl
Stage 17	Local memory prototype	✅ Done	`pnpm dev memory add/search/list` — append-only JSONL, no vector DB or embeddings
Stage 18	Evaluation harness	✅ Done	`pnpm dev eval` — deterministic mock-based eval with fixtures, no LLM judge
Stage 19	Memory injection experiment	✅ Done	`--memory-query` / `--memory-limit` on `run` / `run-ollama` — explicit opt-in only

What It Does Today

Given an agent spec like:

# Deep Research Agent

## Goal
Produce a comprehensive, well-sourced research summary.

## Instructions
- Be accurate.
- Do not invent facts.
- Separate fact, inference, and uncertainty.

## Constraints
- Do not hide uncertainty.
- Do not execute destructive actions.

## Tools
- mock_search

## Output
Return a clear, structured answer.

The engine can generate:

.nullain/
  research-agent.ir.json
  research-agent.prompt.xml

  sessions/
    session-<id>/
      state.json
      prompt.xml
      decision.json
      output.md
      trace.jsonl

Commands

Install dependencies:

pnpm install

Run typecheck:

pnpm typecheck

Run tests:

pnpm test

Build:

pnpm build

Compile a Markdown Agent Spec into Cognitive IR:

pnpm dev compile examples/research-agent.md

Render an XML-like prompt:

pnpm dev render examples/research-agent.md

Create a runtime session:

pnpm dev session examples/research-agent.md

Create a runtime session and scheduler decision:

pnpm dev decide examples/research-agent.md

Run the full local pipeline with mock LLM:

pnpm dev run examples/research-agent.md

Inject a user task into the prompt:

pnpm dev run examples/research-agent.md --task "Research local LLM agent frameworks"
pnpm dev run-ollama examples/research-agent.md --task "Research local LLM agent frameworks"

Inject available tool metadata into the prompt:

pnpm dev run examples/research-agent.md --task "Research local LLM agent frameworks" --tool web_search
pnpm dev run-ollama examples/research-agent.md --task "Research local LLM agent frameworks" --tool web_search --tool citation_manager

Run the full pipeline with Ollama LLM:

pnpm dev run-ollama examples/research-agent.md

Execute a tool manually (Stage 16 — only echo is available):

pnpm dev tool echo --input "hello from nullysh"

This writes artifacts to .nullain/tool-runs/<request-id>/:

request.json
result.json
trace.jsonl

Note: web_search and citation_manager are not executable yet. Only echo is supported as a local-safe proof-of-concept. The LLM pipeline does not call tools automatically in this stage.

Manage local memory records (Stage 17 — append-only JSONL, no vector DB):

pnpm dev memory add --content "Nullysh Engine uses a local JSONL memory prototype."
pnpm dev memory list
pnpm dev memory search --query "jsonl"

This stores memory in .nullain/memory/memory.jsonl. It is not integrated into the LLM pipeline yet.

Run the evaluation harness (Stage 18 — deterministic, mock-based, no LLM judge):

pnpm dev eval

This runs a small fixture suite against the mock provider and writes a JSON report to .nullain/evals/. It checks:

Prompt includes expected strings
Output includes expected strings
Event types are emitted in the correct order

It does not use external datasets, embeddings, or real LLMs.

Inject local memory into the prompt (Stage 19 — explicit opt-in only):

pnpm dev run examples/research-agent.md --task "Explain the memory approach" --memory-query "jsonl"
pnpm dev run-ollama examples/research-agent.md --task "Explain the memory approach" --memory-query "jsonl" --memory-limit 2

This searches .nullain/memory/memory.jsonl by substring and injects matches into the prompt as <memory_context>. It only activates when --memory-query is explicitly passed. No embeddings, no vector DB, no auto-injection.

Repository Structure

src/
  cli/          # CLI entrypoint and commands
  compiler/     # Markdown Agent Spec → Cognitive IR
  ir/           # Zod schemas and TypeScript IR types
  llm/          # LLM provider interface and mock provider
  pipeline/     # Reusable orchestration and artifact writing
  prompt/       # Cognitive IR → XML-like prompt renderer
  runtime/      # Runtime state and session creation
  scheduler/    # Deterministic scheduler decisions
  trace/        # JSONL trace events and writers

examples/
  research-agent.md

tests/
  compiler.test.ts
  ir.test.ts
  llm.test.ts
  pipeline.test.ts
  prompt.test.ts
  runtime.test.ts
  scheduler.test.ts
  trace.test.ts

Core Modules

`src/ir/`

Defines the minimal cognitive intermediate representation:

CognitiveNode
CognitiveDocument
CognitiveNodeType

The IR is the source of truth.

Prompt XML, runtime state, output files, and traces are projections or artifacts generated from the IR.

`src/compiler/`

Parses a small Markdown Agent Spec format and produces a validated Cognitive IR document.

Current mapping:

Markdown Section	IR Node Type
`# Title`	`CognitiveDocument.title`
`## Goal`	`goal`
`## Instructions`	`instruction`
`## Constraints`	`constraint`
`## Tools`	`tool`
`## Output`	`output`

The compiler is deterministic and validates the final document with Zod.

`src/prompt/`

Renders the Cognitive IR into an XML-like prompt structure.

The renderer:

groups nodes by type;
preserves node IDs;
escapes XML special characters;
produces deterministic output;
keeps XML as a rendering format, not as the source of truth.

`src/runtime/`

Creates a minimal runtime session state.

Current state includes:

sessionId
documentId
title
version
status
step
createdAt
updatedAt
currentGoal
prompt
nodeCount

`src/trace/`

Writes JSONL trace events.

Current events:

compile.completed
prompt.rendered
session.created
scheduler.decided
llm.mock.completed

Each line in trace.jsonl is a valid JSON event.

`src/scheduler/`

Implements a deterministic rule-based scheduler.

Current behavior:

Runtime State	Decision
`created`	`answer`
`completed`	`finish`
`failed`	`error`
`step < 0`	`error`

The scheduler does not call an LLM.
It does not execute tools.
It does not create plans.
It only returns a simple decision.

`src/llm/`

Defines a provider interface and a deterministic mock provider.

Current provider:

createMockLlmProvider()

Default mock response:

Mock response generated by Nullysh Engine.

This is intentional. Real providers come after the local pipeline is stable.

`src/pipeline/`

Extracts the main orchestration logic out of the CLI.

Current functions:

runMockPipeline(input) → RunPipelineResult
writeRunArtifacts(result, sessionDir) → ArtifactPaths

The pipeline orchestrator:

compiles Markdown into Cognitive IR;
renders the XML-like prompt;
creates a runtime session;
gets a deterministic scheduler decision;
calls the mock LLM provider when the decision is answer;
creates trace events;
returns structured results without directly touching the filesystem.

Filesystem writes are handled separately by the artifact writer.

This keeps the CLI thin and makes the pipeline reusable, testable, and ready for future provider integrations.

Generated Artifacts

IR JSON

.nullain/research-agent.ir.json

Canonical compiled representation of the Markdown Agent Spec.

Prompt XML

.nullain/research-agent.prompt.xml

LLM-facing XML-like prompt rendering.

Session State

.nullain/sessions/<session-id>/state.json

Initial runtime state for an agent session.

Decision

.nullain/sessions/<session-id>/decision.json

Scheduler decision for the next action.

Output

.nullain/sessions/<session-id>/output.md

Mock LLM output.

Trace

.nullain/sessions/<session-id>/trace.jsonl

One JSON event per line, useful for debugging and future observability.

Example trace event types:

compile.completed
prompt.rendered
session.created
scheduler.decided
llm.mock.completed

Current Scope

The current version intentionally does not include:

real LLM provider integration;
Ollama provider;
OpenAI provider;
Anthropic provider;
tool execution;
memory;
vector database;
graph database;
multi-agent orchestration;
MCP integration;
LangChain or LangGraph dependency;
Rust modules;
long-running execution loop;
autonomous planning.

This is by design.

The goal is to keep the foundation small, testable, observable, and easy to refactor.

Development Principles

Nullysh Engine is being developed with a grounded, red-team-first mindset.

1. No premature abstraction

Only extract abstractions after repeated pressure appears.

2. No hidden magic

Every stage should write inspectable artifacts.

3. Trace before autonomy

If the engine cannot explain what happened, it should not become more autonomous.

4. IR as source of truth

XML, prompts, runtime state, outputs, and traces are projections of the IR, not replacements for it.

5. Determinism before intelligence

Mock providers and rule-based scheduling come before real models.

6. Small stages, clean commits

Each stage should be independently testable, reversible, and understandable.

7. Operational simplicity first

Avoid dependencies, servers, databases, and frameworks until the pain is real.

Roadmap

Completed

Development

Clone:

git clone https://github.com/nettycpu/nullysh-engine.git
cd nullysh-engine

Install:

pnpm install

Validate:

pnpm typecheck
pnpm test
pnpm build

Run full local mock pipeline:

pnpm dev run examples/research-agent.md

Status

Nullysh Engine is currently in early v0.1 foundation stage.

It is not production-ready yet.

It is a deliberately small and inspectable base for building future agentic systems.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
examples		examples
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Nullysh Engine

Overview

Current Pipeline

Why This Exists

Features Implemented

What It Does Today

Commands

Repository Structure

Core Modules

src/ir/

src/compiler/

src/prompt/

src/runtime/

src/trace/

src/scheduler/

src/llm/

src/pipeline/

Generated Artifacts

IR JSON

Prompt XML

Session State

Decision

Output

Trace

Current Scope

Development Principles

1. No premature abstraction

2. No hidden magic

3. Trace before autonomy

4. IR as source of truth

5. Determinism before intelligence

6. Small stages, clean commits

7. Operational simplicity first

Roadmap

Completed

Next

Development

Status

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`src/ir/`

`src/compiler/`

`src/prompt/`

`src/runtime/`

`src/trace/`

`src/scheduler/`

`src/llm/`

`src/pipeline/`

Packages