Skip to content

nettycpu/nullysh-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

24 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Nullysh Engine repository overview

Nullysh Engine

Minimal cognitive runtime engine for observable, testable, and evolvable AI agents.

Nullysh Engine v0.1 architecture

Status Language Runtime Package Manager Tests License


Overview

Nullysh Engine is a minimal cognitive runtime engine for building AI-agent systems with a strong focus on:

  • deterministic behavior before autonomy;
  • inspectable intermediate representations;
  • traceable execution artifacts;
  • small, testable stages;
  • low operational complexity;
  • no premature agentic overengineering.

The current version is not trying to be a full autonomous agent framework yet.

It is the foundation layer: a local, deterministic pipeline that turns a human-written agent specification into an executable, inspectable, and traceable agent session.


Current Pipeline

Markdown Agent Spec
β†’ Cognitive IR JSON
β†’ XML-like Prompt
β†’ Runtime Session
β†’ Scheduler Decision
β†’ Mock LLM Response
β†’ Output
β†’ Trace JSONL

The pipeline is intentionally simple.

The goal is to make every step visible before adding real LLM providers, tool execution, memory, or complex agent loops.


Why This Exists

Most agent projects fail early because they start with too much abstraction:

  • multi-agent orchestration before a stable single-agent loop;
  • vector memory before basic traceability;
  • real LLM providers before deterministic testability;
  • tool execution before clear runtime state;
  • complex planners before simple decisions;
  • autonomy before observability.

Nullysh Engine starts from the opposite direction:

Spec first.
IR second.
Prompt third.
Runtime fourth.
Trace always.
Autonomy later.

Features Implemented

Stage Feature Status Output
Stage 1 Cognitive IR schema βœ… Done Type-safe IR models
Stage 2 Markdown compiler βœ… Done .nullain/*.ir.json
Stage 3 XML-like prompt renderer βœ… Done .nullain/*.prompt.xml
Stage 4 Runtime session βœ… Done state.json, prompt.xml
Stage 5 JSONL tracing βœ… Done trace.jsonl
Stage 6 Deterministic scheduler βœ… Done decision.json
Stage 7 Mock LLM provider βœ… Done output.md
Stage 8 Minimal pipeline orchestrator βœ… Done reusable pipeline + artifact writer
Stage 9 CLI cleanup and helper reuse βœ… Done src/cli/helpers.ts + unified CLI
Stage 10 Ollama provider adapter βœ… Done src/llm/ollama-provider.ts + run-ollama command
Stage 11 User task injection βœ… Done --task option for run and run-ollama
Stage 12 Basic local tool registry βœ… Done src/tools/ β€” registry + resolver (no execution yet)
Stage 13 Tool availability injection via CLI βœ… Done --tool option for run and run-ollama
Stage 14 Tool capability contract βœ… Done execution_mode="metadata_only" + <tool_capability_contract> in prompt
Stage 15 Minimal tool execution protocol βœ… Done src/tools/execution.ts + echo-tool.ts β€” local-safe echo executor, no pipeline integration yet
Stage 16 Manual tool execution CLI + artifacts βœ… Done pnpm dev tool echo --input "hello" β€” writes request.json, result.json, trace.jsonl
Stage 17 Local memory prototype βœ… Done pnpm dev memory add/search/list β€” append-only JSONL, no vector DB or embeddings
Stage 18 Evaluation harness βœ… Done pnpm dev eval β€” deterministic mock-based eval with fixtures, no LLM judge
Stage 19 Memory injection experiment βœ… Done --memory-query / --memory-limit on run / run-ollama β€” explicit opt-in only

What It Does Today

Given an agent spec like:

# Deep Research Agent

## Goal
Produce a comprehensive, well-sourced research summary.

## Instructions
- Be accurate.
- Do not invent facts.
- Separate fact, inference, and uncertainty.

## Constraints
- Do not hide uncertainty.
- Do not execute destructive actions.

## Tools
- mock_search

## Output
Return a clear, structured answer.

The engine can generate:

.nullain/
  research-agent.ir.json
  research-agent.prompt.xml

  sessions/
    session-<id>/
      state.json
      prompt.xml
      decision.json
      output.md
      trace.jsonl

Commands

Install dependencies:

pnpm install

Run typecheck:

pnpm typecheck

Run tests:

pnpm test

Build:

pnpm build

Compile a Markdown Agent Spec into Cognitive IR:

pnpm dev compile examples/research-agent.md

Render an XML-like prompt:

pnpm dev render examples/research-agent.md

Create a runtime session:

pnpm dev session examples/research-agent.md

Create a runtime session and scheduler decision:

pnpm dev decide examples/research-agent.md

Run the full local pipeline with mock LLM:

pnpm dev run examples/research-agent.md

Inject a user task into the prompt:

pnpm dev run examples/research-agent.md --task "Research local LLM agent frameworks"
pnpm dev run-ollama examples/research-agent.md --task "Research local LLM agent frameworks"

Inject available tool metadata into the prompt:

pnpm dev run examples/research-agent.md --task "Research local LLM agent frameworks" --tool web_search
pnpm dev run-ollama examples/research-agent.md --task "Research local LLM agent frameworks" --tool web_search --tool citation_manager

Run the full pipeline with Ollama LLM:

pnpm dev run-ollama examples/research-agent.md

Execute a tool manually (Stage 16 β€” only echo is available):

pnpm dev tool echo --input "hello from nullysh"

This writes artifacts to .nullain/tool-runs/<request-id>/:

  • request.json
  • result.json
  • trace.jsonl

Note: web_search and citation_manager are not executable yet. Only echo is supported as a local-safe proof-of-concept. The LLM pipeline does not call tools automatically in this stage.

Manage local memory records (Stage 17 β€” append-only JSONL, no vector DB):

pnpm dev memory add --content "Nullysh Engine uses a local JSONL memory prototype."
pnpm dev memory list
pnpm dev memory search --query "jsonl"

This stores memory in .nullain/memory/memory.jsonl. It is not integrated into the LLM pipeline yet.

Run the evaluation harness (Stage 18 β€” deterministic, mock-based, no LLM judge):

pnpm dev eval

This runs a small fixture suite against the mock provider and writes a JSON report to .nullain/evals/. It checks:

  • Prompt includes expected strings
  • Output includes expected strings
  • Event types are emitted in the correct order

It does not use external datasets, embeddings, or real LLMs.

Inject local memory into the prompt (Stage 19 β€” explicit opt-in only):

pnpm dev run examples/research-agent.md --task "Explain the memory approach" --memory-query "jsonl"
pnpm dev run-ollama examples/research-agent.md --task "Explain the memory approach" --memory-query "jsonl" --memory-limit 2

This searches .nullain/memory/memory.jsonl by substring and injects matches into the prompt as <memory_context>. It only activates when --memory-query is explicitly passed. No embeddings, no vector DB, no auto-injection.


Repository Structure

src/
  cli/          # CLI entrypoint and commands
  compiler/     # Markdown Agent Spec β†’ Cognitive IR
  ir/           # Zod schemas and TypeScript IR types
  llm/          # LLM provider interface and mock provider
  pipeline/     # Reusable orchestration and artifact writing
  prompt/       # Cognitive IR β†’ XML-like prompt renderer
  runtime/      # Runtime state and session creation
  scheduler/    # Deterministic scheduler decisions
  trace/        # JSONL trace events and writers

examples/
  research-agent.md

tests/
  compiler.test.ts
  ir.test.ts
  llm.test.ts
  pipeline.test.ts
  prompt.test.ts
  runtime.test.ts
  scheduler.test.ts
  trace.test.ts

Core Modules

src/ir/

Defines the minimal cognitive intermediate representation:

  • CognitiveNode
  • CognitiveDocument
  • CognitiveNodeType

The IR is the source of truth.

Prompt XML, runtime state, output files, and traces are projections or artifacts generated from the IR.


src/compiler/

Parses a small Markdown Agent Spec format and produces a validated Cognitive IR document.

Current mapping:

Markdown Section IR Node Type
# Title CognitiveDocument.title
## Goal goal
## Instructions instruction
## Constraints constraint
## Tools tool
## Output output

The compiler is deterministic and validates the final document with Zod.


src/prompt/

Renders the Cognitive IR into an XML-like prompt structure.

The renderer:

  • groups nodes by type;
  • preserves node IDs;
  • escapes XML special characters;
  • produces deterministic output;
  • keeps XML as a rendering format, not as the source of truth.

src/runtime/

Creates a minimal runtime session state.

Current state includes:

  • sessionId
  • documentId
  • title
  • version
  • status
  • step
  • createdAt
  • updatedAt
  • currentGoal
  • prompt
  • nodeCount

src/trace/

Writes JSONL trace events.

Current events:

  • compile.completed
  • prompt.rendered
  • session.created
  • scheduler.decided
  • llm.mock.completed

Each line in trace.jsonl is a valid JSON event.


src/scheduler/

Implements a deterministic rule-based scheduler.

Current behavior:

Runtime State Decision
created answer
completed finish
failed error
step < 0 error

The scheduler does not call an LLM.
It does not execute tools.
It does not create plans.
It only returns a simple decision.


src/llm/

Defines a provider interface and a deterministic mock provider.

Current provider:

createMockLlmProvider()

Default mock response:

Mock response generated by Nullysh Engine.

This is intentional. Real providers come after the local pipeline is stable.


src/pipeline/

Extracts the main orchestration logic out of the CLI.

Current functions:

runMockPipeline(input) β†’ RunPipelineResult
writeRunArtifacts(result, sessionDir) β†’ ArtifactPaths

The pipeline orchestrator:

  • compiles Markdown into Cognitive IR;
  • renders the XML-like prompt;
  • creates a runtime session;
  • gets a deterministic scheduler decision;
  • calls the mock LLM provider when the decision is answer;
  • creates trace events;
  • returns structured results without directly touching the filesystem.

Filesystem writes are handled separately by the artifact writer.

This keeps the CLI thin and makes the pipeline reusable, testable, and ready for future provider integrations.


Generated Artifacts

IR JSON

.nullain/research-agent.ir.json

Canonical compiled representation of the Markdown Agent Spec.


Prompt XML

.nullain/research-agent.prompt.xml

LLM-facing XML-like prompt rendering.


Session State

.nullain/sessions/<session-id>/state.json

Initial runtime state for an agent session.


Decision

.nullain/sessions/<session-id>/decision.json

Scheduler decision for the next action.


Output

.nullain/sessions/<session-id>/output.md

Mock LLM output.


Trace

.nullain/sessions/<session-id>/trace.jsonl

One JSON event per line, useful for debugging and future observability.

Example trace event types:

compile.completed
prompt.rendered
session.created
scheduler.decided
llm.mock.completed

Current Scope

The current version intentionally does not include:

  • real LLM provider integration;
  • Ollama provider;
  • OpenAI provider;
  • Anthropic provider;
  • tool execution;
  • memory;
  • vector database;
  • graph database;
  • multi-agent orchestration;
  • MCP integration;
  • LangChain or LangGraph dependency;
  • Rust modules;
  • long-running execution loop;
  • autonomous planning.

This is by design.

The goal is to keep the foundation small, testable, observable, and easy to refactor.


Development Principles

Nullysh Engine is being developed with a grounded, red-team-first mindset.

1. No premature abstraction

Only extract abstractions after repeated pressure appears.

2. No hidden magic

Every stage should write inspectable artifacts.

3. Trace before autonomy

If the engine cannot explain what happened, it should not become more autonomous.

4. IR as source of truth

XML, prompts, runtime state, outputs, and traces are projections of the IR, not replacements for it.

5. Determinism before intelligence

Mock providers and rule-based scheduling come before real models.

6. Small stages, clean commits

Each stage should be independently testable, reversible, and understandable.

7. Operational simplicity first

Avoid dependencies, servers, databases, and frameworks until the pain is real.


Roadmap

Completed

  • Stage 1 β€” Cognitive IR schema
  • Stage 2 β€” Markdown compiler
  • Stage 3 β€” XML-like prompt renderer
  • Stage 4 β€” Runtime session
  • Stage 5 β€” JSONL tracing
  • Stage 6 β€” Deterministic scheduler
  • Stage 7 β€” Mock LLM provider
  • Stage 8 β€” Minimal pipeline orchestrator
  • Stage 9 β€” CLI cleanup and helper reuse
  • Stage 10 β€” Ollama provider adapter
  • Stage 11 β€” User task injection
  • Stage 12 β€” Basic local tool registry
  • Stage 13 β€” Tool availability injection via CLI
  • Stage 14 β€” Tool capability contract
  • Stage 15 β€” Minimal tool execution protocol
  • Stage 16 β€” Manual tool execution CLI + artifacts
  • Stage 17 β€” Local memory prototype
  • Stage 18 β€” Evaluation harness
  • Stage 19 β€” Memory injection experiment

Next

  • Stage 20 β€” Controlled tool execution in pipeline
  • Stage 21 β€” Prompt compact mode
  • Stage 22 β€” Release hardening

Development

Clone:

git clone https://github.com/nettycpu/nullysh-engine.git
cd nullysh-engine

Install:

pnpm install

Validate:

pnpm typecheck
pnpm test
pnpm build

Run full local mock pipeline:

pnpm dev run examples/research-agent.md

Status

Nullysh Engine is currently in early v0.1 foundation stage.

It is not production-ready yet.

It is a deliberately small and inspectable base for building future agentic systems.


License

MIT

About

A TypeScript engine for structured agentic systems, cognitive IR, and modular AI execution flows βš™οΈπŸ§ 

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors