GitHub - sou350121/Pulsar: Automated domain intelligence pipeline. You define the domain; Pulsar runs the engine — any LLM · any RSS · self-evolving beliefs.

Pulsar · 照见: Automated Domain Intelligence Engine

中文 / English

GitHub · Issues · Architecture · Deploy · Deployment · Pipeline DAG · MCP API

👋 Join the community

📡 VLA-Handbook · Agent-Playbook · GitHub Issues

Overview

The Challenges of Domain Intelligence

Anyone tracking a fast-moving technical domain runs into the same six problems eventually:

Signal overload — dozens of new papers/articles daily; without a rating mechanism it's pure noise, and reading everything is impossible
Opaque reasoning — AI summaries tell you "this is important" without explaining why; you can't trust or reproduce the judgment
Knowledge doesn't accumulate — papers you read today, community debates from last week — all lost in inboxes and message streams
Unreliable pipelines — cron jobs fail silently with no alert; by the time you notice, a week of data is missing
Unverifiable judgments — "AI trend predictions" have no historical accuracy record; there's no way to evaluate the source's credibility
Static assumptions — domain beliefs never update as new evidence arrives, drifting further from reality over time

How Pulsar Solves Them

Pulsar is a server-side domain intelligence pipeline. You define the domain; Pulsar runs the engine. Configure your RSS feeds, keywords, and LLM provider once — then rating, filtering, reasoning, archiving, self-healing, and self-calibration all run autonomously.

Rate first, cut noise before LLM cost → solves signal overload: A four-tier rating engine (⚡/🔧/📖/❌) evaluates every signal before it reaches the LLM. Raw signals → 3–5 selected for deep analysis, saving 80%+ inference cost
Three-stage observable reasoning chain → transparent and reproducible: prep → agent → post, each stage with defined I/O formats and intermediate artifacts written to disk; when something breaks, you see exactly which stage failed
Structured knowledge written to Git → knowledge accumulates permanently: All outputs are Markdown pushed to GitHub via the Contents API; full commit history, full-text grep, no SaaS dependency
Watchdog self-healing → pipelines recover automatically: 16 health checks, 7 failure categories handled automatically in DAG order, full run logs persisted to memory/watchdog-log.json
Biweekly predictions + mandatory ✅/❌ grading → forecast accuracy on record: Every reasoning report must include verifiable predictions with explicit verification conditions. The next report grades each one ✅/❌. Accuracy history accumulates permanently in Git — the system cannot quietly revise past claims
Self-evolving belief system → the pipeline finds and fixes its own blind spots: The system maintains explicit domain hypotheses × confidence scores (0–1). Every month it identifies which beliefs are confirmed by data and which are drifting. Drifting hypotheses enter a watch-list and automatically receive boosted signal injection the next cycle — the system actively investigates what it might be wrong about, with no human prompting. Together with biweekly ✅/❌ grading, this forms a closed self-correction loop

🚀 Quick Start (click to expand)

Quick Start

⚡ One-command Setup (Recommended)

Clone the repo and run the guided installer — it handles Python, mcp, config files, and prints your Claude Desktop JSON block:

git clone https://github.com/sou350121/Pulsar ~/clawd
bash ~/clawd/scripts/setup.sh

The script will prompt you for: LLM API key, GitHub token, Telegram bot token + chat ID, and your research domain details. All config files are written automatically.

Non-interactive / CI:

bash ~/clawd/scripts/setup.sh --non-interactive --memory-dir /path/to/memory

Note: setup.sh requires Python 3.10+ and installs the mcp package automatically. For manual setup, continue with the steps below.

🤖 AI-Assisted Setup (Cursor · Claude · ChatGPT)

Use this prompt with any AI coding assistant to get guided, interactive setup:

I've cloned Pulsar (https://github.com/sou350121/Pulsar) — an automated domain intelligence pipeline.
Please help me set it up for my research domain.

First, read these files:
- AGENTS.md               — verified deployment guide
- config/active-config.template.json  — domain config (RSS feeds, keywords, hypotheses)
- config/github-config.template.json  — GitHub push target config
- .env.example            — required API keys

Then help me complete these steps:

1. Configure my domain (memory/active-config.json):
   - My research domain: [describe your domain — e.g. "biomedical AI", "climate policy", "fintech"]
   - RSS feeds to monitor: [list your feeds, or ask me for suggestions]
   - Institutions/orgs to prioritize in ratings: [e.g. "NIH", "Fed", "TSMC"]
   - 3–5 domain hypotheses I want to track and calibrate monthly

2. Set up .env:
   - LLM provider: [OpenAI / DeepSeek / Moonshot / DashScope / Groq / self-hosted]
   - I will provide my API key when ready

3. Configure GitHub push target (memory/github-config-primary.json):
   - My knowledge-base repo: [your-username/your-repo]

4. Update path references if I cloned outside ~/clawd/:
   MYUSER=$(whoami)
   find scripts/ -name "*.py" | xargs sed -i "s|/home/admin|/home/$MYUSER|g"

5. Verify setup by running the first pipeline step.

After reading the config files, ask me the questions needed to fill in the blanks.

🚀 TL;DR: Want the 10-minute path? docs/deployment/quickstart.md walks through clone → preset → first RSS pull with an ai-news working example you can verify before touching your real domain config.

Prerequisites

OS: Linux (recommended), macOS
Python: 3.9 or higher
Node.js: 22 or higher
Moltbot: https://molt.bot — handles cron scheduling and Telegram delivery
Network: stable access to your RSS sources, GitHub API, and your chosen LLM provider

Keys required:

Key	Purpose	Compatible providers
LLM API Key	All inference calls (rating, reasoning, intel)	Any OpenAI-compatible endpoint — OpenAI, DeepSeek, Moonshot, DashScope (Alibaba), Groq, etc.
GitHub Token	Push knowledge to your GitHub repos	GitHub Settings → Developer Settings → Fine-grained tokens (repo write)
Telegram Bot Token	Send daily intelligence updates	Telegram → search @BotFather → /newbot
Telegram Chat ID	Target channel or user ID	Telegram → search @userinfobot, send any message
Tophub API Key (optional)	Trending tech article feed	tophubdata.com

1. Clone the Repository

git clone https://github.com/sou350121/Pulsar ~/clawd
cd ~/clawd

⚠️ Important: Scripts are pre-configured for the ~/clawd/ directory. Cloning elsewhere requires updating hardcoded paths with:
MYUSER=$(whoami)
find scripts/ -name "*.py" | xargs sed -i "s|/home/admin|/home/$MYUSER|g"

2. Configure Your Research Domain

Copy the config template and edit it for your domain:

mkdir -p memory
cp config/active-config.template.json memory/active-config.json

Open memory/active-config.json and define:

RSS feeds — any Atom/RSS URLs: arXiv category feeds, blog feeds, GitHub release feeds, news sites, anything with a feed
Keywords — terms that mark a signal as domain-relevant (used by the rating engine to filter noise)
Institution labels — org/lab tags for rating priority (e.g. "[MIT]", "[Google DeepMind]", "[YCombinator]")
Hypotheses — the domain beliefs you want to track and calibrate monthly

The included reference configuration tracks VLA robotics (arXiv cs.RO, cs.AI) and AI developer tools (tech news feeds). The pipeline logic is fully domain-agnostic — swap the config to track fintech, biomedical research, climate policy, or any other domain.

Also set your GitHub knowledge-base target:

cp config/github-config.template.json memory/github-config-primary.json
# Edit: set "repo" to your knowledge-base repo (e.g. "your-username/your-domain-handbook")

3. Set Up Environment Variables

cp .env.example .env

Open .env and fill in your values:

# LLM provider key — any OpenAI-compatible endpoint works (see detail below)
DASHSCOPE_API_KEY=sk-xxxxxxxxxxxxxxxx

GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxx
TELEGRAM_BOT_TOKEN=xxxxxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TELEGRAM_CHAT_ID=YOUR_CHAT_ID
MOLTBOT_GATEWAY_PORT=18789
TOPHUB_API_KEY=your_tophubdata_api_key   # optional: trending tech articles

💡 Tip: Telegram Chat ID is a positive integer for users, negative for channels. For channels, add the Bot as an admin first.

👇 Expand for configuration details:

LLM Provider — any OpenAI-compatible API works

All Pulsar LLM calls use the OpenAI SDK format throughout, so any compatible provider works without changing pipeline logic. The reference deployment uses DashScope + qwen3.5-plus, but you can swap to any provider:

Provider	Base URL	Example model
OpenAI	`https://api.openai.com/v1`	`gpt-4o-mini`
DeepSeek	`https://api.deepseek.com/v1`	`deepseek-chat`
Moonshot	`https://api.moonshot.cn/v1`	`moonshot-v1-8k`
DashScope (Alibaba)	`https://dashscope.aliyuncs.com/compatible-mode/v1`	`qwen3.5-plus`
Groq	`https://api.groq.com/openai/v1`	`llama-3.1-8b-instant`
Self-hosted	your endpoint	Ollama, vLLM, llama.cpp, etc.

To switch providers: put your provider's API key in DASHSCOPE_API_KEY (or rename the var), then update the base URL constant in scripts/_vla_expert.py — a single-line change.

Telegram Bot Setup

Pulsar sends daily intelligence updates via Moltbot — no direct Telegram API calls needed.

Open Telegram, search for @BotFather
Send /newbot, follow prompts, and get your Token (format: 123456789:ABCdef...)
Add the Token to TELEGRAM_BOT_TOKEN in .env
Get your Chat ID: search @userinfobot, send any message — it replies with your ID
Add the Chat ID to TELEGRAM_CHAT_ID in .env

To push to a channel:

# Add Bot as channel admin first, then get the channel ID
# Channel IDs are negative integers, e.g. -1001234567890

💡 Tip: Pulsar supports multiple TG accounts (e.g. separate channels per domain). See AGENTS.md.

GitHub Token Setup

Pulsar pushes daily outputs to your knowledge-base repos via the GitHub Contents API.

Go to GitHub → Settings → Developer Settings → Personal Access Tokens → Fine-grained tokens
Create a new token; under Repository access, select your target repos
Set permission: Contents: Read and Write
Add the token to GITHUB_TOKEN in .env

Create the GitHub config file in memory/ (not in the repo — created from the template):

mkdir -p memory
cp config/github-config.template.json memory/github-config-primary.json

Then edit to point to your repo:

{
  "repo": "your-username/your-domain-handbook",
  "api_base": "https://api.github.com",
  "token_env": "GITHUB_TOKEN",
  "branch": "main"
}

4. Start the Moltbot Gateway

Moltbot schedules all cron jobs and sends Telegram messages. Install first:

npm install -g moltbot

Then start the gateway:

pkill -f moltbot-gateway || true
nohup moltbot gateway run --bind loopback --port 18789 --force \
  > /tmp/moltbot-gateway.log 2>&1 &

Verify it's running:

ss -ltnp | grep 18789
tail -n 20 /tmp/moltbot-gateway.log

Expected output:

Gateway running on ws://127.0.0.1:18789

5. Load Cron Jobs

The scheduled jobs are stored in config/jobs.template.json. Load them by copying to the Moltbot cron directory before starting the gateway (or stop it first):

pkill -f moltbot-gateway || true
mkdir -p ~/.openclaw/cron
cp config/jobs.template.json ~/.openclaw/cron/jobs.json

Then restart the gateway and verify:

nohup moltbot gateway run --bind loopback --port 18789 --force \
  > /tmp/moltbot-gateway.log 2>&1 &
moltbot cron list

6. Run Your First Pipeline

The reference deployment ships a complete VLA robotics + AI developer tools configuration. Here's how to verify it end to end:

Collect today's signals (VLA robotics example)

python3 scripts/vla-rss-collect.py

Verify the collection worked:

ls ~/clawd/memory/vla-rss-*.json

python3 -c "
import json
with open('memory/vla-daily-hotspots.json') as f:
    d = json.load(f)
papers = sorted(d.get('reported_papers', []), key=lambda x: x.get('date',''), reverse=True)[:3]
for p in papers:
    print(p.get('rating','?'), p.get('title',''))
"

Trigger the full rating + push pipeline

moltbot cron run <vla-hotspots-job-id> --force --timeout 180000 --expect-final

📝 Adapting to your domain: the reference scripts are named vla-* and ai-app-*. To track a different domain, update memory/active-config.json with your RSS feeds and keywords, then fork and rename the relevant scripts. The three-stage structure (prep → run → post) stays the same.

Congratulations — Pulsar is live! 🎉

7. Verify the install

After cloning (and any time you suspect drift), run the self-check:

python3 scripts/check-pipeline.py            # full check (~30 s)
python3 scripts/check-pipeline.py --parse    # AST only, fast
python3 scripts/check-pipeline.py --quiet    # suppress per-file OK lines

It does three things: AST-parses every script in scripts/, confirms the shared helpers (_vla_expert.py, _domain_loader.py, _gh_issues_config.py) are importable, and smoke-runs the leaf "mechanical" scripts (field-state, cross-domain, gh-adoption, community-context, gh-issues collector) against a temporary empty memory dir. Day-1 "no upstream data" exits are pinned to their expected error messages, so this check is meaningful even before your first real cron run.

Exit code: 0 = all green, 1 = anything off.

Core Concepts

1. Signal Rating Engine

Pulsar runs a rule-based rating engine on every raw signal before any LLM call — it doesn't feed everything to the model.

Four-tier system:

Rating	Meaning	Criteria	Daily cap
⚡	Breakthrough	All 4 conditions met (top institution + key technology + high engineering value + strong relevance)	1
🔧	Engineering value	3 of 4 conditions	5
📖	Worth watching	2 of 4 conditions	unlimited
❌	Not relevant	0–1 conditions	—

The rating conditions — keywords, institution tags, relevance rules — are all defined in your memory/active-config.json. No code changes needed to adapt the rating engine to a new domain.

Only ⚡ and 🔧 signals enter downstream LLM analysis; the rest are filtered out.

Result: dozens of raw signals per day → average 4–6 enter reasoning → ~80% reduction in LLM cost.

2. Three-Stage Reasoning Chain

Every pipeline follows the same three-stage structure:

prep-*.py          →    run-*-two-phase.py    →    post-*.py
(structured collect)     (LLM reasoning)           (validation + output)
      ↓                        ↓                         ↓
candidates JSON          LLM output JSON           memory + GitHub + TG

All intermediate artifacts are written to memory/tmp/ for debugging. If a pipeline fails mid-run, Watchdog detects the orphaned llm-output file and resumes from the post stage — skipping the expensive collection and LLM steps.

3. Knowledge Written to Git

All outputs are pushed to your configured repos via the GitHub Contents API:

Output type                 →    Target path (set in memory/github-config-*.json)
Domain signal ratings       →    your-repo/knowledge/ratings/
Social intelligence         →    your-repo/memory/blog/archives/social-intel/
Daily picks                 →    your-repo/memory/blog/archives/daily-pick/
Biweekly reasoning reports  →    your-repo/reports/biweekly/

Push script: scripts/gh-contents-upload.py (handles create/update, auto-resolves SHA).

Benefits:

Full-text grep across all historical outputs (git log -S "your keyword")
Permanent archive, zero SaaS dependency
Fork any knowledge repo and build your own domain knowledge graph

4. Watchdog Self-Healing

scripts/daily-watchdog.py runs daily and checks 16 health signals:

Check	Pass condition	Self-healing action
`rss`	Today's RSS collected	Trigger RSS collector script
`hotspots`	Today's hotspots updated	Trigger hotspots cron job
`social`	Today's social intel has signals	Trigger social pipeline
`release`	Release tracker checked today	Trigger release tracker
`rating`	Rating completed within 10h	Warn (no auto-heal)
`disk_space`	Disk usage < 85%	Warn; > 95% → error
...	...	...

Self-healing follows DAG order (rss → daily → social) to avoid consuming data that hasn't been collected yet.

Run log: memory/watchdog-log.json (retains 60 entries; killed runs and recoveries both recorded).

5. Biweekly Prediction ✅/❌ Loop

Every two weeks, Pulsar generates a reasoning report containing verifiable predictions:

### Predictions (next 2 weeks)
1. ⏳ [Your hypothesis] — Verification: [specific, measurable condition]
2. ⏳ [Another hypothesis] — Verification: [what you'd observe if true]

The next report reviews those predictions:

### Previous Predictions Review
1. ✅ Confirmed — [evidence found]
2. ❌ Not confirmed — [counter-evidence]

This makes the system's judgment accuracy traceable and measurable, not just claimed.

6. Self-Evolving Belief System

This is Pulsar's most distinctive capability — and what separates it from static pipelines.

Most intelligence systems collect, summarize, and forget. Pulsar maintains an explicit model of its own beliefs, tracks their accuracy over time, and actively revises them based on evidence.

The system tracks domain hypotheses in memory/assumptions.json, each with a confidence score (0–1):

{
  "id": "V-001",
  "text": "Your domain hypothesis here",
  "confidence": 0.72,
  "last_updated": "2026-02-01"
}

The closed self-correction loop:

  Daily signals
       │
       ▼
  Calibration check ─── each signal matched against hypotheses
       │
       │  (monthly, on the 28th)
       ▼
  Trigger rate computed per hypothesis
       │
       ├── confirmed by data ──▶ confidence ▲  (max +0.08)
       │
       └── drifting / low evidence ──▶ confidence ▼
                                            │
                                            ▼
                                      Watch-list entry
                                            │
                                            ▼
                                 Next cycle: signal boost
                                 (extra relevant signals
                                  injected from RSS/social)
                                            │
                                            ▼
                                 More evidence → re-evaluate
                                            │
                                            └──▶ loop continues

  Biweekly predictions ──▶ ✅/❌ grade ──▶ accuracy history in Git

What makes this self-evolution, not just automation:

The system decides what to look harder for — not just passively collect more
Declining confidence is a trigger for active investigation, not just a metric
Biweekly ✅/❌ reviews provide an independent check on the system's reasoning quality
Confidence history is committed to Git — belief changes are traceable, never silent
No human curates the watch-list: it emerges from the data itself

This loop runs continuously. After enough cycles, the system's confidence scores reflect accumulated real-world evidence — not the priors you started with.

Advanced Capabilities

These features are shipped in the public template but not surfaced in the bullet list above. Pick the ones relevant to your domain — they all run on the same 2 GB VPS profile.

Cross-domain Rule Engine v2 (7 built-in rules)

scripts/cross-domain-rule-engine.py runs daily and surfaces signals that bridge two domains. Rules are deterministic (auditable, predictable) and the engine adds a one-sentence LLM "cross-domain significance" per insight batch.

ID	Direction	Trigger
R001	VLA technique → AI App	Diffusion / flow matching / transformer / RLHF / quantization techniques crossing into AI app development
R002	AI App framework → VLA	New agent frameworks (LangGraph, CrewAI, etc.) adopted in robotics stacks
R003	AI embodied → VLA	Embodied-AI papers from generalist labs reaching VLA practitioners
R004	VLA foundations → AI	Foundation-model and pre-training methods originating in robotics
R005	Paradigm fusion	Both domains converging on the same paradigm in the same week
R006	GitHub-repo convergence	Top issues / PRs touching shared dependencies (e.g. PyTorch, JAX, Triton)
R007	Hypothesis-driven transfer	LLM-generated transfer hypotheses (AI → VLA), capped per cycle

Output: memory/cross-domain-insight.json; reports include the latest insights with their LLM significance.

GitHub Issues Adoption Sensor (4 scripts)

A mechanical signal pipeline that watches OSS adoption in your domain via issue/PR velocity, not just star counts.

collect-github-issues.py  (daily, all tier-1 repos)
       ↓
  memory/gh-issues-*.json
       ↓
compute-gh-adoption.py    (Fri 13:00 — tier-1 + tier-2)
       ↓
  Adoption phases · DFI (Daily Field Index) · Convergence signals
       ↓
update-gh-field-notes.py  (push to your knowledge-base repo)

Repo registry: scripts/_gh_issues_config.py — edit to fit your domain (tier-1 = daily, tier-2 = weekly)
Adoption phases: incubation → growth → mainstream → maturity, detected from issue cadence + community-question density
Convergence signals: when ≥3 monitored repos hit the same dependency / method / benchmark in the same week
Env override: PULSAR_FIELD_NOTES_REPO, PULSAR_FIELD_NOTES_PATH — point at your own knowledge-base file

Field-State Trigger (mechanical, zero LLM)

scripts/ai-field-state.py evaluates the daily corpus against 6 trigger types before any LLM step runs:

Trigger	Catches
`breakthrough_density`	Spike of ⚡ items in a 3-day window
`paradigm_shift`	New family entering top-5 method share
`consensus_drift`	An assumption being repeatedly contradicted
`silent_decay`	A previously hot family dropping out for 14+ days
`cross_domain_pull`	A foreign-domain technique entering rated signals
`release_clustering`	≥3 model releases targeting the same benchmark within 7 days

Field-state runs before deep-dive scheduling; only signals matching a trigger become deep-dive candidates. Mechanical filtering keeps LLM cost bounded and outputs auditable.

Semantic Memory Search

Pulsar maintains an embedding index over the rolling 60-day window:

# Build / refresh (incremental, batch=10)
python3 scripts/semantic-index-builder.py

# Pure-Python cosine query — no torch / numpy required
python3 scripts/semantic-search.py "what contradicted assumption V-003 last month?"

Embedder: DashScope text-embedding-v3 (1024-dim); any OpenAI-compatible embedding endpoint works with a one-line URL swap
Storage: memory/semantic-index/chunks.jsonl + memory/semantic-index/vectors.bin
Retrieval: top-k cosine over chunk vectors, returns {source, date, snippet, score}
MCP-ready: also exposed as the search_memory MCP tool

Multi-domain Routing

memory/domains.json registry + scripts/_domain_loader.py shared loader let you add a 3rd or 4th research domain by editing one file. Per-domain active-config.json, hypotheses, and Telegram routing keep streams isolated.

Project Architecture

Pulsar/
├── scripts/                    # Pipeline scripts
│   ├── prep-*.py               # Data collection (RSS, web search, GitHub API)
│   ├── run-*-two-phase.py      # Two-phase execution (prep + LLM agent)
│   ├── post-*.py               # Post-processing (validate + memory + GitHub + TG)
│   ├── daily-watchdog.py       # Health monitoring + self-healing (16 checks)
│   ├── cross-domain-rule-engine.py  # 7 built-in rules (R001-R007) for cross-domain bridging
│   ├── ai-field-state.py       # Mechanical field-state trigger (no LLM, 6 trigger types)
│   ├── collect-github-issues.py     # Daily issues collector across tracked OSS repos
│   ├── compute-gh-adoption.py       # Weekly adoption-phase analysis (DFI/convergence)
│   ├── update-gh-field-notes.py     # Push field-notes back to your knowledge-base
│   ├── prep-community-context.py    # Bundle community + adoption context for reports
│   ├── semantic-index-builder.py    # DashScope text-embedding-v3, incremental
│   ├── semantic-search.py      # Pure-Python cosine similarity over the index
│   ├── entity-tracker.py       # Author/lab/method/benchmark index across 90-day window
│   ├── upstream-signal-monitor.py   # Track 1-2 upstream domains for early signals
│   ├── memory-janitor.py       # Periodic cleanup of expired files
│   ├── memory-upsert.py        # Generic append-write tool for memory files
│   ├── gh-contents-upload.py   # GitHub Contents API push
│   ├── _vla_expert.py          # Shared LLM client + domain context module
│   └── SCRIPTS.md              # Full pipeline DAG documentation
│
├── config/
│   ├── active-config.template.json  # ← Start here: RSS feeds, keywords, domain settings
│   ├── assumptions.template.json    # Domain hypotheses template
│   ├── github-config.template.json  # GitHub push target template
│   └── jobs.template.json           # Cron job configurations
│
├── memory/                     # Local knowledge store (.gitignored, auto-created at runtime)
│   ├── active-config.json      # Your domain config (created from template)
│   ├── assumptions.json        # Domain hypotheses + confidence scores
│   ├── watchdog-log.json       # Watchdog run history
│   ├── tmp/                    # Pipeline intermediates (auto-cleaned after 60 days)
│   └── github-config-*.json    # GitHub push targets (created from template)
│
├── docs/
│   └── banner.svg
│
├── AGENTS.md                   # Full deployment guide (AI agent-readable)
├── .env.example                # Key template (copy to .env and fill in)
└── LICENSE                     # MIT

How Pulsar Compares

Each tool has a genuine sweet spot — here's an honest breakdown:

Pick Feedly AI if you want zero setup, polished mobile UX, 1M+ curated sources, and team collaboration. Pick ResearchRabbit if you're doing academic literature reviews — visual citation graphs and 270M+ papers. Pick MineContext if you want to capture your own reading context — local-first, private. Pick Pulsar if you need a server-side pipeline that runs autonomously, generates structured knowledge assets, self-heals on failure, and self-calibrates monthly.

Dimension	Feedly AI	ResearchRabbit	MineContext	Pulsar
Best at	Team intel feeds, mobile	Academic citation mapping	Personal context capture	Autonomous domain pipeline
Hosting	☁️ SaaS only	☁️ SaaS only	✅ Local / OSS	✅ Self-hosted / OSS
Cost	$1,600–3,200 / month	Closed pricing	Free	Free
Setup effort	✅ Zero	✅ Zero	✅ Desktop install	⚠️ ~1 hour
LLM provider	❌ Fixed	❌ Fixed	❌ Fixed	✅ Any OpenAI-compatible
RSS configurable	⚠️ Limited	❌	❌	✅ Any feed URL
Domain configurable	⚠️ Topic filters	❌	❌	✅ Fully custom
Signal rating	❌	❌	❌	✅ ⚡/🔧/📖/❌ before LLM
Reasoning transparency	❌ Black-box	❌	❌	✅ 3-stage observable chain
Self-healing	❌	❌	❌	✅ 7 auto-recovery paths
Belief calibration	❌	❌	❌	✅ Hypotheses, monthly
Prediction tracking	❌	❌	❌	✅ ✅/❌ every 2 weeks
Knowledge output	Feed / inbox	Graph visualization	Local summaries	Structured Markdown → Git
RAM footprint	N/A (cloud)	N/A (cloud)	Desktop app	2 GB VPS

Biomimetic Architecture

Pulsar's internal layers follow a cognitive organism model, not a traditional data pipeline:

Layer	Biological analog	Pulsar component
Perception	Sensory organs	Configurable RSS feeds · GitHub releases · community feeds
Filtering	Thalamic gate	Rating engine (⚡/🔧/📖/❌) — noise cut before LLM
Reasoning	Cortical processing	Three-stage LLM: prep → agent → post
Memory	Hippocampal encoding	Structured Markdown → GitHub
Metacognition	Prefrontal reflection	Biweekly prediction reviews · monthly calibration
Immune system	Autoimmune response	Watchdog: 16 health checks, 7 self-healing paths

Reference Deployment: Key Numbers

The included configuration tracks two domains simultaneously — VLA robotics research and AI developer tools. Numbers below reflect this dual-domain setup:

Metric	Value
Scheduled jobs	33 cron jobs across both domains
Pipeline scripts	67 across VLA, AI, calibration, cross-domain, GH adoption, and semantic-memory pipelines
Tracked hypotheses	19 with monthly confidence auto-updates
Watchdog checks	16 health signals, 7 auto-recovery paths
Cross-domain rules	7 built-in rules (R001-R007) — VLA technique transfer, AI framework adoption, paradigm convergence, GitHub-repo convergence, hypothesis-driven transfer
MCP tools	12 read-only tools (signals, knowledge, meta, search, domain registry)
End-to-end latency	< 2 hours: RSS → rated signals → TG notification
Knowledge retention	Social intel / hotspots 90-day rolling · reports permanent in Git
Hardware requirement	2 GB RAM — minimal VPS

A single-domain deployment needs roughly half the scripts and cron jobs.

Reference Output Repositories

The reference deployment pushes content to these public repos daily:

Repo	Domain	Contents
VLA-Handbook	Robotics · VLA research	Daily paper ratings · theory deep dives · biweekly forecasts
Agent-Playbook	AI apps · agent tools	Tool index · framework analyses · daily picks

To connect Pulsar to your own repos, edit memory/github-config-*.json (copied from config/github-config.template.json).

Roadmap

Pulsar evolves from a single-domain pipeline into a self-evolving domain intelligence platform — defining the standard for the emerging "personal domain intelligence" category alongside AI swarm and agentic RAG developments in 2026–2027.

The table below reflects debate from three perspectives (product strategy, engineering constraints, researcher utility) and includes the rationale behind each prioritization decision.

Priority	Feature	Description	Rationale	Status
P0	MCP Server	Expose Pulsar's knowledge base, signal history, and hypothesis confidence scores as an MCP endpoint — queryable by Claude, Cursor, or any MCP-compatible client	Strategic moat: no competitor (n8n, Dify, RAGFlow) offers a domain-knowledge MCP endpoint. Turns Pulsar from a "script collection" into queryable intelligence infrastructure; makes every downstream AI tool domain-aware without custom integration	✅ Done
P0	Multi-domain config	Extend from 1 domain to N domains under a shared scheduler and delivery layer, with per-domain config files and isolated memory paths	Structural prerequisite for all cross-domain features; already partially supported at config level — needs pipeline unification and routing logic	✅ Done
P0	One-click deploy script	Interactive `setup.sh` that scaffolds `.env`, `active-config.json`, GitHub config, and first cron load in a single guided run	Reduces copy-and-adapt friction from ~1 hour to minutes; community adoption depends on this; first impression determines whether anyone clones beyond the original deployment	✅ Done
P1	Quality Drift Detector	Track signal density, rating distribution, and LLM output quality per source; alert when metrics drop systematically over 3+ consecutive days	More fundamental than Spike Detector: a spike is a one-time event, drift is silent pipeline degradation. Watchdog already catches "did it run"; drift detection catches "is what it produces still meaningful"	✅ Done
P1	Agent Role-Switching (requires 4 GB RAM)	Refactor the three-stage chain into named roles — Reader, Analyst, Memory, Delivery — executed sequentially; each role can use a different model size	Sequential role-switching (not true parallel swarm) is the only architecture compatible with 2 GB servers. The value is model-level targeting: cheap model for Reader, strong model for Analyst, without rewriting the whole pipeline	✅ Done
P1	Cross-domain Rule Engine → v2	User-defined deterministic rules for cross-domain signal bridging: `IF vla_rating:⚡ AND keyword IN ["diffusion", "flow matching"] THEN flag_for_ai_app_review`	LLM-generated cross-domain discovery produces too many false positives ("both domains mention transformers"). Deterministic rules are auditable, predictable, and encode the user's actual cross-domain hypotheses rather than letting the model guess	✅ Done (v2: 7 rules + LLM significance)
P1	Field-State Trigger	Zero-LLM gate that decides whether a daily deep-dive should run at all; 6 trigger types (breakthrough density, paradigm shift, consensus drift, silent decay, cross-domain pull, release clustering)	A pipeline that always generates a deep-dive eventually generates noise deep-dives on slow days. Field-state lets the system stay quiet with an auditable reason; bounds LLM cost without losing time-sensitive signals	✅ Done
P2	GitHub Issues Adoption Sensor	Watch OSS-repo issue/PR cadence (not stars) and infer adoption phase (incubation → growth → mainstream → maturity), Daily Field Index, and cross-repo convergence	Stars measure attention, not adoption. The maintainer-facing signal of "is this library winning" lives in issue cadence and contributor diversity; a 4-script pipeline turns that into a structured field map you can grep	✅ Done
P2	Spike Detector	Out-of-schedule alert when a keyword's signal density exceeds 3× its 7-day baseline within 24 hours; triggers push immediately, bypasses daily batch	Daily batch is insufficient for ⚡-level events: top-conference papers generate community debate within hours, not the next morning. Spike detection restores time-sensitivity without replacing the batch pipeline	⏭️ Skipped
P2	Devil's Advocate Report	Each reasoning report appends a "Strongest Counterargument" section via a separate adversarial agent pass	Replaces "Debate mode": users don't need to read a full debate, they need the best objection in 2 sentences. Reduces confirmation bias in output without doubling report length; the prior framing added UX friction with no analytical gain	✅ Done
P2	Entity Tracker	Extract `{author, lab, benchmark, method}` from every ⚡/🔧-rated signal into a structured JSON index, queried across the rolling 90-day window	Covers 80% of knowledge-graph use cases at a fraction of the cost. Answering "what has this lab published in 3 months?" requires an index, not full GraphRAG — and this index can be built incrementally with no upfront batch cost	✅ Done
P2	Upstream Signal Monitor	Track 1–2 upstream domains (e.g. computer vision for robotics, materials science for biomedical) for signals that historically precede breakthroughs in your domain; flag without deep analysis	Domain breakthroughs rarely originate within the domain itself: diffusion models came from image generation, not robotics. Upstream monitoring provides 1–3 month advance signals with near-zero added pipeline cost	✅ Done
P2	Semantic Memory Search	Vector index over the 60-day knowledge window; enables natural-language queries like "what contradicted assumption V-003 last month?"	Bridges the gap between file-based storage and actual knowledge retrieval. Without this, cross-report reasoning requires re-reading all historical outputs; with it, the system can answer questions about its own history	✅ Done
P3	GraphRAG Knowledge Graph	Convert Git commit history and Entity Tracker index into a relationship graph: paper ↔ author ↔ benchmark ↔ lab ↔ method; supports structured traversal queries	Deferred: Entity Tracker (P2) satisfies most retrieval needs first. GraphRAG's index construction is O(n²) in LLM calls and only becomes cost-effective at 6+ months of accumulated data or with significantly cheaper models than are available today	📋 Planned
P3	Prediction Score Public API	Expose each domain's biweekly prediction hit-rate and hypothesis confidence scores as a queryable endpoint — the "credibility score" for a domain intelligence source	Makes Pulsar's accuracy claims independently verifiable. Differentiates from every black-box AI summary tool that asserts importance without a track record; this turns the prediction loop into a public signal of system quality	📋 Planned
P4	Config & Domain Template Marketplace	Community hub for sharing domain configs, assumption templates, keyword sets, and validated cron blueprints across Pulsar instances	Replaces "Federated Calibration" (hypothesis confidence scores are context-dependent and cannot be meaningfully shared across instances with different sources, keywords, and rating criteria). What can be shared — and immediately useful — is the structure: domain config templates, RSS feed lists, hypothesis starter sets	📋 Planned

Changelog

2026-05-12 — Deep Capability Sync

Feature	Description
Cross-domain Rule Engine v2	Bumped `cross-domain-rule-engine.py` to 7 built-in rules (R001–R007) — covers VLA→AI technique transfer, AI→VLA framework adoption, embodied-AI bridging, foundation-model backflow, paradigm fusion, GitHub-repo convergence, and LLM-generated transfer hypotheses. Each insight batch is now annotated with a one-sentence LLM "cross-domain significance" line.
GitHub Issues Adoption Sensor	New 4-script pipeline (`collect-github-issues.py`, `compute-gh-adoption.py`, `_gh_issues_config.py`, `update-gh-field-notes.py`) — watches a curated registry of OSS repos and infers adoption phases (incubation → growth → mainstream → maturity), the Daily Field Index (DFI), and cross-repo convergence signals. Output can be pushed back into your knowledge-base via env-configurable target.
Field-State Trigger	New `ai-field-state.py` — mechanical, zero-LLM gate with 6 trigger types (breakthrough density, paradigm shift, consensus drift, silent decay, cross-domain pull, release clustering). Runs ahead of deep-dive scheduling to bound LLM cost.
Community Context Prep	New `prep-community-context.py` — fetches compact summaries of community field notes + the latest adoption snapshot into a single tmp file weekly/biweekly reports can read. Target repo configurable via `PULSAR_FIELD_NOTES_REPO`.
MCP Server: 12 tools	Docs and counts corrected: `list_domains`, `get_domain_config`, `search_memory` are documented alongside the original 9. Tool count was previously misstated as 11.
Watchdog: 16 checks	Count corrected from 15 (added `quality_drift` and `ai_deep_dive` follow-up after Drift Detector and AI deep-dive shipped).
Architecture doc	New `docs/architecture.md` documents the 4-layer model (INPUT → PROCESSING → OUTPUT → QUALITY) and the closed self-correction loop.
Self-check script	New `scripts/check-pipeline.py` — AST-parses every script, confirms shared helpers import, smoke-runs the no-LLM leaves against an empty memory dir; day-1 expected exits are pinned to their error messages so the check is meaningful before first real cron run.
Roadmap rows	Field-State Trigger and GitHub Issues Adoption Sensor added to the Roadmap as ✅ Done (previously surfaced only in Advanced Capabilities).
Bug fix	`prep-community-context.py` had three broken string literals (literal newlines instead of `\n` escapes) — would raise `SyntaxError` on import. Same bug exists in the maintainer's private fork; this patch is OSS-template only.

2026-02-28 — P0 Infrastructure Release

Feature	Description
MCP Server	11-tool MCP server exposing the full knowledge base to Claude Desktop, Cursor, or any MCP client — query VLA signals, SOTA, releases, social intel, predictions, and pipeline health in plain conversation
Multi-domain config	`memory/domains.json` registry + `scripts/_domain_loader.py` shared loader — add a 3rd research domain by editing one file instead of modifying scripts
One-click deploy	`scripts/setup.sh` — guided 6-step installer (Python check, `mcp` install, interactive config prompts, config file generation, path substitution, verification + Claude Desktop JSON output)

Acknowledgements

Pulsar is built on top of Moltbot (formerly OpenClaw) — the agent gateway that handles cron scheduling, LLM routing, and Telegram delivery. Without Moltbot's reliable scheduling and agent runtime, the 33-job autonomous pipeline wouldn't be possible.

Thanks to the Moltbot team for building and maintaining the infrastructure that Pulsar runs on.

Community & Contributing

Have a question, idea, or want to fork this into your own domain's Pulsar?

💬 File an issue: GitHub Issues
🔀 Pull requests: improvements to pipelines, new domain support, and bug fixes are welcome
📡 See the reference outputs: VLA-Handbook · Agent-Playbook

MIT License — fork it, adapt it, make it your own domain's Pulsar.

Document	Contents
docs/deployment/README.md	Start-here deployment guide — quickstart preset, your-own-domain walkthrough, troubleshooting
AGENTS.md	Full deployment guide: key config · path reference · troubleshooting
docs/architecture.md	4-layer model · 3-pass dedup · closed self-correction loop · end-to-end timing
docs/use-cases/README.md	Index of all 14 capabilities — what each does, status, when to enable
docs/mcp.md	12-tool MCP API reference
scripts/SCRIPTS.md	Full DAG of all pipeline scripts · I/O for each script
VLA-Handbook	Reference VLA knowledge repo (live output)
Agent-Playbook	Reference AI tools knowledge repo (live output)

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
config		config
docs		docs
memory		memory
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
SETUP_PROMPT.md		SETUP_PROMPT.md

Folders and files

Latest commit

History

Repository files navigation

Pulsar · 照见: Automated Domain Intelligence Engine

Overview

The Challenges of Domain Intelligence

How Pulsar Solves Them

Quick Start

⚡ One-command Setup (Recommended)

🤖 AI-Assisted Setup (Cursor · Claude · ChatGPT)

Prerequisites

1. Clone the Repository

2. Configure Your Research Domain

3. Set Up Environment Variables

4. Start the Moltbot Gateway

5. Load Cron Jobs

6. Run Your First Pipeline

Collect today's signals (VLA robotics example)

Trigger the full rating + push pipeline

7. Verify the install

Core Concepts

1. Signal Rating Engine

2. Three-Stage Reasoning Chain

3. Knowledge Written to Git

4. Watchdog Self-Healing

5. Biweekly Prediction ✅/❌ Loop

6. Self-Evolving Belief System

Advanced Capabilities

Cross-domain Rule Engine v2 (7 built-in rules)

GitHub Issues Adoption Sensor (4 scripts)

Field-State Trigger (mechanical, zero LLM)

Semantic Memory Search

Multi-domain Routing

Project Architecture

How Pulsar Compares

Biomimetic Architecture

Reference Deployment: Key Numbers

Reference Output Repositories

Roadmap

Changelog

2026-05-12 — Deep Capability Sync

2026-02-28 — P0 Infrastructure Release

Further Reading

Acknowledgements

Community & Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages