An AI-powered desktop automation agent that executes web searches, opens websites, launches applications, and takes screenshots through natural language commands, with 0% URL hallucination rate achieved through Python+LLM hybrid architecture.
InShot_20260107_173956683.mp4
Most AI agents are demo-grade.
They hallucinate structured data (URLs, IDs, file paths), execute incorrect actions, and collapse the moment they are connected to real systems.
Not-Jarvis is designed from the opposite direction: production reliability first.
It enforces hard boundaries between reasoning and execution:
- Python performs all deterministic extraction (no guessing, no generation)
- LLMs are restricted to semantic planning only
- The agent re-plans after every action using real execution results
Result: an OS-level automation agent that behaves deterministically, streams execution in real time, and remains safe to run outside of toy demos.
- Zero URL Hallucination: Deterministic Python extraction ensures 100% accurate website URLs
- Persistent Memory: Conversation history stored in PostgreSQL survives server restarts
- Real-Time Streaming: Server-Sent Events provide live task execution updates
- Iterative Planning: Re-plans after each action based on actual results (no wasted multi-step plans)
- Multi-Turn Conversations: Maintains context across requests using LangGraph checkpointing
Client Request β FastAPI β LangGraph StateGraph β Gemini 2.5 Flash
β
PostgreSQL (Supabase)
β
Persistent Conversation Memory
Execution Loop:
- TaskPlanner: Analyzes goal + conversation history β plans single next action
- Executor: Executes action (search, open_website, screenshot, open_app)
- Loop Back: TaskPlanner re-evaluates with execution results
- Reception: Formats final response when task complete
Key Design Decision: Single-step planning instead of multi-step plans eliminates wasted LLM calls when intermediate results differ from expectations.
LLMs frequently hallucinate URLs:
User: "Find the MIT website"
LLM: "Opening https://mit.com" β (Hallucinated - real site is mit.edu)
Separate deterministic operations from semantic decisions:
# Python Extracts URLs (Zero Hallucination)
def enhanced_search(query: str) -> str:
results = serpapi.search(query)
# Extract top 4 URLs into indexed map
url_map = {i: results['organic_results'][i]['link']
for i in range(min(4, len(results)))}
return f"{results}\n\n[URL_MAP]: {url_map}"
# LLM selects index (0-3), never sees full URLs
# Executor resolves: url_index 0 β actual URL β opens browserResults: 0% URL hallucination across all test queries.
| Component | Technology | Purpose |
|---|---|---|
| Backend | FastAPI (Async) | HTTP server with SSE streaming |
| AI Model | Gemini 2.5 Flash | Task planning & decision making |
| Agent Framework | LangGraph | State management & workflow orchestration |
| Database | PostgreSQL (Supabase) | Persistent conversation checkpoints |
| Web Search | SerpAPI | Real-time search results |
| OS Automation | webbrowser, pyautogui | System-level actions |
- Python 3.11+
- PostgreSQL database (Supabase recommended)
- Google API key (Gemini)
- SerpAPI key
- Clone & Install
git clone https://github.com/YourUsername/Not-Jarvis.git
cd Not-Jarvis
python -m venv Jarvis
Jarvis\Scripts\activate # Windows
pip install -r requirements.txt- Environment Variables
Copy
.env.exampleto.envand fill in your API keys:
cp .env.example .envThen edit .env:
GOOGLE_API_KEY=your_gemini_api_key
SERPAPI_API_KEY=your_serpapi_key
DATABASE_URL=postgresql://user:pass@host:5432/db- Run Server
uvicorn main:app --reload- Run Client
python client.pyYou: Find the MIT website and open it
π Searching for: MIT website...
β
Search complete
π Opening website...
β
Website opened
I have opened the MIT website for you.
[Browser opens https://web.mit.edu]
You: What was my previous request?
You asked me to find and open the MIT website.
You: Take a screenshot
πΈ Taking screenshot...
β
Screenshot saved
Screenshot has been captured and saved.
Note: Each action streams progress updates in real-time via Server-Sent Events.
# Each request uses same thread_id for continuity
config = {"configurable": {"thread_id": "USER_123"}}
# LangGraph automatically:
# - Loads previous messages from DB
# - Merges with new request
# - Saves updated state after each nodeclass State(TypedDict):
messages: Annotated[list, add_messages] # Persists (reducer accumulates)
executor_memory: list # Resets per request
user_goal: str # New each request
loop_count: int # Tracks iterations
reception_output: str # Streaming updates (any node can set)Design:
messagespersist for conversation contextexecutor_memoryresets for independent task executionreception_outputis set by any node for real-time progress updates
# SerpAPI returns structured JSON
search_results = searching_tool.results(query)
# Python extracts top 4 URLs deterministically
url_map = {i: result['link'] for i, result in enumerate(results[:4])}
# LLM receives: "[URL_MAP]: {0: url1, 1: url2, 2: url3, 3: url4}"
# LLM returns: url_index: 0 (just a number, never generates URLs)
# Executor resolves index β actual URL (no LLM generation = no hallucination)1. Client β POST /not-jarvis/stream
2. FastAPI β workflow.compile(checkpointer=supabase)
3. LangGraph β loads messages from DB (thread_id lookup)
4. TaskPlanner β streams "π Searching..." β plans action
5. Executor β streams "β
Search complete" β executes
6. Loop β repeats until is_complete=True (each node streams updates)
7. Reception β streams final summary
8. SSE β delivers all updates to client in real-time
Streaming Pattern: Every node (TaskPlanner, Executor, Reception) can set reception_output, which FastAPI immediately streams via Server-Sent Events. User sees progress as it happens.
- checkpoints: State snapshots (user_goal, loop_count, route_to)
- checkpoint_writes: Messages & executor_memory (msgpack encoded)
- checkpoint_blobs: Large data overflow
- checkpoint_migrations: Schema versioning
Problem: LLMs hallucinate structured data (URLs, emails, IDs)
Solution: Use Python for extraction, LLMs for semantic decisions
Before: TaskPlanner returned 10 steps, executed only first (9 wasted)
After: Single-step planning, re-plan after each execution
Messages: Persist (conversation context)
Executor Memory: Reset (task independence)
Design trade-off: Conversation continuity vs clean task boundaries
- Windows-Only: Uses Windows
startcommand and webbrowser module - No Error Recovery: Failed actions don't retry automatically
- URL_MAP Limited to 4 Results: Only top 4 search results available
- No Observability: Missing structured logging, metrics, and tracing (LangSmith, OpenTelemetry)
- Single-User Design: Hardcoded session ID, no multi-user support
- Cross-platform support (macOS, Linux)
- Add observability: LangSmith for LLM tracing, Prometheus for metrics
- Authentication & multi-user support
- Retry logic and error recovery
- Add more tools (file operations, email, calendar)
- Voice input/output integration
- Deploy with Docker + managed PostgreSQL
Apache License 2.0 - See LICENSE file for details
Gokul Sree Chandra
Designing and building AI agents & backend infrastructure
Agentic systems, FastAPI, LangGraph, reliability-first architecture
Built with focus on production-ready patterns: async/await, connection pooling, structured outputs, error handling, and architectural clarity.