Skip to content

itsyasirkhandev/chatwithpage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chat with Page ✨

A sophisticated Chrome extension that enables AI-powered conversations about any webpage with advanced content extraction, intelligent routing, smart caching, and a premium glassmorphic UI. Built with LangChain, Google Gemini, and modern web technologies.

Chrome Extension LangChain Gemini License

UI Preview

chrome extension

✨ Key Features

🚀 Core Capabilities

  • Smart Content Extraction - Mozilla Readability + DOMPurify + Turndown pipeline for pristine markdown
  • Auth-Aware - Works on authenticated pages using browser cookies (Notion, Gmail, paywalled sites)
  • Real-Time Streaming - Progressive AI responses with live streaming
  • Persistent Conversations - Tab-based chat history with session storage
  • Dynamic Content - Handles JavaScript-rendered and SPA pages
  • Smart Caching - Instant reload - caches embeddings and summaries per page/URL
  • Intelligent Rate Limit Handling - Auto-detects 429 errors and offers model fallback with one click

🎯 Intelligent Processing

  • 4-Path Adaptive Routing - Automatically selects optimal strategy based on document size
  • Token Optimization - 25-30% reduction through advanced text cleaning
  • Hybrid Summary+Search - Best-of-both-worlds approach for quality + cost efficiency
  • LLM-Driven Auto-Escalation - Seamlessly switches from summary to detailed retrieval when needed
  • Cost Optimization - Up to 87% cost reduction vs naive approaches
  • Processing Cache - No rebuilding on reopen - instant load from cache
  • Markdown-Aware Splitting - Respects document structure (headers, code blocks, lists) for better chunks
  • Dynamic Chunk Sizing - Automatically adjusts chunk size (500-2500) with 10-20% overlap based on document length
  • Model Fallback Chain - Seamless switching between 4 Gemini models when rate limited

🎨 Premium UI/UX

  • Glassmorphic Design - Modern frosted-glass aesthetic with backdrop blur
  • Dark Mode - Seamless theme switching with system detection
  • Toast Notifications - Non-intrusive feedback (success, error, info, warning)
  • Processing Indicators - Real-time status during background operations
  • Mode Badges - Visual indicators (📦 Direct, ⚡ Hybrid, 🔍 RAG)
  • Token Counter - Live token count with cost estimation
  • Document Stats - Word count and reading time display
  • Markdown Rendering - Beautiful formatting with syntax highlighting
  • Keyboard Shortcuts - Cmd/Ctrl+K (new chat), Cmd/Ctrl+L (clear), Escape (close)

🛠️ Tech Stack

Technology Purpose
LangChain LLM orchestration, chains, prompts, and memory management
Google Gemini 2.5 Flash Lightning-fast AI for chat and summarization
Google Embeddings API text-embedding-004 model for semantic search
Mozilla Readability Article extraction (powers Firefox Reader View)
DOMPurify XSS-proof HTML sanitization
Turndown HTML to Markdown conversion
Marked.js Markdown rendering with GFM support
Highlight.js Syntax highlighting for code blocks
Vite Next-gen build tool with HMR
CRXJS Chrome extension dev with Vite
Lucide Beautiful consistent icons

Installation

Prerequisites

Setup

  1. Clone the repository

    git clone <repository-url>
    cd langchain-chrome-extension
  2. Install dependencies

    npm install
  3. Build the extension

    npm run build
  4. Load in Chrome

    • Open chrome://extensions/
    • Enable "Developer mode" (top right toggle)
    • Click "Load unpacked"
    • Select the dist folder

Usage

  1. Navigate to any webpage you want to chat about

  2. Click the extension icon in Chrome's toolbar

  3. Enter your Gemini API key (first time only)

    • Get a free key from Google AI Studio
    • The key is stored locally and never shared
  4. Start chatting!

    • Ask questions about the page content
    • Follow up with related questions
    • The AI remembers your conversation

Example Questions

  • "What is this article about?"
  • "Summarize the main points in 3 bullets"
  • "What did the author say about X?"
  • "Can you explain that technical concept in simpler terms?"
  • "Compare and contrast the arguments presented"
  • "What are the key takeaways?"

UI Features

Visual Indicators:

  • 📦 Direct Mode - Small doc, full content sent
  • Hybrid Mode - Medium doc, summary + search
  • 🔍 RAG Mode - Large doc, retrieval only
  • 🪙 Token Counter - Live count with cost (e.g., "12.5K tokens • $0.009")
  • 📄 Document Stats - Word count and estimated reading time

Notifications:

  • ✅ Green toast - Success (API key saved, ready to chat)
  • ❌ Red toast - Error (rate limit, API issues)
  • ⚠️ Yellow toast - Warning (fallback mode)
  • ℹ️ Blue toast - Info (processing status)

Keyboard Shortcuts:

  • Cmd/Ctrl + K - Clear chat and start new conversation
  • Cmd/Ctrl + L - Clear current input
  • Escape - Close extension popup
  • Enter - Send message
  • Shift + Enter - New line in message

Project Structure

langchain-chrome-extension/
├── src/
│   ├── popup/
│   │   ├── index.html      # Chat UI markup
│   │   ├── main.js         # LangChain integration & chat logic
│   │   └── style.css       # Modern styling
│   ├── content/
│   │   └── content.js      # Page content extraction
│   └── background/
│       └── background.js   # Service worker & history cleanup
├── public/
│   └── icons/              # Extension icons (16, 48, 128px)
├── dist/                   # Built extension (load this in Chrome)
├── manifest.json           # Extension configuration
├── vite.config.js          # Vite build configuration
└── package.json

Development

Commands

Command Description
npm run dev Start development server with hot reload
npm run build Build for production
npm run preview Preview production build

Development Mode

npm run dev

This starts Vite's dev server with hot module replacement. Changes to your code will automatically update the extension.

How It Works

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Content       │     │    Popup        │     │   Background    │
│   Script        │────▶│    (Chat UI)    │────▶│   Service       │
│                 │     │                 │     │   Worker        │
│ Readability +   │     │ Smart Router:   │     │ Clears history  │
│ DOMPurify +     │     │ Stuffing/Hybrid │     │ on tab close    │
│ Turndown +      │     │ Summary/RAG     │     │                 │
│ Text Cleaning   │     │ + Gemini API    │     │                 │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Intelligent 4-Path Routing System

The extension automatically chooses the best strategy based on document size:

Document Size Detection:
├─ < 10k tokens   → Path A: Direct Stuffing
│                   ├─ Strategy: Send full content to LLM
│                   ├─ Speed: Instant ⚡
│                   └─ Cost: ~$0.002/query
│
├─ 10k-20k tokens → Path B: Hybrid Summary+Search ⭐
│                   ├─ Strategy: Compact summary (500-1k tokens) + embeddings
│                   ├─ Auto-escalation: Summary first → retrieval if needed
│                   ├─ Speed: 1-2s (summary) or 3-4s (retrieval)
│                   └─ Cost: ~$0.0003/query (summary) or ~$0.001 (retrieval)
│
├─ 20k-30k tokens → Path C: Summary-First with Auto-Escalation
│                   ├─ Strategy: Standard summary (1k-2k tokens) + embeddings
│                   ├─ Auto-escalation: Same as Path B
│                   ├─ Speed: 2-3s (summary) or 4-5s (retrieval)
│                   └─ Cost: ~$0.0004/query average
│
└─ > 30k tokens   → Path D: Pure RAG
                    ├─ Strategy: Chunk + embed + retrieve only
                    ├─ Speed: 3-4s per query
                    └─ Cost: ~$0.001/query

Smart Auto-Escalation (Paths B & C)

When using Hybrid mode, the system intelligently routes questions:

  1. General questions → Answered from summary

    • "What's this article about?"
    • "What's the author's main argument?"
    • "Summarize the key points"
    • Result: Fast, cheap (~$0.0003)
  2. Specific questions → Auto-escalates to retrieval

    • LLM detects need: 🔍 NEEDS_DETAIL: [description]
    • System retrieves relevant chunks
    • Answers with exact details
    • Result: Accurate, still efficient (~$0.001)

User sees: "Searching for [detail]..." then detailed answer

Content Extraction Pipeline

1. Readability (Mozilla)
   └─ Extracts "Reader View" content
   └─ Removes: ads, navigation, sidebars, footers
   
2. DOMPurify
   └─ Sanitizes HTML (XSS protection)
   └─ Removes: scripts, dangerous attributes
   
3. Turndown
   └─ Converts to Markdown
   └─ Preserves: headings, lists, tables, code blocks, links
   
4. Advanced Text Cleaning (15 regex patterns)
   └─ Removes: copyright footers, "related articles", comments sections
   └─ Removes: social buttons, email prompts, excessive whitespace
   └─ Result: 25-30% token reduction

Smart Caching System

Cache Behavior:

First Open (Cold Cache):
  ├─ Extract content from page
  ├─ Generate summary (if needed) ⏱️ 5-15s
  ├─ Create embeddings ⏱️ 10-30s
  ├─ Save to chrome.storage.session
  └─ Total: 15-45 seconds

Reopen Same Page (Warm Cache):
  ├─ Check cache: matching tabId + URL ✓
  ├─ Restore summary (instant)
  ├─ Restore embeddings (instant)
  └─ Total: <1 second ⚡

Different Page or URL Changed:
  ├─ Cache miss (URL mismatch)
  ├─ Process as new page
  └─ Create new cache entry

Cache Storage:

  • Location: chrome.storage.session (temporary, browser session only)
  • Cleared: When browser closes or tab closes completely
  • Size: Embeddings + summary (~100KB - 2MB per page)
  • No disk footprint: All in-memory during session

Benefits:

  • ⚡ 95-99% faster on reopen
  • 💰 Zero API cost for cached pages
  • 🎯 Only process once per page visit
  • 🔒 Privacy-first: Data cleared automatically

LangChain Components Used

  • ChatGoogleGenerativeAI - Gemini chat model wrapper
  • GoogleGenerativeAIEmbeddings - Gemini embeddings for RAG
  • ChatPromptTemplate - Structured prompts with system messages
  • MessagesPlaceholder - Injects conversation history into prompts
  • RunnableSequence - Chains components together
  • StringOutputParser - Parses streaming text output
  • RecursiveCharacterTextSplitter - Markdown-aware splitting with dynamic chunk sizing (500-2500 chars, 10-20% overlap)
  • HumanMessage / AIMessage - Message types for history

Custom Implementation

  • MemoryVectorStore - In-browser embedding storage (no external DB)
  • Cosine Similarity - Custom retrieval using vector similarity
  • Auto-Escalation Logic - Detects LLM signals for detail needs
  • Parallel Processing - Summary + embeddings generated simultaneously

Data Flow

  1. Page Load → Readability extracts clean content → DOMPurify sanitizes → Turndown converts to Markdown → Advanced cleaning
  2. Size Detection → Estimates tokens → Routes to appropriate path (A/B/C/D)
  3. Preprocessing (if needed):
    • Path B/C: Generate summary + embeddings in parallel
    • Path D: Generate embeddings only
  4. User Question → Added to chat history
  5. Smart Routing:
    • Hybrid mode: Try summary → Auto-escalate if needed
    • RAG mode: Retrieve relevant chunks
    • Direct mode: Use full content
  6. AI Response → Streamed to UI, added to history, saved to storage
  7. Tab Close → Background script clears that tab's history

Cost Optimization

Real-World Savings Example

Scenario: 15k token document, 10 queries (8 general + 2 specific)

Approach Cost Breakdown Total
Naive Stuffing 10 × $0.01 $0.10
Pure Summary $0.01 + 9 × $0.0003 $0.013
Hybrid (This Extension) Setup: $0.01 + Summary: $0.0024 + Specific: $0.002 $0.0144
Savings - 85-87%

Cost per Query by Mode

Mode Context Size Cost/Query
Direct Stuffing 10k tokens ~$0.002
Hybrid (Summary) 500-1k tokens ~$0.0003
Hybrid (Retrieval) 4k tokens ~$0.001
Pure RAG 4k tokens ~$0.001

Key Insight: 80% of questions are general → Hybrid mode answers 80% of queries at 1/3 the cost!

Configuration

Changing the Model

Edit src/popup/main.js:

const model = new ChatGoogleGenerativeAI({
  model: 'gemini-2.5-flash-lite',  // Change model here
  apiKey: apiKey,
  streaming: true,
});

Available fast models:

  • gemini-2.5-flash-lite (fastest)
  • gemini-2.5-flash (balanced)
  • gemini-2.0-flash (legacy)

Adjusting History Limit

Edit the history limit in src/popup/main.js:

// Keep last N exchanges (currently 10 exchanges = 20 messages)
if (chatHistory.length > 20) {
  chatHistory = chatHistory.slice(-20);
}

Adjusting Token Thresholds

Change when different routing modes activate in src/popup/main.js:

async function processPageContent(apiKey) {
  const estimatedTokens = markdown.length / 4;
  
  if (estimatedTokens < 10000) {          // Path A: Direct
  } else if (estimatedTokens < 20000) {   // Path B: Hybrid
  } else if (estimatedTokens < 30000) {   // Path C: Hybrid+
  } else {                                // Path D: RAG
  }
}

Customizing RAG Parameters

Adjust retrieval settings:

// In retrieveRelevantChunks() function
const relevantChunks = await retrieveRelevantChunks(
  question, 
  4,      // k: number of chunks to retrieve (increase for more context)
  0.5     // scoreThreshold: minimum similarity (0-1, lower = more results)
);

// In setupRAGPipeline() function
const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,        // Size of each chunk
  chunkOverlap: 200,      // Overlap between chunks
});

Permissions

Permission Reason
activeTab Access current tab's content
storage Store API key and chat history
scripting Inject content script
tabs Detect tab closure for history cleanup

Advanced Features

Token Cleaning Pipeline

The extension applies aggressive text cleaning to reduce token usage:

Removes:

  • Copyright footers and legal text
  • "Related Articles" / "You May Also Like" sections
  • Comments sections
  • Social media share buttons
  • Email subscription prompts
  • Navigation breadcrumbs
  • Standalone URLs and reference links
  • Excessive whitespace

Result: 25-30% token reduction while preserving actual content

Markdown Preservation

Converting to Markdown helps LLMs understand document structure:

  • Headings (#, ##, ###) → Better section understanding
  • Lists (-, 1.) → Clearer enumeration
  • Tables → Structured data comprehension
  • Code blocks → Syntax preservation

In-Browser Vector Store

Custom implementation with no external dependencies:

  • Storage: Plain JavaScript arrays in memory
  • Embeddings: Gemini text-embedding-004 model
  • Retrieval: Cosine similarity calculation
  • Performance: ~100ms for similarity search across 100 chunks
  • Privacy: Everything stays in your browser

Auto-Escalation Mechanism

The system uses a special prompt pattern to detect when details are needed:

LLM Response: "🔍 NEEDS_DETAIL: specific date from section 3"
                ↓
Extension detects signal
                ↓
Retrieves relevant chunks
                ↓
Re-queries with detailed context
                ↓
Returns specific answer

This happens automatically without user intervention.

🔧 Troubleshooting

Extension icon not showing

  • Refresh the extension in chrome://extensions/
  • Remove and re-add the extension
  • Make sure you loaded the dist folder, not the root folder

"Could not load page content"

  • Refresh the webpage and reopen the extension
  • Some pages cannot be accessed: chrome://, chrome-extension://, edge://, about: URLs
  • Very dynamic SPAs may need a moment to fully render before extraction
  • Check browser console (F12) for detailed error messages

API Errors & Rate Limits

"🚫 Rate Limit Reached - Switch Model to Continue"

  • Extension detects rate limits instantly and offers to switch to next model
  • Model fallback chain: gemini-2.5-flash-lite → gemini-2.5-flash → gemini-2.0-flash-lite → gemini-2.0-flash
  • Click "Switch Model & Continue" to automatically retry with next model
  • If all models exhausted, wait 60 seconds and try again
  • Free tier limits: 15 requests/minute per model, 1,500 requests/day
  • Check quota: Google AI Studio

"🔧 Gemini API is temporarily unavailable (503)"

  • Google's servers are overloaded (temporary)
  • Wait 10-30 seconds and try again
  • Usually resolves within seconds

"❌ Invalid API key"

  • Verify your API key at Google AI Studio
  • Generate a new key if needed
  • Make sure you copied the entire key

Performance & Processing

Slow initial processing

  • Normal: Large documents (15k+ tokens) take 15-45 seconds to process
  • Processing happens once - subsequent reopens are instant (<1s) thanks to caching
  • Progress shown: Watch the processing status indicator for updates
  • "Creating Summary", "Building Search Index", etc.

Cache not working

  • Cache is per-tab + URL - changing URL creates new cache
  • Cache cleared when browser closes or tab closes completely
  • Check console for: "✓ Loaded processing cache from [timestamp]"

Theme not persisting

  • Theme preference is stored in localStorage
  • Clearing browser data will reset to system preference
  • Manual toggle always overrides system preference

UI/UX Issues

Toast notifications not showing

  • Check if browser notifications are blocked
  • Look for toasts in top-right corner of popup
  • They auto-dismiss after 3-5 seconds

Keyboard shortcuts not working

  • Make sure extension popup is focused
  • Shortcuts: Cmd/Ctrl+K, Cmd/Ctrl+L, Escape
  • Refresh extension if shortcuts stop responding

Mode badge showing wrong mode

  • This updates after processing completes
  • Badge reflects the actual strategy being used
  • 📦 Direct, ⚡ Hybrid, 🔍 RAG

Console Debugging

Open DevTools (F12) on the popup for detailed logs:

  • Processing status: "🔄 No cache found - processing document"
  • Cache hits: "📦 Using cached processing"
  • API calls: "Initializing LangChain in [mode] mode"
  • Errors: Full stack traces with context

License

MIT License - feel free to use this project for learning or as a base for your own extensions.

Acknowledgments

  • LangChain for the excellent JS/TS library and RAG implementation
  • Google AI for Gemini API (chat and embeddings)
  • Mozilla for Readability - the gold standard in content extraction
  • DOMPurify for bulletproof HTML sanitization
  • Turndown for reliable HTML-to-Markdown conversion
  • CRXJS for making Chrome extension development with Vite seamless

🏆 Technical Highlights

This extension demonstrates several advanced RAG (Retrieval-Augmented Generation) techniques:

  1. Intelligent 4-Path Routing - Automatically chooses between stuffing, hybrid, and RAG based on document size
  2. Hybrid Summary+Search - Novel approach combining summaries with retrieval for optimal cost/quality
  3. LLM-Driven Auto-Escalation - Intelligent detection of when to switch from summary to detailed retrieval
  4. Smart Caching System - Session-based cache for instant reload without API calls (95-99% faster)
  5. Token Optimization - Multi-stage cleaning pipeline (Readability → DOMPurify → Turndown → 15 regex patterns)
  6. In-Browser Vector Store - Custom cosine similarity implementation requiring no external database
  7. Parallel Processing - Summary and embeddings generated simultaneously for 2x faster setup
  8. Auth-Aware Extraction - Uses browser cookies for authenticated pages (Notion, Gmail, paywalled sites)
  9. Premium UI/UX - Glassmorphic design, dark mode, toast notifications, processing indicators
  10. Markdown Rendering - Beautiful formatting with syntax highlighting and copy buttons
  11. Markdown-Aware Splitting - Respects document structure across all modes (headers, code blocks, lists, tables)
  12. Dynamic Chunk Sizing - Automatically adjusts chunk size and overlap (10-20%) based on document length
  13. Smart Rate Limit Detection - 3-guard system with cooldown, console clearing, and instant feedback
  14. Model Fallback Chain - Automatic model switching with one-click retry when rate limited

Performance: 95-99% faster on reopen thanks to intelligent caching Cost efficiency: Up to 87% reduction compared to naive approaches while maintaining quality User Experience: Professional, polished interface with real-time feedback


Built with ❤️ for efficient, intelligent web content interaction.