Chat with Page ✨

A sophisticated Chrome extension that enables AI-powered conversations about any webpage with advanced content extraction, intelligent routing, smart caching, and a premium glassmorphic UI. Built with LangChain, Google Gemini, and modern web technologies.

UI Preview

✨ Key Features

🚀 Core Capabilities

Smart Content Extraction - Mozilla Readability + DOMPurify + Turndown pipeline for pristine markdown
Auth-Aware - Works on authenticated pages using browser cookies (Notion, Gmail, paywalled sites)
Real-Time Streaming - Progressive AI responses with live streaming
Persistent Conversations - Tab-based chat history with session storage
Dynamic Content - Handles JavaScript-rendered and SPA pages
Smart Caching - Instant reload - caches embeddings and summaries per page/URL
Intelligent Rate Limit Handling - Auto-detects 429 errors and offers model fallback with one click

🎯 Intelligent Processing

4-Path Adaptive Routing - Automatically selects optimal strategy based on document size
Token Optimization - 25-30% reduction through advanced text cleaning
Hybrid Summary+Search - Best-of-both-worlds approach for quality + cost efficiency
LLM-Driven Auto-Escalation - Seamlessly switches from summary to detailed retrieval when needed
Cost Optimization - Up to 87% cost reduction vs naive approaches
Processing Cache - No rebuilding on reopen - instant load from cache
Markdown-Aware Splitting - Respects document structure (headers, code blocks, lists) for better chunks
Dynamic Chunk Sizing - Automatically adjusts chunk size (500-2500) with 10-20% overlap based on document length
Model Fallback Chain - Seamless switching between 4 Gemini models when rate limited

🎨 Premium UI/UX

Glassmorphic Design - Modern frosted-glass aesthetic with backdrop blur
Dark Mode - Seamless theme switching with system detection
Toast Notifications - Non-intrusive feedback (success, error, info, warning)
Processing Indicators - Real-time status during background operations
Mode Badges - Visual indicators (📦 Direct, ⚡ Hybrid, 🔍 RAG)
Token Counter - Live token count with cost estimation
Document Stats - Word count and reading time display
Markdown Rendering - Beautiful formatting with syntax highlighting
Keyboard Shortcuts - Cmd/Ctrl+K (new chat), Cmd/Ctrl+L (clear), Escape (close)

🛠️ Tech Stack

Technology	Purpose
LangChain	LLM orchestration, chains, prompts, and memory management
Google Gemini 2.5 Flash	Lightning-fast AI for chat and summarization
Google Embeddings API	text-embedding-004 model for semantic search
Mozilla Readability	Article extraction (powers Firefox Reader View)
DOMPurify	XSS-proof HTML sanitization
Turndown	HTML to Markdown conversion
Marked.js	Markdown rendering with GFM support
Highlight.js	Syntax highlighting for code blocks
Vite	Next-gen build tool with HMR
CRXJS	Chrome extension dev with Vite
Lucide	Beautiful consistent icons

Installation

Prerequisites

Node.js (v18 or higher)
Google AI Studio API Key

Setup

Clone the repository

git clone <repository-url>
cd langchain-chrome-extension

Install dependencies
```
npm install
```
Build the extension
```
npm run build
```
Load in Chrome
- Open chrome://extensions/
- Enable "Developer mode" (top right toggle)
- Click "Load unpacked"
- Select the dist folder

Usage

Navigate to any webpage you want to chat about
Click the extension icon in Chrome's toolbar
Enter your Gemini API key (first time only)
- Get a free key from Google AI Studio
- The key is stored locally and never shared
Start chatting!
- Ask questions about the page content
- Follow up with related questions
- The AI remembers your conversation

Example Questions

"What is this article about?"
"Summarize the main points in 3 bullets"
"What did the author say about X?"
"Can you explain that technical concept in simpler terms?"
"Compare and contrast the arguments presented"
"What are the key takeaways?"

UI Features

Visual Indicators:

📦 Direct Mode - Small doc, full content sent
⚡ Hybrid Mode - Medium doc, summary + search
🔍 RAG Mode - Large doc, retrieval only
🪙 Token Counter - Live count with cost (e.g., "12.5K tokens • $0.009")
📄 Document Stats - Word count and estimated reading time

Notifications:

✅ Green toast - Success (API key saved, ready to chat)
❌ Red toast - Error (rate limit, API issues)
⚠️ Yellow toast - Warning (fallback mode)
ℹ️ Blue toast - Info (processing status)

Keyboard Shortcuts:

Cmd/Ctrl + K - Clear chat and start new conversation
Cmd/Ctrl + L - Clear current input
Escape - Close extension popup
Enter - Send message
Shift + Enter - New line in message

Project Structure

langchain-chrome-extension/
├── src/
│   ├── popup/
│   │   ├── index.html      # Chat UI markup
│   │   ├── main.js         # LangChain integration & chat logic
│   │   └── style.css       # Modern styling
│   ├── content/
│   │   └── content.js      # Page content extraction
│   └── background/
│       └── background.js   # Service worker & history cleanup
├── public/
│   └── icons/              # Extension icons (16, 48, 128px)
├── dist/                   # Built extension (load this in Chrome)
├── manifest.json           # Extension configuration
├── vite.config.js          # Vite build configuration
└── package.json

Development

Commands

Command	Description
`npm run dev`	Start development server with hot reload
`npm run build`	Build for production
`npm run preview`	Preview production build

Development Mode

npm run dev

This starts Vite's dev server with hot module replacement. Changes to your code will automatically update the extension.

How It Works

Architecture

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   Content       │     │    Popup        │     │   Background    │
│   Script        │────▶│    (Chat UI)    │────▶│   Service       │
│                 │     │                 │     │   Worker        │
│ Readability +   │     │ Smart Router:   │     │ Clears history  │
│ DOMPurify +     │     │ Stuffing/Hybrid │     │ on tab close    │
│ Turndown +      │     │ Summary/RAG     │     │                 │
│ Text Cleaning   │     │ + Gemini API    │     │                 │
└─────────────────┘     └─────────────────┘     └─────────────────┘

Intelligent 4-Path Routing System

The extension automatically chooses the best strategy based on document size:

Document Size Detection:
├─ < 10k tokens   → Path A: Direct Stuffing
│                   ├─ Strategy: Send full content to LLM
│                   ├─ Speed: Instant ⚡
│                   └─ Cost: ~$0.002/query
│
├─ 10k-20k tokens → Path B: Hybrid Summary+Search ⭐
│                   ├─ Strategy: Compact summary (500-1k tokens) + embeddings
│                   ├─ Auto-escalation: Summary first → retrieval if needed
│                   ├─ Speed: 1-2s (summary) or 3-4s (retrieval)
│                   └─ Cost: ~$0.0003/query (summary) or ~$0.001 (retrieval)
│
├─ 20k-30k tokens → Path C: Summary-First with Auto-Escalation
│                   ├─ Strategy: Standard summary (1k-2k tokens) + embeddings
│                   ├─ Auto-escalation: Same as Path B
│                   ├─ Speed: 2-3s (summary) or 4-5s (retrieval)
│                   └─ Cost: ~$0.0004/query average
│
└─ > 30k tokens   → Path D: Pure RAG
                    ├─ Strategy: Chunk + embed + retrieve only
                    ├─ Speed: 3-4s per query
                    └─ Cost: ~$0.001/query

Smart Auto-Escalation (Paths B & C)

When using Hybrid mode, the system intelligently routes questions:

General questions → Answered from summary
- "What's this article about?"
- "What's the author's main argument?"
- "Summarize the key points"
- Result: Fast, cheap (~$0.0003)
Specific questions → Auto-escalates to retrieval
- LLM detects need: 🔍 NEEDS_DETAIL: [description]
- System retrieves relevant chunks
- Answers with exact details
- Result: Accurate, still efficient (~$0.001)

User sees: "Searching for [detail]..." then detailed answer

Content Extraction Pipeline

1. Readability (Mozilla)
   └─ Extracts "Reader View" content
   └─ Removes: ads, navigation, sidebars, footers
   
2. DOMPurify
   └─ Sanitizes HTML (XSS protection)
   └─ Removes: scripts, dangerous attributes
   
3. Turndown
   └─ Converts to Markdown
   └─ Preserves: headings, lists, tables, code blocks, links
   
4. Advanced Text Cleaning (15 regex patterns)
   └─ Removes: copyright footers, "related articles", comments sections
   └─ Removes: social buttons, email prompts, excessive whitespace
   └─ Result: 25-30% token reduction

Smart Caching System

Cache Behavior:

First Open (Cold Cache):
  ├─ Extract content from page
  ├─ Generate summary (if needed) ⏱️ 5-15s
  ├─ Create embeddings ⏱️ 10-30s
  ├─ Save to chrome.storage.session
  └─ Total: 15-45 seconds

Reopen Same Page (Warm Cache):
  ├─ Check cache: matching tabId + URL ✓
  ├─ Restore summary (instant)
  ├─ Restore embeddings (instant)
  └─ Total: <1 second ⚡

Different Page or URL Changed:
  ├─ Cache miss (URL mismatch)
  ├─ Process as new page
  └─ Create new cache entry

Cache Storage:

Location: chrome.storage.session (temporary, browser session only)
Cleared: When browser closes or tab closes completely
Size: Embeddings + summary (~100KB - 2MB per page)
No disk footprint: All in-memory during session

Benefits:

⚡ 95-99% faster on reopen
💰 Zero API cost for cached pages
🎯 Only process once per page visit
🔒 Privacy-first: Data cleared automatically

LangChain Components Used

ChatGoogleGenerativeAI - Gemini chat model wrapper
GoogleGenerativeAIEmbeddings - Gemini embeddings for RAG
ChatPromptTemplate - Structured prompts with system messages
MessagesPlaceholder - Injects conversation history into prompts
RunnableSequence - Chains components together
StringOutputParser - Parses streaming text output
RecursiveCharacterTextSplitter - Markdown-aware splitting with dynamic chunk sizing (500-2500 chars, 10-20% overlap)
HumanMessage / AIMessage - Message types for history

Custom Implementation

MemoryVectorStore - In-browser embedding storage (no external DB)
Cosine Similarity - Custom retrieval using vector similarity
Auto-Escalation Logic - Detects LLM signals for detail needs
Parallel Processing - Summary + embeddings generated simultaneously

Data Flow

Page Load → Readability extracts clean content → DOMPurify sanitizes → Turndown converts to Markdown → Advanced cleaning
Size Detection → Estimates tokens → Routes to appropriate path (A/B/C/D)
Preprocessing (if needed):
- Path B/C: Generate summary + embeddings in parallel
- Path D: Generate embeddings only
User Question → Added to chat history
Smart Routing:
- Hybrid mode: Try summary → Auto-escalate if needed
- RAG mode: Retrieve relevant chunks
- Direct mode: Use full content
AI Response → Streamed to UI, added to history, saved to storage
Tab Close → Background script clears that tab's history

Cost Optimization

Real-World Savings Example

Scenario: 15k token document, 10 queries (8 general + 2 specific)

Approach	Cost Breakdown	Total
Naive Stuffing	10 × $0.01	$0.10
Pure Summary	$0.01 + 9 × $0.0003	$0.013
Hybrid (This Extension) ⭐	Setup: $0.01 + Summary: $0.0024 + Specific: $0.002	$0.0144
Savings	-	85-87%

Cost per Query by Mode

Mode	Context Size	Cost/Query
Direct Stuffing	10k tokens	~$0.002
Hybrid (Summary)	500-1k tokens	~$0.0003
Hybrid (Retrieval)	4k tokens	~$0.001
Pure RAG	4k tokens	~$0.001

Key Insight: 80% of questions are general → Hybrid mode answers 80% of queries at 1/3 the cost!

Configuration

Changing the Model

Edit src/popup/main.js:

const model = new ChatGoogleGenerativeAI({
  model: 'gemini-2.5-flash-lite',  // Change model here
  apiKey: apiKey,
  streaming: true,
});

Available fast models:

gemini-2.5-flash-lite (fastest)
gemini-2.5-flash (balanced)
gemini-2.0-flash (legacy)

Adjusting History Limit

Edit the history limit in src/popup/main.js:

// Keep last N exchanges (currently 10 exchanges = 20 messages)
if (chatHistory.length > 20) {
  chatHistory = chatHistory.slice(-20);
}

Adjusting Token Thresholds

Change when different routing modes activate in src/popup/main.js:

async function processPageContent(apiKey) {
  const estimatedTokens = markdown.length / 4;
  
  if (estimatedTokens < 10000) {          // Path A: Direct
  } else if (estimatedTokens < 20000) {   // Path B: Hybrid
  } else if (estimatedTokens < 30000) {   // Path C: Hybrid+
  } else {                                // Path D: RAG
  }
}

Customizing RAG Parameters

Adjust retrieval settings:

// In retrieveRelevantChunks() function
const relevantChunks = await retrieveRelevantChunks(
  question, 
  4,      // k: number of chunks to retrieve (increase for more context)
  0.5     // scoreThreshold: minimum similarity (0-1, lower = more results)
);

// In setupRAGPipeline() function
const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,        // Size of each chunk
  chunkOverlap: 200,      // Overlap between chunks
});

Permissions

Permission	Reason
`activeTab`	Access current tab's content
`storage`	Store API key and chat history
`scripting`	Inject content script
`tabs`	Detect tab closure for history cleanup

Advanced Features

Token Cleaning Pipeline

The extension applies aggressive text cleaning to reduce token usage:

Removes:

Copyright footers and legal text
"Related Articles" / "You May Also Like" sections
Comments sections
Social media share buttons
Email subscription prompts
Navigation breadcrumbs
Standalone URLs and reference links
Excessive whitespace

Result: 25-30% token reduction while preserving actual content

Markdown Preservation

Converting to Markdown helps LLMs understand document structure:

Headings (#, ##, ###) → Better section understanding
Lists (-, 1.) → Clearer enumeration
Tables → Structured data comprehension
Code blocks → Syntax preservation

In-Browser Vector Store

Custom implementation with no external dependencies:

Storage: Plain JavaScript arrays in memory
Embeddings: Gemini text-embedding-004 model
Retrieval: Cosine similarity calculation
Performance: ~100ms for similarity search across 100 chunks
Privacy: Everything stays in your browser

Auto-Escalation Mechanism

The system uses a special prompt pattern to detect when details are needed:

LLM Response: "🔍 NEEDS_DETAIL: specific date from section 3"
                ↓
Extension detects signal
                ↓
Retrieves relevant chunks
                ↓
Re-queries with detailed context
                ↓
Returns specific answer

This happens automatically without user intervention.

🔧 Troubleshooting

Extension icon not showing

Refresh the extension in chrome://extensions/
Remove and re-add the extension
Make sure you loaded the dist folder, not the root folder

"Could not load page content"

Refresh the webpage and reopen the extension
Some pages cannot be accessed: chrome://, chrome-extension://, edge://, about: URLs
Very dynamic SPAs may need a moment to fully render before extraction
Check browser console (F12) for detailed error messages

API Errors & Rate Limits

"🚫 Rate Limit Reached - Switch Model to Continue"

Extension detects rate limits instantly and offers to switch to next model
Model fallback chain: gemini-2.5-flash-lite → gemini-2.5-flash → gemini-2.0-flash-lite → gemini-2.0-flash
Click "Switch Model & Continue" to automatically retry with next model
If all models exhausted, wait 60 seconds and try again
Free tier limits: 15 requests/minute per model, 1,500 requests/day
Check quota: Google AI Studio

"🔧 Gemini API is temporarily unavailable (503)"

Google's servers are overloaded (temporary)
Wait 10-30 seconds and try again
Usually resolves within seconds

"❌ Invalid API key"

Verify your API key at Google AI Studio
Generate a new key if needed
Make sure you copied the entire key

Performance & Processing

Slow initial processing

Normal: Large documents (15k+ tokens) take 15-45 seconds to process
Processing happens once - subsequent reopens are instant (<1s) thanks to caching
Progress shown: Watch the processing status indicator for updates
"Creating Summary", "Building Search Index", etc.

Cache not working

Cache is per-tab + URL - changing URL creates new cache
Cache cleared when browser closes or tab closes completely
Check console for: "✓ Loaded processing cache from [timestamp]"

Theme not persisting

Theme preference is stored in localStorage
Clearing browser data will reset to system preference
Manual toggle always overrides system preference

UI/UX Issues

Toast notifications not showing

Check if browser notifications are blocked
Look for toasts in top-right corner of popup
They auto-dismiss after 3-5 seconds

Keyboard shortcuts not working

Make sure extension popup is focused
Shortcuts: Cmd/Ctrl+K, Cmd/Ctrl+L, Escape
Refresh extension if shortcuts stop responding

Mode badge showing wrong mode

This updates after processing completes
Badge reflects the actual strategy being used
📦 Direct, ⚡ Hybrid, 🔍 RAG

Console Debugging

Open DevTools (F12) on the popup for detailed logs:

Processing status: "🔄 No cache found - processing document"
Cache hits: "📦 Using cached processing"
API calls: "Initializing LangChain in [mode] mode"
Errors: Full stack traces with context

License

MIT License - feel free to use this project for learning or as a base for your own extensions.

Acknowledgments

LangChain for the excellent JS/TS library and RAG implementation
Google AI for Gemini API (chat and embeddings)
Mozilla for Readability - the gold standard in content extraction
DOMPurify for bulletproof HTML sanitization
Turndown for reliable HTML-to-Markdown conversion
CRXJS for making Chrome extension development with Vite seamless

🏆 Technical Highlights

This extension demonstrates several advanced RAG (Retrieval-Augmented Generation) techniques:

Intelligent 4-Path Routing - Automatically chooses between stuffing, hybrid, and RAG based on document size
Hybrid Summary+Search - Novel approach combining summaries with retrieval for optimal cost/quality
LLM-Driven Auto-Escalation - Intelligent detection of when to switch from summary to detailed retrieval
Smart Caching System - Session-based cache for instant reload without API calls (95-99% faster)
Token Optimization - Multi-stage cleaning pipeline (Readability → DOMPurify → Turndown → 15 regex patterns)
In-Browser Vector Store - Custom cosine similarity implementation requiring no external database
Parallel Processing - Summary and embeddings generated simultaneously for 2x faster setup
Auth-Aware Extraction - Uses browser cookies for authenticated pages (Notion, Gmail, paywalled sites)
Premium UI/UX - Glassmorphic design, dark mode, toast notifications, processing indicators
Markdown Rendering - Beautiful formatting with syntax highlighting and copy buttons
Markdown-Aware Splitting - Respects document structure across all modes (headers, code blocks, lists, tables)
Dynamic Chunk Sizing - Automatically adjusts chunk size and overlap (10-20%) based on document length
Smart Rate Limit Detection - 3-guard system with cooldown, console clearing, and instant feedback
Model Fallback Chain - Automatic model switching with one-click retry when rate limited

Performance: 95-99% faster on reopen thanks to intelligent caching Cost efficiency: Up to 87% reduction compared to naive approaches while maintaining quality User Experience: Professional, polished interface with real-time feedback

Built with ❤️ for efficient, intelligent web content interaction.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
public/icons		public/icons
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
manifest.json		manifest.json
package-lock.json		package-lock.json
package.json		package.json
vite.config.js		vite.config.js

Folders and files

Latest commit

History

Repository files navigation