Ensure both services are running:
# Terminal 1: Node.js Backend
npm install # if not done
node server.js
# Terminal 2: Python RAG Service
pip install -r requirements.txt # if needed
python rag-service/main.pyWhat it tests: Upload different PDFs and verify only current PDF is used
# 1. Open browser and go to http://localhost:3000
# (Or use curl if preferred - see advanced tests below)
# 2. Upload Coursera Certificate
# - Use the upload form
# - Select a Coursera certificate PDF
# 3. Ask questions about Coursera
# Q: "What course is mentioned?"
# Expected: Coursera course name (e.g., "IBM Professional Certificate")
# ✓ PASS if you get Coursera-specific info
# 4. Upload NPTEL Certificate (Different PDF)
# - Use the upload form again
# - Select the NPTEL certificate PDF
# - Observe: Chat history should be cleared
# 5. Ask questions about NPTEL
# Q: "What platform issued this certificate?"
# Expected: "NPTEL"
# ✗ FAIL if it says "Coursera" or mentions old PDF
# ✓ PASS if it ONLY uses NPTEL informationWhat it tests: Previous conversation context doesn't affect new PDF
# 1. Upload PDF A (e.g., Coursera)
# 2. Ask: "Who issued this certificate?"
# Answer: "IBM / Coursera"
# 3. Ask: "Tell me more about it"
# Answer: References previous context (normal - same PDF)
# 4. Upload PDF B (e.g., NPTEL)
# 5. Ask: "Who issued this certificate?"
# Expected: "NPTEL"
# ✗ FAIL if it references Coursera or PDF A
# ✓ PASS if it answers ONLY about PDF BWhat it tests: Backend state is properly tracked
# Terminal/PowerShell:
# Before uploading any PDF
curl http://localhost:4000/pdf-status
# Expected response:
# {
# "backend": {
# "pdf_loaded": false,
# "session_id": null,
# "upload_time": null
# },
# "frontend": { ... }
# }
# After uploading a PDF
curl http://localhost:4000/pdf-status
# Expected response:
# {
# "backend": {
# "pdf_loaded": true,
# "session_id": "some-uuid-here",
# "upload_time": "2024-02-24T..."
# },
# "frontend": { ... }
# }
# ✓ PASS if session_id changes between uploads$file = "C:\path\to\your\pdf.pdf"
$form = @{
file = Get-Item $file
}
Invoke-WebRequest -Uri "http://localhost:4000/upload" -Form $form -Method Post$body = @{
question = "What PDF is this?"
} | ConvertTo-Json
Invoke-WebRequest -Uri "http://localhost:4000/ask" `
-Method Post `
-ContentType "application/json" `
-Body $bodyInvoke-WebRequest -Uri "http://localhost:4000/pdf-status" -Method Get | Select-Object -ExpandProperty ContentInvoke-WebRequest -Uri "http://localhost:4000/clear-history" -Method Post| Scenario | Before Fix | After Fix |
|---|---|---|
| Upload PDF A, ask question | ✓ Works | ✓ Works |
| Upload PDF B immediately | ✗ Leaks PDF A context | ✓ Clears PDF A context |
| Ask about PDF B | ✗ Mixes with PDF A info | ✓ Only PDF B info |
| Session status | ✗ Not available | ✓ Available via /status |
| Multiple rapid uploads | ✗ Race conditions | ✓ Thread-safe |
Solution:
- Stop both services (Ctrl+C in terminals)
- Do NOT delete the uploads folder (file paths are used)
- Start services again
- Test again
This is correct! It means the fix is working.
- Ensure PDF was uploaded successfully
- Check browser console for errors
- Check terminal output for Python errors
Check:
# In Python terminal, should see:
# Uvicorn running on http://0.0.0.0:5000
# Test connectivity:
curl http://localhost:5000/status
# Should return JSON with pdf statusCheck:
# In Node terminal, should see:
# Backend running on http://localhost:4000
# Test connectivity:
curl http://localhost:4000/pdf-status
# Should return JSON with status✓ All tests pass when:
- Uploading Coursera PDF returns Coursera-specific answers
- Uploading NPTEL PDF afterwards returns ONLY NPTEL-specific answers
- Coursera context is NOT mentioned when asking about NPTEL
/pdf-statusshows different session_id for each upload- Chat history is empty after uploading new PDF
Node.js (server.js):
- Clears chat history on new PDF upload
- Calls new
/resetendpoint on Python service - Stores PDF session ID in browser session
Python (main.py):
- Generates unique session ID for each PDF
- Uses thread-safe locks for state management
- Validates session before answering questions
- New
/resetendpoint explicitly clears state - New
/statusendpoint reports current state
No database changes needed. Everything is in-memory with proper cleanup.
- No noticeable performance impact
- Actually improves memory usage (old vectorstores are garbage collected)
- PDF processing speed unchanged
- Question answering speed unchanged
Ready to test? Start with Test 1 (5 minutes) and you'll immediately see if the fix works!