Skip to content

MarouaHattab/Log-Analysis-RAG

Repository files navigation

Log Analysis RAG System

Log Analyzer Logo

Problem Overview

As we see in the large-scale LLM ecosystem, many companies use the power of AI to analyze and debug their log systems. This is mainly because log data is massive, complex, and difficult to investigate manually. Software engineering teams, large enterprises, and public or government organizations also require strict data privacy, which makes self-hosted or controlled AI-based log analysis systems essential.

  • LLM adoption: 98% of organizations surveyed are adopting or have adopted LLM infrastructure, indicating near‑universal usage in technical environments.
  • Security automation: Security teams using LLMs automate threat detection (e.g., ~49% in cybersecurity teams by 2026).
  • Modern log analysis: ~55–70% of medium to large enterprises have or plan to adopt enhanced log analysis systems (with or without AI/LLMs) to support monitoring, reliability engineering, and compliance.

This project provides a self-hosted, privacy‑aware LLM‑powered Log Analyzer built on Retrieval‑Augmented Generation (RAG) to turn raw logs into actionable insights.

High‑Level System Workflow

Everything begins when the user submits log files (application logs, server logs, or system logs) to the platform.

  1. Log Ingestion

    • Users upload log files through the FastAPI backend.
    • The API receives the upload request and immediately delegates heavy processing tasks to Celery workers through RabbitMQ, which serves as the message broker between services.
    • This design keeps the API responsive while intensive operations are handled asynchronously.
  2. Preprocessing & Chunking

    • Celery workers clean, normalize, and split large log files into smaller, meaningful chunks (by timestamp, service, IP, error pattern, or status code), making the data suitable for analysis.
  3. Embedding Generation

    • Each log chunk is converted into a numerical vector using Ollama’s nomic-embed-text:latest embedding model.
    • These embeddings capture the semantic meaning of log messages, enabling the system to understand similarities between errors, warnings, and behavioral patterns instead of relying on simple keyword matching.
  4. Storage Layer

    • The generated vectors are stored and indexed in a vector database (Qdrant / pgvector).
    • At the same time, structured metadata (log source, severity level, service name, IP, time window, processing status, etc.) is stored in PostgreSQL.
  5. Semantic Retrieval & Question Answering

    • When the user submits a query—such as “Why did the service crash?” or “Show similar errors to this stack trace”—the system performs a semantic similarity search against the vector database to retrieve the most relevant log chunks.
    • The retrieved log context is then passed to Ollama’s qwen2.5-coder:1.5b language model.
    • The LLM analyzes the logs, correlates events, and generates clear, context‑aware explanations, potential root causes, or troubleshooting suggestions.
  6. Answer Delivery

    • The final analyzed response is returned to the user through the FastAPI API, providing actionable insights instead of raw log data.
  7. Observability & Monitoring

    • Flower monitors Celery workers and task execution in real time.
    • Prometheus collects system and application metrics.
    • Grafana visualizes performance, queue health, and resource usage.
    • This ensures full observability, reliability, and scalability of the Log Analyzer system.

Architecture & Workflow Diagrams

  • End-to-end workflow
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'15px', 'fontFamily':'arial'}}}%%

graph TB
    %% Styling
    classDef userStyle fill:#e0e7ff,stroke:#6366f1,stroke-width:3px,color:#1e1b4b
    classDef apiStyle fill:#4ade80,stroke:#22c55e,stroke-width:3px,color:#064e3b
    classDef brokerStyle fill:#fb923c,stroke:#f97316,stroke-width:3px,color:#7c2d12
    classDef queueStyle fill:#fbbf24,stroke:#f59e0b,stroke-width:3px,color:#78350f
    classDef workerStyle fill:#fde68a,stroke:#f59e0b,stroke-width:3px,color:#78350f
    classDef aiStyle fill:#60a5fa,stroke:#3b82f6,stroke-width:3px,color:#1e3a8a
    classDef dbStyle fill:#34d399,stroke:#10b981,stroke-width:3px,color:#064e3b
    classDef monitorStyle fill:#fca5a5,stroke:#ef4444,stroke-width:3px,color:#7f1d1d
    classDef llmStyle fill:#c4b5fd,stroke:#a855f7,stroke-width:3px,color:#581c87

    %% Main Components
    User["👤 User<br/>File Upload & Queries"]
    FastAPI["⚡ FastAPI<br/>REST API"]
    RabbitMQ["🐰 RabbitMQ<br/>Message Broker"]

    %% Celery Task Queue
    CeleryQueue["📋 Celery<br/>Task Queue"]

    %% Celery Workers Processing Pipeline
    subgraph CeleryWorkers["🔧 Celery Workers - File Processing & Indexing Tasks"]
        direction TB
        Chunking["📄 Chunking<br/>Split logs into chunks"]
        Embedding["🧮 Embedding<br/>nomic-embed-text:latest"]
        Indexing["📊 Indexing<br/>Prepare vectors"]

        Chunking --> Embedding --> Indexing
    end

    %% Storage Layer
    VectorDB["🗄️ Vector Database<br/>Qdrant / PgVector<br/>Embeddings Storage"]
    PostgreSQL["🐘 PostgreSQL<br/>Database<br/>Metadata & Logs"]
    Redis["⚡ Redis<br/>Database<br/>Task State & Cache"]

    %% AI/ML Layer
    Ollama["🤖 Ollama LLM<br/>qwen2.5-coder:7b<br/>Analysis Engine"]

    Response["💬 LLM Response<br/>Root Cause Analysis"]

    %% Monitoring
    subgraph Monitoring["📊 Monitoring"]
        direction LR
        Prometheus["🔥 Prometheus<br/>Monitoring"]
        Grafana["📈 Grafana<br/>Monitoring"]
    end

    Flower["🌸 Flower<br/>Celery Monitoring"]

    %% ========== INGESTION FLOW (Solid Lines) ==========
    User -->|"1. Upload<br/>Log Files"| FastAPI
    FastAPI -->|"2. Send<br/>Message"| RabbitMQ
    RabbitMQ -->|"3. Route to"| CeleryQueue
    CeleryQueue -->|"4. Assign<br/>Task"| CeleryWorkers

    Indexing -->|"5. Store<br/>Embeddings"| VectorDB
    Indexing -->|"6. Save<br/>Metadata"| PostgreSQL
    Indexing -->|"7. Cache<br/>Results"| Redis

    %% ========== QUERY FLOW (Dashed Lines) ==========
    User -.->|"A. Submit<br/>Query"| FastAPI
    FastAPI -.->|"B. Semantic<br/>Search"| VectorDB
    FastAPI -.->|"C. Fetch<br/>Metadata"| PostgreSQL
    FastAPI -.->|"D. Get<br/>Cache"| Redis
    VectorDB -.->|"E. Relevant<br/>Chunks"| Ollama
    Ollama -.->|"F. Generated<br/>Insights"| Response
    Response -.->|"G. Display<br/>Answer"| User

    %% ========== MONITORING CONNECTIONS (Dotted Lines) ==========
    Flower -.-|monitor| CeleryQueue
    Flower -.-|monitor| CeleryWorkers
    Grafana -.-|metrics| FastAPI
    Grafana -.-|metrics| PostgreSQL
    Grafana -.-|metrics| Redis
    Grafana -.-|metrics| VectorDB
    Prometheus -.-|collect| FastAPI
    Prometheus -.-|collect| CeleryWorkers

    %% Apply Styles
    class User userStyle
    class FastAPI apiStyle
    class RabbitMQ brokerStyle
    class CeleryQueue queueStyle
    class CeleryWorkers,Chunking,Embedding,Indexing workerStyle
    class VectorDB,PostgreSQL dbStyle
    class Redis brokerStyle
    class Ollama aiStyle
    class Response llmStyle
    class Flower,Prometheus,Grafana monitorStyle
Loading
  • Monitoring with Prometheus & Grafana

    Grafana Dashboard

    FastAPI & Monitoring

    Postgres SQL & Monitoring

    Node Exporter & Monitoring

    Qdrant & Monitoring

  • Dockerized infrastructure
    Dockerized Stack

  • Database schema & assets
    Project Database

    Database Assets

    Database Data Chunks

     Database Celery Task Execution

    Database Relationships

  • Celery & Flower monitoring
    Flower Dashboard

    Flower Detailed View

  • RabbitMQ management
    RabbitMQ Management

  • Chunking & indexing views
    Collection Chunks
    Data Chunks
    Index Info


Retrieval Augmented Generation implementation for log file question answering and analysis. This project uses FastAPI, Celery, and various vector databases to provide a scalable and efficient RAG pipeline, optimized for local performance using Ollama as the LLM provider.

Tools & Technologies

Backend & API

  • Python 3.12
  • FastAPI – REST API for handling user requests
  • Uvicorn – ASGI server

LLM & AI

  • Ollama
    • nomic-embed-text:latest – text embeddings generation (local deployment & azure deployment)
    • qwen2.5-coder:7b – large language model for answer generation(local deployment)
    • qwen2.5-coder:1.5b – large language model for answer generation( azure deployment)

Vector & Databases

  • Qdrant – vector database for similarity search
  • PostgreSQL (pgvector) – metadata and vector storage
  • SQLAlchemy & Alembic – ORM and database migrations

Asynchronous Processing

  • Celery – background task processing
  • RabbitMQ – message broker
  • Celery Beat – task scheduling
  • Flower – Celery monitoring

Infrastructure & DevOps

  • Docker & Docker Compose – containerization and service orchestration
  • Nginx – reverse proxy
  • Azure – cloud deployment option
  • Streamlit Cloud – frontend deployment option

Monitoring & Observability

  • Prometheus – metrics collection
  • Grafana – metrics visualization
  • Node Exporter – system metrics
  • Postgres Exporter – database metrics

Frontend & Visualization

  • Chart.js – Interactive data visualization library used in the web interface (src/templates/index.html)
    • Traffic Over Time Chart – Line chart displaying request patterns over time periods
    • Status Codes Distribution – Doughnut chart showing HTTP status code breakdown (2xx, 3xx, 4xx, 5xx)
    • Top IPs Chart – Bar chart displaying most active IP addresses
    • Top URLs Chart – Bar chart showing most frequently accessed endpoints
    • Real-time Dashboard – Metrics cards displaying total requests, unique visitors, bandwidth, and error rates
    • Charts are dynamically rendered using Chart.js CDN and updated via REST API calls to the EDA endpoint
    • Responsive design with loading states and error handling

Testing & Development

  • Postman – API testing
  • Git & GitHub – version control

Deployment Options

🌐 Live Deployments

🚀 Deployment Methods

  • Azure – Cloud deployment for backend services with Ollama cloud-hosted models
  • Streamlit Cloud – Frontend deployment with Ollama cloud-hosted models
  • GitHub Actions – CI/CD pipeline for automated deployment with Ollama cloud-hosted models

Component Responsibilities

  • FastAPI: Main entry point of the system. Handles user requests, file uploads, and search queries, and orchestrates communication with backend services.
  • Uvicorn: ASGI server responsible for running the FastAPI application efficiently with high performance and async support.
  • RabbitMQ: Message broker enabling reliable and asynchronous communication between FastAPI and Celery workers, allowing smooth horizontal scaling.
  • Celery Workers: Execute background and long‑running tasks such as file processing, text/log chunking, and data indexing without blocking the API or degrading user experience.
  • Celery Beat: Handles scheduled and periodic tasks such as cleanup jobs or recurring background processes.
  • Vector Databases (Qdrant / pgvector): Store and index embeddings generated from log chunks, enabling fast and accurate similarity search during retrieval.
  • Ollama – nomic-embed-text:latest: Generates dense vector embeddings from text chunks, forming the foundation of semantic search.
  • Ollama – qwen2.5-coder:7b: Generates context‑aware responses based on the most relevant retrieved chunks.
  • PostgreSQL: Stores structured application data including metadata, project information, and task execution details.
  • SQLAlchemy & Alembic: Provide ORM capabilities and database schema migrations to manage PostgreSQL efficiently.
  • Nginx: Acts as a reverse proxy in front of the FastAPI application, improving security, routing, and performance.
  • Docker & Docker Compose: Containerize and orchestrate all system services, ensuring consistent environments and simplified deployment.
  • Monitoring Stack:
    • Flower – real‑time monitoring of Celery workers and task execution.
    • Prometheus – metrics collection.
    • Grafana – metrics visualization and dashboards.
    • Node Exporter – system‑level metrics.
    • Postgres Exporter – database metrics.
  • Chart.js: Frontend visualization library used to render interactive charts for log analysis dashboard, including traffic patterns, status code distributions, top IPs, and top URLs.
  • Postman: Used for API testing and endpoint validation during development.
  • Git & GitHub: Version control and source code management.

Log Chunking Methods Evaluation

This system includes multiple log‑specific chunking strategies for RAG, evaluated on a dataset of 150+ Apache web server log entries over a 65‑minute period (08:15:23–09:20:05, January 8, 2026). The dataset covers multiple IPs, HTTP methods (GET, POST, PUT, DELETE), and status codes (200, 304, 401, 403, 404) across static assets, APIs, product pages, admin, search, cart, and checkout flows.

Chunking Methods: Technical Overview

The system provides 7 advanced chunking methods optimized for log analysis and RAG applications. Each method is designed to handle different query types and analysis scenarios.

Chunking Algorithm Logic

Each chunking method follows a systematic approach:

  1. Pattern Recognition: Extract key features from log lines (timestamps, IP addresses, status codes, URLs, HTTP methods)
  2. Boundary Detection: Identify natural boundaries based on the method's strategy (time windows, error blocks, IP changes, etc.)
  3. Chunk Formation: Group log entries into chunks respecting boundaries and size constraints
  4. Metadata Extraction: Generate rich metadata for each chunk (error counts, time windows, IP addresses, status categories)
  5. Overlap Management: Maintain context overlap between chunks for better RAG retrieval (where applicable)

The chunking process ensures that semantically related log entries stay together, improving the quality of retrieval and answer generation in the RAG pipeline.

Method 1 – log_hybrid_adaptive ⭐ (Recommended Default)

Best for: General-purpose RAG systems that need to answer diverse queries about errors, performance, user behavior, and temporal patterns.

  • Logic: Combines multiple strategies intelligently:
    • Time-window awareness (keeps logs from same time period together)
    • Error-block awareness (never splits error contexts)
    • Component awareness (groups requests from same IP when beneficial)
    • Semantic sliding (maintains overlap for context)
    • Status-code awareness (groups similar HTTP responses)
  • Strategy:
    • Primary grouping: Time windows (hourly)
    • Secondary grouping: Keep errors with their context
    • Tertiary grouping: Consider IP/component patterns
    • Always maintain overlap for RAG context
  • Config: chunk_size = 100 (configurable), overlap_size = 20 (configurable)
  • Metadata: Includes chunk_index, entries count, time_window, has_errors, error_count, primary_ip, status_category, chunk_reasons
  • Pros: Best balance across all query types, intelligent boundary detection, preserves context
  • Cons: More complex logic, slightly higher processing overhead

Method 2 – log_hybrid_intelligent

Best for: RAG systems requiring high-quality semantic search and accurate context retrieval for complex queries.

  • Logic: Context-aware smart splitting using intelligent boundaries:
    • Never splits IP session patterns (consecutive requests from same IP)
    • Never splits error sequences (errors + immediate context)
    • Respects natural log boundaries (time gaps, URL pattern changes)
    • Maintains semantic overlap
    • Optimizes chunk size for embedding models
  • Boundary Detection:
    • Time gap > 60 seconds = natural boundary
    • IP change after 5+ consecutive requests = session boundary
    • Error sequences kept intact (error + 2 before + 2 after)
    • URL pattern shifts = activity boundary
  • Config: chunk_size = 100 (configurable), overlap_size = 15 (configurable)
  • Metadata: Includes chunk_index, entries, unique_ips, error_count, boundary_reasons, has_overlap
  • Pros: High-quality semantic chunks, protects error context, respects natural boundaries
  • Cons: Requires parsing all lines first, more memory intensive

Method 3 – log_semantic_sliding

Best for: RAG applications where context between chunks matters, general-purpose log analysis.

  • Logic: Sliding window over sequential log entries with configurable overlap to preserve context across chunk boundaries.
  • Config: chunk_size = 100 (configurable), overlap_size = 20 (configurable, percentage-based)
  • Metadata: Includes chunk_index, entries count, has_overlap flag
  • Pros: Strong temporal/context preservation, good for multi-entry analysis and general RAG queries
  • Cons: Slight storage overhead due to overlap, may mix unrelated entries for very focused queries

Method 4 – log_error_block

Best for: Error analysis, debugging, security incident investigation, authentication failure patterns.

  • Logic: Detects error status codes (400, 401, 403, 404, 405, 500, 501, 502, 503, 504) and groups them with nearby non-error context into blocks. Chunks are created when size threshold is reached OR when errors are followed by non-error lines.
  • Config: chunk_size = 100 (configurable), overlap = 0
  • Metadata: Includes error_lines count, total_lines count
  • Pros: Excellent for error analysis, debugging, and security incident investigation
  • Cons: Less effective for non-error queries; can fragment successful traffic patterns

Method 5 – log_time_window

Best for: Temporal/traffic pattern analysis, peak-hour identification, time-series queries.

  • Logic: Groups logs into fixed hourly windows using extracted timestamps (e.g., 2026-Jan-08_08:00). Detects timestamp patterns like [23/Jan/2019:03:56:14 +0330] and groups by hour.
  • Config: chunk_size = 100 (configurable), overlap = 0, hourly granularity
  • Metadata: Includes time_window identifier, entries count
  • Pros: Ideal for temporal/traffic pattern analysis and peak-hour identification
  • Cons: May split related requests across hour boundaries; less suited for user-centric queries

Method 6 – log_component_based

Best for: Client/user behavior analysis, session tracking, suspicious activity detection, user journey analysis.

  • Logic: Groups logs by client IP address using regex pattern matching at the start of log lines. Chunks are created on size threshold OR when IP address changes.
  • Config: chunk_size = 100 (configurable), overlap = 0, component identifier = IP
  • Metadata: Includes component (IP address), entries count
  • Pros: Great for client/user behavior analysis, session tracking, and suspicious activity detection
  • Cons: Fragments time-based patterns; multi-IP queries require multiple chunks

Method 7 – log_status_code

Best for: Performance monitoring, error-rate analysis, status-code-focused analytics.

  • Logic: Groups logs by HTTP status code categories:
    • 2xx_success - Successful requests
    • 3xx_redirect - Redirect responses
    • 4xx_client_error - Client-side errors
    • 5xx_server_error - Server-side errors
  • Config: chunk_size = 100 (configurable), overlap = 0
  • Metadata: Includes status_category, entries count
  • Pros: Excellent for performance monitoring and error-rate analysis
  • Cons: Fragments user journeys and limits context to same-status entries

Question–Answer Evaluation (Summary)

The methods were evaluated on multiple query types, including user journey analysis, error analysis, time‑based analysis, status code analysis, authentication failure patterns, and cart operations.

  • User journey & cart flows:

    • log_component_based scored highest (9/10) by grouping all events for a given IP (e.g., full checkout for IP 172.16.54.78, cart operations across users).
    • log_semantic_sliding provided good context but sometimes mixed users or missed parts of sequences.
  • Error & authentication analysis:

    • log_error_block excelled (9/10) for listing 4xx errors, authentication failures, and grouping related error context.
    • log_status_code also performed well for summarizing error distributions.
  • Time‑based traffic patterns:

    • log_time_window achieved 9/10 for analyzing traffic between 08:00 and 09:00, peak periods, and time‑localized behaviors.
  • Status distribution:

    • log_status_code was best suited to compute percentages of 2xx/3xx/4xx responses and identify 404 endpoints, though full accuracy depends on covering the complete dataset.

Comparative Scores (by Query Type)

Query Type Hybrid Adaptive ⭐ Hybrid Intelligent Semantic Sliding Error Block Time Window Component‑Based Status Code
User Journey 8.5/10 8.0/10 7/10 4/10 6/10 9/10 5/10
Error Analysis 9.0/10 8.5/10 7/10 9/10 6/10 7/10 8/10
Time Patterns 8.5/10 7.5/10 7/10 5/10 9/10 5/10 6/10
Status Analysis 8.0/10 7.5/10 7/10 6/10 6/10 6/10 8/10
Auth Failures 8.5/10 8.0/10 6/10 9/10 6/10 8/10 7/10
Cart Operations 8.5/10 8.0/10 7/10 5/10 6/10 9/10 5/10
Average 8.5/10 7.9/10 7.0/10 6.3/10 6.5/10 7.3/10 6.5/10

Recommendations

  • Best general‑purpose method:

    • log_hybrid_adaptive ⭐ (Default) - Recommended for default RAG usage due to intelligent combination of multiple strategies, balanced performance across all query types, and design optimized for retrieval‑augmented generation. It adaptively combines time-window, error-block, component, and semantic sliding strategies.
    • log_semantic_sliding - Alternative default choice with strong context preservation and balanced performance (7.0/10).
    • log_hybrid_intelligent - For high-quality semantic search requiring context-aware splitting and natural boundary detection.
  • Best for specialized tasks:

    • log_error_block – Error analysis, security incidents, and authentication failures.
    • log_time_window – Temporal/traffic pattern queries, peak-hour identification.
    • log_component_based – User journey, cart operations, and IP‑based behavior analysis.
    • log_status_code – Performance monitoring and status‑code‑focused analytics.

This evaluation demonstrates that no single chunking method is optimal for all query types, and the system can select or combine strategies depending on the question type for more accurate log analysis. The hybrid methods (log_hybrid_adaptive and log_hybrid_intelligent) provide the best balance by intelligently combining multiple strategies.


Installation

1. Clone the Repository

git clone https://github.com/MarouaHattab/mini-rag-app
cd mini-rag-app

2. Environment Setup

Option A: Local Development

Create Virtual Environment

python3 -m venv env
source env/bin/activate  # On Windows: env\Scripts\activate

Install Dependencies

cd src
pip install -r requirements.txt

Configure Environment

cd ../docker/env
cp .env.example.app .env.app
cp .env.example.postgres .env.postgres
cp .env.example.rabbitmq .env.rabbitmq
cp .env.example.redis .env.redis
cp .env.example.grafana .env.grafana

Configure Ollama (Local LLM) This project is configured to use Ollama by default, removing the need for external API keys. Update your .env.app with the following:

# LLM Configuration for local Ollama
GENERATION_BACKEND="OPENAI"
EMBEDDING_BACKEND="OPENAI"

# Use local Ollama endpoint (OpenAI compatible)
OPENAI_API_URL="http://host.docker.internal:11434/v1"
OPENAI_API_KEY="ollama"  # Placeholder value

# Model Selection
GENERATION_MODEL_ID="qwen2.5-coder:1.5b"
EMBEDDING_MODEL_ID="nomic-embed-text"

Option B: Docker-Only Setup

cd docker/env
# Copy all environment files
for file in .env.example.*; do cp "$file" "${file//.example/}"; done
# Update LLM configuration in .env.app as shown above

3. Database Setup

cd docker
docker compose up pgvector rabbitmq redis -d

Run Migrations

cd ../src/models/db_schemes/minirag
source ../../../../env/bin/activate  # If using local setup
alembic upgrade head

Running the Application

Using Docker Compose (Recommended)

cd docker
docker compose up --build

Services will be available at:

Local Development

Terminal 1: Start services

cd docker
docker compose up pgvector rabbitmq redis qdrant prometheus grafana -d

Terminal 2: Start FastAPI

cd src
source ../env/bin/activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Terminal 3: Start Celery Worker

celery -A celery_app worker --queues=default,file_processing,data_indexing --loglevel=info

Terminal 4: Start Flower (optional)

celery -A celery_app flower --conf=flowerconfig.py

API Usage

1. Upload Documents

curl -X POST "http://localhost:8000/data/upload/1" \
  -H "Content-Type: multipart/form-data" \
  -F "files=@document.log"

Document Uploaded

2. Process Documents

curl -X POST "http://localhost:8000/data/process/1"

Processing In Progress

3. Index for Search

curl -X POST "http://localhost:8000/api/v1/data/process-and-push/1"

Indexing Completed

4. Search Documents

curl -X POST "http://localhost:8000/nlp/index/search/1" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the main topic?", "top_k": 5}'

5. Ask Questions (RAG)

curl -X POST "http://localhost:8000/nlp/index/answer/1" \
  -H "Content-Type: application/json" \
  -d '{"query": "Explain the main concepts in the document"}'

RAG Answer Example

API Testing

Postman Collection

Import the Postman collection from src/assets/mini-rag-app.postman_collection.json

Interactive Documentation

Visit http://localhost:8000/docs for Swagger UI documentation. ![Swagger UI](img/api-endpoints.png

Monitoring

Grafana Dashboards

  • URL: http://localhost:3000
  • Default Credentials: admin / admin (configure in .env.grafana)
  • Pre-configured Dashboards: System metrics, PostgreSQL metrics, application metrics

Celery Task Monitoring

Prometheus Metrics

  • URL: http://localhost:9090
  • Available Metrics: Application performance, database health, system resources

Configuration

Supported File Types

  • PDF: .pdf
  • Text: .txt
  • Maximum file size: 10MB (configurable)

Vector Databases

  • PostgreSQL + pgvector: Default, integrated with main database.
  • Qdrant: Dedicated vector database, better for large-scale deployments.