As we see in the large-scale LLM ecosystem, many companies use the power of AI to analyze and debug their log systems. This is mainly because log data is massive, complex, and difficult to investigate manually. Software engineering teams, large enterprises, and public or government organizations also require strict data privacy, which makes self-hosted or controlled AI-based log analysis systems essential.
- LLM adoption: 98% of organizations surveyed are adopting or have adopted LLM infrastructure, indicating near‑universal usage in technical environments.
- Security automation: Security teams using LLMs automate threat detection (e.g., ~49% in cybersecurity teams by 2026).
- Modern log analysis: ~55–70% of medium to large enterprises have or plan to adopt enhanced log analysis systems (with or without AI/LLMs) to support monitoring, reliability engineering, and compliance.
This project provides a self-hosted, privacy‑aware LLM‑powered Log Analyzer built on Retrieval‑Augmented Generation (RAG) to turn raw logs into actionable insights.
Everything begins when the user submits log files (application logs, server logs, or system logs) to the platform.
-
Log Ingestion
- Users upload log files through the FastAPI backend.
- The API receives the upload request and immediately delegates heavy processing tasks to Celery workers through RabbitMQ, which serves as the message broker between services.
- This design keeps the API responsive while intensive operations are handled asynchronously.
-
Preprocessing & Chunking
- Celery workers clean, normalize, and split large log files into smaller, meaningful chunks (by timestamp, service, IP, error pattern, or status code), making the data suitable for analysis.
-
Embedding Generation
- Each log chunk is converted into a numerical vector using Ollama’s
nomic-embed-text:latestembedding model. - These embeddings capture the semantic meaning of log messages, enabling the system to understand similarities between errors, warnings, and behavioral patterns instead of relying on simple keyword matching.
- Each log chunk is converted into a numerical vector using Ollama’s
-
Storage Layer
- The generated vectors are stored and indexed in a vector database (Qdrant / pgvector).
- At the same time, structured metadata (log source, severity level, service name, IP, time window, processing status, etc.) is stored in PostgreSQL.
-
Semantic Retrieval & Question Answering
- When the user submits a query—such as “Why did the service crash?” or “Show similar errors to this stack trace”—the system performs a semantic similarity search against the vector database to retrieve the most relevant log chunks.
- The retrieved log context is then passed to Ollama’s
qwen2.5-coder:1.5blanguage model. - The LLM analyzes the logs, correlates events, and generates clear, context‑aware explanations, potential root causes, or troubleshooting suggestions.
-
Answer Delivery
- The final analyzed response is returned to the user through the FastAPI API, providing actionable insights instead of raw log data.
-
Observability & Monitoring
- Flower monitors Celery workers and task execution in real time.
- Prometheus collects system and application metrics.
- Grafana visualizes performance, queue health, and resource usage.
- This ensures full observability, reliability, and scalability of the Log Analyzer system.
- End-to-end workflow
%%{init: {'theme':'base', 'themeVariables': { 'fontSize':'15px', 'fontFamily':'arial'}}}%%
graph TB
%% Styling
classDef userStyle fill:#e0e7ff,stroke:#6366f1,stroke-width:3px,color:#1e1b4b
classDef apiStyle fill:#4ade80,stroke:#22c55e,stroke-width:3px,color:#064e3b
classDef brokerStyle fill:#fb923c,stroke:#f97316,stroke-width:3px,color:#7c2d12
classDef queueStyle fill:#fbbf24,stroke:#f59e0b,stroke-width:3px,color:#78350f
classDef workerStyle fill:#fde68a,stroke:#f59e0b,stroke-width:3px,color:#78350f
classDef aiStyle fill:#60a5fa,stroke:#3b82f6,stroke-width:3px,color:#1e3a8a
classDef dbStyle fill:#34d399,stroke:#10b981,stroke-width:3px,color:#064e3b
classDef monitorStyle fill:#fca5a5,stroke:#ef4444,stroke-width:3px,color:#7f1d1d
classDef llmStyle fill:#c4b5fd,stroke:#a855f7,stroke-width:3px,color:#581c87
%% Main Components
User["👤 User<br/>File Upload & Queries"]
FastAPI["⚡ FastAPI<br/>REST API"]
RabbitMQ["🐰 RabbitMQ<br/>Message Broker"]
%% Celery Task Queue
CeleryQueue["📋 Celery<br/>Task Queue"]
%% Celery Workers Processing Pipeline
subgraph CeleryWorkers["🔧 Celery Workers - File Processing & Indexing Tasks"]
direction TB
Chunking["📄 Chunking<br/>Split logs into chunks"]
Embedding["🧮 Embedding<br/>nomic-embed-text:latest"]
Indexing["📊 Indexing<br/>Prepare vectors"]
Chunking --> Embedding --> Indexing
end
%% Storage Layer
VectorDB["🗄️ Vector Database<br/>Qdrant / PgVector<br/>Embeddings Storage"]
PostgreSQL["🐘 PostgreSQL<br/>Database<br/>Metadata & Logs"]
Redis["⚡ Redis<br/>Database<br/>Task State & Cache"]
%% AI/ML Layer
Ollama["🤖 Ollama LLM<br/>qwen2.5-coder:7b<br/>Analysis Engine"]
Response["💬 LLM Response<br/>Root Cause Analysis"]
%% Monitoring
subgraph Monitoring["📊 Monitoring"]
direction LR
Prometheus["🔥 Prometheus<br/>Monitoring"]
Grafana["📈 Grafana<br/>Monitoring"]
end
Flower["🌸 Flower<br/>Celery Monitoring"]
%% ========== INGESTION FLOW (Solid Lines) ==========
User -->|"1. Upload<br/>Log Files"| FastAPI
FastAPI -->|"2. Send<br/>Message"| RabbitMQ
RabbitMQ -->|"3. Route to"| CeleryQueue
CeleryQueue -->|"4. Assign<br/>Task"| CeleryWorkers
Indexing -->|"5. Store<br/>Embeddings"| VectorDB
Indexing -->|"6. Save<br/>Metadata"| PostgreSQL
Indexing -->|"7. Cache<br/>Results"| Redis
%% ========== QUERY FLOW (Dashed Lines) ==========
User -.->|"A. Submit<br/>Query"| FastAPI
FastAPI -.->|"B. Semantic<br/>Search"| VectorDB
FastAPI -.->|"C. Fetch<br/>Metadata"| PostgreSQL
FastAPI -.->|"D. Get<br/>Cache"| Redis
VectorDB -.->|"E. Relevant<br/>Chunks"| Ollama
Ollama -.->|"F. Generated<br/>Insights"| Response
Response -.->|"G. Display<br/>Answer"| User
%% ========== MONITORING CONNECTIONS (Dotted Lines) ==========
Flower -.-|monitor| CeleryQueue
Flower -.-|monitor| CeleryWorkers
Grafana -.-|metrics| FastAPI
Grafana -.-|metrics| PostgreSQL
Grafana -.-|metrics| Redis
Grafana -.-|metrics| VectorDB
Prometheus -.-|collect| FastAPI
Prometheus -.-|collect| CeleryWorkers
%% Apply Styles
class User userStyle
class FastAPI apiStyle
class RabbitMQ brokerStyle
class CeleryQueue queueStyle
class CeleryWorkers,Chunking,Embedding,Indexing workerStyle
class VectorDB,PostgreSQL dbStyle
class Redis brokerStyle
class Ollama aiStyle
class Response llmStyle
class Flower,Prometheus,Grafana monitorStyle
-
Monitoring with Prometheus & Grafana
Retrieval Augmented Generation implementation for log file question answering and analysis. This project uses FastAPI, Celery, and various vector databases to provide a scalable and efficient RAG pipeline, optimized for local performance using Ollama as the LLM provider.
- Python 3.12
- FastAPI – REST API for handling user requests
- Uvicorn – ASGI server
- Ollama
nomic-embed-text:latest– text embeddings generation (local deployment & azure deployment)qwen2.5-coder:7b– large language model for answer generation(local deployment)qwen2.5-coder:1.5b– large language model for answer generation( azure deployment)
- Qdrant – vector database for similarity search
- PostgreSQL (pgvector) – metadata and vector storage
- SQLAlchemy & Alembic – ORM and database migrations
- Celery – background task processing
- RabbitMQ – message broker
- Celery Beat – task scheduling
- Flower – Celery monitoring
- Docker & Docker Compose – containerization and service orchestration
- Nginx – reverse proxy
- Azure – cloud deployment option
- Streamlit Cloud – frontend deployment option
- Prometheus – metrics collection
- Grafana – metrics visualization
- Node Exporter – system metrics
- Postgres Exporter – database metrics
- Chart.js – Interactive data visualization library used in the web interface (
src/templates/index.html)- Traffic Over Time Chart – Line chart displaying request patterns over time periods
- Status Codes Distribution – Doughnut chart showing HTTP status code breakdown (2xx, 3xx, 4xx, 5xx)
- Top IPs Chart – Bar chart displaying most active IP addresses
- Top URLs Chart – Bar chart showing most frequently accessed endpoints
- Real-time Dashboard – Metrics cards displaying total requests, unique visitors, bandwidth, and error rates
- Charts are dynamically rendered using Chart.js CDN and updated via REST API calls to the EDA endpoint
- Responsive design with loading states and error handling
- Postman – API testing
- Git & GitHub – version control
- Azure Deployment: http://rag-log-analysis.italynorth.cloudapp.azure.com/
- Backend API deployed on Azure with Ollama cloud-hosted models
- Full RAG system with FastAPI, Celery, and vector databases
- Streamlit Cloud Deployment: https://log-analysis-rag.streamlit.app/
- Frontend interface deployed on Streamlit Cloud
- Interactive log analysis dashboard with RAG capabilities
- Connected to Ollama cloud-hosted models
- Azure – Cloud deployment for backend services with Ollama cloud-hosted models
- Streamlit Cloud – Frontend deployment with Ollama cloud-hosted models
- GitHub Actions – CI/CD pipeline for automated deployment with Ollama cloud-hosted models
- FastAPI: Main entry point of the system. Handles user requests, file uploads, and search queries, and orchestrates communication with backend services.
- Uvicorn: ASGI server responsible for running the FastAPI application efficiently with high performance and async support.
- RabbitMQ: Message broker enabling reliable and asynchronous communication between FastAPI and Celery workers, allowing smooth horizontal scaling.
- Celery Workers: Execute background and long‑running tasks such as file processing, text/log chunking, and data indexing without blocking the API or degrading user experience.
- Celery Beat: Handles scheduled and periodic tasks such as cleanup jobs or recurring background processes.
- Vector Databases (Qdrant / pgvector): Store and index embeddings generated from log chunks, enabling fast and accurate similarity search during retrieval.
- Ollama –
nomic-embed-text:latest: Generates dense vector embeddings from text chunks, forming the foundation of semantic search. - Ollama –
qwen2.5-coder:7b: Generates context‑aware responses based on the most relevant retrieved chunks. - PostgreSQL: Stores structured application data including metadata, project information, and task execution details.
- SQLAlchemy & Alembic: Provide ORM capabilities and database schema migrations to manage PostgreSQL efficiently.
- Nginx: Acts as a reverse proxy in front of the FastAPI application, improving security, routing, and performance.
- Docker & Docker Compose: Containerize and orchestrate all system services, ensuring consistent environments and simplified deployment.
- Monitoring Stack:
- Flower – real‑time monitoring of Celery workers and task execution.
- Prometheus – metrics collection.
- Grafana – metrics visualization and dashboards.
- Node Exporter – system‑level metrics.
- Postgres Exporter – database metrics.
- Chart.js: Frontend visualization library used to render interactive charts for log analysis dashboard, including traffic patterns, status code distributions, top IPs, and top URLs.
- Postman: Used for API testing and endpoint validation during development.
- Git & GitHub: Version control and source code management.
This system includes multiple log‑specific chunking strategies for RAG, evaluated on a dataset of 150+ Apache web server log entries over a 65‑minute period (08:15:23–09:20:05, January 8, 2026). The dataset covers multiple IPs, HTTP methods (GET, POST, PUT, DELETE), and status codes (200, 304, 401, 403, 404) across static assets, APIs, product pages, admin, search, cart, and checkout flows.
The system provides 7 advanced chunking methods optimized for log analysis and RAG applications. Each method is designed to handle different query types and analysis scenarios.
Each chunking method follows a systematic approach:
- Pattern Recognition: Extract key features from log lines (timestamps, IP addresses, status codes, URLs, HTTP methods)
- Boundary Detection: Identify natural boundaries based on the method's strategy (time windows, error blocks, IP changes, etc.)
- Chunk Formation: Group log entries into chunks respecting boundaries and size constraints
- Metadata Extraction: Generate rich metadata for each chunk (error counts, time windows, IP addresses, status categories)
- Overlap Management: Maintain context overlap between chunks for better RAG retrieval (where applicable)
The chunking process ensures that semantically related log entries stay together, improving the quality of retrieval and answer generation in the RAG pipeline.
Best for: General-purpose RAG systems that need to answer diverse queries about errors, performance, user behavior, and temporal patterns.
- Logic: Combines multiple strategies intelligently:
- Time-window awareness (keeps logs from same time period together)
- Error-block awareness (never splits error contexts)
- Component awareness (groups requests from same IP when beneficial)
- Semantic sliding (maintains overlap for context)
- Status-code awareness (groups similar HTTP responses)
- Strategy:
- Primary grouping: Time windows (hourly)
- Secondary grouping: Keep errors with their context
- Tertiary grouping: Consider IP/component patterns
- Always maintain overlap for RAG context
- Config:
chunk_size = 100(configurable),overlap_size = 20(configurable) - Metadata: Includes chunk_index, entries count, time_window, has_errors, error_count, primary_ip, status_category, chunk_reasons
- Pros: Best balance across all query types, intelligent boundary detection, preserves context
- Cons: More complex logic, slightly higher processing overhead
Best for: RAG systems requiring high-quality semantic search and accurate context retrieval for complex queries.
- Logic: Context-aware smart splitting using intelligent boundaries:
- Never splits IP session patterns (consecutive requests from same IP)
- Never splits error sequences (errors + immediate context)
- Respects natural log boundaries (time gaps, URL pattern changes)
- Maintains semantic overlap
- Optimizes chunk size for embedding models
- Boundary Detection:
- Time gap > 60 seconds = natural boundary
- IP change after 5+ consecutive requests = session boundary
- Error sequences kept intact (error + 2 before + 2 after)
- URL pattern shifts = activity boundary
- Config:
chunk_size = 100(configurable),overlap_size = 15(configurable) - Metadata: Includes chunk_index, entries, unique_ips, error_count, boundary_reasons, has_overlap
- Pros: High-quality semantic chunks, protects error context, respects natural boundaries
- Cons: Requires parsing all lines first, more memory intensive
Best for: RAG applications where context between chunks matters, general-purpose log analysis.
- Logic: Sliding window over sequential log entries with configurable overlap to preserve context across chunk boundaries.
- Config:
chunk_size = 100(configurable),overlap_size = 20(configurable, percentage-based) - Metadata: Includes chunk_index, entries count, has_overlap flag
- Pros: Strong temporal/context preservation, good for multi-entry analysis and general RAG queries
- Cons: Slight storage overhead due to overlap, may mix unrelated entries for very focused queries
Best for: Error analysis, debugging, security incident investigation, authentication failure patterns.
- Logic: Detects error status codes (400, 401, 403, 404, 405, 500, 501, 502, 503, 504) and groups them with nearby non-error context into blocks. Chunks are created when size threshold is reached OR when errors are followed by non-error lines.
- Config:
chunk_size = 100(configurable),overlap = 0 - Metadata: Includes error_lines count, total_lines count
- Pros: Excellent for error analysis, debugging, and security incident investigation
- Cons: Less effective for non-error queries; can fragment successful traffic patterns
Best for: Temporal/traffic pattern analysis, peak-hour identification, time-series queries.
- Logic: Groups logs into fixed hourly windows using extracted timestamps (e.g.,
2026-Jan-08_08:00). Detects timestamp patterns like[23/Jan/2019:03:56:14 +0330]and groups by hour. - Config:
chunk_size = 100(configurable),overlap = 0, hourly granularity - Metadata: Includes time_window identifier, entries count
- Pros: Ideal for temporal/traffic pattern analysis and peak-hour identification
- Cons: May split related requests across hour boundaries; less suited for user-centric queries
Best for: Client/user behavior analysis, session tracking, suspicious activity detection, user journey analysis.
- Logic: Groups logs by client IP address using regex pattern matching at the start of log lines. Chunks are created on size threshold OR when IP address changes.
- Config:
chunk_size = 100(configurable),overlap = 0, component identifier = IP - Metadata: Includes component (IP address), entries count
- Pros: Great for client/user behavior analysis, session tracking, and suspicious activity detection
- Cons: Fragments time-based patterns; multi-IP queries require multiple chunks
Best for: Performance monitoring, error-rate analysis, status-code-focused analytics.
- Logic: Groups logs by HTTP status code categories:
2xx_success- Successful requests3xx_redirect- Redirect responses4xx_client_error- Client-side errors5xx_server_error- Server-side errors
- Config:
chunk_size = 100(configurable),overlap = 0 - Metadata: Includes status_category, entries count
- Pros: Excellent for performance monitoring and error-rate analysis
- Cons: Fragments user journeys and limits context to same-status entries
The methods were evaluated on multiple query types, including user journey analysis, error analysis, time‑based analysis, status code analysis, authentication failure patterns, and cart operations.
-
User journey & cart flows:
log_component_basedscored highest (9/10) by grouping all events for a given IP (e.g., full checkout for IP172.16.54.78, cart operations across users).log_semantic_slidingprovided good context but sometimes mixed users or missed parts of sequences.
-
Error & authentication analysis:
log_error_blockexcelled (9/10) for listing 4xx errors, authentication failures, and grouping related error context.log_status_codealso performed well for summarizing error distributions.
-
Time‑based traffic patterns:
log_time_windowachieved 9/10 for analyzing traffic between 08:00 and 09:00, peak periods, and time‑localized behaviors.
-
Status distribution:
log_status_codewas best suited to compute percentages of 2xx/3xx/4xx responses and identify 404 endpoints, though full accuracy depends on covering the complete dataset.
| Query Type | Hybrid Adaptive ⭐ | Hybrid Intelligent | Semantic Sliding | Error Block | Time Window | Component‑Based | Status Code |
|---|---|---|---|---|---|---|---|
| User Journey | 8.5/10 | 8.0/10 | 7/10 | 4/10 | 6/10 | 9/10 | 5/10 |
| Error Analysis | 9.0/10 | 8.5/10 | 7/10 | 9/10 | 6/10 | 7/10 | 8/10 |
| Time Patterns | 8.5/10 | 7.5/10 | 7/10 | 5/10 | 9/10 | 5/10 | 6/10 |
| Status Analysis | 8.0/10 | 7.5/10 | 7/10 | 6/10 | 6/10 | 6/10 | 8/10 |
| Auth Failures | 8.5/10 | 8.0/10 | 6/10 | 9/10 | 6/10 | 8/10 | 7/10 |
| Cart Operations | 8.5/10 | 8.0/10 | 7/10 | 5/10 | 6/10 | 9/10 | 5/10 |
| Average | 8.5/10 ⭐ | 7.9/10 | 7.0/10 | 6.3/10 | 6.5/10 | 7.3/10 | 6.5/10 |
-
Best general‑purpose method:
log_hybrid_adaptive⭐ (Default) - Recommended for default RAG usage due to intelligent combination of multiple strategies, balanced performance across all query types, and design optimized for retrieval‑augmented generation. It adaptively combines time-window, error-block, component, and semantic sliding strategies.log_semantic_sliding- Alternative default choice with strong context preservation and balanced performance (7.0/10).log_hybrid_intelligent- For high-quality semantic search requiring context-aware splitting and natural boundary detection.
-
Best for specialized tasks:
log_error_block– Error analysis, security incidents, and authentication failures.log_time_window– Temporal/traffic pattern queries, peak-hour identification.log_component_based– User journey, cart operations, and IP‑based behavior analysis.log_status_code– Performance monitoring and status‑code‑focused analytics.
This evaluation demonstrates that no single chunking method is optimal for all query types, and the system can select or combine strategies depending on the question type for more accurate log analysis. The hybrid methods (log_hybrid_adaptive and log_hybrid_intelligent) provide the best balance by intelligently combining multiple strategies.
git clone https://github.com/MarouaHattab/mini-rag-app
cd mini-rag-appCreate Virtual Environment
python3 -m venv env
source env/bin/activate # On Windows: env\Scripts\activateInstall Dependencies
cd src
pip install -r requirements.txtConfigure Environment
cd ../docker/env
cp .env.example.app .env.app
cp .env.example.postgres .env.postgres
cp .env.example.rabbitmq .env.rabbitmq
cp .env.example.redis .env.redis
cp .env.example.grafana .env.grafanaConfigure Ollama (Local LLM)
This project is configured to use Ollama by default, removing the need for external API keys. Update your .env.app with the following:
# LLM Configuration for local Ollama
GENERATION_BACKEND="OPENAI"
EMBEDDING_BACKEND="OPENAI"
# Use local Ollama endpoint (OpenAI compatible)
OPENAI_API_URL="http://host.docker.internal:11434/v1"
OPENAI_API_KEY="ollama" # Placeholder value
# Model Selection
GENERATION_MODEL_ID="qwen2.5-coder:1.5b"
EMBEDDING_MODEL_ID="nomic-embed-text"cd docker/env
# Copy all environment files
for file in .env.example.*; do cp "$file" "${file//.example/}"; done
# Update LLM configuration in .env.app as shown abovecd docker
docker compose up pgvector rabbitmq redis -dRun Migrations
cd ../src/models/db_schemes/minirag
source ../../../../env/bin/activate # If using local setup
alembic upgrade headcd docker
docker compose up --buildServices will be available at:
- FastAPI API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Nginx (Load Balancer): http://localhost:81
- Flower (Celery Monitor): http://localhost:5555
- Grafana Dashboard: http://localhost:3000
- Prometheus Metrics: http://localhost:9090
- RabbitMQ Management: http://localhost:15672
Terminal 1: Start services
cd docker
docker compose up pgvector rabbitmq redis qdrant prometheus grafana -dTerminal 2: Start FastAPI
cd src
source ../env/bin/activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000Terminal 3: Start Celery Worker
celery -A celery_app worker --queues=default,file_processing,data_indexing --loglevel=infoTerminal 4: Start Flower (optional)
celery -A celery_app flower --conf=flowerconfig.pycurl -X POST "http://localhost:8000/data/upload/1" \
-H "Content-Type: multipart/form-data" \
-F "files=@document.log"curl -X POST "http://localhost:8000/data/process/1"curl -X POST "http://localhost:8000/api/v1/data/process-and-push/1"curl -X POST "http://localhost:8000/nlp/index/search/1" \
-H "Content-Type: application/json" \
-d '{"query": "What is the main topic?", "top_k": 5}'curl -X POST "http://localhost:8000/nlp/index/answer/1" \
-H "Content-Type: application/json" \
-d '{"query": "Explain the main concepts in the document"}'Import the Postman collection from src/assets/mini-rag-app.postman_collection.json
Visit http://localhost:8000/docs for Swagger UI documentation. 
- Pre-configured Dashboards: System metrics, PostgreSQL metrics, application metrics
- Flower Dashboard: http://localhost:5555
- Password: Set in CELERY_FLOWER_PASSWORD environment variable
- URL: http://localhost:9090
- Available Metrics: Application performance, database health, system resources
- PDF: .pdf
- Text: .txt
- Maximum file size: 10MB (configurable)
- PostgreSQL + pgvector: Default, integrated with main database.
- Qdrant: Dedicated vector database, better for large-scale deployments.




















