You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AI-powered documentation generation using multi-provider LLMs and specialized micro-agent architecture for comprehensive README creation.
8
-
7
+
An AI-powered full-stack application that automatically generates high-quality project documentation from source code repositories. Connect a GitHub repo, let specialized micro-agents analyze the codebase, architecture, dependencies, and APIs, and get structured README documentation in minutes, powered by multi-provider LLMs, OpenAI-compatible endpoints, or locally hosted models such as Ollama.
9
8
---
10
9
11
10
## 📋 Table of Contents
@@ -18,7 +17,8 @@ AI-powered documentation generation using multi-provider LLMs and specialized mi
@@ -28,7 +28,15 @@ AI-powered documentation generation using multi-provider LLMs and specialized mi
28
28
29
29
## Project Overview
30
30
31
-
**DocuBot** is an intelligent documentation generation platform that analyzes GitHub repositories using specialized micro-agents to automatically create comprehensive, well-structured README documentation with minimal human intervention.
31
+
**DocuBot** shows how agentic AI can be applied to one of the most time-consuming software tasks: documentation. The applicatoin analyzes real project evidence from a repository and uses specialized micro-agents to generate structured, context-aware README documentation that is more accurate and maintainable than traditional single-prompt generation.
32
+
33
+
The application supports a flexible inference layer, allowing it to work with OpenAI, Groq, OpenRouter, custom OpenAI-compatible APIs, and local Ollama deployments. This makes it practical for cloud-based teams, enterprise environments, and privacy-sensitive local setups alike.
34
+
35
+
This makes DocuBot suitable for:
36
+
37
+
-**Enterprise teams** — integrate with internal gateways, hosted APIs, or private inference infrastructure
38
+
-**Local experimentation** — run documentation generation with self-hosted models through Ollama
39
+
-**Hardware benchmarking** — measure SLM throughput on Apple Silicon, CUDA, or Intel Gaudi hardware
32
40
33
41
### How It Works
34
42
@@ -442,20 +450,21 @@ DocuBot/
442
450
443
451
### Performance Tips
444
452
445
-
-**Model Selection**: For faster processing, use `gpt-4o-mini` or Groq's `llama-3.2-90b-text-preview`
446
-
-**Local Development**: Use Ollama with `qwen2.5:7b` for private, offline documentation generation
447
-
-**Monorepo**: Select specific subprojects for focused documentation
448
-
-**PR Creation**: Requires `GITHUB_TOKEN` with `repo` scope in `api/.env`
453
+
-**Use the largest model your hardware can sustain.**`qwen3:14b` produces the best documentation quality; `qwen3:4b` is faster and good for benchmarking.
454
+
-**Lower `LLM_TEMPERATURE`** (e.g., `0.1`) for more factual, evidence-grounded documentation. Raise it slightly (e.g., `0.3–0.5`) for more descriptive, narrative-style README prose.
455
+
-**Keep repositories focused.** The agents analyze up to `MAX_FILES_TO_SCAN` files (default: 500). For large monorepos, use the built-in project selector to target a specific subproject rather than letting agents scan the entire repo.
456
+
-**On Apple Silicon**, always run Ollama natively — never inside Docker. The Metal GPU backend delivers significantly higher throughput for sequential multi-agent workloads compared to CPU-only inference.
457
+
-**On Linux with an NVIDIA GPU**, set `CUDA_VISIBLE_DEVICES` before starting Ollama to target a specific GPU.
458
+
-**For enterprise remote APIs**, choose a model with a large context window (≥16k tokens) to avoid truncation on longer inputs.
449
459
450
460
---
451
461
452
462
## LLM Provider Configuration
453
463
454
-
DocuBot supports multiple LLM providers. Choose the one that best fits your needs:
464
+
DocuBot supports multiple LLM providers. All providers are configured via the `.env` file. Set `INFERENCE_PROVIDER=ollama` for local inference.
455
465
456
-
### OpenAI (Recommended for Production)
457
466
458
-
**Best for**: Highest quality outputs, production deployments
467
+
### OpenAI
459
468
460
469
-**Get API Key**: https://platform.openai.com/account/api-keys
If the endpoint uses a private domain mapped in `/etc/hosts`, also set:
551
+
552
+
```bash
553
+
LOCAL_URL_ENDPOINT=your-private-domain.internal
554
+
```
555
+
556
+
542
557
### Switching Providers
543
558
544
559
To switch providers, simply update `api/.env` and restart:
@@ -557,27 +572,92 @@ docker compose up -d
557
572
558
573
---
559
574
560
-
## Performance Benchmarks
561
-
562
-
The following benchmarks were collected by running DocuBot's full 9-agent documentation pipeline across three inference environments. Use these results to choose the right deployment profile for your needs.
575
+
## Inference Benchmarks
563
576
564
-
> **Note:** Intel Enterprise Inference was tested on Intel Xeon hardware to demonstrate on-premises SLM deployment for enterprise codebases.
577
+
The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized DocuBot's full 9-agent documentation pipeline.
565
578
566
-
### Results
567
-
568
-
| Model Type / Inference Provider | Model Name | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Total Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/sec) | Hardware Profile |
> - All benchmarks use the same Documentation generation workflow. Token counts may vary slightly per run due to non-deterministic model output.
588
+
> - vLLM on Apple Silicon uses Metal (MPS) GPU acceleration.
589
+
> -[Intel OPEA Enterprise Inference](https://github.com/opea-project/Enterprise-Inference) runs on Intel Xeon CPUs without GPU acceleration.
590
+
591
+
---
592
+
593
+
## Model Capabilities
594
+
595
+
### Qwen3-4B-Instruct-2507
574
596
575
-
### Model Capabilities
597
+
A 4-billion-parameter open-weight code model from Alibaba's Qwen team (July 2025 release), designed for on-prem and edge deployment.
576
598
577
-
| Model | Highlights |
578
-
|---|---|
579
-
|**Qwen3-4B-Instruct-2507**| 4B-parameter code-specialized model with 262.1K native context (deployment-limited to 8.1K on Xeon CPU). Supports multi-agent documentation generation, code analysis, and structured JSON output. Enables full on-premises deployment with data sovereignty for enterprise codebases. |
580
-
|**gpt-4o-mini**| Cloud-native multimodal model with 128K context, optimized for code understanding and technical documentation. Delivers 42% higher throughput and 26% lower latency versus CPU-based alternatives while supporting concurrent multi-agent orchestration at cloud scale. |
> Both models support Code Analysis & Documentation Generation, Multi-agent / agentic task execution, Mermaid diagram generation, function calling, and JSON-structured output. However, only Qwen3-4B offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.
0 commit comments