Skip to content

Commit b92f4d5

Browse files
authored
Merge pull request #5 from cld2labs/dev
update README, github workflow file and add PR template
2 parents dbbe69c + 5a9f48a commit b92f4d5

3 files changed

Lines changed: 158 additions & 39 deletions

File tree

.github/pull_request_template.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Summary
2+
3+
<!-- What does this PR do? Keep it to 1-3 bullet points. -->
4+
5+
-
6+
7+
## Type of Change
8+
9+
<!-- Check the one that applies. -->
10+
11+
- [ ] Bug fix
12+
- [ ] New feature / enhancement
13+
- [ ] Documentation update
14+
- [ ] Refactor (no behavior change)
15+
- [ ] Chore (dependencies, CI, tooling)
16+
17+
## Changes Made
18+
19+
<!-- Briefly describe the key changes. Link to relevant issues if applicable. -->
20+
21+
Resolves #<!-- issue number -->
22+
23+
## How to Test
24+
25+
<!-- Steps a reviewer can follow to verify the changes. -->
26+
27+
1.
28+
29+
## Checklist
30+
31+
- [ ] I have read the [Contributing Guide](../CONTRIBUTING.md)
32+
- [ ] My branch is up to date with `main`
33+
- [ ] New environment variables (if any) are documented in `.env.example` and the README
34+
- [ ] No secrets, API keys, or credentials are included in this PR
35+
- [ ] I have tested my changes locally
36+
37+
## Screenshots (if applicable)
38+
39+
<!-- Add screenshots for UI changes. Delete this section if not applicable. -->

.github/workflows/code-scans.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ jobs:
3737
run: mkdir -p trivy-reports
3838

3939
- name: Run Trivy FS Scan
40-
uses: aquasecurity/trivy-action@0.24.0
40+
uses: aquasecurity/trivy-action@0.35.0
4141
with:
4242
scan-type: 'fs'
4343
scan-ref: '.'

README.md

Lines changed: 118 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
11
<p align="center">
2-
<img src="docs/assets/InnovationHub-HeaderImage.png" width="800" alt="DocuBot AI Documentation Generator">
2+
<img src="docs/assets/InnovationHub-HeaderImage.png" width="800" alt="Company logo">
33
</p>
44

55
# 📚 DocuBot - AI Documentation Generator
66

7-
AI-powered documentation generation using multi-provider LLMs and specialized micro-agent architecture for comprehensive README creation.
8-
7+
An AI-powered full-stack application that automatically generates high-quality project documentation from source code repositories. Connect a GitHub repo, let specialized micro-agents analyze the codebase, architecture, dependencies, and APIs, and get structured README documentation in minutes, powered by multi-provider LLMs, OpenAI-compatible endpoints, or locally hosted models such as Ollama.
98
---
109

1110
## 📋 Table of Contents
@@ -18,7 +17,8 @@ AI-powered documentation generation using multi-provider LLMs and specialized mi
1817
- [Project Structure](#project-structure)
1918
- [Usage Guide](#usage-guide)
2019
- [LLM Provider Configuration](#llm-provider-configuration)
21-
- [Performance Benchmarks](#performance-benchmarks)
20+
- [Inference Benchmarks](#inference-benchmarks)
21+
- [Model Capabilities](#model-capabilities)
2222
- [Environment Variables](#environment-variables)
2323
- [Technology Stack](#technology-stack)
2424
- [Troubleshooting](#troubleshooting)
@@ -28,7 +28,15 @@ AI-powered documentation generation using multi-provider LLMs and specialized mi
2828

2929
## Project Overview
3030

31-
**DocuBot** is an intelligent documentation generation platform that analyzes GitHub repositories using specialized micro-agents to automatically create comprehensive, well-structured README documentation with minimal human intervention.
31+
**DocuBot** shows how agentic AI can be applied to one of the most time-consuming software tasks: documentation. The applicatoin analyzes real project evidence from a repository and uses specialized micro-agents to generate structured, context-aware README documentation that is more accurate and maintainable than traditional single-prompt generation.
32+
33+
The application supports a flexible inference layer, allowing it to work with OpenAI, Groq, OpenRouter, custom OpenAI-compatible APIs, and local Ollama deployments. This makes it practical for cloud-based teams, enterprise environments, and privacy-sensitive local setups alike.
34+
35+
This makes DocuBot suitable for:
36+
37+
- **Enterprise teams** — integrate with internal gateways, hosted APIs, or private inference infrastructure
38+
- **Local experimentation** — run documentation generation with self-hosted models through Ollama
39+
- **Hardware benchmarking** — measure SLM throughput on Apple Silicon, CUDA, or Intel Gaudi hardware
3240

3341
### How It Works
3442

@@ -442,20 +450,21 @@ DocuBot/
442450

443451
### Performance Tips
444452

445-
- **Model Selection**: For faster processing, use `gpt-4o-mini` or Groq's `llama-3.2-90b-text-preview`
446-
- **Local Development**: Use Ollama with `qwen2.5:7b` for private, offline documentation generation
447-
- **Monorepo**: Select specific subprojects for focused documentation
448-
- **PR Creation**: Requires `GITHUB_TOKEN` with `repo` scope in `api/.env`
453+
- **Use the largest model your hardware can sustain.** `qwen3:14b` produces the best documentation quality; `qwen3:4b` is faster and good for benchmarking.
454+
- **Lower `LLM_TEMPERATURE`** (e.g., `0.1`) for more factual, evidence-grounded documentation. Raise it slightly (e.g., `0.3–0.5`) for more descriptive, narrative-style README prose.
455+
- **Keep repositories focused.** The agents analyze up to `MAX_FILES_TO_SCAN` files (default: 500). For large monorepos, use the built-in project selector to target a specific subproject rather than letting agents scan the entire repo.
456+
- **On Apple Silicon**, always run Ollama natively — never inside Docker. The Metal GPU backend delivers significantly higher throughput for sequential multi-agent workloads compared to CPU-only inference.
457+
- **On Linux with an NVIDIA GPU**, set `CUDA_VISIBLE_DEVICES` before starting Ollama to target a specific GPU.
458+
- **For enterprise remote APIs**, choose a model with a large context window (≥16k tokens) to avoid truncation on longer inputs.
449459

450460
---
451461

452462
## LLM Provider Configuration
453463

454-
DocuBot supports multiple LLM providers. Choose the one that best fits your needs:
464+
DocuBot supports multiple LLM providers. All providers are configured via the `.env` file. Set `INFERENCE_PROVIDER=ollama` for local inference.
455465

456-
### OpenAI (Recommended for Production)
457466

458-
**Best for**: Highest quality outputs, production deployments
467+
### OpenAI
459468

460469
- **Get API Key**: https://platform.openai.com/account/api-keys
461470
- **Models**: `gpt-4o`, `gpt-4-turbo`, `gpt-4o-mini`
@@ -468,9 +477,9 @@ DocuBot supports multiple LLM providers. Choose the one that best fits your need
468477
LLM_MODEL=gpt-4o
469478
```
470479

471-
### Groq (Fast & Free Tier)
480+
### Groq
472481

473-
**Best for**: Fast inference, development, free tier testing
482+
Groq provides OpenAI-compatible endpoints with extremely fast inference (LPU hardware).
474483

475484
- **Get API Key**: https://console.groq.com/keys
476485
- **Models**: `llama-3.2-90b-text-preview`, `llama-3.1-70b-versatile`
@@ -484,14 +493,13 @@ DocuBot supports multiple LLM providers. Choose the one that best fits your need
484493
LLM_MODEL=llama-3.2-90b-text-preview
485494
```
486495

487-
### Ollama (Local & Private)
496+
### Ollama
488497

489-
**Best for**: Local deployment, privacy, no API costs, offline operation
498+
Runs inference locally on the host machine with full GPU acceleration.
490499

491-
- **Install**: https://ollama.com/download
492-
- **Pull Model**: `ollama pull qwen2.5:7b`
493-
- **Models**: `qwen2.5:7b`, `llama3.1:8b`, `llama3.2:3b`
494-
- **Pricing**: Free (local hardware costs only)
500+
- **Install Ollama**: https://ollama.com/download
501+
- **Pull Model**: `ollama pull qwen3:14b`
502+
- **Models**: `qwen3:4b`, `llama3.1:8b`, `llama3.2:3b`
495503
- **Configuration**:
496504
```bash
497505
LLM_PROVIDER=ollama
@@ -505,15 +513,15 @@ DocuBot supports multiple LLM providers. Choose the one that best fits your need
505513
curl -fsSL https://ollama.com/install.sh | sh
506514

507515
# Pull model
508-
ollama pull qwen2.5:7b
516+
ollama pull qwen3:14b
509517

510-
# Verify it's running
518+
# Verify Ollama is running:
511519
curl http://localhost:11434/api/tags
512520
```
513521

514-
### OpenRouter (Multi-Model Access)
522+
### OpenRouter
515523

516-
**Best for**: Access to multiple models through one API, model flexibility
524+
OpenRouter provides a unified API across hundreds of models from different providers.
517525

518526
- **Get API Key**: https://openrouter.ai/keys
519527
- **Models**: Claude, Gemini, GPT-4, Llama, and 100+ others
@@ -539,6 +547,13 @@ LLM_BASE_URL=https://your-custom-endpoint.com/v1
539547
LLM_MODEL=your-model-name
540548
```
541549

550+
If the endpoint uses a private domain mapped in `/etc/hosts`, also set:
551+
552+
```bash
553+
LOCAL_URL_ENDPOINT=your-private-domain.internal
554+
```
555+
556+
542557
### Switching Providers
543558

544559
To switch providers, simply update `api/.env` and restart:
@@ -557,27 +572,92 @@ docker compose up -d
557572

558573
---
559574

560-
## Performance Benchmarks
561-
562-
The following benchmarks were collected by running DocuBot's full 9-agent documentation pipeline across three inference environments. Use these results to choose the right deployment profile for your needs.
575+
## Inference Benchmarks
563576

564-
> **Note:** Intel Enterprise Inference was tested on Intel Xeon hardware to demonstrate on-premises SLM deployment for enterprise codebases.
577+
The table below compares inference performance across different providers, deployment modes, and hardware profiles using a standardized DocuBot's full 9-agent documentation pipeline.
565578

566-
### Results
567-
568-
| Model Type / Inference Provider | Model Name | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Total Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/sec) | Hardware Profile |
579+
| Provider | Model | Deployment | Context Window | Avg Input Tokens | Avg Output Tokens | Avg Total Tokens / Request | P50 Latency (ms) | P95 Latency (ms) | Throughput (req/sec) | Hardware |
569580
|---|---|---|---|---|---|---|---|---|---|---|
570-
| vLLM | Qwen3-4B-Instruct-2507 | Local | 262.1K | 3,040 | 307.7 | 5809 | 15,864 | 40,809 | 0.0580 | Apple Silicon (Metal) |
571-
| Enterprise Inference / SLM · [Intel OPEA EI](https://opea.dev) | Qwen3-4B-Instruct-2507 | CPU (Xeon) | 8.1K | 4,211.9 | 270 | 4481 | 10,540 | 32,205 | 0.076 | CPU-only |
581+
| vLLM | Qwen3-4B-Instruct-2507 | Local | 262.1K | 3,040 | 307.7 | 5809 | 15,864 | 40,809 | 0.0580 | Apple Silicon (Metal)(Macbook Pro M4) |
582+
| [Intel OPEA EI](https://github.com/opea-project/Enterprise-Inference) | Qwen3-4B-Instruct-2507 | CPU (Xeon) | 8.1K | 4,211.9 | 270 | 4481 | 10,540 | 32,205 | 0.076 | CPU-only |
572583
| OpenAI (Cloud) | gpt-4o-mini | API (Cloud) | 128K | 3,820.11 | 316.41 | 4136.52 | 7,760 | 23,535 | 0.108 | N/A |
573584

585+
> **Notes:**
586+
>
587+
> - All benchmarks use the same Documentation generation workflow. Token counts may vary slightly per run due to non-deterministic model output.
588+
> - vLLM on Apple Silicon uses Metal (MPS) GPU acceleration.
589+
> - [Intel OPEA Enterprise Inference](https://github.com/opea-project/Enterprise-Inference) runs on Intel Xeon CPUs without GPU acceleration.
590+
591+
---
592+
593+
## Model Capabilities
594+
595+
### Qwen3-4B-Instruct-2507
574596

575-
### Model Capabilities
597+
A 4-billion-parameter open-weight code model from Alibaba's Qwen team (July 2025 release), designed for on-prem and edge deployment.
576598

577-
| Model | Highlights |
578-
|---|---|
579-
| **Qwen3-4B-Instruct-2507** | 4B-parameter code-specialized model with 262.1K native context (deployment-limited to 8.1K on Xeon CPU). Supports multi-agent documentation generation, code analysis, and structured JSON output. Enables full on-premises deployment with data sovereignty for enterprise codebases. |
580-
| **gpt-4o-mini** | Cloud-native multimodal model with 128K context, optimized for code understanding and technical documentation. Delivers 42% higher throughput and 26% lower latency versus CPU-based alternatives while supporting concurrent multi-agent orchestration at cloud scale. |
599+
600+
| Attribute | Details |
601+
| --------------------------- | ------------------------------------------------------------------------------------------------------------------- |
602+
| **Parameters** | 4.0B total (3.6B non-embedding) |
603+
| **Architecture** | Transformer with Grouped Query Attention (GQA) — 36 layers, 32 Q-heads / 8 KV-heads |
604+
| **Context Window** | 262,144 tokens (256K) native |
605+
| **Reasoning Mode** | Non-thinking only (Instruct-2507 variant). Separate Thinking-2507 variant available with always-on chain-of-thought |
606+
| **Tool / Function Calling** | Supported; MCP (Model Context Protocol) compatible |
607+
| **Structured Output** | JSON-structured responses supported |
608+
| **Multilingual** | 100+ languages and dialects |
609+
| **Code Benchmarks** | MultiPL-E: 76.8%, LiveCodeBench v6: 35.1%, BFCL-v3 (tool use): 61.9 |
610+
| **Quantization Formats** | GGUF (Q4_K_M ~2.5 GB, Q8_0 ~4.3 GB), AWQ (int4), GPTQ (int4), MLX (4-bit ~2.3 GB) |
611+
| **Inference Runtimes** | Ollama, vLLM, llama.cpp, LMStudio, SGLang, KTransformers |
612+
| **Fine-Tuning** | Full fine-tuning and adapter-based (LoRA); 5,000+ community adapters on HuggingFace |
613+
| **License** | Apache 2.0 |
614+
| **Deployment** | Local, on-prem, air-gapped, cloud — full data sovereignty |
615+
616+
617+
### GPT-4o-mini
618+
619+
OpenAI's cost-efficient multimodal model, accessible exclusively via cloud API.
620+
621+
622+
| Attribute | Details |
623+
| --------------------------- | --------------------------------------------------------------------------------- |
624+
| **Parameters** | Not publicly disclosed |
625+
| **Architecture** | Multimodal Transformer (text + image input, text output) |
626+
| **Context Window** | 128,000 tokens input / 16,384 tokens max output |
627+
| **Reasoning Mode** | Standard inference (no explicit chain-of-thought toggle) |
628+
| **Tool / Function Calling** | Supported; parallel function calling |
629+
| **Structured Output** | JSON mode and strict JSON schema adherence supported |
630+
| **Multilingual** | Broad multilingual support |
631+
| **Code Benchmarks** | MMMLU: ~87%, strong HumanEval and MBPP scores |
632+
| **Pricing** | $0.15 / 1M input tokens, $0.60 / 1M output tokens (Batch API: 50% discount) |
633+
| **Fine-Tuning** | Supervised fine-tuning via OpenAI API |
634+
| **License** | Proprietary (OpenAI Terms of Use) |
635+
| **Deployment** | Cloud-only — OpenAI API or Azure OpenAI Service. No self-hosted or on-prem option |
636+
| **Knowledge Cutoff** | October 2023 |
637+
638+
639+
### Comparison Summary
640+
641+
642+
| Capability | Qwen3-4B-Instruct-2507 | GPT-4o-mini |
643+
| ------------------------------- | -------------------------------- | --------------------------------- |
644+
| Code Analysis & Documentation Generation | Yes | Yes |
645+
| Multi-agent / agentic task execution | Yes | Yes |
646+
| Mermaid / architecture diagram Generation | Yes | Yes |
647+
| Function / tool calling | Yes | Yes |
648+
| JSON structured output | Yes | Yes |
649+
| On-prem / air-gapped deployment | Yes | No |
650+
| Data sovereignty | Full (weights run locally) | No (data sent to cloud API) |
651+
| Open weights | Yes (Apache 2.0) | No (proprietary) |
652+
| Custom fine-tuning | Full fine-tuning + LoRA adapters | Supervised fine-tuning (API only) |
653+
| Quantization for edge devices | GGUF / AWQ / GPTQ / MLX | N/A |
654+
| Multimodal (image input) | No | Yes |
655+
| Native context window | 256K | 128K |
656+
657+
658+
> Both models support Code Analysis & Documentation Generation, Multi-agent / agentic task execution, Mermaid diagram generation, function calling, and JSON-structured output. However, only Qwen3-4B offers open weights, data sovereignty, and local deployment flexibility — making it suitable for air-gapped, regulated, or cost-sensitive environments. GPT-4o-mini offers lower latency and higher throughput via OpenAI's cloud infrastructure, with added multimodal capabilities.
659+
660+
---
581661

582662
## Environment Variables
583663

0 commit comments

Comments
 (0)