Intelligent developer news monitoring powered by Gemma 4 E4B — running on a $7/month server with zero API costs.
Monitors 25+ developer RSS feeds every 6 hours, uses Gemma 4 E4B to distinguish genuine releases and security alerts from tutorial spam, and posts clean digests to Slack — automatically.
Before: Manually scanning feeds or paying $15–20/month in OpenAI API calls.
After: Gemma 4 running locally. $0 in AI costs. Runs on any VPS with 3GB RAM.
┌─────────────────────────────────────────────────────────────────┐
│ Architecture Overview │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ 25+ RSS │───▶│ feedparser │───▶│ │ │
│ │ Feeds │ │ (Python) │ │ Gemma 4 E4B │ │
│ └──────────┘ └──────────────┘ │ via Ollama │ │
│ │ │ │
│ Rust Blog 200 items │ • Classifies each │ │
│ Django Releases ──────────────────▶│ item: signal/noise│ │
│ K8s Releases │ • Summarizes │ │
│ GitHub Security │ newsworthy items │ │
│ AWS Blog │ • 128K ctx window │ │
│ + 20 more... ~4-8 seconds │ handles batches │ │
│ │ │ │
│ └──────────┬───────────┘ │
│ │ │
│ ┌──────────▼───────────┐ │
│ │ Slack Digest │ │
│ │ │ │
│ │ ## Dev Digest │ │
│ │ • Django 5.2 — ... │ │
│ │ • CVE-2026-XXXX ... │ │
│ └──────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
This project deliberately uses the E4B (Edge 4 Billion) model, not the largest available. Here's why that choice was intentional:
┌──────────────────────────────────────────────────────────────────────────┐
│ Gemma 4 Model Comparison │
├──────────┬────────┬─────────────┬─────────────┬──────────────────────────┤
│ Model │ Params │ Active │ RAM Needed │ Best For │
├──────────┼────────┼─────────────┼─────────────┼──────────────────────────┤
│ E2B │ 2B │ 2B (all) │ ~1.5 GB │ Mobile, Raspberry Pi │
│ E4B ★ │ 4B │ 4B (all) │ ~2.5 GB │ Edge, cheap VPS, CPU │
│ 26B MoE │ 26B │ ~3.8B active│ ~14 GB │ Consumer GPU │
│ 31B Dense│ 31B │ 31B (all) │ ~20 GB │ Workstation, H100 │
└──────────┴────────┴─────────────┴─────────────┴──────────────────────────┘
★ = This project's choice
The reasoning:
- Task fit: Feed classification is a reasoning task, not a creative one. E4B's 4B parameters and 128K context window are more than sufficient.
- Economics: E4B runs comfortably on a €6.99/month Hetzner CPX21 (3 vCPU, 4GB RAM). The 26B or 31B would need a $40–80/month server.
- Throughput: On CPU, E4B processes a 200-item batch in 4–8 seconds. The 31B Dense would take 60–90 seconds for the same batch.
- Per-Layer Embeddings: Gemma 4's edge models use PLE architecture — they have the representational depth of a much larger model at a fraction of the memory cost.
The best model for a job isn't always the biggest one.
┌─────────────────────────────────────────────────────────────────┐
│ Monthly Cost Comparison │
│ │
│ API-based approach (OpenAI GPT-3.5-turbo): │
│ ────────────────────────────────────────── │
│ 2.4M tokens/month × $0.50/1M = $1.20 for THIS workflow │
│ × 5 similar automation workflows = ~$15–20/month │
│ │
│ This project (Gemma 4 E4B on Hetzner CPX21): │
│ ───────────────────────────────────────────── │
│ VPS cost (4GB RAM, 3 vCPU): $7.40/month │
│ AI inference cost (unlimited): $0.00/month │
│ ───────────────────────────────────────────────── │
│ Total: $7.40/month │
│ │
│ Savings vs API-only (5 workflows): ~$10–15/month │
│ │
│ More importantly: ZERO marginal cost per inference │
│ → Run it 100x more often. No budget anxiety. │
└─────────────────────────────────────────────────────────────────┘
Measured on a Hetzner CPX21 (3 vCPU AMD EPYC, 4GB RAM, no GPU):
┌─────────────────────────────────────────────────────────────────┐
│ Real-World Performance (CPU Inference) │
│ │
│ Feed fetch (25 feeds, ~180 items): │
│ ████████░░░░░░░░░░░░░░░░░░░░ 1.8 seconds │
│ │
│ Gemma 4 E4B analysis (180 items, ~6K tokens): │
│ ████████████████████░░░░░░░░ 4.2 seconds │
│ │
│ Total cycle time: │
│ ██████████████████████░░░░░░ ~6 seconds │
│ │
│ Peak RAM usage during inference: 2.7 GB / 4.0 GB │
│ CPU usage during inference: 72% (3 vCPU) │
│ CPU usage idle: < 1% │
│ │
│ Spam filter accuracy: ~87% │
│ (vs GPT-3.5: ~88%, GPT-4o: ~95%) │
└─────────────────────────────────────────────────────────────────┘
- Python 3.9+
- 3GB+ free RAM (E4B model needs ~2.5GB)
- Linux, macOS, or WSL2
- (Optional) Slack webhook URL
git clone https://github.com/YOUR_USERNAME/gemma4-rss-intelligence
cd gemma4-rss-intelligence
chmod +x install.sh && ./install.shThe installer handles everything:
- Checks Python version
- Installs Ollama (if needed)
- Downloads Gemma 4 E4B (~2.5GB)
- Creates Python virtualenv
- Installs dependencies
- Runs a config check
Total setup time: ~10 minutes (mostly model download speed)
# 1. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# 2. Download Gemma 4 E4B
ollama pull gemma4:e4b
# 3. Clone and set up Python env
git clone https://github.com/YOUR_USERNAME/gemma4-rss-intelligence
cd gemma4-rss-intelligence
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# 4. Test run (24h lookback, no Slack)
python3 monitor.py --hours 24 --dry-runEdit config.yaml to customise everything:
# Choose your Gemma 4 model
gemma4:
model: "gemma4:e4b" # Change to gemma4:26b or gemma4:31b for more power
host: "http://localhost:11434"
# Monitoring window
monitoring:
hours_back: 6 # Cron runs every 6 hours, so this matches
max_items_per_feed: 15 # Safety cap — E4B handles 128K tokens
# Slack (optional)
slack:
webhook_url: "YOUR_SLACK_WEBHOOK_HERE"
channel: "#dev-digest"
# Add/remove feeds freely
feeds:
- url: "https://blog.rust-lang.org/feed.xml"
name: "Rust Blog"
# ... 24 more pre-configuredAny RSS or Atom feed works:
feeds:
- url: "https://github.com/YOUR_ORG/YOUR_REPO/releases.atom"
name: "Internal Releases"
- url: "https://your-company-blog.com/feed.xml"
name: "Company Blog"Run every 6 hours automatically:
crontab -eAdd this line (replace /path/to with your actual path):
0 */6 * * * cd /path/to/gemma4-rss-intelligence && ./venv/bin/python3 monitor.py >> monitor.log 2>&1To verify it's set:
crontab -lpython3 monitor.py [OPTIONS]
Options:
--config PATH Config file (default: config.yaml)
--hours N Override lookback window in hours
--dry-run Analyze but don't post to Slack
--check Check Ollama connection, then exit
Examples:
python3 monitor.py # Normal run
python3 monitor.py --hours 24 --dry-run # Test 24h lookback
python3 monitor.py --config prod.yaml # Custom config
python3 monitor.py --check # Health check
If you have more RAM, upgrade easily:
# 26B MoE — near-flagship quality, ~14GB RAM
ollama pull gemma4:26b
# Update config.yaml: model: "gemma4:26b"
# 31B Dense — flagship, ~20GB RAM, #3 on Arena AI leaderboard
ollama pull gemma4:31b
# Update config.yaml: model: "gemma4:31b"┌─────────────────────────────────────────────────────────────────┐
│ Which Model Should You Use? │
│ │
│ RAM Available Recommended Model Monthly VPS Cost │
│ ───────────────────────────────────────────────────────────── │
│ 2–3 GB gemma4:e2b $6–7 (Hetzner CX22) │
│ 3–4 GB gemma4:e4b ★ $7 (Hetzner CPX21) │
│ 12–16 GB gemma4:26b $25 (Hetzner CCX23) │
│ 20+ GB gemma4:31b $50+ or local GPU │
│ │
│ ★ Recommended for this use case │
└─────────────────────────────────────────────────────────────────┘
"Cannot reach Ollama"
ollama serve # Start manually
# Or check if port 11434 is blocked
curl http://localhost:11434/api/tags"Model not found"
ollama list # See what's downloaded
ollama pull gemma4:e4b # Download E4B"Inference timeout (120s)"
- Your server may be under-provisioned. Try:
- Reduce
max_items_per_feedin config.yaml to 8–10 - Switch to
gemma4:e2b(faster, less accurate) - Add a GPU (even a cheap one makes 10x difference)
- Reduce
High memory usage / OOM
# Check available RAM
free -h
# Switch to smaller model
# config.yaml: model: "gemma4:e2b" (~1.5GB RAM)Feed not updating
# Test a specific feed URL
python3 -c "
import feedparser
f = feedparser.parse('https://blog.rust-lang.org/feed.xml')
print(len(f.entries), 'entries')
print(f.entries[0].title if f.entries else 'empty')
"gemma4-rss-intelligence/
├── monitor.py # Main script — feed fetch + Gemma 4 analysis + notify
├── config.yaml # All configuration (feeds, model, Slack)
├── requirements.txt # Python dependencies (4 packages)
├── install.sh # One-command setup script
├── .env.example # Environment variable template
├── .gitignore
└── README.md
The core call_gemma4() function is a simple HTTP wrapper around Ollama. You can extend it for other automation tasks:
# Code review on every PR
digest = call_gemma4(
prompt=f"Review this diff for bugs and style issues:\n{diff_content}",
model="gemma4:e4b",
host="http://localhost:11434"
)
# Log anomaly detection
digest = call_gemma4(
prompt=f"Identify anomalies in these error logs:\n{log_lines}",
model="gemma4:e4b",
host="http://localhost:11434"
)Since it's running locally at zero marginal cost, there's no reason to be conservative about how often you call it.
- Gemma 4 Official Docs
- Gemma 4 on Hugging Face
- Ollama Documentation
- Google AI Studio (free cloud access)
Apache 2.0 — same as Gemma 4 itself. Use it for anything or we can say almost for anything .