Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/03-build-secure.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@
cache-to: type=local,dest=/tmp/.buildx-cache-new,mode=max
build-args: |
BUILDKIT_INLINE_CACHE=1
${{ matrix.service == 'backend' && format('BACKEND_CACHE_BUST={0}', hashFiles('backend/**/*.py', 'backend/Dockerfile.backend', 'pyproject.toml', 'poetry.lock')) || '' }}

Check warning on line 124 in .github/workflows/03-build-secure.yml

View workflow job for this annotation

GitHub Actions / YAML Lint

124:121 [line-length] line too long (181 > 120 characters)

# Move cache to optimize for next run (temp workaround for actions/cache#828)
- name: 💾 Save BuildKit Cache
Expand Down
38 changes: 38 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,44 @@ make prod-logs # View logs
- ✅ **Local dev**: Feature development, bug fixes, rapid iteration
- ✅ **Production**: Docker containers for deployment to staging/production environments

#### Docker Build Cache Invalidation Strategy

The backend Dockerfile uses content-based cache invalidation to optimize build performance:

**Local Builds**:

- Default value `BACKEND_CACHE_BUST=local-build` is used automatically
- Cache invalidates only on manual rebuilds with `--no-cache` or when Dockerfile changes
- Fast local iteration without unnecessary cache invalidation

**CI/CD Builds** (GitHub Actions):

- Content-based hash:
`BACKEND_CACHE_BUST=${{ hashFiles('backend/**/*.py', 'backend/Dockerfile.backend', 'pyproject.toml', 'poetry.lock') }}`
- Cache invalidates automatically when:
- Backend Python files change (`backend/**/*.py`)
- Dockerfile changes (`backend/Dockerfile.backend`)
- Dependencies change (`pyproject.toml`, `poetry.lock`)
- Cache preserved when backend files are unchanged → **faster CI builds** (Issue #349)
- Targeted pattern excludes `.pyc`, `__pycache__`, `.log` files to avoid unnecessary cache invalidation

**Implementation Details**:

- Builder stage: Uses static `CACHE_BUST=20251119` for PyTorch CPU-only migration (Issue #506)
- Runtime stage: Uses dynamic `BACKEND_CACHE_BUST` for content-based invalidation
- Both stages are independent - builder cache separate from runtime cache
- See `backend/Dockerfile.backend` lines 38 and 82 for ARG declarations

**Manual Cache Invalidation**:

```bash
# Force cache invalidation for local builds
docker build --build-arg BACKEND_CACHE_BUST=$(date +%s) \
-f backend/Dockerfile.backend -t rag-modulo-backend:latest .
```

See `docs/troubleshooting/docker.md` for detailed troubleshooting.

### Testing

#### Test Categories
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,8 @@ local-dev-status:

build-backend:
@echo "$(CYAN)🔨 Building backend image...$(NC)"
@# Note: BACKEND_CACHE_BUST uses default value 'local-build' from Dockerfile
@# For CI builds, this is set via workflow using content hash
@if [ "$(BUILDX_AVAILABLE)" = "yes" ]; then \
echo "Using Docker BuildKit with buildx..."; \
$(CONTAINER_CLI) buildx build --load \
Expand Down
15 changes: 12 additions & 3 deletions backend/Dockerfile.backend
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,13 @@ RUN apt-get update && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# CACHE_BUST: Force rebuild to use CPU-only PyTorch (Issue #506)
# Ensure final stage cache is also invalidated
ARG CACHE_BUST=20251119
# BACKEND_CACHE_BUST: Content-based cache invalidation for backend source files
# This ARG is set by the workflow using hashFiles() to invalidate cache when backend
# source files change. For local builds, defaults to 'local-build' marker.
# In CI, this is set to content hash (hashFiles('backend/**/*.py', ...)) ensuring
# cache is invalidated only when backend files actually change, preserving Docker
# layer cache benefits when unchanged.
ARG BACKEND_CACHE_BUST=local-build

WORKDIR /app

Expand All @@ -87,6 +91,11 @@ COPY --from=builder /usr/local/bin /usr/local/bin
# Copy Poetry config from project root (moved from backend/ in Issue #501)
COPY pyproject.toml poetry.lock ./

# Use BACKEND_CACHE_BUST to invalidate cache before copying backend files
# This ensures that when backend/** files change, this layer (and subsequent COPY layers)
# are rebuilt, but when backend files are unchanged, Docker can reuse cached layers.
RUN echo "Backend content hash (cache invalidation): $BACKEND_CACHE_BUST"

# Copy only essential application files from backend directory
COPY backend/main.py backend/healthcheck.py ./
COPY backend/rag_solution/ ./rag_solution/
Expand Down
64 changes: 56 additions & 8 deletions docs/troubleshooting/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ This guide covers common Docker and container-related issues in RAG Modulo, incl
RAG Modulo uses Docker Compose for orchestrating multiple containers:

**Services**:

- `backend`: FastAPI application (port 8000)
- `frontend`: React/Nginx (port 3000/8080)
- `postgres`: PostgreSQL database (port 5432)
Expand All @@ -27,6 +28,7 @@ RAG Modulo uses Docker Compose for orchestrating multiple containers:
- `mlflow-server`: Model tracking (port 5001)

**Docker Compose Files**:

- `./docker-compose.yml` - Production deployment
- `./docker-compose-infra.yml` - Infrastructure services
- `./docker-compose.dev.yml` - Development overrides
Expand All @@ -37,6 +39,7 @@ RAG Modulo uses Docker Compose for orchestrating multiple containers:
### Issue 1: Container Immediately Exits

**Symptoms**:

```bash
$ docker compose ps
NAME STATUS
Expand Down Expand Up @@ -72,6 +75,7 @@ docker compose exec backend env | grep COLLECTIONDB
```

**Solution**:

```bash
# Copy example .env
cp .env.example .env
Expand Down Expand Up @@ -121,6 +125,7 @@ docker compose logs backend | grep -i error
```

**Solution**:

```bash
# Check PYTHONPATH in Dockerfile
cat backend/Dockerfile.backend | grep PYTHONPATH
Expand All @@ -135,6 +140,7 @@ lsof -i :8000 || netstat -tuln | grep 8000
### Issue 2: Container Health Check Failures

**Symptoms**:

```bash
$ docker compose ps
NAME STATUS
Expand Down Expand Up @@ -171,6 +177,7 @@ docker compose logs backend | tail -50
```

**Solution**:

```bash
# Restart backend
docker compose restart backend
Expand All @@ -193,6 +200,7 @@ backend:
### Issue 3: Container Restarts Continuously

**Symptoms**:

```bash
$ docker compose ps
NAME STATUS
Expand Down Expand Up @@ -228,6 +236,7 @@ journalctl -u docker | grep oom
```

**Solution**:

```yaml
# Increase memory limit
# File: docker-compose.yml
Expand Down Expand Up @@ -257,6 +266,7 @@ backend:
### Issue 1: Cannot Connect to Database

**Symptoms**:

```python
sqlalchemy.exc.OperationalError: could not connect to server: Connection refused
```
Expand Down Expand Up @@ -316,6 +326,7 @@ netstat -tuln | grep 5432
### Issue 2: Cannot Access Backend from Host

**Symptoms**:

```bash
$ curl http://localhost:8000/api/health
curl: (7) Failed to connect to localhost port 8000: Connection refused
Expand Down Expand Up @@ -373,6 +384,7 @@ curl http://localhost:8000/api/health
### Issue 3: Container Cannot Reach External APIs

**Symptoms**:

```python
httpx.ConnectError: All connection attempts failed
# When calling WatsonX/OpenAI APIs
Expand Down Expand Up @@ -434,6 +446,7 @@ sudo iptables -I DOCKER-USER -p tcp --dport 443 -j ACCEPT
### Issue 1: Volume Mount Errors

**Symptoms**:

```bash
Error response from daemon: invalid mount config for type "bind": bind source path does not exist
```
Expand Down Expand Up @@ -483,6 +496,7 @@ services:
### Issue 2: Permission Denied Errors

**Symptoms**:

```bash
postgres_1 | FATAL: data directory "/var/lib/postgresql/data" has wrong ownership
backend_1 | PermissionError: [Errno 13] Permission denied: '/app/logs/rag_modulo.log'
Expand Down Expand Up @@ -534,6 +548,7 @@ services:
### Issue 3: Disk Space Exhausted

**Symptoms**:

```bash
Error: No space left on device
```
Expand Down Expand Up @@ -601,43 +616,68 @@ docker compose up -d

## Image Build Problems

### Issue 1: Build Fails with CACHE_BUST
### Issue 1: Build Fails with BACKEND_CACHE_BUST

**Symptoms**:

```bash
ERROR: failed to solve: failed to compute cache key:
# Or cache not invalidating when backend files change
```

**Diagnosis**:

```bash
# Check Dockerfile
cat backend/Dockerfile.backend | grep CACHE_BUST
cat backend/Dockerfile.backend | grep BACKEND_CACHE_BUST

# Try build with no cache
docker build --no-cache -f backend/Dockerfile.backend -t test-build .
```

**Solutions**:

**A) Update CACHE_BUST Value**:
**A) Local Builds** (uses default value):

```bash
# Local builds use default value 'local-build' automatically
docker build -f backend/Dockerfile.backend -t rag-modulo-backend:latest .
make build-backend # Also works - uses default value
```

**B) Force Cache Invalidation**:

```dockerfile
# File: backend/Dockerfile.backend
ARG CACHE_BUST=20251028 # Change date to force rebuild
RUN echo "Cache bust: $CACHE_BUST"
```bash
# Override with a new value to force cache invalidation
docker build --build-arg BACKEND_CACHE_BUST=$(date +%s) \
-f backend/Dockerfile.backend -t rag-modulo-backend:latest .
```

**B) Build with --pull**:
**C) CI/CD Builds** (content-based invalidation):

```yaml
# In GitHub Actions workflows, BACKEND_CACHE_BUST is set automatically
# based on content hash of backend files:
BACKEND_CACHE_BUST=${{ hashFiles('backend/**/*.py', 'backend/Dockerfile.backend', 'pyproject.toml', 'poetry.lock') }}
```

**D) Build with --pull**:

```bash
# Pull latest base image
docker build --pull -f backend/Dockerfile.backend -t rag-modulo-backend:latest .
```

**Understanding Cache Invalidation Strategy**:

- **Local builds**: Use default `BACKEND_CACHE_BUST=local-build` - cache invalidates only on manual rebuilds
- **CI builds**: Use content hash - cache invalidates automatically when backend Python files, Dockerfile, or dependency files change
- **Cache benefits**: Docker layer cache is preserved when backend files are unchanged, significantly speeding up builds

### Issue 2: Poetry Lock File Issues

**Symptoms**:

```bash
ERROR: poetry.lock does not exist or is out of sync with pyproject.toml
```
Expand Down Expand Up @@ -669,6 +709,7 @@ docker build --build-arg SKIP_LOCK_CHECK=1 -f backend/Dockerfile.backend .
### Issue 3: Build Timeouts

**Symptoms**:

```bash
ERROR: failed to solve: DeadlineExceeded
```
Expand All @@ -692,6 +733,7 @@ COMPOSE_HTTP_TIMEOUT=600 docker compose build backend
### Issue 1: Backend OOM (Out of Memory)

**Symptoms**:

```bash
docker compose ps
rag-modulo-backend-1 Restarting (137) 1 minute ago
Expand Down Expand Up @@ -748,6 +790,7 @@ WEB_CONCURRENCY=2 # Default is 4
### Issue 2: CPU Throttling

**Symptoms**:

```bash
# Slow response times
# High CPU usage: docker stats shows 100% CPU
Expand Down Expand Up @@ -803,6 +846,7 @@ nginx:
### Issue 1: Services Start Out of Order

**Symptoms**:

```bash
backend_1 | sqlalchemy.exc.OperationalError: could not connect to server
# Backend starts before PostgreSQL is ready
Expand All @@ -825,6 +869,7 @@ backend:
### Issue 2: Circular Dependency

**Symptoms**:

```bash
Error: Circular dependency between services:
service1 depends on service2
Expand All @@ -850,6 +895,7 @@ def connect_to_database():
### Issue 1: Docker Compose V1 vs V2

**Symptoms**:

```bash
docker-compose: command not found
# Or
Expand All @@ -876,6 +922,7 @@ DOCKER_COMPOSE := docker compose
### Issue 2: Multiple Compose Files

**症状**:

```bash
# Confusion about which services are running
# Different configurations in different files
Expand All @@ -902,6 +949,7 @@ docker compose -f docker-compose.yml -f docker-compose.dev.yml config
### Issue 3: Environment Variable Conflicts

**Symptoms**:

```bash
# Different values in .env vs docker-compose.yml
# Variables not being picked up
Expand Down
Loading