Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ The Cluster Logging Operator (CLO) is a Kubernetes operator for configuring log
### Data Flow Architecture

Logs flow through the collector in this pattern:
```
```text
collect from source → move to ._internal → transform → apply output datamodel → apply sink changes → send
```

Expand Down
357 changes: 357 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,357 @@
# Cluster Logging Operator - Architecture

This document describes the internal architecture of the Cluster Logging Operator (CLO), including design decisions, key components, and important tradeoffs.

## Overview

The Cluster Logging Operator manages log collection and forwarding in OpenShift clusters using a declarative, Kubernetes-native approach. Users define what logs to collect and where to send them via the ClusterLogForwarder CRD, and the operator translates this into a working deployment of Vector collectors.

### Core Design Principles

1. **Declarative Configuration**: Users specify desired state via CRDs; the operator ensures the cluster matches that state
2. **Separation of Concerns**: Collector logic is separated from forwarding logic through Vector's transform and sink model
3. **Multi-tenancy**: Infrastructure, application, and audit logs are isolated to support per-tenant controls
4. **Struct-Based Generation**: Complex configurations are generated using Go structs to ensure consistency and maintainability

## Architecture Diagram

```text
┌─────────────────────────────────────────────────────────────────────┐
│ User (via oc) │
└────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ ClusterLogForwarder CRD (User Intent) │
│ ├─ Inputs (what logs to collect) │
│ ├─ Outputs (where to send logs) │
│ ├─ Pipelines (how to route logs) │
│ └─ Filters (transformations to apply) │
└────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Cluster Logging Operator Controller │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ 1. Validate CRD │ │
│ │ 2. Generate Vector Config │ │
│ │ 3. Create/Update Secrets & ConfigMaps │ │
│ │ 4. Deploy Collector DaemonSets/Deployments │ │
│ └──────────────────────────────────────────────────────────────┘ │
└────────────────────────────┬────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Vector Collectors (on each node) │
│ ┌────────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Sources │ │ Transforms │ │ Sinks │ │
│ │ - journald │→ │ - normalize │→ │ - Loki │ │
│ │ - containers │ │ - filter │ │ - Splunk │ │
│ │ - audit │ │ - enrich │ │ - CloudWatch │ │
│ └────────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
```

## Core Components

### 1. Controllers (`internal/controller/`)

**ClusterLogForwarder Controller** (`observability/clusterlogforwarder_controller.go`)
- Watches ClusterLogForwarder resources for changes
- Validates CRD specifications
- Manages Secret and ConfigMap creation for collector credentials
- Orchestrates Vector deployment (DaemonSets/Deployments)
- Handles reconciliation and error reporting

**LogFileMetricsExporter Controller** (`logfilemetricsexporter/logfilemetricsexporter_controller.go`)
- Manages LogFileMetricsExporter resources
- Exposes metrics about log file processing
- Provides observability into the logging system

### 2. API Definitions (`api/observability/v1/`)

**ClusterLogForwarder Types** (`clusterlogforwarder_types.go`)
- Main CRD definition for user-facing configuration
- Defines the structure for inputs, outputs, pipelines, and filters
- Includes validation rules

**Output Types** (`output_types.go`)
- Specifications for each supported output type
- Examples: Kafka, Splunk, CloudWatch, Google Cloud Logging, Elasticsearch, Azure, etc.
- Handles output-specific configuration parameters

### 3. Configuration Generator (`internal/generator/`)

The configuration generator translates ClusterLogForwarder specs into Vector configurations.

**Key Design**: Struct-based generation for consistency and maintainability

```text
ClusterLogForwarder Spec
┌─────────────────────────┐
│ Generator Module │
│ ├─ inputs.go │
│ ├─ outputs.go │
│ ├─ filters.go │
│ └─ pipelines.go │
└──────────┬──────────────┘
Vector Configuration (TOML)
```

**Output Structure** (`internal/generator/vector/output/`)

Each output type is a separate package:
- `aws/` - AWS CloudWatch
- `azurelogsingestion/` - Azure Logs Ingestion
- `azuremonitor/` - Azure Monitor
- `elasticsearch/` - Elasticsearch
- `gcl/` - Google Cloud Logging
- `http/` - Generic HTTP
- `kafka/` - Kafka
- `loki/` - Grafana Loki
- `lokistack/` - LokiStack (managed)
- `otlp/` - OpenTelemetry Protocol
- `splunk/` - Splunk
- `syslog/` - Syslog

Each output module:
1. Defines Go struct types for configuration
2. Assembles the transforms and sink configuration to represent an output
3. Handles output-specific validation and transformations

### 4. Data Flow Architecture

Logs flow through the Vector collector following this pattern:

```text
Sources → internal preprocessing → global transforms → output-specific transforms → sinks
```

**Minimal Required Attributes**:
- `log_type` - classification (application, infrastructure, audit)
- `log_source` - specific source (e.g., node, container, kubeAPI)
- `timestamp` - when the log was generated
- `message` - log content
- `level` - log severity
- `kubernetes_metadata` - pod, namespace, labels, etc.

### 5. Key Directories

- `api/` - API/CRD definitions
- `cmd/` - Operator entry point
- `config/` - Kubernetes manifests (RBAC, CRDs, ServiceAccount)
- `internal/controller/` - Reconciliation logic
- `internal/generator/` - Configuration generation system
- `internal/validations/` - Input validation logic
- `test/functional/` - Functional tests for outputs
- `test/e2e/` - End-to-end integration tests
- `hack/` - Build scripts and utilities
- `docs/` - Documentation

## Design Decisions

### 1. Vector as the Collector

**Decision**: Use Vector for log collection and forwarding

**Rationale**:
- High-performance, memory-efficient collector
- Extensive output support (100+ destinations)
- Strong community and maintenance
- ViaQ data model compatibility

**Tradeoff**: Operator must stay updated with Vector releases and API changes

### 2. Struct-Based Configuration Generation

**Decision**: Generate Vector configs using Go structs rather than templates

**Rationale**:
- Type-safe configuration construction
- Maintains consistency across different output types
- Easier to review, test, and extend
- Simpler to add new outputs

**Tradeoff**: Requires keeping structs in sync with Vector's configuration schema

### 3. Separate Input, Output, and Transform Definitions

**Decision**: Keep inputs, outputs, and filters as distinct CRD fields with separate pipeline specification

**Rationale**:
- Allows flexible routing of logs from multiple inputs to multiple outputs
- Simplifies validation of each component independently
- Enables reuse of filter definitions

**Tradeoff**: More verbose CRD structure; requires careful validation to ensure pipeline validity

### 4. DaemonSet-Based Collection

**Decision**: Deploy Vector as a DaemonSet on each node for node-level logs

**Rationale**:
- DaemonSets ensure collection from every node
- Reduces network hops for node-local logs
- Deployments are used for the case where the collector acts as an audit log receiver with no need to collect off a node

**Tradeoff**: Multiple collection processes may have higher resource overhead than a single centralized collector

## Important Tradeoffs Still in Effect

### 1. Complexity vs. Flexibility

**Trade**: Supporting complex log routing (multiple inputs → transformations → multiple outputs) increases operator complexity.

**Mitigation**: Validate pipelines thoroughly; provide clear error messages.

### 2. Performance vs. Compatibility

**Trade**: Supporting older OpenShift versions requires maintaining compatibility with older Kubernetes APIs.

**Mitigation**: Use operator-sdk abstractions; test against supported versions.

### 3. Security vs. Usability

**Trade**: Storing secrets securely requires additional complexity (Secret management, RBAC validation).

**Mitigation**: Use Kubernetes Secrets; validate RBAC permissions; document security best practices.

### 4. Configuration Customization vs. Maintainability

**Trade**: Allowing arbitrary Vector configuration increases flexibility but risks unsupported or broken configurations.

**Mitigation**: Provide a well-defined CRD with validation; disallow direct Vector config modification.

## Adding New Features

### Adding a New Output Type

See [docs/contributing/how-to-add-new-output.md](docs/contributing/how-to-add-new-output.md) for detailed steps.

Basic approach:
1. Add output type definition to `api/observability/v1/output_types.go`
2. Add struct definitions to `internal/api/observability/` (Vector API structs)
3. Create generator package in `internal/generator/vector/output/[type]/`
4. Register in `internal/generator/vector/outputs.go`
5. Add functional tests in `test/functional/outputs/`

### Adding a New Input Type

Similar process to outputs:
1. Update `api/observability/v1/clusterlogforwarder_types.go` with new input type
2. Create input generator in `internal/generator/vector/input/[type]/`
3. Add validation logic to controller
4. Test with functional tests

### Adding a New Filter

Filters are transformations applied to logs:
1. Add filter definition to `api/observability/v1/clusterlogforwarder_types.go`
2. Implement in `internal/generator/vector/filter/`
3. Update pipeline logic to apply filters correctly
4. Add tests

## Testing Strategy

### Unit Tests

Test individual components in isolation:
- Configuration generation correctness
- Configuration validation
- Data transformations

Location: Throughout codebase with `*_test.go` files

### Functional Tests

Test output connector integration:
- Actual log forwarding to supported destinations
- Connection handling and credential management
- Data format compliance

Location: `test/functional/outputs/`

### E2E Tests

Test full operator functionality:
- CRD application and reconciliation
- Collector deployment and configuration
- End-to-end log flow

Location: `test/e2e/`

## Dependency Points

### External Dependencies

1. **Vector** - The actual log collector (versioned in Dockerfile)
2. **operator-sdk** - Kubernetes operator framework
3. **controller-runtime** - Kubernetes controller libraries
4. **client-go** - Kubernetes client library
5. **loki-operator** - For managed LokiStack integration (optional)

### Internal Dependencies

- `api/` packages imported by `internal/controller/`
- `internal/generator/` imported by controllers for config generation
- Controllers depend on Kubernetes primitives via client-go

## Version Compatibility

- **Go**: See `go.mod` for the exact version
- **Vector**: See Dockerfile for the exact version

## Monitoring and Observability

### Metrics Exposed

- Reconciliation timing and success rates
- Vector deployment status
- Configuration generation statistics
- Resource usage (via Vector itself)

### Logs

- Controller reconciliation logs
- Configuration generation debug information
- Error reporting on CRD validation failures

### Status Conditions

ClusterLogForwarder Status includes:
- Validation status (Valid/Invalid)
- Deployment status (Deployed/Failing)
- Collector readiness status

## Build Artifacts

### Dockerfiles

**Dockerfile** - Development and CI image
- Standard build used for local development and CI/CD testing
- Compiles Go binary inside container for cross-platform compatibility
- Used by: `make image`, `make deploy` (for development/testing)

**Dockerfile.art** - Red Hat production image
- Official production image distributed by Red Hat
- Builds using Red Hat's internal build system
- Enables strict FIPS mode for security and compliance requirements
- Required for Red Hat official builds and certified releases

**Dockerfile.macos-dev** - macOS development image
- Development convenience for Apple Silicon Macs
- Avoids QEMU amd64 emulation (which segfaults on ARM machines)
- Expects pre-built binary at `bin/cluster-logging-operator`
- Usage: Cross-compile on host with `GOOS=linux GOARCH=amd64 make build`, then `docker build -f Dockerfile.macos-dev`

## Security Considerations

1. **Secret Management**: Credentials stored in Kubernetes Secrets, not in configs
2. **RBAC**: Operator validates permissions before accessing external services
3. **Network**: TLS/mTLS configuration for secure communication with outputs
4. **Audit**: Audit log collection with special handling for sensitive data

Loading