diff --git a/AGENTS.md b/AGENTS.md index 6758b6bd9..4481cda2e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -56,7 +56,7 @@ The Cluster Logging Operator (CLO) is a Kubernetes operator for configuring log ### Data Flow Architecture Logs flow through the collector in this pattern: -``` +```text collect from source → move to ._internal → transform → apply output datamodel → apply sink changes → send ``` diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 000000000..9139b9e2b --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,357 @@ +# Cluster Logging Operator - Architecture + +This document describes the internal architecture of the Cluster Logging Operator (CLO), including design decisions, key components, and important tradeoffs. + +## Overview + +The Cluster Logging Operator manages log collection and forwarding in OpenShift clusters using a declarative, Kubernetes-native approach. Users define what logs to collect and where to send them via the ClusterLogForwarder CRD, and the operator translates this into a working deployment of Vector collectors. + +### Core Design Principles + +1. **Declarative Configuration**: Users specify desired state via CRDs; the operator ensures the cluster matches that state +2. **Separation of Concerns**: Collector logic is separated from forwarding logic through Vector's transform and sink model +3. **Multi-tenancy**: Infrastructure, application, and audit logs are isolated to support per-tenant controls +4. **Struct-Based Generation**: Complex configurations are generated using Go structs to ensure consistency and maintainability + +## Architecture Diagram + +```text +┌─────────────────────────────────────────────────────────────────────┐ +│ User (via oc) │ +└────────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ ClusterLogForwarder CRD (User Intent) │ +│ ├─ Inputs (what logs to collect) │ +│ ├─ Outputs (where to send logs) │ +│ ├─ Pipelines (how to route logs) │ +│ └─ Filters (transformations to apply) │ +└────────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Cluster Logging Operator Controller │ +│ ┌──────────────────────────────────────────────────────────────┐ │ +│ │ 1. Validate CRD │ │ +│ │ 2. Generate Vector Config │ │ +│ │ 3. Create/Update Secrets & ConfigMaps │ │ +│ │ 4. Deploy Collector DaemonSets/Deployments │ │ +│ └──────────────────────────────────────────────────────────────┘ │ +└────────────────────────────┬────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────────┐ +│ Vector Collectors (on each node) │ +│ ┌────────────────┐ ┌──────────────┐ ┌──────────────┐ │ +│ │ Sources │ │ Transforms │ │ Sinks │ │ +│ │ - journald │→ │ - normalize │→ │ - Loki │ │ +│ │ - containers │ │ - filter │ │ - Splunk │ │ +│ │ - audit │ │ - enrich │ │ - CloudWatch │ │ +│ └────────────────┘ └──────────────┘ └──────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +## Core Components + +### 1. Controllers (`internal/controller/`) + +**ClusterLogForwarder Controller** (`observability/clusterlogforwarder_controller.go`) +- Watches ClusterLogForwarder resources for changes +- Validates CRD specifications +- Manages Secret and ConfigMap creation for collector credentials +- Orchestrates Vector deployment (DaemonSets/Deployments) +- Handles reconciliation and error reporting + +**LogFileMetricsExporter Controller** (`logfilemetricsexporter/logfilemetricsexporter_controller.go`) +- Manages LogFileMetricsExporter resources +- Exposes metrics about log file processing +- Provides observability into the logging system + +### 2. API Definitions (`api/observability/v1/`) + +**ClusterLogForwarder Types** (`clusterlogforwarder_types.go`) +- Main CRD definition for user-facing configuration +- Defines the structure for inputs, outputs, pipelines, and filters +- Includes validation rules + +**Output Types** (`output_types.go`) +- Specifications for each supported output type +- Examples: Kafka, Splunk, CloudWatch, Google Cloud Logging, Elasticsearch, Azure, etc. +- Handles output-specific configuration parameters + +### 3. Configuration Generator (`internal/generator/`) + +The configuration generator translates ClusterLogForwarder specs into Vector configurations. + +**Key Design**: Struct-based generation for consistency and maintainability + +```text + ClusterLogForwarder Spec + │ + ▼ + ┌─────────────────────────┐ + │ Generator Module │ + │ ├─ inputs.go │ + │ ├─ outputs.go │ + │ ├─ filters.go │ + │ └─ pipelines.go │ + └──────────┬──────────────┘ + │ + ▼ + Vector Configuration (TOML) +``` + +**Output Structure** (`internal/generator/vector/output/`) + +Each output type is a separate package: +- `aws/` - AWS CloudWatch +- `azurelogsingestion/` - Azure Logs Ingestion +- `azuremonitor/` - Azure Monitor +- `elasticsearch/` - Elasticsearch +- `gcl/` - Google Cloud Logging +- `http/` - Generic HTTP +- `kafka/` - Kafka +- `loki/` - Grafana Loki +- `lokistack/` - LokiStack (managed) +- `otlp/` - OpenTelemetry Protocol +- `splunk/` - Splunk +- `syslog/` - Syslog + +Each output module: +1. Defines Go struct types for configuration +2. Assembles the transforms and sink configuration to represent an output +3. Handles output-specific validation and transformations + +### 4. Data Flow Architecture + +Logs flow through the Vector collector following this pattern: + +```text +Sources → internal preprocessing → global transforms → output-specific transforms → sinks +``` + +**Minimal Required Attributes**: +- `log_type` - classification (application, infrastructure, audit) +- `log_source` - specific source (e.g., node, container, kubeAPI) +- `timestamp` - when the log was generated +- `message` - log content +- `level` - log severity +- `kubernetes_metadata` - pod, namespace, labels, etc. + +### 5. Key Directories + +- `api/` - API/CRD definitions +- `cmd/` - Operator entry point +- `config/` - Kubernetes manifests (RBAC, CRDs, ServiceAccount) +- `internal/controller/` - Reconciliation logic +- `internal/generator/` - Configuration generation system +- `internal/validations/` - Input validation logic +- `test/functional/` - Functional tests for outputs +- `test/e2e/` - End-to-end integration tests +- `hack/` - Build scripts and utilities +- `docs/` - Documentation + +## Design Decisions + +### 1. Vector as the Collector + +**Decision**: Use Vector for log collection and forwarding + +**Rationale**: +- High-performance, memory-efficient collector +- Extensive output support (100+ destinations) +- Strong community and maintenance +- ViaQ data model compatibility + +**Tradeoff**: Operator must stay updated with Vector releases and API changes + +### 2. Struct-Based Configuration Generation + +**Decision**: Generate Vector configs using Go structs rather than templates + +**Rationale**: +- Type-safe configuration construction +- Maintains consistency across different output types +- Easier to review, test, and extend +- Simpler to add new outputs + +**Tradeoff**: Requires keeping structs in sync with Vector's configuration schema + +### 3. Separate Input, Output, and Transform Definitions + +**Decision**: Keep inputs, outputs, and filters as distinct CRD fields with separate pipeline specification + +**Rationale**: +- Allows flexible routing of logs from multiple inputs to multiple outputs +- Simplifies validation of each component independently +- Enables reuse of filter definitions + +**Tradeoff**: More verbose CRD structure; requires careful validation to ensure pipeline validity + +### 4. DaemonSet-Based Collection + +**Decision**: Deploy Vector as a DaemonSet on each node for node-level logs + +**Rationale**: +- DaemonSets ensure collection from every node +- Reduces network hops for node-local logs +- Deployments are used for the case where the collector acts as an audit log receiver with no need to collect off a node + +**Tradeoff**: Multiple collection processes may have higher resource overhead than a single centralized collector + +## Important Tradeoffs Still in Effect + +### 1. Complexity vs. Flexibility + +**Trade**: Supporting complex log routing (multiple inputs → transformations → multiple outputs) increases operator complexity. + +**Mitigation**: Validate pipelines thoroughly; provide clear error messages. + +### 2. Performance vs. Compatibility + +**Trade**: Supporting older OpenShift versions requires maintaining compatibility with older Kubernetes APIs. + +**Mitigation**: Use operator-sdk abstractions; test against supported versions. + +### 3. Security vs. Usability + +**Trade**: Storing secrets securely requires additional complexity (Secret management, RBAC validation). + +**Mitigation**: Use Kubernetes Secrets; validate RBAC permissions; document security best practices. + +### 4. Configuration Customization vs. Maintainability + +**Trade**: Allowing arbitrary Vector configuration increases flexibility but risks unsupported or broken configurations. + +**Mitigation**: Provide a well-defined CRD with validation; disallow direct Vector config modification. + +## Adding New Features + +### Adding a New Output Type + +See [docs/contributing/how-to-add-new-output.md](docs/contributing/how-to-add-new-output.md) for detailed steps. + +Basic approach: +1. Add output type definition to `api/observability/v1/output_types.go` +2. Add struct definitions to `internal/api/observability/` (Vector API structs) +3. Create generator package in `internal/generator/vector/output/[type]/` +4. Register in `internal/generator/vector/outputs.go` +5. Add functional tests in `test/functional/outputs/` + +### Adding a New Input Type + +Similar process to outputs: +1. Update `api/observability/v1/clusterlogforwarder_types.go` with new input type +2. Create input generator in `internal/generator/vector/input/[type]/` +3. Add validation logic to controller +4. Test with functional tests + +### Adding a New Filter + +Filters are transformations applied to logs: +1. Add filter definition to `api/observability/v1/clusterlogforwarder_types.go` +2. Implement in `internal/generator/vector/filter/` +3. Update pipeline logic to apply filters correctly +4. Add tests + +## Testing Strategy + +### Unit Tests + +Test individual components in isolation: +- Configuration generation correctness +- Configuration validation +- Data transformations + +Location: Throughout codebase with `*_test.go` files + +### Functional Tests + +Test output connector integration: +- Actual log forwarding to supported destinations +- Connection handling and credential management +- Data format compliance + +Location: `test/functional/outputs/` + +### E2E Tests + +Test full operator functionality: +- CRD application and reconciliation +- Collector deployment and configuration +- End-to-end log flow + +Location: `test/e2e/` + +## Dependency Points + +### External Dependencies + +1. **Vector** - The actual log collector (versioned in Dockerfile) +2. **operator-sdk** - Kubernetes operator framework +3. **controller-runtime** - Kubernetes controller libraries +4. **client-go** - Kubernetes client library +5. **loki-operator** - For managed LokiStack integration (optional) + +### Internal Dependencies + +- `api/` packages imported by `internal/controller/` +- `internal/generator/` imported by controllers for config generation +- Controllers depend on Kubernetes primitives via client-go + +## Version Compatibility + +- **Go**: See `go.mod` for the exact version +- **Vector**: See Dockerfile for the exact version + +## Monitoring and Observability + +### Metrics Exposed + +- Reconciliation timing and success rates +- Vector deployment status +- Configuration generation statistics +- Resource usage (via Vector itself) + +### Logs + +- Controller reconciliation logs +- Configuration generation debug information +- Error reporting on CRD validation failures + +### Status Conditions + +ClusterLogForwarder Status includes: +- Validation status (Valid/Invalid) +- Deployment status (Deployed/Failing) +- Collector readiness status + +## Build Artifacts + +### Dockerfiles + +**Dockerfile** - Development and CI image +- Standard build used for local development and CI/CD testing +- Compiles Go binary inside container for cross-platform compatibility +- Used by: `make image`, `make deploy` (for development/testing) + +**Dockerfile.art** - Red Hat production image +- Official production image distributed by Red Hat +- Builds using Red Hat's internal build system +- Enables strict FIPS mode for security and compliance requirements +- Required for Red Hat official builds and certified releases + +**Dockerfile.macos-dev** - macOS development image +- Development convenience for Apple Silicon Macs +- Avoids QEMU amd64 emulation (which segfaults on ARM machines) +- Expects pre-built binary at `bin/cluster-logging-operator` +- Usage: Cross-compile on host with `GOOS=linux GOARCH=amd64 make build`, then `docker build -f Dockerfile.macos-dev` + +## Security Considerations + +1. **Secret Management**: Credentials stored in Kubernetes Secrets, not in configs +2. **RBAC**: Operator validates permissions before accessing external services +3. **Network**: TLS/mTLS configuration for secure communication with outputs +4. **Audit**: Audit log collection with special handling for sensitive data + diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..cb71dbfe3 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,241 @@ +# Contributing to Cluster Logging Operator + +Thank you for your interest in contributing! This document provides guidelines for submitting changes to the Cluster Logging Operator project. + +## Getting Started + +1. **Fork and Clone** + ```bash + git clone https://github.com/yourusername/cluster-logging-operator.git + cd cluster-logging-operator + ``` + +2. **Install Dependencies** + ```bash + make tools + ``` + +3. **Create a Branch** + ```bash + git checkout -b feature/your-feature-name + ``` + +## Development Workflow + +### Before Making Changes + +- Check existing [issues](https://github.com/openshift/cluster-logging-operator/issues) and the [LOG Jira project](https://redhat.atlassian.net/browse/LOG) to avoid duplicate work +- For significant changes, open a Jira issue or GitHub issue to discuss the approach first +- Review the [ARCHITECTURE.md](ARCHITECTURE.md) to understand the design + +### Making Changes + +1. **Write Your Code** + - Follow Go conventions and style guidelines + - Keep changes focused and minimal + - Add tests for new functionality + +2. **Run Tests Locally** + ```bash + # Unit tests + make test-unit + + # All validation + make check + + # Functional tests (requires cluster) + make test-functional + + # E2E tests (requires cluster) + make test-e2e-local + ``` + +3. **Code Quality Checks** + ```bash + # Run pre-commit validation + make pre-commit + + # Run linter + make lint + ``` + +## Submitting a Pull Request + +### PR Requirements + +1. **PR Title and Description** + - Use clear, descriptive titles + - Start with the Jira issue number if applicable (e.g., "LOG-1234: Add feature X") + - Provide context on what the change does and why it's needed + +2. **Code Changes** + - Keep PRs focused on a single feature or bug fix + - Update related documentation + - Include tests for new functionality + - Ensure all tests pass locally before pushing + +3. **Commit Messages** + - Use clear, descriptive commit messages + - Reference related issues when applicable + - Keep commits atomic and logical + +### CI/CD Pipeline + +The repository uses automated CI/CD: +- **Unit Tests** - Run on every PR +- **Linting** - Code style validation +- **Build Validation** - Ensures code compiles +- **Functional Tests** - Tests component integration +- **E2E Tests** - Tests full operator functionality in a cluster + +All checks must pass before merging. Don't be discouraged if CI catches issues—this is normal. Push fixes and the CI will re-run automatically. + +### Review Process + +- A maintainer will review your PR +- Address feedback and push updates as needed +- Be patient—reviews may take time due to other priorities +- For urgent changes, mention it in the PR description + +## Testing Requirements + +### Unit Tests + +Unit tests are required for: +- New API types and validations +- Configuration generation logic +- Controller reconciliation logic + +Run with: +```bash +make test-unit +``` + +### Functional Tests + +Functional tests verify output connector integration (requires cluster): +```bash +make test-functional +``` + +### E2E Tests + +End-to-end tests require a Kubernetes cluster: +```bash +make test-e2e-local +``` + +## Coding Conventions + +### Go Style + +- Follow [Go Code Review Comments](https://github.com/golang/go/wiki/CodeReviewComments) +- Use `gofmt` for formatting +- Run `make lint` before committing + +### API Changes + +When modifying CRD types in `api/`: +1. Update the struct tags and comments +2. Run `make generate` to update generated code +3. Run `make bundle docs` to update bundle manifests and documentation +4. Add tests for validation logic +5. Update relevant documentation + +### Configuration Generation + +When adding new output types: +1. Add type definition to `api/observability/v1/output_types.go` +2. Add struct definitions to `internal/api/observability/` (Vector API structs) +3. Create generator in `internal/generator/vector/output/[type]/` +4. Add entry point to `internal/generator/vector/outputs.go` +5. Add functional tests in `test/functional/outputs/` + +See [docs/contributing/how-to-add-new-output.md](docs/contributing/how-to-add-new-output.md) for detailed examples. + +## Adding New Output Types + +Comprehensive guide: [How to Add a New Output Type](docs/contributing/how-to-add-new-output.md) + +Quick summary: +1. Add API type definitions +2. Add struct definitions and implement configuration generation +3. Add functional tests to verify connectivity and log forwarding +4. Update documentation + +## Documentation + +### When to Update Docs + +- New features should include documentation +- API changes need corresponding doc updates +- Complex logic should have implementation notes +- Architecture changes should be documented in ARCHITECTURE.md + +### Documentation Locations + +- **User Guides**: `/docs/administration/` +- **Developer Guides**: `/docs/contributing/` +- **Architecture**: `/docs/architecture/` +- **Feature Docs**: `/docs/features/` +- **API Reference**: `/docs/reference/` + +## Project Structure + +For navigation tips, see [ARCHITECTURE.md](ARCHITECTURE.md#key-directories). + +## Common Issues + +### Tests Failing + +1. **Unit tests fail**: Check for linting errors with `make lint` +2. **Functional tests fail**: Verify cluster connectivity and required permissions +3. **E2E tests fail**: Ensure clean cluster state with `make undeploy-all` + +### Build Issues + +- Run `make clean` then `make build` to start fresh +- Ensure Go version matches what is specified in `go.mod` +- Check that all dependencies are installed: `make tools` + +## Development Tools + +### IDE Setup + +This project works with: +- GoLand / IntelliJ IDEA with Go plugin +- VS Code with Go extension +- Vim/Neovim with gopls + +### Debugging + +Run the operator in debug mode: +```bash +make run-debug +``` + +This starts the operator under the `dlv` debugger. + +## Reporting Issues + +When reporting bugs: +1. Search existing issues first +2. Provide: + - Steps to reproduce + - Expected vs actual behavior + - OpenShift version + - CLO version + - Relevant logs or error messages + +## Code Review Guidelines + +See [docs/contributing/REVIEW.adoc](docs/contributing/REVIEW.adoc) for detailed review guidelines. + +## Questions? + +- Check [docs/](docs/) for detailed information +- Create an [issue](https://github.com/openshift/cluster-logging-operator/issues) with a question label + +## License + +By contributing, you agree that your contributions will be licensed under the Apache License 2.0. See [LICENSE](LICENSE) for details. diff --git a/README.md b/README.md new file mode 100644 index 000000000..f29270a39 --- /dev/null +++ b/README.md @@ -0,0 +1,94 @@ +# Cluster Logging Operator + +The Cluster Logging Operator (CLO) is a Kubernetes operator for configuring log collection and forwarding on OpenShift clusters. It manages the deployment of Vector as the log collector and supports complex log routing through inputs, outputs, filters, and pipelines defined via the ClusterLogForwarder CRD. + +## What This Repository Does + +This repository contains the source code and configuration for the Cluster Logging Operator, which: + +- Collects logs from multiple sources (application, infrastructure, and audit logs) +- Transforms and filters logs using configurable pipelines +- Forwards logs to various destinations (Loki, Splunk, AWS CloudWatch, Google Cloud Logging, Elasticsearch, Kafka, Azure, and more) +- Manages log collection through Kubernetes Custom Resources +- Provides metrics and observability for the logging infrastructure + +## Building and Running Locally + +### Prerequisites + +- Go (see `go.mod` for the required version) +- Podman or Docker +- Kubernetes cluster (or local cluster like Kind, minikube, or Code Ready Containers) +- kubeconfig configured for your target cluster + +### Quick Start + +```bash +# Install development tools +make tools + +# Build the operator binary +make build + +# Run the operator locally (requires kubeconfig) +make run + +# Or deploy to a cluster +make deploy +``` + +For a full list of development commands and workflows, see [CONTRIBUTING.md](CONTRIBUTING.md). + +## Directory Structure + +- `api/` - API definitions and CRD schemas +- `cmd/` - Main entry point for the operator +- `config/` - Kubernetes configuration manifests +- `internal/controller/` - Reconciliation logic +- `internal/generator/` - Configuration generation system for Vector +- `test/` - All testing infrastructure (unit, functional, e2e) +- `hack/` - Development and build scripts +- `docs/` - Detailed documentation including administration and contribution guides + +## Architecture + +The operator uses a multi-layered architecture: + +- **Controllers**: Manage ClusterLogForwarder and LogFileMetricsExporter resources +- **Configuration Generator**: Translates CRD specs into Vector collector configurations +- **Vector Integration**: Uses Vector as the log collector/forwarder +- **Kubernetes Integration**: Deploys and manages collector components via DaemonSets and Deployments + +For detailed architecture information, see [ARCHITECTURE.md](ARCHITECTURE.md). + +## Documentation + +- **[CONTRIBUTING.md](CONTRIBUTING.md)** - How to submit changes and development guidelines +- **[ARCHITECTURE.md](ARCHITECTURE.md)** - Design decisions, dependency points, and architecture details +- **[AGENTS.md](AGENTS.md)** - AI agent instructions and conventions +- **[docs/](docs/)** - Detailed developer documentation including: + - [Administration Guides](docs/administration/) + - [Architecture Overview](docs/architecture/) + - [Contributing Guidelines](docs/contributing/) + - [Feature Documentation](docs/features/) + - [API Reference](docs/reference/) + +For official OpenShift Logging documentation, see the [OpenShift Container Platform documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/logging). + +## Contributing + +We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for information on how to: +- Submit changes via pull requests +- Follow coding conventions +- Run tests before submitting +- Participate in code reviews + +## License + +This repository is licensed under the Apache License 2.0. See [LICENSE](LICENSE) file for details. + +## Getting Help + +- Open an issue in this repository for bugs, feature requests, or documentation problems +- Check existing [issues](https://redhat.atlassian.net/browse/LOG) +- Visit the [OpenShift Logging documentation](https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/logging) for user-facing information