Skip to content

Latest commit

 

History

History
574 lines (428 loc) · 13.1 KB

File metadata and controls

574 lines (428 loc) · 13.1 KB

Structured Logging Guide for AgentFarm

This guide explains how to use the new structured logging system powered by structlog in AgentFarm.

Table of Contents

Why Structured Logging?

Structured logging provides several advantages over traditional text-based logging:

  • Searchable & Filterable: Every log entry is a structured dict with searchable fields
  • Rich Context: Automatically include simulation_id, step_number, agent_id, etc.
  • Machine Readable: JSON output for analysis tools and log aggregators
  • Better Debugging: Easily trace events across distributed systems
  • Performance Metrics: Built-in performance tracking and slow operation detection
  • Type Safety: Better IDE support and fewer string formatting bugs

Quick Start

1. Basic Logging

from farm.utils import get_logger

logger = get_logger(__name__)

# Simple logging
logger.info("simulation_started", num_agents=100, num_steps=1000)
logger.warning("resource_low", resource_id="res_001", level=10)
logger.error("agent_died", agent_id="agent_123", cause="starvation")

Output (development mode):

2025-10-01T12:34:56.789Z [info     ] simulation_started    num_agents=100 num_steps=1000
2025-10-01T12:34:57.123Z [warning  ] resource_low          resource_id=res_001 level=10
2025-10-01T12:34:58.456Z [error    ] agent_died            agent_id=agent_123 cause=starvation

2. Context Binding

from farm.utils import get_logger, bind_context, unbind_context

logger = get_logger(__name__)

# Bind context that persists across logs
bind_context(simulation_id="sim_001", experiment_id="exp_42")

logger.info("step_started", step=1)  # Includes simulation_id and experiment_id
logger.info("step_started", step=2)  # Still includes the context

# Unbind when done
unbind_context("simulation_id", "experiment_id")

3. Context Managers

from farm.utils import log_simulation, log_step

# Simulation context
with log_simulation(simulation_id="sim_001", num_agents=100, num_steps=1000):
    # All logs here include simulation_id
    
    for step in range(1000):
        with log_step(step_number=step):
            # All logs here include both simulation_id and step
            process_agents()

Configuration

Configure logging at application startup:

from farm.utils import configure_logging

# Development: Pretty console output with colors
configure_logging(
    environment="development",
    log_dir="logs",
    log_level="DEBUG",
    enable_colors=True,
)

# Production: JSON logs for analysis
configure_logging(
    environment="production",
    log_dir="/var/log/agentfarm",
    log_level="INFO",
    json_logs=True,
)

# Testing: Simple format, no colors
configure_logging(
    environment="testing",
    log_level="WARNING",
)

Configuration Options

Parameter Description Default
environment Environment name (development/production/testing) "development"
log_dir Directory for log files (None = console only) None
log_level Minimum log level (DEBUG/INFO/WARNING/ERROR/CRITICAL) "INFO"
json_logs Output JSON formatted logs to file False
enable_colors Enable colored console output True
include_caller_info Include file/line/function info True

Basic Usage

Event-Style Logging

Instead of traditional messages, use event names with context:

# ❌ Old way (traditional logging)
logger.info(f"Agent {agent_id} moved from {old_pos} to {new_pos}")

# ✅ New way (structured logging)
logger.info(
    "agent_moved",
    agent_id=agent_id,
    old_position=old_pos,
    new_position=new_pos,
)

Log Levels

logger.debug("detailed_operation", operation="rebuild_spatial_index", items=1000)
logger.info("simulation_started", config=config_dict)
logger.warning("performance_degraded", fps=15, expected_fps=30)
logger.error("database_error", error=str(e), query=sql)
logger.critical("system_failure", reason="out_of_memory")

Exception Logging

try:
    risky_operation()
except Exception as e:
    logger.error(
        "operation_failed",
        operation="risky_operation",
        error_type=type(e).__name__,
        error_message=str(e),
        exc_info=True,  # Include full traceback
    )

Context Binding

Global Context

Use bind_context() to add fields to all subsequent logs:

from farm.utils import bind_context, unbind_context, clear_context

# Bind context
bind_context(simulation_id="sim_001", user_id="user_123")

logger.info("event_occurred")  # Includes simulation_id and user_id

# Unbind specific keys
unbind_context("user_id")

# Clear all context
clear_context()

Scoped Context

Use context managers for automatic cleanup:

from farm.utils import log_context

with log_context(simulation_id="sim_001", step=42):
    logger.info("agent_action", action="move")
    # Includes simulation_id and step
    
# Context automatically cleaned up after exiting block
logger.info("other_event")  # No simulation_id or step

Logger-Level Binding

# Create a logger with permanent bindings
logger = get_logger(__name__).bind(
    component="spatial_index",
    version="2.0",
)

# All logs from this logger include component and version
logger.info("index_rebuilt", num_items=5000)

Performance Logging

Performance Decorator

Automatically log function execution time:

from farm.utils import log_performance

@log_performance(operation_name="agent_step", slow_threshold_ms=50.0)
def step(self, action):
    # Implementation
    pass

Output:

[debug] operation_complete    operation=agent_step duration_ms=23.45 status=success

If operation is slow:

[warning] operation_slow      operation=agent_step duration_ms=156.78 status=success performance_warning=slow_operation

Error Logging Decorator

Automatically log unhandled exceptions:

from farm.utils import log_errors

@log_errors()
def risky_function(data):
    # If this raises an exception, it's automatically logged
    process(data)

Specialized Loggers

Simulation Logging

from farm.utils import log_simulation

with log_simulation(
    simulation_id="sim_001",
    num_agents=100,
    num_steps=1000,
):
    run_simulation()

Logs:

[info] simulation_started    simulation_id=sim_001 num_agents=100 num_steps=1000
... simulation logs (all include simulation_id) ...
[info] simulation_completed  simulation_id=sim_001 duration_seconds=45.67

Step Logging

from farm.utils import log_step

for step_num in range(1000):
    with log_step(step_number=step_num, simulation_id="sim_001"):
        # All logs include step and simulation_id
        process_step()

Experiment Logging

from farm.utils import log_experiment

with log_experiment(
    experiment_id="exp_001",
    experiment_name="parameter_sweep",
    num_iterations=10,
):
    run_experiment()

Agent Logger

Specialized logger for agent events:

from farm.utils import AgentLogger

class MyAgent:
    def __init__(self, agent_id, agent_type):
        self.logger = AgentLogger(agent_id, agent_type)
    
    def move(self, new_position):
        self.logger.log_action(
            action_type="move",
            success=True,
            reward=0.5,
            position=new_position,
        )
    
    def interact(self, other_agent):
        self.logger.log_interaction(
            interaction_type="share_resource",
            target_id=other_agent.id,
            amount=10,
        )
    
    def die(self, cause):
        self.logger.log_death(cause=cause, final_resources=self.resources)

Log Sampling

Reduce log volume for high-frequency events:

from farm.utils import LogSampler

# Log only 10% of events
sampler = LogSampler(sample_rate=0.1)

for i in range(10000):
    if sampler.should_log():
        logger.debug("high_frequency_event", iteration=i)

# Or use minimum interval (log at most once per 100ms)
sampler = LogSampler(min_interval_ms=100)

Best Practices

1. Use Event Names, Not Messages

# ❌ Don't
logger.info("The simulation has started")

# ✅ Do
logger.info("simulation_started")

2. Include Rich Context

# ❌ Don't
logger.error("Agent failed")

# ✅ Do
logger.error(
    "agent_action_failed",
    agent_id=agent.id,
    agent_type=agent.type,
    action=action,
    reason="insufficient_resources",
    current_resources=agent.resources,
)

3. Use Structured Data

# ❌ Don't
logger.info(f"Position: ({x}, {y})")

# ✅ Do
logger.info("agent_position", x=x, y=y)
# Or
logger.info("agent_position", position=(x, y))

4. Bind Context Early

# At simulation start
bind_context(simulation_id=sim_id)

# At agent creation
agent_logger = logger.bind(agent_id=agent.id, agent_type=agent.type)

# In loops
with log_step(step_number=step):
    # Process step
    pass

5. Use Appropriate Log Levels

  • DEBUG: Detailed diagnostic information (spatial index rebuilds, cache hits)
  • INFO: General informational events (simulation started, agent born)
  • WARNING: Warning events that might need attention (low resources, slow operations)
  • ERROR: Error events that need investigation (failed actions, database errors)
  • CRITICAL: Critical failures requiring immediate action (system crash)

6. Don't Log Sensitive Data

The logging system automatically censors common sensitive fields (password, token, secret, api_key), but be careful:

# ❌ Don't
logger.info("user_login", password=password)

# ✅ Do
logger.info("user_login", username=username)  # password auto-censored

7. Sample High-Frequency Logs

# For per-agent, per-step operations
sampler = LogSampler(sample_rate=0.01)  # 1% of events

for agent in agents:
    if sampler.should_log():
        logger.debug("agent_state", agent_id=agent.id, state=agent.state)

Migration from Standard Logging

Replace Imports

# ❌ Old
import logging
logger = logging.getLogger(__name__)

# ✅ New
from farm.utils import get_logger
logger = get_logger(__name__)

Replace basicConfig

# ❌ Old
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

# ✅ New
from farm.utils import configure_logging
configure_logging(
    environment="development",
    log_level="INFO",
)

Replace String Formatting

# ❌ Old
logger.info(f"Agent {agent_id} moved to {position}")
logger.error("Error: %s", str(e))

# ✅ New
logger.info("agent_moved", agent_id=agent_id, position=position)
logger.error("error_occurred", error=str(e), error_type=type(e).__name__)

Replace exc_info

# ❌ Old
logger.error(f"Operation failed: {e}", exc_info=True)

# ✅ New
logger.error(
    "operation_failed",
    error_type=type(e).__name__,
    error_message=str(e),
    exc_info=True,
)

Output Formats

Development (Console)

2025-10-01T12:34:56.789Z [info     ] simulation_started    simulation_id=sim_001 num_agents=100

Production (JSON)

{
  "timestamp": "2025-10-01T12:34:56.789Z",
  "level": "info",
  "event": "simulation_started",
  "simulation_id": "sim_001",
  "num_agents": 100,
  "logger": "farm.simulation",
  "filename": "simulation.py",
  "func_name": "run_simulation",
  "lineno": 123
}

Testing (Simple)

[info] simulation_started simulation_id=sim_001 num_agents=100

Integration with Analysis Tools

The JSON log format can be easily imported into analysis tools:

import json
import pandas as pd

# Load logs as DataFrame
with open("logs/application.json.log") as f:
    logs = [json.loads(line) for line in f]
    df = pd.DataFrame(logs)

# Analyze
slow_ops = df[df['duration_ms'] > 100]
errors_by_type = df[df['level'] == 'error'].groupby('error_type').size()

Examples

See the logging package README for runnable snippets and patterns; this guide also includes sections above with copy-pastable setup.

Troubleshooting

Logs not appearing

Ensure logging is configured before any other imports:

from farm.utils import configure_logging
configure_logging()  # Must be first!

Context not persisting

Use bind_context() for global context or context managers for scoped context:

from farm.utils import bind_context
bind_context(simulation_id="sim_001")  # Persists globally

Performance impact

Use sampling for high-frequency logs:

from farm.utils import LogSampler
sampler = LogSampler(sample_rate=0.1)

Third-party library noise

The logging config automatically reduces noise from common libraries, but you can adjust:

import logging
logging.getLogger("noisy_library").setLevel(logging.ERROR)