This document explains how Station works internally, how its components interact, and what it provides beyond Laravel Horizon.
For related documentation, see:
- Queue Drivers — Feature matrix, driver configs, and per-driver details
- Facades — All 7 facades with every public method
- Security, Alerting & Resilience — API auth, encryption, masking, alerting, circuit breaker
- Configuration & Environment Variables — Full config reference and env var list
- Migrating from Horizon — Step-by-step migration guide and feature mapping
- Overview
- How Station Integrates with Laravel
- Jobs
- Batches
- Workflows
- Workers and Supervisor
- Recovery System
- Metrics and Monitoring
- Dashboard
- Database Schema
- Job Middleware
- Testing
- Telemetry
- Auto-Scaling
- Worker and Supervisor Events
- Station vs Horizon
Station is a Laravel package that replaces Horizon as the queue management and monitoring layer. It provides:
- A custom worker and supervisor that fork child processes with
pcntl_fork() - Transparent job tracking via Laravel event listeners (no code changes needed)
- Batch management that wraps Laravel's native
Bus::batch()with an overlay table for richer metadata - A persistent workflow engine with conditional steps, branching, and pause/resume
- Job checkpointing and stuck job recovery
- A real-time Inertia.js + Vue 3 dashboard
- Multi-driver support: RabbitMQ (primary), Redis, SQS, Beanstalkd, Kafka
The key design principle: your existing Laravel queue code works unchanged. Station hooks into Laravel's queue event system to track everything transparently.
graph TB
subgraph "Your Laravel Application"
A[dispatch()] --> Q[Laravel Queue System]
B[Bus::batch()] --> Q
C[Station::dispatch()] --> Q
end
subgraph "Station Core"
Q -->|JobQueued| SP[StationServiceProvider]
SP -->|trackQueued| DB[(station_jobs)]
W[station:work] --> SUP[WorkerSupervisor]
SUP -->|pcntl_fork| W1[Worker 1]
SUP -->|pcntl_fork| W2[Worker 2]
SUP -->|pcntl_fork| W3[Worker N]
end
subgraph "Queue Drivers"
Q --> RMQ[RabbitMQ]
Q --> RED[Redis]
Q --> SQS[Amazon SQS]
Q --> BKD[Beanstalkd]
Q --> KFK[Kafka]
end
subgraph "Worker Processing"
W1 -->|pop job| RMQ
W1 -->|JobProcessing| SP
W1 -->|JobProcessed / JobFailed| SP
SP -->|trackCompleted / trackFailed| DB
end
subgraph "Dashboard"
DASH[Inertia.js + Vue 3] -->|reads| DB
API[REST API] -->|reads/writes| DB
end
sequenceDiagram
participant App as Your App
participant Laravel as Laravel Queue
participant Driver as Queue Driver
participant SP as StationServiceProvider
participant DB as station_jobs
participant Worker as Station Worker
App->>Laravel: dispatch(new MyJob)
Laravel->>Driver: Push to queue
Laravel-->>SP: JobQueued event
SP->>DB: INSERT (status: pending)
Worker->>Driver: Pop next job
Worker-->>SP: JobProcessing event
SP->>DB: UPDATE (status: processing)
Worker->>Worker: Execute job->handle()
alt Job succeeds
Worker-->>SP: JobProcessed event
SP->>DB: UPDATE (status: completed)
else Job fails
Worker-->>SP: JobFailed event
SP->>DB: UPDATE (status: failed)
end
Station's StationServiceProvider registers listeners on four Laravel queue events:
| Laravel Event | Station Listener | What It Does |
|---|---|---|
JobQueued |
trackQueued() |
Creates a record in station_jobs with status pending |
JobProcessing |
trackProcessing() |
Updates status to processing, increments attempts |
JobProcessed |
trackCompleted() |
Updates status to completed, records processing time. If the job belongs to a batch, calls BatchManager::recordJobCompletion() |
JobFailed |
trackFailed() |
Updates status to failed, stores exception. If the job belongs to a batch, calls BatchManager::recordJobFailure() |
These listeners fire for any job dispatched through Laravel's queue system, whether via dispatch(), Bus::dispatch(), Queue::push(), or Station's own Station::dispatch(). No special traits or interfaces are required on jobs.
Batch ID extraction: When a job uses Laravel's Batchable trait, the listener deserializes the job payload and reads the batchId property from the trait. This connects individual job events to their parent batch.
Station Job ID tracking: Station generates a UUID7 for each job via Job::create(). Before pushing to the queue, it sets a stationJobId property on the job object. Event listeners extract this from the deserialized payload to correlate queue events back to the station_jobs record.
Disabling tracking: Set station.tracking.enabled to false in config to disable all event listeners. This prevents Station from recording job state changes, batch progress, and workflow metrics. Useful for testing or environments where tracking overhead is unwanted.
Standard Laravel dispatch works out of the box:
// All of these are tracked by Station automatically
ProcessOrderJob::dispatch($order);
ProcessOrderJob::dispatch($order)->onQueue('high');
ProcessOrderJob::dispatch($order)->delay(now()->addMinutes(5));Station also provides a fluent API via the Station facade:
Station::job(new ProcessOrderJob($order))
->onQueue('high')
->delay(now()->addMinutes(5))
->tags(['orders', 'payments'])
->dispatch();stateDiagram-v2
[*] --> pending: dispatch()
pending --> reserved: Worker reserves job
reserved --> processing: Execution begins
processing --> completed: handle() succeeds
processing --> failed: handle() throws exception
failed --> pending: retry()
completed --> [*]
| Status | Meaning |
|---|---|
pending |
Job is in the queue, waiting to be picked up |
reserved |
A worker has reserved the job but has not yet started executing it |
processing |
The job is actively being executed |
completed |
Job finished successfully |
failed |
Job threw an exception |
Each job tracked in station_jobs stores:
| Field | Type | Description |
|---|---|---|
id |
UUID | Station's own ID (UUID7, time-sortable) |
queue |
string | Queue name (e.g., default, high) |
connection |
string | Driver connection name |
job_class |
string | Fully qualified class name |
payload |
longText | Serialized job object |
status |
string | pending, reserved, processing, completed, failed |
attempts |
int | How many times the job has been attempted |
max_tries |
int | Maximum retry attempts allowed |
timeout |
int | Per-attempt timeout in seconds |
priority |
int | Queue priority level |
batch_id |
UUID | Parent batch ID (if part of a batch) |
tags |
JSON | User-defined tags for filtering |
worker_id |
string | Which worker is processing this job |
memory_used |
int | Peak memory during processing (bytes) |
processing_time |
int | How long execution took (milliseconds) |
available_at |
timestamp | When the job becomes available (for delayed jobs) |
reserved_at |
timestamp | When a worker reserved the job |
started_at |
timestamp | When execution began |
completed_at |
timestamp | When execution finished |
exception |
text | Exception message (populated for failed jobs) |
Failed jobs are also stored in station_failed_jobs with the full exception trace and context.
| Feature | dispatch() |
Station::dispatch() |
|---|---|---|
Job tracked in station_jobs |
Yes (via event listener) | Yes |
| Custom tags | No | Yes |
| Priority control | Via queue name | Explicit priority field |
| Batch ID correlation | Automatic (via Batchable trait) | Automatic |
| Metrics recorded | Yes | Yes |
Both paths end up tracked identically. The difference is that Station's API adds fluent options for tags and priority.
Tags are configured via station.tags in config:
| Setting | Default | Description |
|---|---|---|
enabled |
true |
Set to false to disable all tagging |
max_length |
100 | Maximum character length per tag (truncated) |
max_per_job |
10 | Maximum number of tags per job (excess dropped) |
auto_tags |
[] |
Tags added automatically to every job. Supported values: 'environment' (adds env:production), 'queue' (adds queue:default), 'connection' (adds connection:redis) |
Jobs implementing Taggable (with a tags(): array method) have their tags merged with any auto-tags. Tags from Station::job()->tags([...]) are also merged.
In addition to Laravel's native queue events, Station dispatches its own events that you can listen to:
| Event | When Fired | Key Properties |
|---|---|---|
JobDispatched |
After a job is dispatched via Station | Job $job |
JobStarted |
When a worker begins executing a job | string $jobId, string $jobClass, string $queue, string $connection |
JobCompleted |
When a job finishes successfully | string $jobId, string $jobClass, string $queue, string $connection, float $durationMs |
JobProcessed |
After job processing is recorded | Job $job, string $worker, int $duration, int $memoryUsed |
JobFailed |
When a job fails | Job $job, Throwable $exception, int $attempts, bool $willRetry |
JobRetrying |
When a failed job is retried | Job $job, int $attempt, ?int $delay, ?Throwable $exception |
JobRecovered |
When a stuck job is recovered | Job $job, string $strategy, bool $fromCheckpoint |
All events are in the Station\Events namespace.
Station's batch system wraps Laravel's native Bus::batch(). When you create a batch through Station, it:
- Calls
Bus::batch($jobs)->onQueue($queue)->allowFailures()->dispatch() - Gets back a Laravel batch with an ID from the
job_batchestable - Creates a Station overlay record in
station_batcheswith the same ID - Tracks progress via event listeners as individual jobs complete or fail
sequenceDiagram
participant App as Your App
participant BM as BatchManager
participant Bus as Bus::batch()
participant LB as job_batches
participant SB as station_batches
participant Driver as Queue Driver
participant Worker as Worker
participant SP as Event Listener
App->>BM: Batch::create($jobs)
BM->>Bus: Bus::batch($jobs)->dispatch()
Bus->>LB: INSERT batch record
Bus->>Driver: Push all jobs to queue
Bus-->>BM: Laravel Batch (with ID)
BM->>SB: INSERT overlay (same ID, status: pending)
loop For each job
Worker->>Driver: Pop job
Worker->>Worker: Execute job
Worker-->>SP: JobProcessed / JobFailed
SP->>SB: incrementProcessed() / incrementFailed()
end
SP->>SB: UPDATE status (completed/failed)
Jobs in a batch must use Laravel's Batchable trait. This is a Laravel requirement, not a Station one.
use Illuminate\Bus\Batchable;
class ProcessOrderJob implements ShouldQueue
{
use Batchable, Dispatchable, InteractsWithQueue, Queueable;
public function handle(): void
{
// Check if batch was cancelled
if ($this->batch()?->cancelled()) {
return;
}
// Do work...
}
}Laravel's job_batches table is the source of truth for batch dispatching and the Batchable trait. Station's station_batches is an overlay that adds:
| Field | In job_batches |
In station_batches |
Why Station needs it |
|---|---|---|---|
id |
Yes | Yes (same value) | Shared identifier |
name |
Yes | Yes | - |
status |
No (derived) | Yes (explicit) | Explicit status tracking: pending, processing, completed, failed, cancelled |
total_jobs |
Yes | Yes | - |
pending_jobs |
Yes | Yes | Atomic counter (see below) |
processed_jobs |
No | Yes | Laravel doesn't track this directly |
failed_jobs |
Yes | Yes | Atomic counter |
allowed_failures |
No (boolean only) | Yes (integer) | Station supports "allow up to N failures" |
failed_job_ids |
Yes | Yes | Tracks which specific jobs failed |
started_at |
No | Yes | When first job began processing |
finished_at |
No | Yes | When batch reached terminal state |
cancelled_at |
No | Yes | When batch was cancelled |
connection |
No | Yes | Which queue driver connection |
A critical implementation detail: Laravel fires JobProcessed before updating job_batches counters. This means that in Station's event listener, calling Bus::findBatch($id) reads stale data from job_batches.
Station solves this with atomic SQL operations directly on station_batches:
// incrementProcessed() runs this SQL:
UPDATE station_batches
SET processed_jobs = processed_jobs + 1,
pending_jobs = GREATEST(pending_jobs - 1, 0)
WHERE id = ?
// incrementFailed() runs this SQL:
UPDATE station_batches
SET failed_jobs = failed_jobs + 1,
processed_jobs = processed_jobs + 1,
pending_jobs = GREATEST(pending_jobs - 1, 0)
WHERE id = ?The syncFromLaravel() method exists for reconciliation, but real-time tracking uses atomic increments.
stateDiagram-v2
[*] --> pending: Batch create()
pending --> processing: First job starts
processing --> completed: All jobs done, failures ≤ allowed
processing --> failed: Failures > allowed_failures
pending --> cancelled: cancel()
processing --> cancelled: cancel()
completed --> [*]
failed --> [*]
cancelled --> [*]
- Created:
BatchManager::create()dispatches viaBus::batch(), creates overlay record with statuspending - Started: When the first job begins processing, status moves to
processing - Progress: Each
JobProcessedevent incrementsprocessed_jobsand decrementspending_jobs - Failure check: On each job failure, if
failed_jobs > allowed_failures, the batch is markedfailedimmediately (and remaining jobs are cancelled via Laravel) - Finished: When
pending_jobsreaches 0:- If
failed_jobs == 0→ status =completed - If
failed_jobs > 0→ status =failed
- If
- Cancelled:
cancel()delegates to Laravel's$laravelBatch->cancel(). Jobs check$this->batch()->cancelled()in theirhandle()method.
Laravel's allowFailures() is boolean. Station extends this with an integer allowed_failures:
$batch = Batch::create(
jobs: $jobs,
name: 'Import Users',
allowedFailures: 5, // Allow up to 5 jobs to fail
);allowed_failures = 0: Any single failure marks the batch as failed immediatelyallowed_failures = 5: The batch continues processing even with up to 5 failures. If a 6th failure occurs, the batch is failed immediately and remaining jobs are cancelled- Station always tells Laravel
allowFailures(true)(allow all at the Laravel level) and handles the threshold check itself viahasExceededAllowedFailures()
Station dispatches events at key points in the batch lifecycle:
| Event | When Fired | Key Properties |
|---|---|---|
BatchCreated |
After a batch is created and dispatched | Batch $batch, int $totalJobs, array $options |
BatchProgress |
After a job in the batch completes or fails | Batch $batch, int $processed, int $failed, float $percentage |
BatchCompleted |
When all jobs finish and the batch succeeds | Batch $batch, int $duration, int $jobsProcessed |
BatchFailed |
When the batch exceeds its failure threshold | Batch $batch, array $failedJobs, ?Throwable $exception |
BatchCancelled |
When a batch is cancelled | Batch $batch |
All events are in the Station\Events namespace.
Station provides two workflow systems that serve different use cases.
A lightweight DAG wrapper around Bus::batch(). Define steps with dependencies, Station resolves execution order via topological sort, and dispatches as a single batch.
graph LR
V[validate] --> P[payment]
V --> I[inventory]
P --> S[ship]
I --> S
style V fill:#3b82f6,color:#fff
style P fill:#3b82f6,color:#fff
style I fill:#3b82f6,color:#fff
style S fill:#3b82f6,color:#fff
Topological sort produces execution groups — steps in the same group run in parallel:
graph TD
subgraph "Group 0"
V[validate]
end
subgraph "Group 1 (parallel)"
P[payment]
I[inventory]
end
subgraph "Group 2"
S[ship]
end
V --> P
V --> I
P --> S
I --> S
use Station\Core\Workflow;
$workflowId = Workflow::create('order-pipeline')
->add('validate', new ValidateOrderJob($order))
->add('payment', new ProcessPaymentJob($order), ['validate'])
->add('inventory', new ReserveInventoryJob($order), ['validate'])
->add('ship', new ShipOrderJob($order), ['payment', 'inventory'])
->onQueue('high')
->dispatch();How it works internally:
resolveExecutionOrder()uses Kahn's algorithm to produce execution groups:- Group 0:
[validate](no dependencies) - Group 1:
[payment, inventory](both depend on validate) - Group 2:
[ship](depends on payment and inventory)
- Group 0:
buildBatch()converts groups into a nestedBus::batch()structure- Jobs in the same group run in parallel; groups run sequentially
- Sets
workflowId,workflowName,workflowStepproperties on each job
Limitations: No persistence, no pause/resume, no conditional logic. Once dispatched, it's a standard Laravel batch.
For strictly sequential job execution, Station provides Chain::create() — a wrapper around Laravel's Bus::chain():
use Station\Facades\Chain;
Chain::create([
new ValidateOrderJob($order),
new ChargePaymentJob($order),
new ShipOrderJob($order),
])
->name('order-fulfillment')
->onQueue('high')
->catch(fn(Throwable $e) => Log::error("Chain failed: {$e->getMessage()}"))
->finally(fn() => Log::info('Chain finished'))
->dispatch();Chain sets chainId, chainIndex, and chainName properties on each job (if the property exists). Unlike Simple Workflows, chains are purely sequential — each job runs only after the previous one completes. Unlike the full workflow engine, chains have no persistence, no conditional logic, and no pause/resume.
A persistent, stateful workflow system stored in station_workflows.
use Station\Facades\Workflow;
// Define a reusable workflow
$definition = Workflow::define('payment-processing')
->description('Process payments and send confirmations')
->addStep('validate', ValidatePaymentJob::class)
->addStep('charge', ChargeCardJob::class, ['validate'])
->addConditionalStep('notify',
SendNotificationJob::class,
fn($context) => $context['charge_successful'] ?? false,
['charge']
)
->timeout(3600)
->maxRetries(3)
->source('api') // Optional: track where the definition came from
->withMetadata(['team' => 'billing']); // Optional: attach arbitrary metadata
// Run synchronously
$instance = Workflow::run('payment-processing', [
'amount' => 99.99,
'currency' => 'usd',
'user_id' => 42,
]);
// Or run asynchronously (recommended for production)
$instance = Workflow::runAsync('payment-processing', [
'amount' => 99.99,
'currency' => 'usd',
'user_id' => 42,
]);Instance lifecycle:
stateDiagram-v2
[*] --> pending: Workflow run()
pending --> running: Execution starts
running --> completed: All steps done
running --> failed: Step fails (max retries exceeded)
running --> paused: pause()
paused --> running: resume()
pending --> cancelled: cancel()
running --> cancelled: cancel()
paused --> cancelled: cancel()
completed --> [*]
failed --> [*]
cancelled --> [*]
Each instance tracks:
status— current statecurrent_step— which step is executingstep_statuses— per-step status map (pending,queued,running,completed,failed,skipped)context— shared mutable state passed between stepsresults— output from each completed stepinput— original input dataconnection— queue connection for async dispatch (null for sync)definition_steps— snapshot of the step graph at execution timeprogress— percentage derived from completed/skipped steps
Station supports two execution modes: Workflow::run() (synchronous) and Workflow::runAsync() (asynchronous). Async mode is recommended for production because each step runs as a queued job, enabling true parallel execution across workers.
Steps are instantiated with (string $instanceId, array $context, array $results) as constructor arguments. $results contains the return values from all previously completed steps, keyed by step name. After each step, context updates are merged if the job implements getContextUpdates().
Workflow::run() executes all steps in the calling process. Topological sort groups independent steps; groups run sequentially, and steps within a group also run sequentially.
sequenceDiagram
participant App as Your App
participant WM as WorkflowManager
participant DB as station_workflows
participant Step as Step Job
App->>WM: Workflow::run('payment', $input)
WM->>DB: INSERT instance (status: running)
loop For each execution group
loop For each step in group
WM->>WM: Check conditions / virtual
alt Condition met or unconditional
WM->>Step: new $jobClass($instanceId, $context, $results)
WM->>Step: app()->call([$job, 'handle'])
Step-->>WM: getContextUpdates()
WM->>DB: UPDATE step_statuses, context
else Condition not met
WM->>DB: UPDATE step_statuses (skipped)
end
end
end
WM->>WM: checkWorkflowCompletion()
WM->>DB: UPDATE status (completed/failed)
WM-->>App: WorkflowInstance
Both sync and async paths use areDependenciesResolved() (completed or skipped) and checkWorkflowCompletion() (deadlock detection) for step evaluation.
Workflow::runAsync() persists the instance and dispatches a RunWorkflowJob. From there, all step execution is queued:
runAsync()→ creates and persists instance → dispatchesRunWorkflowJobRunWorkflowJob::handle()→ callsexecuteExistingInstance()→ starts workflow →advanceWorkflowAsync()advanceWorkflowAsync()— the core orchestration loop:- Evaluates which pending steps have resolved dependencies
- Auto-completes virtual steps inline (fan-in completion markers)
- Skips conditional steps whose condition is false
- Marks real steps as
queuedand dispatches aWorkflowStepJobper step
WorkflowStepJob::handle()→ callsexecuteAsyncStep()— three-phase pattern:- Phase 1 (Claim):
lockForUpdate(), verify step isqueued, transition torunning - Phase 2 (Execute): outside the lock,
app()->call([$job, 'handle']) - Phase 3 (Record + Advance):
lockForUpdate(), record result, calladvanceWorkflowAsync()to dispatch newly unblocked steps
- Phase 1 (Claim):
- If
RunWorkflowJobitself fails (e.g., definition not found),RunWorkflowJob::failed()callshandleAsyncStepFailure()with the pseudo-step name_starter, which fails the entire workflow
sequenceDiagram
participant App as Your App
participant WM as WorkflowManager
participant DB as station_workflows
participant Queue as Queue Driver
participant Worker as Worker
App->>WM: Workflow::runAsync('payment', $input)
WM->>DB: INSERT instance (status: pending)
WM->>Queue: dispatch(RunWorkflowJob)
WM-->>App: WorkflowInstance
Worker->>WM: executeExistingInstance()
WM->>DB: SELECT FOR UPDATE → start()
WM->>WM: advanceWorkflowAsync()
WM->>DB: UPDATE steps (queued)
WM->>Queue: dispatch(WorkflowStepJob) × N
par Each ready step
Worker->>WM: executeAsyncStep(step)
WM->>DB: SELECT FOR UPDATE → running
WM->>WM: app()->call([$job, 'handle'])
WM->>DB: SELECT FOR UPDATE → completed
WM->>WM: advanceWorkflowAsync()
WM->>Queue: dispatch next steps
end
WM->>WM: checkWorkflowCompletion()
WM->>DB: UPDATE status (completed/failed)
stateDiagram-v2
[*] --> pending: Step defined
pending --> queued: advanceWorkflowAsync() (async only)
pending --> running: executeStep() (sync only)
queued --> running: executeAsyncStep() claims step
running --> completed: Step succeeds
running --> failed: Step fails (retries exhausted)
pending --> skipped: Condition is false
pending --> completed: Virtual step (auto-complete)
completed --> [*]
failed --> [*]
skipped --> [*]
The queued status only exists in async mode. In sync mode, steps transition directly from pending to running.
All async state transitions use loadInstanceForUpdate() (SELECT ... FOR UPDATE). This prevents race conditions when multiple workers complete steps concurrently and try to advance the workflow simultaneously. Each critical section (claim, record, advance) runs inside a database transaction with the row lock held.
addParallel() creates a virtual completion step that depends on all parallel steps. When all parallel steps finish, the virtual step auto-completes, enabling downstream steps to proceed. This is the fan-in pattern for DAG convergence.
Workflow::define('order-processing')
->addStep('validate', ValidateOrderJob::class)
->addParallel('fulfillment', [
'payment' => ProcessPaymentJob::class,
'inventory' => ReserveInventoryJob::class,
'notification' => SendConfirmationJob::class,
], ['validate']) // All 3 depend on 'validate'
->addStep('ship', ShipOrderJob::class, ['fulfillment']); // Depends on virtual stepThis creates steps payment, inventory, notification (all depending on validate), plus a virtual step fulfillment that depends on all three. The ship step waits for the virtual fulfillment step, which auto-completes when all parallel steps finish.
When a workflow starts (sync or async), the current definition steps are serialized into the definition_steps column on station_workflows. Running instances use this snapshot of the step graph, ensuring they are unaffected if the definition is modified after execution begins.
checkWorkflowCompletion() evaluates three possible states:
- All steps done (completed, skipped, or failed) → workflow completes or fails depending on whether any step failed
- No progress possible — remaining pending steps have only failed dependencies → workflow fails with "deadlocked"
- Active steps exist (queued or running) → workflow continues
recoverStuckWorkflows(threshold) finds running workflows that haven't been updated within the threshold (default 300s) and takes corrective action:
- Re-dispatches queued steps that were lost (worker died before picking up the job)
- Fails running steps that timed out (worker died mid-execution)
- Advances stalled workflows where no steps are queued/running but the workflow isn't complete
Triggered via php artisan station:recover --workflows.
Conditional steps only execute if a condition evaluates to true:
->addConditionalStep('send_email',
SendEmailJob::class,
fn($context) => $context['email_opted_in'] ?? false,
['previous_step']
)If the condition is false, the step is marked as skipped and doesn't block downstream steps.
Branch steps dynamically select which job to run based on context:
->addBranch('route_payment',
fn($context) => $context['gateway'], // returns 'stripe' or 'paypal'
[
'stripe' => ProcessStripeJob::class,
'paypal' => ProcessPayPalJob::class,
],
['validate']
)The context is a shared array that accumulates across steps:
- Workflow starts with
context = input(the data passed toWorkflow::run()) - Each step receives the current context in its constructor
- After each step,
getContextUpdates()is called if the method exists - The returned array is merged into context for subsequent steps
In addition to context, steps receive $results — an array of return values from all completed steps keyed by step name (e.g., $results['validate']). This is separate from context: results are read-only outputs from each step's handle() return value, while context is the shared mutable state.
This allows steps to communicate without tight coupling.
Each WorkflowStep supports individual retry and timeout configuration:
| Config | Default | Description |
|---|---|---|
timeout |
null (no limit) |
Per-step timeout in seconds |
retries |
3 | Maximum retry attempts for the step |
retryDelay |
60 | Delay in seconds between retries |
In async mode, these are applied to the WorkflowStepJob dispatched for each step. In sync mode, retries are not applied (the step runs once).
Station dispatches events at key points in the workflow lifecycle:
| Event | When Fired | Key Properties |
|---|---|---|
WorkflowStarted |
When a workflow instance begins execution | WorkflowInstance $instance |
WorkflowStepCompleted |
After a step finishes successfully | WorkflowInstance $instance, string $stepName, mixed $result |
WorkflowCompleted |
When all steps finish and the workflow succeeds | WorkflowInstance $instance |
WorkflowFailed |
When a step fails (retries exhausted) or the workflow deadlocks | WorkflowInstance $instance |
All events are in the Station\Events namespace.
Station's StationServiceProvider also records metrics for workflow steps via event listeners, using a synthetic queue name workflow:{definition_name}. This means workflow step performance appears in the metrics system alongside regular queue metrics.
station:work starts a WorkerSupervisor that manages child worker processes:
graph TD
SUP["WorkerSupervisor (parent)"]
SUP -->|pcntl_fork| W1["Worker 1 (PID 1234)"]
SUP -->|pcntl_fork| W2["Worker 2 (PID 1235)"]
SUP -->|pcntl_fork| W3["Worker N (PID 1236)"]
SUP -.->|SIGTERM| W1
SUP -.->|SIGTERM| W2
SUP -.->|SIGTERM| W3
W1 -->|pop| Q[(Queue Driver)]
W2 -->|pop| Q
W3 -->|pop| Q
W1 -->|track| DB[(station_jobs)]
W2 -->|track| DB
W3 -->|track| DB
WorkerSupervisor (parent process)
├── Worker 1 (forked child, PID 1234)
├── Worker 2 (forked child, PID 1235)
└── Worker 3 (forked child, PID 1236)
The supervisor:
- Forks N child processes (configurable via
--workersorconfig[supervisors][default][processes]) - Runs a loop every 100ms that:
- Reaps dead workers (
pcntl_waitpidwithWNOHANG) - Restarts workers to maintain the desired pool size
- Collects metrics
- Reaps dead workers (
- On shutdown (
SIGTERM/SIGINT), sendsSIGTERMto all workers, waits up to a configurable timeout, thenSIGKILLs any remaining
Each forked child runs Worker::run(), which:
- Reconnects the database (required after
pcntl_fork()) - Creates fresh instances of
Worker,JobRepository,MetricsCollectorfrom the container - Enters a loop:
- Pops a job from the queue via the configured driver
- Dispatches
JobProcessingevent (triggers Station's tracking) - Calls
$job->fire()(Laravel's job execution) - Dispatches
JobProcessedorJobFailedevent - Tracks completion/failure in
station_jobs - Records metrics
- Sleeps if no job available
The worker fires Laravel's standard queue events (JobProcessing, JobProcessed, JobFailed). This is critical because the StationServiceProvider event listeners depend on these events to update batch progress, record metrics, and track job state.
Both supervisor and workers use pcntl_async_signals(true):
| Signal | Supervisor | Worker |
|---|---|---|
SIGTERM |
Graceful shutdown | Stop after current job |
SIGINT |
Graceful shutdown | Stop after current job |
SIGQUIT |
Graceful shutdown | - |
SIGUSR1 |
Pause (forwards to workers) | Pause processing |
SIGUSR2 |
Resume (forwards to workers) | Resume processing |
SIGALRM |
- | Job timeout (throws RuntimeException) |
Workers enforce several limits:
| Limit | Config Key | Default | Behavior |
|---|---|---|---|
| Memory | supervisors.default.memory |
128 MB | Worker exits, supervisor restarts it |
| Max jobs | supervisors.default.max_jobs |
0 (unlimited) | Worker exits after N jobs |
| Max time | supervisors.default.max_time |
0 (unlimited) | Worker exits after N seconds |
| Job timeout | supervisors.default.timeout |
60 seconds | pcntl_alarm() kills the job |
| Sleep | supervisors.default.sleep |
3 seconds | Delay between polls when queue is empty |
graph TD
SD[StuckJobDetector] -->|Scan processing jobs| DB[(station_jobs)]
SD -->|Calculate stuck score| SCORE{Score > 0.7?}
SCORE -->|Yes| CONFIRM{Confirm 3x / 10s}
SCORE -->|No| OK[Job is healthy]
CONFIRM -->|Confirmed stuck| STRATEGY{Recovery Strategy}
STRATEGY -->|graceful| GR[Load checkpoint → re-queue]
STRATEGY -->|restart| RS[Kill process → re-queue from start]
STRATEGY -->|checkpoint| CP[Resume from checkpoint data]
GR --> QUEUE[Job back in queue]
RS --> QUEUE
CP --> QUEUE
subgraph "Stuck Score Calculation"
H["Heartbeat age (0.4)"]
R["Runtime vs expected (0.3)"]
M["Memory growth (0.15)"]
P["Process state (0.15)"]
end
Jobs can implement Checkpointable to save progress that survives failures:
use Station\Contracts\Checkpointable;
class ImportUsersJob implements ShouldQueue, Checkpointable
{
private int $lastId = 0;
private int $processed = 0;
public function handle(): void
{
User::where('id', '>', $this->lastId)->chunk(100, function ($users) {
foreach ($users as $user) {
$this->processUser($user);
$this->processed++;
}
$this->lastId = $users->last()->id;
});
}
// Called periodically to save progress
public function checkpoint(): array
{
return ['last_id' => $this->lastId, 'processed' => $this->processed];
}
// Called before handle() when resuming from a checkpoint
public function restore(array $data): void
{
$this->lastId = $data['last_id'] ?? 0;
$this->processed = $data['processed'] ?? 0;
}
// Return false when all work is done
public function hasMoreWork(): bool
{
return User::where('id', '>', $this->lastId)->exists();
}
}Checkpoints are stored in station_checkpoints (one row per job, keyed by job ID). Optional encryption via STATION_CHECKPOINT_ENCRYPT=true.
StuckJobDetector identifies jobs that are likely hung:
- Finds jobs in
processingstatus longer than the configured timeout - Calculates a "stuck score" (0.0 to 1.0) based on weighted factors:
- Heartbeat age: weight 0.4
- Runtime vs expected: weight 0.3
- Memory growth: weight 0.15
- Process state: weight 0.15
- Jobs scoring above 0.7 are considered stuck
| Strategy | Behavior |
|---|---|
graceful |
Load checkpoint, re-queue job to resume from last saved position |
restart |
Kill the worker process, re-queue job from the beginning |
checkpoint |
Specifically use checkpoint data to resume |
# Scan and recover stuck jobs
php artisan station:recover
# Dry run to see what would be recovered
php artisan station:recover --dry-run
# Use specific strategy (graceful, restart, checkpoint)
php artisan station:recover --strategy=graceful
# Custom stuck threshold (seconds)
php artisan station:recover --threshold=600
# Also recover stuck workflow steps
php artisan station:recover --workflows
# Only recover jobs on a specific queue
php artisan station:recover --queue=highHealthChecker evaluates the overall system:
| Check | What It Verifies |
|---|---|
| Database | Station database connection is accessible (latency measured) |
| Connections | Can reach each configured queue driver (deep check via ->size(), latency measured) |
| Disk | Storage disk usage below warning/critical thresholds |
Returns an overall HealthStatus enum:
| Status | Value | Meaning |
|---|---|---|
| Healthy | healthy |
All checks pass |
| Warning | warning |
Disk usage above warning threshold but below critical |
| Degraded | degraded |
Non-critical checks have warnings |
| Unhealthy | unhealthy |
Database or connections unreachable, or disk above critical threshold |
Two connectivity check modes:
checkConnections()— Deep check: actually calls->size()on each queue driver. Used bystation:healthand the health API endpoint.checkConnectivityQuick()— Lightweight TCP probe with 2-second timeout. Sends protocol-specific commands (PING for Redis, AMQP header for RabbitMQ, stats for Beanstalkd, ApiVersions for Kafka). SQS is skipped (cloud service). Used for dashboard polling.
Station records metrics at two levels: per-job and aggregate.
Stored on each station_jobs record:
| Metric | Field | Unit |
|---|---|---|
| Processing time | processing_time |
Milliseconds |
| Memory usage | memory_used |
Bytes |
| Attempts | attempts |
Count |
| Wait time | Derived from created_at to started_at |
Milliseconds |
Stored in station_metrics at configurable intervals:
| Metric | Description |
|---|---|
jobs_processed |
Total jobs completed in this interval |
jobs_failed |
Total jobs failed in this interval |
jobs_pending |
Queue depth at time of recording |
avg_processing_time |
Mean execution time (ms) |
avg_wait_time |
Mean time in queue before processing (ms) |
peak_memory |
Highest memory usage seen (bytes) |
active_workers |
Number of workers running |
The MetricsCollector computes:
| Metric | Method | Description |
|---|---|---|
| Throughput | getThroughput(?queue) |
Jobs per minute |
| Failure rate | getFailureRate(?queue) |
failed / (processed + failed) as percentage |
| Average wait time | getAverageWaitTime(?queue) |
Mean queue wait (ms) |
| Average processing time | getAverageProcessingTime(?queue) |
Mean execution time (ms) |
Available at GET /api/station/metrics/prometheus:
station_jobs_total{queue="default",status="completed"} 15234
station_jobs_total{queue="default",status="failed"} 42
station_queue_size{queue="default"} 150
station_workers_active 8
| Page | Route | Content |
|---|---|---|
| Dashboard | GET /station |
Overview stats, recent jobs, health indicators |
| Connections | GET /station/connections |
Queue list with depths, pause/resume controls |
| Jobs | GET /station/jobs |
Paginated job list with filters (status, queue, class) |
| Job Detail | GET /station/jobs/{id} |
Full job payload, timeline, attempts, exception |
| Pending | GET /station/pending |
Pending jobs awaiting processing |
| Failed | GET /station/failed |
Failed jobs with retry/delete actions |
| Stuck | GET /station/stuck |
Stuck jobs with recovery actions |
| Completed | GET /station/completed |
Recently completed jobs |
| Silenced | GET /station/silenced |
Jobs from silenced job classes |
| Batches | GET /station/batches |
Batch list with progress bars, status filters |
| Batch Detail | GET /station/batches/{id} |
Batch jobs, counters, failed job IDs |
| Workflows | GET /station/workflows |
Running workflow instances |
| Workflow Definitions | GET /station/workflows/definitions |
Reusable workflow definitions |
| Workflow Detail | GET /station/workflows/{id} |
Step-by-step progress, context, results |
| Metrics | GET /station/metrics |
Aggregate metrics with time range selection |
| Queue Metrics | GET /station/metrics/queues |
Per-queue metric breakdowns |
| Metric Records | GET /station/metrics/records |
Raw metric data records |
| Alerts | GET /station/alerts |
Alert history with severity and status |
| Alert Rules | GET /station/alerts/rules |
Alert rule management |
| Alert Channels | GET /station/alerts/channels |
Notification channel configuration |
| Tags | GET /station/tags |
Tag-based job browser |
| Audit Log | GET /station/audit-log |
Admin action history |
| Settings | GET /station/settings |
Configuration viewer |
All paths below are relative to the API prefix (default: /api/station, configurable via station.api.prefix). Authenticated via Bearer token. See Security - API Authentication for auth details.
Note: The dashboard uses a separate set of session-authenticated web routes under
/station/api/(configurable viastation.dashboard.path). The routes below are the external API for programmatic access.
Jobs:
GET /stats— Overview statisticsGET /jobs— List jobs (filterable by status, queue, class, tags)GET /jobs/{id}— Job detailsPOST /jobs/{id}/retry— Retry a jobPOST /jobs/{id}/cancel— Cancel a jobPOST /jobs/bulk/cancel— Bulk cancel jobsPOST /jobs/bulk/retry— Bulk retry jobsPOST /jobs/bulk/delete— Bulk delete jobs
Failed Jobs:
GET /failed— List failed jobsPOST /failed/{id}/retry— Retry specific failed jobDELETE /failed/{id}— Delete specific failed jobPOST /failed/retry-all— Retry all failed jobsDELETE /failed— Flush all failed jobsPOST /failed/bulk/retry— Bulk retry failed jobsPOST /failed/bulk/delete— Bulk delete failed jobs
Batches:
GET /batches— List batchesGET /batches/{id}— Batch detailsPOST /batches/{id}/cancel— Cancel batchPOST /batches/{id}/retry— Retry failed batch jobsPOST /batches/bulk/cancel— Bulk cancel batchesPOST /batches/bulk/retry— Bulk retry batchesPOST /batches/bulk/delete— Bulk delete batches
Workflows:
POST /workflows/run— Run a workflow definitionGET /workflows/{name}/{id}/status— Workflow instance statusPOST /workflows/{id}/pause— Pause workflowPOST /workflows/{id}/resume— Resume workflowPOST /workflows/{id}/cancel— Cancel workflowPOST /workflows/bulk/pause— Bulk pause workflowsPOST /workflows/bulk/resume— Bulk resume workflowsPOST /workflows/bulk/cancel— Bulk cancel workflows
Workers & Supervisor:
GET /workers/dashboard-status— Worker status for dashboardGET /workers/status— Detailed worker statusPOST /workers/start— Start a workerPOST /workers/stop— Stop a workerPOST /workers/stop-external— Stop an external workerGET /supervisor/status— Supervisor statusPOST /supervisor/start— Start supervisorPOST /supervisor/stop— Stop supervisor
Queue Management:
GET /queues/connections— List queue connectionsPOST /queues/pause— Pause a queuePOST /queues/resume— Resume a queueGET /queues/pause-status— Get queue pause status
Monitoring & Metrics:
GET /health— Health checkGET /metrics— Metrics summaryGET /metrics/time-series— Time-series metrics dataGET /metrics/driver-info— Per-driver metric infoGET /metrics/prometheus— Prometheus-format metrics exportGET /metrics/driver-time-series?connection={name}&period={1h}— Per-driver time-series metrics (web route only)GET /monitoring— Monitoring overviewGET /drivers— Driver connectivity and info
Tags:
GET /tags— List all tagsPOST /jobs/{id}/tags— Add tag to a jobDELETE /jobs/{id}/tags/{tag}— Remove tag from a job
Stuck Jobs & Recovery:
GET /stuck— List stuck jobsPOST /stuck/{id}/recover— Recover a specific stuck jobPOST /stuck/bulk/recover— Bulk recover stuck jobsPOST /recover— Trigger stuck job recovery
Alerts:
GET /alerts/channels— List alert channelsPOST /alerts/channels— Create alert channelPUT /alerts/channels/{id}— Update alert channelDELETE /alerts/channels/{id}— Delete alert channelPOST /alerts/channels/{id}/test— Test alert channelGET /alerts/rules— List alert rulesGET /alerts/rules/{id}— Get alert rule detailsPOST /alerts/rules— Create alert rulePUT /alerts/rules/{id}— Update alert ruleDELETE /alerts/rules/{id}— Delete alert rulePOST /alerts/rules/{id}/toggle— Toggle alert rule enabled/disabledPOST /alerts/rules/{id}/test— Test alert ruleGET /alerts/history— Alert historyPOST /alerts/history/{id}/resolve— Resolve an alert
All dashboard pages auto-refresh via Inertia.js partial reloads every 3 seconds. Only the data props are reloaded, not the full page, keeping the experience smooth.
Station creates these tables (all prefixed with station_):
| Table | Purpose |
|---|---|
station_jobs |
Every dispatched job, with status and metrics |
station_failed_jobs |
Failed jobs with exception and context |
station_job_events |
Event timeline per job (queued, processing, completed, failed) |
station_batches |
Batch metadata overlay on Laravel's job_batches |
station_checkpoints |
Checkpoint data for resumable jobs |
station_metrics |
Aggregated metrics snapshots per queue |
station_supervisors |
Active supervisor processes with heartbeats |
station_workers |
Active worker processes |
station_audit_log |
Admin actions (retry, delete, pause, etc.) |
station_queue_status |
Queue pause/resume state |
station_workflows |
Workflow instance state, progress, connection, and definition step snapshots |
station_kafka_delayed_jobs |
Kafka-specific delayed job storage |
station_driver_snapshots |
Per-driver metric snapshots for connectivity and performance |
station_alert_channels |
Alert notification channel configuration |
station_alert_rules |
Alert rule definitions and thresholds |
station_alert_history |
Triggered alert history with severity and resolution status |
Key indexes are on (queue, status), (status, created_at), (batch_id), and (worker_id) for efficient dashboard queries.
| Column | Type | Description |
|---|---|---|
id |
UUID | Workflow instance ID |
definition_id |
UUID | ID of the workflow definition |
definition_name |
string | Name of the workflow definition |
connection |
string | Queue connection for async dispatch (null for sync) |
status |
string | pending, running, paused, completed, failed, cancelled |
current_step |
string | Name of the currently executing step |
input |
JSON | Original input data passed to run() or runAsync() |
context |
JSON | Shared mutable context that accumulates across steps |
results |
JSON | Return values from completed steps, keyed by step name |
step_statuses |
JSON | Per-step status map (pending, queued, running, completed, failed, skipped) |
definition_steps |
JSON | Snapshot of the step graph at execution time |
error |
text | Error message (if workflow failed) |
progress |
int | Completion percentage derived from completed/skipped steps |
started_at |
timestamp | When execution began |
completed_at |
timestamp | When execution finished |
created_at |
timestamp | When the instance was created |
updated_at |
timestamp | Last state change |
Station provides four middleware classes that can be attached to jobs via the middleware() method:
Limits how frequently a job class can be processed. Uses Laravel's RateLimiter under the hood:
use Station\Middleware\RateLimited;
class MyJob implements ShouldQueue
{
public function middleware(): array
{
return [new RateLimited(
key: 'my-job',
maxAttempts: 10, // Max executions...
decaySeconds: 60, // ...per this window
releaseOnLimit: true, // Release back to queue when limited
releaseDelay: 60, // Seconds before retry
)];
}
// Optional: customize the rate limiter key per-job instance
public function rateLimiterKey(): string
{
return $this->userId;
}
}Sets an execution time limit using pcntl_alarm. Throws JobTimeoutException when exceeded:
use Station\Middleware\Timeout;
public function middleware(): array
{
return [new Timeout(
seconds: 30,
releaseOnTimeout: false, // If true, release back to queue instead of failing
releaseDelay: 0,
)];
}Requires ext-pcntl. Without it, the middleware is a no-op.
Ensures only one instance of a job exists in the queue at a time, using cache-based locks:
use Station\Middleware\Unique;
class MyJob implements ShouldQueue
{
public function middleware(): array
{
return [new Unique(lockFor: 3600)]; // Lock for 1 hour
}
public function uniqueId(): string
{
return $this->userId; // Required: defines uniqueness
}
}The lock is held for lockFor seconds. If another job with the same uniqueId() is already locked, this job is skipped.
Prevents concurrent execution of jobs that share a key. Unlike Unique (which prevents duplicate queueing), this prevents simultaneous processing:
use Station\Middleware\WithoutOverlapping;
public function middleware(): array
{
return [new WithoutOverlapping(
key: 'process-order-' . $this->orderId,
expiresAfter: 300, // Lock expires after 5 minutes (safety net)
releaseOnOverlap: true, // Release back to queue if another instance is running
releaseDelay: 5, // Seconds before retry
)];
}The lock is released when the job finishes. expiresAfter is a safety net for crashed workers.
Station provides a WithStation trait and StationFake for testing job dispatch without hitting real queues.
use Station\Testing\WithStation;
class MyJobTest extends TestCase
{
use WithStation;
public function testJobIsDispatched(): void
{
$this->fakeStation();
MyJob::dispatch($data);
$this->assertStationDispatched(MyJob::class);
}
}fakeStation() also calls Queue::fake() and Bus::fake() internally.
| Method | Description |
|---|---|
assertStationDispatched(string $job, ?callable $callback) |
Assert a job class was dispatched, optionally filtering with a callback |
assertStationNotDispatched(string $job, ?callable $callback) |
Assert a job was not dispatched |
assertNothingDispatched() |
Assert no jobs were dispatched at all |
assertStationDispatchedTimes(string $job, int $times) |
Assert exact dispatch count |
assertBatchDispatched(?callable $callback) |
Assert a batch was dispatched |
assertChainDispatched(array $expectedChain) |
Assert a chain with specific job classes |
| Method | Description |
|---|---|
getDispatchedJobs(?string $type) |
Get all dispatched job objects, optionally filtered by class |
processStationJobs() |
Process all pending jobs synchronously (calls handle()) |
Station includes an OpenTelemetry-compatible telemetry system for distributed tracing and metrics.
'telemetry' => [
'enabled' => env('STATION_TELEMETRY_ENABLED', false),
'service_name' => env('STATION_TELEMETRY_SERVICE', 'station'),
'tracing' => [
'enabled' => true,
'driver' => env('STATION_TRACING_DRIVER', 'internal'), // 'internal', 'opentelemetry'
'sample_rate' => (float) env('STATION_TRACING_SAMPLE_RATE', 1.0),
'propagation' => 'w3c', // 'w3c', 'b3', 'jaeger'
],
'metrics' => [
'enabled' => true,
'driver' => env('STATION_METRICS_DRIVER', 'internal'), // 'internal', 'opentelemetry'
'export_interval' => 60,
'prometheus_endpoint' => env('STATION_PROMETHEUS_ENDPOINT', '/station/metrics'),
],
'exporters' => [
'otlp' => ['endpoint' => env('OTEL_EXPORTER_OTLP_ENDPOINT')],
'jaeger' => ['endpoint' => env('JAEGER_ENDPOINT')],
'zipkin' => ['endpoint' => env('ZIPKIN_ENDPOINT')],
],
],When ext-opentelemetry is installed, Station uses OpenTelemetryTracer and OpenTelemetryMeter. Otherwise it falls back to InternalTracer and InternalMeter.
When enabled, Station automatically creates spans and records metrics for:
- Job spans — Started on
JobStarted, ended onJobCompleted/JobFailedwith status and exception attributes - Workflow spans — Started on
WorkflowStarted, ended onWorkflowCompleted/WorkflowFailed
Metrics counters automatically incremented:
station.jobs.started(labels: queue, job_class)station.jobs.completed(labels: queue, job_class)station.jobs.failed(labels: queue, job_class)station.workflows.started(labels: workflow)station.workflows.completed(labels: workflow)station.workflows.failed(labels: workflow)
$telemetry = app(TelemetryManager::class);
// Custom span
$span = $telemetry->startSpan('my-operation', ['key' => 'value']);
// ... do work ...
$span->end();
// Record metrics
$telemetry->incrementCounter('my.counter', ['label' => 'value']);
$telemetry->recordHistogram('my.duration', $durationMs, ['queue' => 'default']);
$telemetry->recordMetric('my.gauge', $currentValue);Active spans are pruned periodically (every 100 job starts). Ended spans older than 10 minutes and in-flight spans older than 1 hour are removed. A hard cap of 1,000 active spans prevents unbounded growth.
Station supports automatic worker scaling based on queue metrics.
| Strategy | Value | Description |
|---|---|---|
| Queue Size | queue_size |
Scale based on pending job count (jobs per worker) |
| Throughput | throughput |
Scale based on target throughput (jobs per minute) |
| Wait Time | wait_time |
Scale based on average job wait time |
| Combined | combined |
Weighted combination of queue_size, throughput, and wait_time |
use Station\Scaling\ScalingPolicyBuilder;
$policy = (new ScalingPolicyBuilder('my-policy'))
->strategy('queue_size')
->workers(min: 1, max: 10)
->cooldown(60) // Seconds between scaling actions
->scaleUpWhenQueueExceeds(1000, increment: 2)
->scaleDownWhenQueueBelow(100, decrement: 1)
->scaleUpWhenWaitTimeExceeds(30.0, increment: 1)
->scaleDownWhenWaitTimeBelow(5.0, decrement: 1)
->build();$policy = (new ScalingPolicyBuilder('time-based'))
->peakHours([9, 10, 11, 12, 13, 14, 15, 16, 17], workers: 8)
->offPeakHours([0, 1, 2, 3, 4, 5, 6, 22, 23], workers: 2)
->weekdays(workers: 6)
->weekends(workers: 2)
->build();'scaling' => [
'enabled' => env('STATION_SCALING_ENABLED', false),
'policies' => [
'default' => [
'min_workers' => 1,
'max_workers' => 10,
'cooldown' => 60,
'scale_up_threshold' => 0.8,
'scale_down_threshold' => 0.2,
],
],
'strategies' => [
'queue_size' => ['enabled' => true, 'high_watermark' => 1000, 'low_watermark' => 100],
'throughput' => ['enabled' => false, 'target_jobs_per_minute' => 100],
'wait_time' => ['enabled' => false, 'max_wait_seconds' => 30],
],
'metrics_window' => 300,
],Station fires events for worker and supervisor lifecycle changes, plus scaling actions.
| Event | Properties | When |
|---|---|---|
WorkerStarted |
worker, queues, options |
Worker process begins |
WorkerStopped |
worker, reason, jobsProcessed |
Worker process ends |
WorkerLoopIteration |
workerId |
Each worker loop tick |
SupervisorStarted |
supervisorId, name, queues, options |
Supervisor starts |
SupervisorStopped |
supervisorId, name, reason, jobsProcessed |
Supervisor stops |
WorkersScaledUp |
queue, previousCount, newCount |
Auto-scaler adds workers |
WorkersScaledDown |
queue, previousCount, newCount |
Auto-scaler removes workers |
AlertTriggered |
alert, rule, channel, severity |
Alert rule triggers a notification |
WorkersScaledUp provides getAddedCount() and WorkersScaledDown provides getRemovedCount() convenience methods.
| Feature | Horizon | Station |
|---|---|---|
| Queue Drivers | Redis only | RabbitMQ, Redis, SQS, Beanstalkd, Kafka |
| Message Durability | In-memory (Redis) | Disk-persisted (RabbitMQ) |
| Dead Letter Queues | No | Yes (automatic, per-queue) |
| Job Checkpointing | No | Yes (resume from saved progress) |
| Stuck Job Detection | Basic (timeout only) | Advanced (weighted scoring: heartbeat, runtime, memory, process state) |
| Stuck Job Recovery | Manual retry only | Automatic recovery with strategies (graceful, restart, checkpoint) |
| Batch Progress | Basic (via job_batches) |
Enhanced (explicit status, atomic counters, integer failure thresholds) |
| Failure Thresholds | Boolean (allowFailures()) |
Integer (allowFailures(5) — allow up to N) |
| Workflows | No | Yes (DAG with conditions, branching, pause/resume) |
| Process Supervision | Yes | Yes (with pcntl_fork and automatic pool maintenance) |
| Dashboard | React SPA | Inertia.js + Vue 3 SPA |
| Per-Job Metrics | Limited (tags, runtime) | Detailed (processing time, memory, wait time, attempts) |
| Aggregate Metrics | Throughput, wait times | Throughput, wait times, failure rate, memory, worker count |
| Prometheus Export | No (third-party) | Built-in (/metrics/prometheus) |
| Health Checks | No | Yes (connections, database, disk, workers) |
| Alerting | No (third-party) | Built-in (Slack, Discord, Teams, webhooks, log) |
| Job Encryption | No | Yes (ShouldBeEncrypted interface) |
| Payload Masking | No | Yes (ShouldMaskPayload for dashboard) |
| Audit Logging | No | Yes (tracks retry, delete, pause actions) |
| API | Limited internal API | Full REST API with Bearer auth |
| RBAC | Gate-based authorization | Gate-based authorization |
1. Job Recovery and Checkpointing
Horizon has no concept of saving job progress. If a worker dies mid-execution, the job restarts from scratch. Station's checkpointing lets long-running jobs save state and resume:
// With Station: 2-hour import job saves progress every 100 records
// If it fails at record 500,000 — resumes from 500,001
$checkpoint->save(['last_id' => $user->id, 'processed' => $count]);2. Multi-Driver Support
Horizon is locked to Redis. Station supports 5 drivers, allowing you to choose based on requirements. See the driver feature matrix for a detailed comparison.
3. Workflow Orchestration
Horizon has no workflow support. Station provides both quick DAG-based workflows and a full persistent engine with conditional steps, branching, and pause/resume.
4. Dead Letter Queues
When a job exhausts all retries, Horizon stores it in the failed_jobs table. Station also routes it to a dead letter queue in the message broker, providing broker-level inspection and requeuing.
5. Integer Failure Thresholds for Batches
Horizon (Laravel's default) only supports allowFailures() as a boolean. Station extends this to an integer: "allow up to 5 failures" gives you nuanced control over batch completion criteria.
6. Built-in Observability
Horizon requires third-party integrations for Prometheus metrics, health checks, and alerting. Station includes all of these natively. See Security, Alerting & Resilience for details.
1. Redis-Native Auto-Balancing
Horizon's auto-balancer dynamically adjusts worker counts per queue using Redis Lua scripts for atomic queue operations. Station's AutoScaler integrates with WorkerSupervisor to scale workers based on queue metrics (queue size, throughput, wait time, or a weighted combination), but uses database queries rather than Redis Lua scripts for metrics collection.
2. Redis Lua Script Optimizations
Horizon uses Redis Lua scripts for atomic queue operations (pop, push, metrics recording) in a single round-trip. Station uses standard database queries, which have slightly higher overhead.
3. Mature Ecosystem
Horizon is a first-party Laravel package maintained by the Laravel team with extensive community support, tutorials, and third-party integrations. Station is newer and has a smaller ecosystem.
| Aspect | Horizon | Station |
|---|---|---|
| Data storage | Redis (metrics, state) | Database (jobs, metrics, state) |
| Worker model | PHP processes managed by Supervisor daemon | pcntl_fork() children managed by parent process |
| Job tracking | Stores in Redis with TTL | Stores in station_jobs table permanently (prunable) |
| Batch tracking | Uses Laravel's job_batches only |
Overlay table (station_batches) with atomic counters |
| Configuration | config/horizon.php with environment-based supervisors |
config/station.php with driver-specific configs |
| Event integration | Fires and listens to Laravel queue events | Same — transparently hooks into the same events |
| Dashboard framework | React + Tailwind | Vue 3 + Inertia.js + Tailwind |