Laravel Horizon Is Not Just a Pretty Dashboard — Here’s What It’s Actually Telling You

Throughput, wait time, runtime, failed job rates, queue balancing strategies, supervisor configuration, and the metrics that tell you your queue is about to fall over before it does — the Horizon guide that goes past “run php artisan horizon” and actually teaches you what to watch.


Most Laravel developers install Horizon, open the dashboard, confirm that jobs are running, and close the tab. The dashboard shows green numbers. The queues are processing. That’s enough — until it isn’t. Wait times climb silently. A burst of traffic floods one queue while workers sit idle on another. A job that normally completes in 400ms starts taking 8 seconds. Nobody notices until emails stop delivering or a user complains that their export is three hours late.

Horizon’s dashboard isn’t decorative. Every metric it exposes is a signal. Understanding what those signals mean — what normal looks like, what warning looks like, and what “this is about to break” looks like — is the difference between reactive and proactive queue management. This post covers the metrics that matter, the configuration that controls them, and the patterns that keep production queues healthy under load.


The Snapshot Command You’re Probably Missing

Before any metrics appear in Horizon’s dashboard, this has to be in your scheduler:

// routes/console.php (Laravel 11+)
Schedule::command('horizon:snapshot')->everyFiveMinutes();

// app/Console/Kernel.php (Laravel 10 and below)
protected function schedule(Schedule $schedule): void
{
    $schedule->command('horizon:snapshot')->everyFiveMinutes();
}

horizon:snapshot is what populates the throughput, wait time, and runtime graphs. Without it, the metrics tab is empty. The queues still process — you just have no visibility into how. This is the most common reason developers open the Horizon dashboard and see blank charts.

Five minutes is the right interval for most applications. Shorter intervals increase Redis write frequency without materially improving graph resolution. Longer intervals make it harder to correlate a spike in wait time with the event that caused it.


What Throughput Actually Means

Throughput is the number of jobs processed per minute, averaged across the snapshot interval. It appears per-queue and across the application as a whole.

What throughput tells you:
→ High throughput + low wait time   = healthy, processing faster than jobs arrive
→ Low throughput + high wait time   = backlog building, workers can't keep up
→ Throughput drops suddenly         = workers died, Horizon restarted, Redis issue
→ Throughput spikes then drops      = burst processed, queue draining normally

Throughput in isolation is not a health metric. A queue with 10 jobs/minute throughput is healthy if 8 jobs/minute arrive. It’s critical if 200 jobs/minute arrive. Throughput is meaningful in relation to arrival rate — and Horizon doesn’t show arrival rate directly, which is why wait time is the more actionable metric.


Wait Time: The Number That Actually Matters

Wait time is the time between a job being dispatched and a worker picking it up. This is the metric your users feel. The email didn’t arrive. The export isn’t ready. The webhook didn’t fire. Every user-visible delay in your application’s async features maps to queue wait time.

Wait time thresholds (rough guide — calibrate to your application):

< 1 second     → Healthy. Workers are ahead of the backlog.
1–5 seconds    → Normal under moderate load. Worth watching.
5–30 seconds   → Workers are struggling to keep up. Investigate.
30+ seconds    → Backlog building. Add workers or reduce job dispatch rate.
Minutes        → Queue overwhelmed. Immediate action required.

Wait time rising is almost always caused by one of three things: dispatch rate exceeding processing capacity, a slow job blocking workers, or workers dying without being replaced. Horizon’s dashboard shows wait time per queue, which is how you distinguish between “everything is slow” and “this specific queue has a problem.”


Runtime: Finding the Jobs That Are Eating Your Workers

Runtime is the average execution time of completed jobs on a queue. It appears in the metrics tab alongside throughput.

A queue with 10 workers and an average job runtime of 5 seconds can process approximately 2 jobs per second — 120 per minute. If 200 jobs per minute arrive, the backlog grows. Runtime × throughput demand = minimum worker count required. Understanding this relationship tells you whether adding workers will solve a wait time problem or whether the jobs themselves need to be faster.

What rising runtime tells you:
→ External API the job calls is slowing down
→ Database query the job runs isn't scaling with data volume
→ Memory pressure causing swap usage, slowing execution
→ Job is doing more work than it used to (code change introduced)
→ N+1 inside a job that wasn't visible at low data volumes

When runtime climbs on a specific queue, open a recent job in the Horizon dashboard. The detail view shows the exact execution time, the queue it ran on, the number of attempts, and the full payload. That’s enough to identify whether the problem is data-volume-related (look at payload size or related record count) or infrastructure-related (look at wall time vs CPU time).


The Three Balancing Strategies — and When to Use Each

Horizon allows you to choose from three worker balancing strategies: auto, simple, and false. The choice between them drives almost everything about how your workers respond to load.

balance: 'auto' — Dynamic allocation

The auto strategy adjusts the number of worker processes per queue based on the current workload of the queue. For example, if your notifications queue has 1,000 pending jobs while your default queue is empty, Horizon will allocate more workers to your notifications queue until the queue is empty.

// config/horizon.php
'environments' => [
    'production' => [
        'supervisor-1' => [
            'connection'       => 'redis',
            'queue'            => ['high', 'default', 'low'],
            'balance'          => 'auto',
            'minProcesses'     => 1,    // minimum per queue
            'maxProcesses'     => 20,   // total ceiling across all queues
            'balanceMaxShift'  => 1,    // max workers to add/remove per rebalance cycle
            'balanceCooldown'  => 3,    // seconds between rebalance decisions
            'tries'            => 3,
            'timeout'          => 90,
        ],
    ],
],

balanceMaxShift and balanceCooldown control how aggressively Horizon rebalances. A balanceMaxShift of 1 means it adds or removes one worker per rebalance cycle — conservative, avoids thrashing. A cooldown of 3 seconds means it evaluates and adjusts every 3 seconds.

minProcesses defines the minimum number of worker processes per queue — this value must be greater than or equal to 1. maxProcesses defines the maximum total number of worker processes Horizon may scale up to across all queues — this value should typically be greater than the number of queues multiplied by the minProcesses value.

The critical constraint: when using the auto balancing strategy, Horizon will consider in-progress workers as “hanging” and force-kill them after the Horizon timeout during scale down. Always ensure the Horizon timeout is greater than any job-level timeout, otherwise jobs may be terminated mid-execution.

Use auto when: your queues have uneven, unpredictable load. Most production applications.

balance: 'simple' — Even distribution

The simple strategy splits workers evenly across the queues a supervisor handles. If a supervisor manages three queues with 9 workers, each queue gets 3 workers — regardless of how many jobs are waiting on each.

'supervisor-1' => [
    'queue'    => ['default', 'notifications', 'exports'],
    'balance'  => 'simple',
    'processes' => 9,  // 3 per queue, static
    'tries'    => 3,
    'timeout'  => 60,
],

Use simple when: your queues have consistent, predictable load and you want deterministic resource allocation. Useful for queues that have SLA-like requirements — you always want exactly N workers on the payments queue regardless of what other queues are doing.

balance: false — Priority ordering

When balance is false, Horizon uses default Laravel queue behaviour: queues are processed strictly in the order listed. Workers only move to the next queue when the first is empty.

'supervisor-1' => [
    'queue'   => ['critical', 'high', 'default', 'low'], // strict priority order
    'balance' => false,
    'processes' => 5,
    'tries'   => 3,
],

This means a flooded critical queue will hold all workers until it drains — low jobs wait indefinitely during high load. Use this only when absolute priority is a hard requirement and starvation of lower queues is acceptable.


Multiple Supervisors: The Right Way to Separate Concerns

A single supervisor handling all queues is the configuration that looks fine until you need to tune anything. The better pattern: separate supervisors for separate job profiles.

// config/horizon.php
'environments' => [
    'production' => [

        // Critical path — small, fast jobs, generous worker pool, simple balance
        // Payments, auth tokens, webhooks that must fire within seconds
        'supervisor-critical' => [
            'connection'  => 'redis',
            'queue'       => ['critical', 'payments'],
            'balance'     => 'simple',
            'processes'   => 10,
            'tries'       => 5,
            'timeout'     => 30,   // these jobs should be fast
            'memory'      => 128,
        ],

        // General async — emails, notifications, cache warming, event listeners
        // Auto-balance, moderate pool, standard timeout
        'supervisor-default' => [
            'connection'    => 'redis',
            'queue'         => ['default', 'notifications'],
            'balance'       => 'auto',
            'minProcesses'  => 2,
            'maxProcesses'  => 15,
            'tries'         => 3,
            'timeout'       => 90,
            'memory'        => 256,
        ],

        // Long-running — exports, report generation, AI processing, video encoding
        // Few workers, long timeout, high memory, isolated so they can't starve other queues
        'supervisor-heavy' => [
            'connection'    => 'redis',
            'queue'         => ['exports', 'processing'],
            'balance'       => 'auto',
            'minProcesses'  => 1,
            'maxProcesses'  => 5,
            'tries'         => 2,
            'timeout'       => 600,  // 10 minutes
            'memory'        => 1024,
        ],

    ],
],

The isolation between supervisors is what makes this valuable. A flood of export jobs can’t steal workers from payment processing. A slow external API call in the processing queue doesn’t affect notification delivery. Each supervisor has its own worker pool, its own timeout, its own retry budget.

The timeout and memory values deserve attention. timeout at the supervisor level is a hard ceiling — a job that runs longer is force-killed. If you don’t set the tries option, Horizon defaults to a single attempt, unless the job class defines $tries, which takes precedence over the Horizon configuration. Setting tries or $tries to 0 allows unlimited attempts, which is ideal when the number of attempts is uncertain. To prevent endless failures, you can limit the number of exceptions allowed by setting the $maxExceptions property on the job class.


The defaults Block: DRY Supervisor Configuration

Most supervisor options are shared across environments. The defaults block defines them once:

// config/horizon.php
'defaults' => [
    'supervisor-critical' => [
        'connection'  => 'redis',
        'queue'       => ['critical', 'payments'],
        'balance'     => 'simple',
        'processes'   => 10,
        'tries'       => 5,
        'timeout'     => 30,
        'memory'      => 128,
    ],
    'supervisor-default' => [
        'connection'    => 'redis',
        'queue'         => ['default', 'notifications'],
        'balance'       => 'auto',
        'minProcesses'  => 2,
        'maxProcesses'  => 15,
        'tries'         => 3,
        'timeout'       => 90,
        'memory'        => 256,
    ],
],

'environments' => [
    'production' => [
        // Inherits defaults — override only what differs in production
        'supervisor-default' => [
            'maxProcesses' => 30,  // more workers in production
        ],
    ],
    'staging' => [
        // Inherits defaults — smaller pool in staging
        'supervisor-default' => [
            'maxProcesses' => 5,
        ],
    ],
    'local' => [
        'supervisor-default' => [
            'maxProcesses' => 3,
        ],
    ],
],

The environment-specific block merges with the default — you only declare what changes. All supervisor configuration stays in config/horizon.php, version-controlled, reviewable in pull requests.


Tags: Tracking Jobs Through the System

Horizon’s tag system lets you filter jobs in the dashboard and trace specific records through the queue. Without tags, you know that ProcessOrderJob is failing — you don’t know which orders or which customers.

class ProcessOrderJob implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public function __construct(
        private readonly Order $order,
    ) {}

    // Horizon reads this method to tag the job in the dashboard
    public function tags(): array
    {
        return [
            'order:' . $this->order->id,
            'customer:' . $this->order->customer_id,
            'tenant:' . $this->order->tenant_id,
        ];
    }

    public function handle(): void
    {
        // ...
    }
}

In the Horizon dashboard, you can now search for order:12345 and see every job related to that order — across every queue, including retries and failures. For a customer complaint about a specific order, this reduces debugging from “search the logs” to “type the order ID.”

Tags are stored in Redis alongside the job. Don’t add tags that are expensive to compute or that pull database records — the tags() method runs when the job is dispatched, not when it’s processed.


Reading the Failed Jobs Tab

The failed jobs tab is the most actionable part of the Horizon dashboard, and it’s the one most developers treat as a log viewer rather than a diagnostic tool. The information it exposes:

For every failed job:
→ Job class + queue it ran on
→ Exact exception message and stack trace
→ Payload — the serialized data the job received
→ Number of attempts before final failure
→ Exact timestamp of each attempt
→ "Retry" button — dispatches a fresh copy of the job

The patterns that appear in the failed jobs tab:

Same job class, many failures in a short window: Usually an external dependency — an API that went down, a database table that’s locked, a third-party service returning 503s. Check the exception message. If it’s a connection error or HTTP 5xx, the problem is upstream.

Failures cluster at specific payloads: If ProcessOrderJob fails for order IDs in a specific range but not others, the payload is the signal. Open three failed jobs with different payloads. The failure specific to certain records is usually a data issue — a missing relationship, a null field the job assumed was populated, a record in an unexpected state.

Memory limit exceeded: The job is doing more work than the supervisor’s memory limit allows. First option: raise the memory limit in the supervisor config. Second option: refactor the job to process a smaller chunk — paginate through records instead of loading the entire collection.

Timeout exceeded: The job ran longer than the supervisor’s timeout. If this is legitimate — the job genuinely needs more time — raise the timeout on the appropriate supervisor. If it shouldn’t be taking that long, add runtime logging to identify the slow section.


The Metrics Tab in Detail

The metrics tab shows throughput and runtime over time, per queue. This is where you look for trends rather than current state.

Trend patterns and what they mean:

Throughput steady, runtime climbing    → jobs getting slower, not more numerous
                                         → investigate the job, not the workers

Throughput falling, runtime steady     → fewer jobs arriving (could be normal)
                                         or workers dying quietly

Both climbing                          → burst of more complex jobs, or data growth
                                         making each job touch more records

Throughput flat, wait time climbing    → dispatch rate now exceeds processing capacity
                                         → add workers or reduce maxProcesses ceiling

Runtime spike then return to baseline  → external dependency was slow for a period
                                         → check your dependency's status history

Runtime steadily increasing over weeks → data growth making jobs slower over time
                                         → most insidious, easiest to miss

The last pattern — slow, steady runtime increase over weeks — is the one that catches most teams off guard. A job that processes user data takes 200ms when the table has 10,000 rows. Six months later, the table has 800,000 rows and the job takes 4 seconds. The throughput looks the same. The wait time is fine. The runtime graph, if you look at it over a 90-day window, shows the trend clearly.


Deploying Horizon Without Dropping Jobs

The wrong deployment sequence for Horizon:

# ❌ Kills Horizon immediately — in-progress jobs are force-terminated
# Workers die mid-execution, jobs may partially complete
sudo supervisorctl stop horizon
git pull
php artisan horizon

The correct sequence:

# ✅ Graceful termination — in-progress jobs finish, then Horizon stops
php artisan horizon:terminate

# Pull new code, run migrations
git pull
php artisan migrate --force

# Supervisor (the Linux process monitor, not Horizon's supervisor) restarts Horizon
# automatically because autorestart=true in the supervisor config

horizon:terminate signals Horizon to finish all in-progress jobs and then exit cleanly. The Linux Supervisor process monitor detects the exit and restarts php artisan horizon with the new code. No jobs are dropped. No workers are force-killed mid-execution.

The Linux Supervisor config that makes this work:

; /etc/supervisor/conf.d/horizon.conf
[program:horizon]
process_name=%(program_name)s
command=php /var/www/app/artisan horizon
autostart=true
autorestart=true
user=www-data
redirect_stderr=true
stdout_logfile=/var/www/app/storage/logs/horizon.log
stopwaitsecs=3600  ; wait up to 1 hour for in-progress jobs to complete

stopwaitsecs=3600 is the detail most configurations miss. Without it, when the Linux Supervisor sends SIGTERM to Horizon during a restart, it waits only a few seconds before sending SIGKILL — which force-terminates in-progress jobs. With stopwaitsecs=3600, it waits up to an hour for Horizon to drain current jobs before killing the process.

Set stopwaitsecs to at least the length of your longest-running job.


The Dashboard Gate: Don’t Leave It Open

By default, the Horizon dashboard is only accessible in the local environment. In production, it requires a gate:

// app/Providers/HorizonServiceProvider.php
protected function gate(): void
{
    Gate::define('viewHorizon', function (User $user) {
        return in_array($user->email, [
            'engineering@yourcompany.com',
        ]);
    });
}

The gate receives the authenticated user. The simplest implementation is an email allowlist. A more robust one checks a role:

Gate::define('viewHorizon', function (User $user) {
    return $user->hasRole('engineering') || $user->hasRole('devops');
});

The Horizon dashboard exposes job payloads — which may contain customer data, sensitive parameters, or internal identifiers. Treat it with the same access controls as your admin panel.


The Signal You’re Looking For

The combination of metrics that predicts a queue problem before it becomes a user-visible problem:

Early warning signs — investigate now:
→ Wait time > 5 seconds on a queue that normally stays under 1 second
→ Runtime climbing more than 50% over a 24-hour period
→ Failed job rate > 1% of throughput
→ Throughput drop > 20% with no corresponding drop in traffic

Immediate action required:
→ Wait time in minutes on any user-facing queue
→ Failed job rate > 10% of throughput
→ Throughput drops to zero (Horizon has stopped)
→ Same job failing repeatedly with a non-transient error

The dashboard can’t page you when these thresholds are crossed. For that, pair Horizon with your application’s monitoring layer — fire Event::dispatch(new QueueHealthDegraded(...)) from a scheduled command that checks Horizon::currentWaitTime(), or use a monitoring service that reads the metrics Horizon stores in Redis.


What Horizon Actually Is

Horizon is an observability tool that happens to also configure your workers. The configuration part is where most tutorials stop. The observability part is what makes the difference between a queue system you understand and one you restart when it breaks and hope for the best.

Every metric in that dashboard is a number that was unknown without Horizon — throughput you guessed at, runtime you never measured, wait time you discovered when users complained. The dashboard makes these numbers visible. Reading them is the skill that makes Horizon worth running.

Leave a Reply

Your email address will not be published. Required fields are marked *