Technology Apr 28, 2026 · 4 min read

Output assertions: the cron job check most monitoring tools skip

Output assertions: the cron job check most monitoring tools skip A follow-up to A reader comment made me realise I'd only solved half the problem — this is a deeper reference guide on output assertions specifically. "Did it run?" is the wrong question. Every monitoring tool asks it. Hea...

DE
DEV Community
by Kriss
Output assertions: the cron job check most monitoring tools skip

Output assertions: the cron job check most monitoring tools skip

A follow-up to A reader comment made me realise I'd only solved half the problem — this is a deeper reference guide on output assertions specifically.

"Did it run?" is the wrong question.

Every monitoring tool asks it. Heartbeat monitors, cron schedulers, even purpose-built tools like Cronitor and Healthchecks.io — they all fundamentally ask: did the job check in? If yes, green. If no, red.

It's a useful question. But it's not the useful question.

The failure mode that looks like success

Imagine a nightly job that syncs user records from your CRM into your database. It runs at midnight, takes about 90 seconds, and exits cleanly. Your heartbeat monitor sees the ping at 12:01:34am and marks it healthy.

What it doesn't see: the job synced 0 records. It has been syncing 0 records for eight days, since someone rotated the CRM API credentials and forgot to update the environment variable. The job connects, gets a 401, logs a warning, falls back to a no-op, and exits 0.

All monitoring: green. Business: broken for eight days.

This is not a hypothetical. Variants of this failure happen constantly. The job ran. That fact is true and also completely useless.

What "did it do anything?" looks like

Output assertions flip the question. Instead of only checking that the job pinged in, you also check what it reported.

A job that processes records should report how many it processed. A job that generates a file should report the file size. A job that sends emails should report how many it sent. You instrument the job to emit a count — one number representing meaningful work done — and your monitoring layer validates it falls within expected bounds.

The failure modes this catches:

  • Zero when non-zero expected: sync runs, processes nothing, exits clean
  • Suspiciously low counts: normally syncs 500 records, today synced 3
  • Count drift over time: weekly report used to include 10k rows, now consistently 200

None of these trip a heartbeat check. All of them are real problems.

Why most tools don't do this

Heartbeat monitoring is architecturally simple: job pings URL, URL records timestamp, alerting checks timestamp age. The data model is just "last seen at".

Output assertions require more: the job must emit structured data, the tool must store it, and the alerting logic must understand what "normal" looks like for that specific job. That's a significantly more complex product to build.

Most tools solve the simpler problem because it covers the obvious failure mode and is much easier to ship.

How to instrument your jobs

The instrumentation is lightweight. Pick a number that represents meaningful work and emit it at the end:

# Database backup — report dump file size
result = subprocess.run(["pg_dump", "-Fc", "mydb", "-f", "/backups/mydb.dump"])
dump_size = os.path.getsize("/backups/mydb.dump")
ping_monitor(count=dump_size)

# CRM sync — report records synced
synced = sync_from_crm()
ping_monitor(count=len(synced))

# Email campaign — report emails sent
sent = send_campaign(campaign_id)
ping_monitor(count=sent)

Three extra lines per job. The return is knowing your job didn't just run — it did something. (ping_monitor is a wrapper around your monitoring call — implementation below.)

Sending the count to your monitor

DeadManCheck accepts a count parameter with each ping:

curl -fsS "https://deadmancheck.io/ping/YOUR-TOKEN?count=1547" > /dev/null

You configure the assertion on the monitor: "alert if count is 0" or "alert if count drops below threshold". If the job checks in but reports zero records, you get alerted — even though the job technically ran fine.

It also does duration monitoring with rolling average anomaly detection. If your 90-second job starts taking 45 minutes, that gets flagged too. Jobs that hang are a separate silent failure mode that output counts don't catch on their own.

The right question

Monitoring that only asks "did it run?" will eventually lie to you at the worst possible moment.

The right question is "did it do anything useful?" Output assertions are how you ask that question automatically, at 2am, every night, without anyone having to check.

Start with your backup jobs. That's where the answer matters most.

DE
Source

This article was originally published by DEV Community and written by Kriss.

Read original article on DEV Community
Back to Discover

Reading List