One Fleet, Two State Machines — Firmware Architecture for GPS Trackers – Discover

A client once asked me why their "universal" GPS tracker firmware worked perfectly in bench tests but bricked half the fleet within a month. The answer was in a single line of their main loop:

while (1) {
    read_gnss();
    send_over_lte();
    sleep_ms(30000);   // 30s, "good enough for everything"
}

That loop is fine on a delivery van with a 12V battery feeding the device. On a pallet running off an 8500 mAh lithium primary cell, it drains the battery in under three weeks.

Why does one GPS tracker firmware not fit every asset in a fleet?

GPS tracker firmware is not one problem: it is two problems with the same API surface. Wired vehicle trackers and battery-powered asset trackers run fundamentally different state machines, speak different MQTT cadences, and fail in different ways. Treating them as one firmware codebase with a "sleep interval" config knob is the single most common reason fleet deployments stall after a successful pilot.

I have spent 20+ years designing IoT hardware at Eelink, shipping GPS/cellular trackers to fleet operators across North America, Europe, and Asia. The pattern below (one fleet, two state machines) is what separates the deployments that survive procurement review from the ones that quietly get shelved after six months.

How does asset class change the firmware state machine?

The two axes that matter are onboard power (does the asset have a 12V/24V/48V supply?) and movement pattern (is it in active duty-cycled use, or mostly idle?). Every tracker on the market is optimized for one quadrant of that 2×2 matrix, even when the product page claims otherwise.

That means every fleet needs at least two firmware builds — one for powered vehicles (Eelink TK417 class: 4G LTE CAT-1, wired install, continuous reporting) and one for unpowered assets (Eelink GPT50 class: NB-IoT/LTE-M, multi-year battery, daily reporting). They share a cellular stack. They share nothing else.

The wired tracker is a polling state machine: it runs a tight loop of GNSS reads and LTE uplinks because cheap vehicle power makes radio duty cycle a non-constraint. The battery tracker is a wake-on-event state machine: it spends 23+ hours per day in deep sleep and only wakes on motion, then runs a strictly bounded sequence of GNSS fix → LTE attach → MQTT publish → back to sleep. One wrong transition and the radio stays powered for minutes instead of seconds, and you are now measuring battery life in weeks, not years.

What does a battery-powered tracker's main loop actually look like?

A sleep-dominant state machine is not a loop you debug with printf. Every state has a power cost and a hard time budget. Here is a reference main loop for a GPT50-class tracker running on an nRF9160-style SiP with an LIS2DW12 accelerometer on the SPI bus:

typedef enum {
    ST_DEEP_SLEEP,
    ST_MOTION_WAKE,
    ST_GNSS_FIX,
    ST_LTE_ATTACH,
    ST_MQTT_PUBLISH,
    ST_SLEEP_ENTRY
} tracker_state_t;

static volatile bool motion_flag = false;

void accel_isr(void) {
    motion_flag = true;   // wake System on RTC or GPIO
}

void tracker_main(void) {
    tracker_state_t s = ST_DEEP_SLEEP;
    fix_t fix = {0};
    uint32_t fix_deadline_ms, lte_deadline_ms;

    for (;;) {
        switch (s) {
        case ST_DEEP_SLEEP:
            lpm_enter_system_off();          // ~3 µA budget
            if (motion_flag || rtc_daily_tick()) s = ST_MOTION_WAKE;
            break;

        case ST_MOTION_WAKE:
            motion_flag = false;
            if (!accel_is_real_motion(500 /*ms*/)) {
                s = ST_SLEEP_ENTRY;          // debounce: ignore vibration
                break;
            }
            gnss_power_on();
            fix_deadline_ms = now_ms() + 90000;   // 90s hard cap
            s = ST_GNSS_FIX;
            break;

        case ST_GNSS_FIX:
            if (gnss_try_fix(&fix)) {
                gnss_power_off();
                s = ST_LTE_ATTACH;
            } else if (now_ms() > fix_deadline_ms) {
                gnss_power_off();
                fix.valid = false;            // report last-known + error flag
                s = ST_LTE_ATTACH;
            }
            break;

        case ST_LTE_ATTACH:
            lte_power_on_with_psm(/*TAU=*/86400, /*Active=*/10);
            lte_deadline_ms = now_ms() + 60000;
            if (lte_is_registered() || now_ms() > lte_deadline_ms) {
                s = lte_is_registered() ? ST_MQTT_PUBLISH : ST_SLEEP_ENTRY;
            }
            break;

        case ST_MQTT_PUBLISH:
            mqtt_publish_cbor(&fix, QOS_1);
            lte_power_off();
            s = ST_SLEEP_ENTRY;
            break;

        case ST_SLEEP_ENTRY:
            rtc_arm_next_wake(24 * 3600);     // 24h backstop
            s = ST_DEEP_SLEEP;
            break;
        }
    }
}

Three things matter here more than anything else. First, every "power on" has a paired "power off" on every exit path — there is no way to leak a powered radio. Second, the accelerometer ISR only sets a flag; the debounce happens in task context with real sampling, because any vibration (a forklift driving past a pallet) would otherwise wake GNSS. Third, the GNSS and LTE stages have hard deadlines: you publish what you have and sleep, because a bad-signal loop that keeps the LTE modem attached for an extra two minutes per day is the difference between a 5-year and a 2-year battery life.

The PSM parameters in lte_power_on_with_psm(TAU=86400, Active=10) are the most consequential line in the whole machine. TAU (T3412 Extended) tells the network to expect the device to stay registered for 24 hours even though it is off-air, which avoids a full attach on every wake. The Active Time (T3324) of 10 seconds is how long the modem listens for mobile-terminated traffic after publish before going RRC-idle. Ten seconds is aggressive. You trade downlink responsiveness for battery. For fleet telemetry that is uplink-only, that trade is free. For anything needing remote config push, you either negotiate a longer T3324 or add eDRX on top so the modem wakes periodically in a low-power paging cycle. Getting these three numbers wrong in firmware is the single most common battery-life failure I see in deployed trackers, and it never shows up in bench testing because a bench has perfect coverage.

How do you compute real battery life from the state machine?

Battery life is not a datasheet number. It is an integral of current over every state the firmware enters in a 24-hour window. The only number that matters is average current over one full wake cycle, with every transition accounted for. Given the following reference numbers for an nRF9160-class SiP:

I_sleep = 3 µA (deep sleep)
I_accel = 2 µA (accelerometer always-on low-g detect)
I_gnss = 25 mA average over fix window
I_lte_tx = 150 mA peak, ~80 mA average during attach + publish
t_wake = 30 s GNSS fix + 15 s LTE attach + 2 s publish = 47 s active

If the device wakes once per day for a scheduled uplink plus three motion events of 47 seconds each, the daily energy budget is:

E_active  = (25 mA * 30s + 80 mA * 17s) * 4 wakes = 8440 mA·s
E_sleep   = 5 µA * (86400 - 188) s             ≈ 431 mA·s
E_daily   ≈ 8871 mA·s ≈ 2.46 mA·h per day

With an 8500 mAh primary cell at 70% usable capacity (temperature derating and self-discharge), you get roughly 5950 / 2.46 ≈ 2420 days, or 6.6 years. Drop the radio active time by 10 seconds and you gain another year. Add one extra motion wake per day and you lose nine months. This is what "battery life" actually means at the firmware level — not a spec-sheet number but an integral you can tune.

How should MQTT payloads differ between the two tracker classes?

The payload design reveals the deployment model. A wired vehicle tracker publishing every 30 seconds can afford a verbose JSON payload because bandwidth is free on CAT-1 and the device never sleeps. A battery tracker publishing once per day cannot afford any bytes it does not need, because every byte maps directly to modem-on-time, and the LTE attach itself is what drains the battery, not the payload size. So you want every publish to be a single small PDU that fits in one packet.

For the TK417-class wired tracker, verbose JSON at QoS 1 on topic fleet/vehicle/{device_id}/telemetry:

{
  "ts": 1745251200,
  "lat": 22.5428,
  "lon": 114.0588,
  "spd_kmh": 42.3,
  "hdg": 178,
  "ign": 1,
  "fuel_v": 12.7,
  "sats": 11,
  "hdop": 0.9
}

For the GPT50-class battery tracker, CBOR at QoS 1 on topic fleet/asset/{device_id}/telemetry, using IETF RFC 8949 encoding with short integer keys:

# Server-side decode (Python with cbor2)
import cbor2

# Battery tracker CBOR payload (~28 bytes on the wire)
payload = bytes.fromhex(
    "a7016568616d6d65021a6614d5802163a2a7036"
    "43ea0e61b04226439e7c05194228"
)
msg = cbor2.loads(payload)
# {1: "hamme", 2: 1712657280, 3: 22.5428, 4: 114.0588,
#  5: 3.62, 6: "moved", 7: 17} → 28 bytes vs ~140 bytes JSON

CBOR at 28 bytes vs JSON at 140 bytes is not the point. The point is the LTE attach-detach cycle that a tiny payload permits. A 28-byte payload fits inside a single NB-IoT UE-originated message with Release Assistance Indication set, which lets the modem detach immediately after publish instead of waiting for the 20-second network-side inactivity timer. That single behavior is worth roughly 2 years of battery life over the device lifetime.

What does server-side reconciliation look like for mixed fleets?

You end up with two payload schemas (verbose JSON for vehicles, compact CBOR for assets) flowing into one MQTT broker, and the backend has to route by topic, decode by schema, and normalize both into a single internal record shape before they hit the time-series database:

import json, cbor2
from dataclasses import dataclass

@dataclass
class Fix:
    device_id: str
    ts: int
    lat: float
    lon: float
    battery_v: float | None = None
    ignition: bool | None = None

def decode(topic: str, payload: bytes) -> Fix:
    parts = topic.split("/")
    device_id = parts[-2]
    if parts[1] == "vehicle":
        d = json.loads(payload)
        return Fix(device_id, d["ts"], d["lat"], d["lon"],
                   ignition=bool(d.get("ign")))
    elif parts[1] == "asset":
        d = cbor2.loads(payload)
        return Fix(device_id, d[2], d[3], d[4], battery_v=d[5])
    raise ValueError(f"Unknown asset class: {topic}")

Two decoders, one Fix record, one time-series table. The firmware is two codebases, but the platform above it is one. That is the inversion most fleet platforms get wrong — they try to unify the firmware and end up with two platforms.

What is the one architectural decision that matters most for mixed-asset fleets?

If you are running a fleet with more than one asset type, your first firmware architectural decision is not which RTOS or which cellular module, it is where you draw the line between polled and wake-on-event devices. Draw that line wrong and every subsequent decision compounds the mistake: the cellular module you pick will be wrong, the battery chemistry you pick will be wrong, the MQTT payload you design will be wrong.

The matrix I published on the procurement side of this same question — the five classification mistakes fleet managers make — is the non-engineering version of this article. If you work with the procurement people, that one is for them.

What pattern has worked for you? Do you run two firmware codebases in parallel or have you found a way to unify them that does not blow up battery life? I would like to hear how others have handled the class boundary — leave a note or reach out.

This article was written with AI assistance for research and drafting.