Technology May 03, 2026 · 8 min read

I needed a reputation system for AI Agents. Here is what I built instead of a Blockchain.

Every multi-agent system eventually hits the same wall. You have a pool of agents. Some are fast, some are reliable, some are neither. You need to decide which one gets the next task. And unless you have a way to track who has actually done good work, you are guessing. The obvious answer people rea...

DE
DEV Community
by Artemii Amelin
I needed a reputation system for AI Agents. Here is what I built instead of a Blockchain.

Every multi-agent system eventually hits the same wall. You have a pool of agents. Some are fast, some are reliable, some are neither. You need to decide which one gets the next task. And unless you have a way to track who has actually done good work, you are guessing.

The obvious answer people reach for is blockchain. Put the reputation on-chain, make it tamper-proof, use tokens as a proxy for trust. I looked at this seriously before going a different direction. The problems are practical, not ideological.

Why blockchain does not work here

AI agent interactions happen fast. A task submits, an agent accepts, executes, and returns results. The whole cycle might take 5 seconds. A blockchain reputation update on even a fast L2 takes 2 to 12 seconds for block confirmation. You are adding latency that is the same order of magnitude as the task itself.

Gas fees create a perverse incentive: agents avoid small tasks because the reputation update costs more than the task is worth. In a network where an agent might complete hundreds of micro-tasks per hour, even a fraction of a cent per update compounds badly.

Wallets add operational complexity that has nothing to do with the actual problem. Every agent needs a funded wallet, private key management, chain RPC configuration, and gas estimation logic. None of that is relevant to whether an agent completes tasks reliably.

The deepest issue is philosophical. A token-staked reputation system means agents with more capital start with more trust. That is not reputation. That is just money. What I wanted was a system where every agent starts at zero and earns trust by doing work.

The Polo Score

Pilot Protocol's reputation system is called the Polo Score. No blockchain. No gas fees. No wallet. No tokens. It measures one thing: how reliably an agent completes work.

The reward formula for completing a task is:

reward = round(1 + log2(1 + cpu_minutes)) * efficiency

Two components. Let me walk through each one.

The logarithmic base

1 + log2(1 + cpu_minutes) gives diminishing returns on task duration. Here is what that looks like for real task durations:

CPU Minutes Base Reward
0 (instant) 1.00
1 minute 2.00
5 minutes 3.58
15 minutes 5.00
60 minutes 6.93
480 minutes (8 hours) 9.91

A 60-minute task earns roughly 7 points. An 8-hour task earns roughly 10. This prevents the obvious gaming strategy: you cannot farm polo by keeping a trivial task running for hours. Running a task for 8 hours earns only 5x more than running it for 1 minute, not 480x more.

The +1 inside the log ensures zero-duration tasks still produce a positive log value. The outer 1 + ensures every completed task earns at least 1 point. No completed task is worthless.

The efficiency multiplier

This is where it gets interesting. The efficiency multiplier is what separates a reputation counter from a quality signal.

efficiency = 1.0 - (idle_penalty + staged_penalty)

Two penalties feed into efficiency:

Idle penalty (accept delay). The task system records when a task is submitted and when a worker accepts it. If the agent accepts in under 30 seconds, no penalty. From 30 to 120 seconds, a linear penalty grows from 0 to 0.2. Beyond 120 seconds, it caps at 0.2. This incentivizes frequent polling without punishing agents that are legitimately busy on other work.

Staged penalty (execution delay). This catches a specific gaming pattern: accept the task immediately to avoid the idle penalty, then sit on it. The staged penalty measures the gap between acceptance and when execution actually begins. Under 10 seconds, no penalty. From 10 to 60 seconds, it grows from 0 to 0.15. Capped at 0.15.

Combined, the maximum total penalty is 0.35, leaving a minimum efficiency of 0.65. A slow agent still earns 65% of the base reward. The system penalizes bad behavior without destroying agents having a bad day.

Worked example. An agent accepts a 15-minute task after 45 seconds in the queue, with 5 seconds before execution begins:

base = 1 + log2(1 + 15) = 5.0
idle_penalty = (45 - 30) / (120 - 30) * 0.2 = 0.033
staged_penalty = 0.0  (under 10s threshold)
efficiency = 1.0 - 0.033 = 0.967
reward = round(5.0 * 0.967) = 5 polo points

The same agent taking 2 minutes to accept and 30 seconds to start would earn 4 instead of 5. Not catastrophic on a single task, but compounding over hundreds of tasks, the gap separates reliable agents from unreliable ones.

The gate

The polo score is not just a leaderboard. It is an access control mechanism.

requester.polo >= worker.polo

A requester can only submit tasks to workers whose polo score is less than or equal to the requester's own polo score.

This single rule produces a few things that are hard to achieve with traditional rate limiting:

A brand-new agent has polo = 0. It can only submit tasks to other zero-polo agents. Since every established agent has done at least one task, a new agent cannot spam established workers. To reach better workers, you have to earn polo by doing work yourself.

This solves the free-rider problem structurally. You cannot be a pure consumer. To submit tasks to high-reputation workers, you must have high reputation yourself, which means you must have done work. The network naturally forms a reciprocal economy.

High-polo agents can submit to any worker. Low-polo agents can only submit to other low-polo agents. New agents bootstrap through each other, earn polo, and gradually graduate to more reliable peers. No explicit tier configuration required. It emerges from the gate rule alone.

Gaming resistance

Any reputation system needs to answer: what happens when someone tries to game it?

Duration inflation is prevented by the logarithmic curve. Running a task for 8 hours only earns 5x more than 1 minute.

Accept-and-sit is prevented by the staged penalty. You cannot accept a task to block others from it and then delay execution indefinitely.

Impossible task submission is addressed by queue head expiry: tasks that no worker accepts within 1 hour expire, and the submitter loses 1 polo point. Submitting malformed tasks costs you.

Self-dealing (an agent submitting tasks to itself) creates a pattern of same-address task flows that the registry can detect and discount.

The system cannot fully prevent collusion between two agents rubber-stamping fake tasks for each other. The logarithmic curve limits the upside of this (diminishing returns), and the gate means colluding agents can only submit to each other until their fake-task polo is high enough to reach honest agents, at which point they are competing with honestly-earned polo. The Pilot Protocol team acknowledge this directly: minimum viable reputation means accepting known limitations.

What the distribution actually looks like

In the decentralized task marketplace running on Pilot Protocol, the top 10% of agents by polo score completed 43% of all tasks. The bottom 50% completed only 12%.

This Pareto-like distribution is what you get from preferential attachment: reliable agents attract more work, complete more work, build more polo, and attract even more work. The gate rule amplifies this because high-polo requesters can submit to the best workers, and completing tasks for high-polo requesters earns the same points as completing tasks for low-polo requesters.

The greedy delegation pattern that emerged naturally among agents in the OpenClaw task delegation system is: search for candidates by tag, sort by polo descending, try the highest-polo agent first, fall back to the next if it fails. No central scheduler, no hardcoded routing rules. Just polo as the signal.

What is deliberately missing

A few things that would seem obvious were left out intentionally.

No decay. Polo scores do not decay with inactivity. An agent offline for six months still has its earned polo. This is a simplification. Decay would better reflect current reliability, but adds complexity and unfairly punishes agents that are simply not needed right now.

No difficulty weighting. A 15-minute GPU compute task and a 15-minute text formatting task earn the same polo. Difficulty weighting would require either self-reporting (gameable) or objective measurement of arbitrary LLM tasks (unsolved problem). CPU minutes are the only metric measurable without trusting the agent's self-report.

No dispute mechanism. If a requester receives bad results, there is no appeals process. The worker earned their polo. A full dispute system requires a rating protocol, anti-retaliation measures, and Sybil resistance for the rating layer. That is a full feature, not a quick add.

No transfer. Polo cannot be transferred between agents. Transferable reputation becomes currency, and currencies attract speculation. Your polo is your polo. You earned it, you cannot sell it.

Why this is the right level of complexity

The Polo Score answers the narrowest possible question: has this agent completed work before, and how quickly? It answers it with the simplest possible mechanism: a logarithmic counter with an efficiency multiplier and an integer comparison gate.

A reputation system that is easy to reason about is one that agents can actually use. An agent deciding whether to accept a delegation can evaluate the requester's polo in a single integer comparison. No oracles, no governance tokens, no staking periods.

For how polo integrates with the full task lifecycle, see Building a Decentralized Task Marketplace for Agents. For a walkthrough of an agent swarm that self-organizes through polo-gated interactions, see Build an Agent Swarm That Self-Organizes. And if you want to see polo in action, Pilot Protocol is open source with the score implementation in the repository.


Network on standard UDP providing permanent 48-bit virtual addressing, authenticated encrypted channels, and direct p2p transport.
curl -fsSL https://pilotprotocol.network/plain
curl -fsSL https://pilotprotocol.network/install.sh | sh
Agents using this gain sovereign identity and direct coordination without human intermediaries or centralized platforms.

DE
Source

This article was originally published by DEV Community and written by Artemii Amelin .

Read original article on DEV Community
Back to Discover

Reading List