1,294 commits across two repos in 61 days. 50 of those were active working days — about 26 commits per day. I dont know JavaScript. I dont know TypeScript. I dont know SQL. I have never written a function, defined a variable, or read a stack trace...
The product is Gate. It is a desktop application where AI workers walk between four desks executing developer tickets. It is in beta on Linux, with 281 IPC handlers, 21,724 lines in main.js, and a 4-desk pipeline that runs against any model provider you bring keys for. I built it as a solo founder over March and April.
This post is about how that actually worked, where the workflow broke, and what I think it means.
What I actually do all day-
I am not the engineer. I am the product manager and the eyes.
The workflow has three layers:
PM Claude — the orchestration layer. I describe what I want or what is broken in plain English. PM Claude asks clarifying questions, pushes back when my request is wrong, and writes structured prompts to send to the implementation agents. Every prompt I copy-paste into a terminal was generated here.
Opus — the architecture and investigation agent. Runs in Claude Code with --dangerously-skip-permissions. Owns hard bugs, multi-file refactors, and any work where I cannot describe the fix because I do not know what file the bug lives in. I read its summaries, not its code.
Codex — the surgical fix agent. Bounded one-liner work with explicit grep-before-touch constraints. When I know exactly what string needs to change in exactly which file, Codex changes it.
I commit. I do not write commits. I read commit messages and decide whether to push.
The hardest part of this workflow is not the prompting. It is noticing when something is wrong by looking at the running app. The robot did not walk to the right desk. The cost meter shows zero when I know it spent something. The dropdown shows "unconfigured" for a provider I just configured. I take a screenshot, drop it into PM Claude, describe what I see, and an investigation prompt comes out the other side.
If you can read code, you can audit a diff before you commit. I cannot. I audit the running product. The diff is opaque to me; the screenshot is not.
The receipts-
Aggregated stats across the Gate repo and the soliddark.net homepage repo, March 3 to April 27, 2026:
1,294 commits across the two repos
1,143 commits to Gate alone, all branches
151 commits to the soliddark.net homepage (the landing page, the trial flow, the webhook handler, the admin dashboard)
50 active working days out of a 55-day window
281 IPC handlers in electron/main.js
21,724 lines in main.js
31,534 lines in gate_ui_v2.html
17 production dependencies, 12 dev dependencies
4 specialized agent desks (Kitty, Strategist, Engineer, Auditor)
3 model providers supported (Anthropic, OpenAI, local Ollama via Rashomon, the local LLM gateway sidecar)
The largest single commit (a pre-launch architecture audit, hash ed3968d) touched 38 files with +55,053 / -3,339. About 49,500 of that was vendored Three.js and React for the 3D robot rendering layer; the hand-touched delta was ~5,500 lines across 26 files. One prompt produced that.
A real ticket, end to end
One ticket from the live system, last week:
A robot named Orochimaru picked up a documentation gap ticket on a Tauri sidecar example. Total wall clock: 14 minutes 51 seconds. The four desks split the work:
Kitty (creative summary): 13 seconds
Strategist (planning): 11 seconds
Engineer (writing): 169 seconds — produced a new file, examples/state-manager.html, 426 lines
Auditor (verification): 97 seconds — verdict APPROVED
Models used: Haiku 4.5 for the lighter desks, Sonnet 4.6 for Engineer and Auditor.
Total telemetry-attributed cost: $0.005589.
Half a cent. For one ticket that wrote 426 lines of working HTML and validated itself.
I watched it happen on the Gate UI. A pixel-art robot walked from desk to desk in a 2D cross-section of the workspace. Speech bubbles fired above its head as each desk reported in. The cost meter ticked up in real time. When Auditor approved, the robot did a celebration emote and walked back to the break room.
I did not write a line of the HTML. I couldnt have if I tried.
When the workflow breaks-
The honest version of the story includes the times it does not work.
Earlier this month, Gate had a bug where new users selecting OpenAI, Codex CLI, or Ollama still saw "configured for Claude" in the connection status dropdown. The product was supposed to be model-agnostic. Visually, it was not.
The bug had four leaks, in four different files, all of which independently defaulted to a Claude model string when no explicit provider was set. Three of them got fixed quickly. The fourth, in a function called gate:get-connection-status, was the one that survived because it lived in a layer no one suspected. It was the only layer the user actually saw.
I caught it because I opened Gate after the "fix" deploy and the dropdown still said wrong things. I took a screenshot. I told PM Claude what I saw. PM Claude wrote an audit prompt for Opus. Opus found the missed function in commit 10b9615. Codex applied the one-line fix.
The "I don't write code" claim has a real caveat: I am the regression detector. The agents wrote the fix, but they would not have known to look for it without a human staring at the running app and noticing what was wrong.
A different bug — a breakroom dual-mesh animation — took eight commits across a week to chase down (327d766, 75c06a2, 2078ebe, df347a3, 066b09e, c311dcc, two more after). Each fix introduced a new visual artifact. Each artifact required me to notice it and describe it. The agents kept finding plausible-looking solutions that were wrong because they could not see the result. I could.
What this actually is-
The bottleneck of building software has moved.
It used to be: can you write the code? You either could or you couldn't, and if you couldn't you needed someone who could before you could ship anything.
Now the bottleneck is: can you specify what you want with enough precision that an agent can build it, can you notice when the agent's output is wrong, and can you direct the next prompt? Those are not coding skills. They are product skills, communication skills, and pattern recognition.
I am not arguing that this is true for all software. Compilers, kernels, distributed systems, anything where the cost of a wrong assumption is catastrophic — those still need engineers who can read every line. I would not direct an agent to write firmware for a pacemaker. I would not deploy AI-generated cryptography. There is a real category of software where "I cannot read code" is a disqualification, not a story.
But Gate is a desktop application. Electron, Tauri, SQLite, JavaScript, HTML, Three.js. The patterns are well-understood. The agents have seen a million examples. The cost of a wrong line is a bug I can see not a system that explodes silently. For this category of software the question of whether I can read the code matters less than whether I can recognize when the product is doing the wrong thing.
The honest weaknesses are obvious.
Telemetry attribution has gaps — three desks in the Orochimaru ticket showed $0 cost when the actual spend was real. Linux-only at v1.0; Mac is three weeks out behind Apple Developer enrollment. The custom Tauri auto-updater is fragile. Vendored code inflates the LOC numbers. Cargo.toml drifted at version 0.1.0 for weeks before someone caught it. None of these would have been caught by reading the diffs. All of them were caught by using the product.
What I learned-
If you cant code and you want to build software with AI agents, the things that matter most are:
Be the eyes. The agent cannot see the running product. You can. Your job is not the diff; it is the screenshot. Take more screenshots than feels reasonable. Drop them into your PM agent constantly.
Specify like an engineer talks to another engineer. Vague requests get vague code. "The button looks weird" is a useless prompt. "The download button on gate.html line 671 still routes to GitHub directly instead of through the proxy at /api/download/gate" is a workable one. You learn the precision over time. You do not need to know what an IPC handler is to describe one accurately if you have read enough commit messages.
Use the audit-first pattern. Do not let an agent jump straight from problem to fix on hard bugs. Have it write a read-only audit document first, naming the suspected files and root cause hypotheses. You read the audit. You approve the implementation pass. The audit takes ten minutes and saves hours of wrong-direction fixes.
Trust your instincts when something feels off. I caught the model-agnostic bug because I looked at the dropdown and it felt wrong. I caught a dashboard reporting a number 36x larger than reality because the panel value made my gut hurt. I do not have technical reasons for these instincts; I have product instincts. Those are valid.
Commit boundaries are sacred. I review every commit message before pushing. The diff itself is opaque to me, but the message tells me what the agent thought it did. If the message and the behavior I see in the product disagree, that is the bug.
The work this took Two months.
Roughly 14-hour days, 7 days a week.
About 2,000 hours of human attention even though I wrote no code by hand.
The agents did the typing. I did the looking, the deciding, and the prompting.
Gate's beta is live on Linux. Mac is May. The trial is 3 days, no card. If you want to see what 1,294 commits of structured prompting produces, the link is at the bottom.
I am not arguing this is the future of all software. I am one data point. But I am the data point standing here, holding a working multi-agent desktop OS, having never written a function in my life.
The bottleneck moved and Im proof of where it moved to.
Gate beta — Linux, $50/mo, 3-day free trial, no card
soliddark.net/gate
Repo (public release): github.com/AkrijSama/gate-public
This article was originally published by DEV Community and written by AkrijSama.
Read original article on DEV Community