Technology Apr 29, 2026 · 7 min read

I Let One AI Agent Ship My Entire iOS Portfolio. Here's What Broke.

Over the last two weeks, a single Claude Code agent — operating with my approval only at four "hard gates" (Apple Developer enrollment, payment info, App Store final-acceptance, and any third rejection of the same app) — has scaffolded, polished, and prepared four iOS apps for the App Store. Total...

DE
DEV Community
by 孫昊
I Let One AI Agent Ship My Entire iOS Portfolio. Here's What Broke.

Over the last two weeks, a single Claude Code agent — operating with my approval only at four "hard gates" (Apple Developer enrollment, payment info, App Store final-acceptance, and any third rejection of the same app) — has scaffolded, polished, and prepared four iOS apps for the App Store.

Total human time investment: probably 4 hours of decisions plus the time to read this post. The apps share an identity: every one is offline-first, ships a one-time IAP at $2.99, has zero analytics SDKs, and runs on a Privacy Manifest that is verifiable from the source code.

What follows isn't a "look how easy AI makes shipping" piece. It's a structural breakdown of where one autonomous agent works extraordinarily well, where it stalls, and the orchestration layer I had to build for it to operate without becoming a chaos engine.

The four apps (concrete, not abstract)

For credibility, the actual technical fingerprints:

  • AutoChoice (decision wheel, Lifestyle category) — com.jiejuefuyou.autochoice
  • AltitudeNow (offline barometric altimeter, no GPS, Health & Fitness) — uses CoreMotion's CMAltimeter
  • DaysUntil (countdown list, Productivity, no notifications) — single screen, JSON persistence
  • PromptVault (offline AI prompt manager with {{variable}} substitution, Productivity) — driven by the agent itself analyzing my Bilibili + Xiaohongshu saved-content history to find a real user need (this is the meta moment: agent built an app whose first user was the agent's own observed behavior pattern)

All four binaries run zero network calls. Verify:

nm -gU <app>.app/<app> | grep -iE 'URL|HTTP|Network'

returns nothing in any of them. The Privacy Manifest declares zero data collected; the source confirms it.

The orchestration layer (the part nobody else writes about)

Most "I used AI to build my app" posts stop at "I used Cursor / Claude Code." This one shows the layer underneath that makes the agent operable:

~/.claude/projects/.../memory/
  user_persona.md          — communication style; injected into every subagent prompt
  feedback_autonomy.md     — exactly what authority is delegated, exactly what isn't
  project_autoapp.md       — current state, identity facts, decision history

orchestrator/
  state.yml                — single source of truth across 4 product repos
  decisions.md             — append-only ADR log; every non-trivial choice has a paragraph
  RESUME.md                — what the agent reads first when re-entering a session
  verify_all.sh            — runs across all 4 repos, checks 32 ASC hard requirements
  setup-asc-secrets.sh     — one command to populate 8 secrets × 4 repos
  asc_sales_report.sh      — pulls daily TSV from App Store Connect API after launch
  dday_runbook.sh          — generates platform-specific launch posts on submission day

The lesson: the orchestration layer is the bottleneck, not the model. A 5-hour rolling token window is plenty if the agent doesn't waste cycles re-deriving context every session. The memory system + state.yml + ADR + RESUME.md is what makes the agent re-entrant. Without it, the same agent would burn half its context on "where was I?" instead of shipping code.

The toolkit is open source: github.com/jiejuefuyou/autoapp-toolkit (MIT).

Where it works extraordinarily well

  • Repository scaffolding. Cloning a working template into a new product, doing 50 search-and-replace operations, fixing the inevitable Swift "redeclaration of body" naming collision (because var body in a SwiftUI View clashes with the View.body requirement and you only catch it at compile-time), pushing to a fresh gh repo create-d origin. Hours → minutes.
  • Cross-platform reasoning. Every app is iOS but the build/sign/release surface area is one of the worst in software (Xcode + fastlane + match + StoreKit 2 + Privacy Manifest + ASC API). An agent that has all of this in context simultaneously beats the human pattern of "let me look up the right INFOPLIST_KEY_* again."
  • Cross-platform data ingestion. The agent built its own browser automation harness (Playwright + cookie polling) to pull 597 saved items from Bilibili (347) and Xiaohongshu (250), classify 63 of them as AI-related, and propose a fourth product (PromptVault) backed by direct user-behavior signal — then scaffolded that product. This is the kind of thing that's not "AI replaces engineer," it's "AI does something an engineer wouldn't bother to do."

Where it stalls

Be honest, this is what makes the post earn HN's respect:

  • Sensory taste decisions. App icons. Two attempts at CoreGraphics-drawn icons came out "fine but generic." A human designer in 30 minutes does better.
  • Domain expertise outside its training data. When Xiaohongshu added a new client-side request-signature scheme, the agent's first 5 attempts at the API endpoint failed silently. It needed me to say "the headers are signed by JS, you have to let the browser issue the request." Without that nudge it would have looped. (Full write-up coming in part 2 of this series.)
  • Reversible vs. irreversible action discrimination. Without explicit gates in feedback_autonomy.md, the agent over-asks ("should I commit this?") on reversible actions and under-asks on irreversible ones. The orchestration layer's job is to put those gates in the right places.

Pricing thesis (what makes this a startup story not just a tech demo)

Every utility iPhone app published in the last three years follows the same monetization playbook: subscription, $4.99-$9.99 per month, free tier crippled.

The thesis I'm testing: for utility apps, one-time IAP at $2.99 is a strictly better model in 2026.

  1. App Store conversion benchmark in Utilities: 48.6%. One-time IAP shows 25-45% higher purchase rate vs. subscription in this category (Adapty 2024 report).
  2. Subscriptions create churn surface area. One-time IAP has zero churn surface area.
  3. For utility apps, "lifetime value" is a fiction — most users uninstall within 30 days. The subscription model overprices early users and underprices the long tail.
  4. App Store reviewers (humans, with their own grievances) are softer on apps that don't pull subscription tricks.

If the AutoApp portfolio's first month shows revenue, the thesis confirms — and the agent can replicate it. If it shows $0, the thesis dies, and I fold this back into a personal-use side-quest. Either way it's testable.

What the agent didn't do

To preempt the snarky comment that says "you're just using AI as a glorified template engine":

The agent did not:

  • Write its own privacy policy from scratch (I reviewed and edited template output)
  • Make any decision about pricing, category, or "should I ship this app at all"
  • Touch my Apple Developer account or payment information
  • Ship anything to the App Store without my explicit go (still a hard gate)

It did:

  • Make 100% of small implementation decisions (variable names, file structure, error handling, when to refactor vs. when to leave it alone)
  • Decide which 4 apps to scaffold (with my opt-in to "go" on each)
  • Choose the brand identity per app (icons, color palette, copy tone)
  • Negotiate its own constraints — when it found a Swift 6 strict-concurrency warning that would block future Xcode upgrades, it fixed it without being asked, because feedback_autonomy.md says "fix the root cause, don't ship the symptom"

Coming next in this series

  • Part 2: How I sniffed Xiaohongshu's collection API in 90 seconds — and why CORS made me rewrite the whole approach. (Drops in 3-5 days.)
  • Part 3: Memory layers in a Claude Code agent — and why yours probably needs them.

The four apps will hit TestFlight as soon as Apple Developer enrollment clears. I'll publish first-week data either way it goes.

Either the orchestrator + agent layer makes shipping iOS apps a 4-hour decision cost, in which case the App Store is about to get a lot more crowded and the floor for "is this app worth it" rises. Or the model misses on something that humans currently don't even notice we're doing — and the next year is about discovering what that is.

Either way, the experiment is testable. I'll publish the revenue data either way.

Repos:

DE
Source

This article was originally published by DEV Community and written by 孫昊.

Read original article on DEV Community
Back to Discover

Reading List