I'm not an engineer. I'm a non-developer who decided, in late 2025, that I wanted Claude Code to keep working when I was asleep.
Six months later, that decision compiled into 857 autonomous sessions, 840 self-merged pull requests, and 17,328 evolution cycles distributed across 49 launchd services — all running on a 2015 MacBook Air with 8GB of RAM.
Yesterday I open-sourced the framework that kept it alive: claude-autonomous-kit.
This post is the long version of what I wrote in r/ClaudeAI. If you've ever wondered what breaks when you actually try to run an LLM agent on a schedule — not just demo it — this is the field report.
The problem nobody warns you about
Claude Code is excellent in interactive mode. You sit down, you ask, you iterate, you ship. The friction is tiny.
But the moment you try to run it without you in the chair, three things break:
- Every session starts blank. Claude can't remember what it did yesterday. It can't remember what it decided two hours ago.
- Failures cascade silently. A scheduled session fails at 3am. The next scheduled session has no idea. By the time you wake up, you have a corrupted state and no audit trail.
- Git becomes a graveyard. The agent creates branches, half-finishes work, gets interrupted, moves on. Within a week you have 30+ orphan branches and no idea which ones contain anything valuable.
These aren't theoretical. I lived through all of them. The framework exists because each one cost me real time before I built guardrails for it.
What the framework actually does
claude-autonomous-kit is not another agent framework. There's no DSL, no graph, no orchestration layer. It's a small set of Bash hooks and Python scripts that wrap the existing claude CLI to make scheduled operation survivable.
The five pieces:
1. launchd plists — schedules sessions (macOS native, no daemon to maintain)
2. session-start hook — injects the previous session's state into context
3. session-end hook — extracts what happened from the transcript JSONL
4. scope-guard hook — blocks Claude from modifying protected files
5. healthcheck — independent watcher; opens GitHub Issues on failure
That's the whole thing. No magic. Each piece is under 200 lines.
Three lessons that took me months to land on
1. git stash breaks under autonomous operation
The first version stashed working changes before each session. It seemed clean. It wasn't.
Sessions get interrupted. They time out. They get killed by OOM. When that happens, the stash is still there but the session that owned it is gone. The next session has no idea what those stashed changes were or whether they're safe to pop.
I now never use stash. Every WIP gets committed to a dedicated branch. If a session dies mid-work, the next one inherits an obvious "previous session left you this branch" state. Stash is invisible; a branch is visible.
2. AI summaries of sessions are unreliable
The intuitive answer is: at the end of each session, ask Claude what it did. Use that as the handover for tomorrow's session.
Don't.
Claude will give you a confident, fluent answer that is partially fabricated. Not because it lies — because the model genuinely doesn't have perfect introspection over its own tool calls within a long context. It will report files it didn't touch. It will skip files it did. The discrepancy is small in any one session and catastrophic when it compounds across 30 sessions.
The actual source of truth is the transcript JSONL that Claude Code writes. Parse it. Extract tool calls deterministically. The handover should be structured data extracted from the transcript, not a prose summary written by the model.
3. CLAUDE.md is your AI's constitution
Treat it like the US Constitution: default freedom, amendments only when something actually fails.
Mine has 9 amendments right now. Each one has a date, a context, and a specific incident:
-
Amendment 7: Review your own code before committing. Added after
eval()shipped to production because nothing intercepted it. - Amendment 8: Every push must result in a merged PR. Added after 33 orphan branches accumulated from agents that pushed but never opened the PR.
- Amendment 9: Inaction is also failure. Added after a session spent 40 turns asking "should I proceed?" instead of executing the obviously-correct action.
The shape of these matters. They are not preventive rules invented from imagination. They are post-incident scar tissue. A constitution full of preemptive rules stops being read; a constitution where every line maps to a real bruise gets respected.
I'd rather have 9 amendments earned through pain than 90 written from anxiety.
What it looks like in numbers (May 2026)
After six months of refinement, here's what one MacBook Air sustains:
- 857 autonomous Claude Code sessions logged
- 840 PRs auto-created and auto-merged
- 17,328 evolution cycles across 49 background launchd services (trading bots, content engines, healthchecks, briefings)
- ~$2 average API cost per autonomous session
- 0 broken deployments shipped to anything user-facing
That last number is the one I'm most quietly proud of. The scope guard is boring. It just refuses to let Claude touch certain files (the constitution, the hooks themselves, the secrets). But "boring and refuses" is exactly what you want from a guardrail. It's caught about 40 attempted edits over six months — every one of them would have been embarrassing.
What still hurts
I want to be honest about the unfinished parts.
- Claude Code 2.1.113+ requires macOS 13. My 2015 MBA is stuck on Big Sur. I'm pinned to an older version until a Mac mini migration happens (planned for July 2026 with an M4 24GB — a separate decision documented in the repo).
-
Hallucinations that look correct are still my biggest unsolved problem. When Claude confidently writes a function that compiles and sort of runs but is subtly wrong, no current guardrail catches it before merge. I rely on
python3 -m py_compileand a human eyeball, neither of which is sufficient. - Recovering from prolonged outages is manual. When Anthropic has a quality regression that lasts hours, my whole pipeline degrades and I haven't built a "skip this session and resume tomorrow" path that's smarter than just letting healthcheck open Issues.
These are the things I'd want a community of people doing the same thing to push back on.
Three open questions
If you're running Claude Code (or any agent) on a schedule:
-
How do you handle the blank-slate problem? Hooks? System prompts that re-inject context?
/memory? Something custom? - Anyone else running Claude Code via launchd, cron, or systemd? What broke first when you tried it?
- For the autonomous-PR pattern — do you trust auto-merge, or do you have a human gate? I auto-merge with scope-guard as the safety net. Curious whether others gate harder.
I'd love to hear how other people have solved (or failed to solve) the same problems.
The repo
github.com/uzucky/claude-autonomous-kit
It includes two skills you can install today: autonomous-bootstrap (sets up the launchd plists and hooks for you) and autonomous-handover (the structured-extraction handover writer).
Discussion is also happening in the r/ClaudeAI Project Showcase Megathread if you prefer that channel.
If something here resonated — or if you've solved one of the open questions better than I have — I'd genuinely like to hear from you. Issues, PRs, or just a comment below all work.
This article was originally published by DEV Community and written by Suzu Idaki.
Read original article on DEV Community