Here's the short version: I went from an empty folder to a published npm package in one sitting. Nine TypeScript files. Forty-eight tests. Real cryptographic security. One external dependency. And I deployed the hosting infrastructure on top of that.
This isn't a story about asking ChatGPT to write code and hoping for the best. This is what happens when you treat AI like a team — with roles, rules, and accountability.
The Problem I Needed to Solve
I write MCP servers in Rust. (MCP is a protocol that lets AI tools talk to external services.) The problem? The tools I use only support packages from npm, PyPI, or container registries. There's no built-in way to run a compiled binary.
So if someone wanted to use my server, they had to manually download it, unzip it, set permissions, and wire up the file path. That's a lot of friction.
I wanted it to be this simple:
{
"mcpServers": {
"local-memory-mcp": {
"command": "npx",
"args": ["-y", "@mcp-bin/runner", "local-memory-mcp", "0.2.1"]
}
}
}
First run? Downloads the binary in about five seconds. Every run after that? Pulls from cache in under 100 milliseconds. That's it. That's the whole user experience.
Step One: Talk It Out, Then Spec It Out
Most people start a project by writing code. I started by having a conversation.
I sat down with Kiro — an AI coding agent — and just talked through what I wanted. Not in formal requirements language. Just: "Here's the problem. Here's how I want it to work. Here's what I care about." I described the user experience, the security concerns, the edge cases I was worried about. Kiro asked clarifying questions, pushed back on some assumptions, and helped me think through trade-offs.
From that conversation, Kiro generated a full requirements specification:
- 41 numbered requirements
- 12 security requirements
- 15 error codes
- 11 test scenarios
I didn't write that spec by hand. It came out of a back-and-forth dialogue — me explaining intent, Kiro turning it into a structured contract. Think of it like describing your dream house to an architect and walking away with a blueprint. The ideas were mine, but the precision came from the collaboration.
This is an important point: the spec wasn't an upfront tax I had to pay. It was a natural byproduct of explaining what I wanted.
Step Two: Let AI Reviewers Tear It Apart
Here's where it gets interesting. I set up 7 AI "review personas" — each one an expert in a different area:
| Persona | What They Focus On |
|---|---|
| Security | Crypto, input validation, file safety |
| Scalability | Performance, caching, handling multiple users |
| Reliability | Correctness, timeouts, cleanup |
| Maintainability | Code organization, tests, API design |
| Marketability | Developer experience, error messages, ease of adoption |
| Resilience | Failure recovery, state consistency |
| Profitability | Scope discipline, build vs. buy decisions |
I ran all seven against the spec. Not the code — the spec. And they found problems. Big ones.
The first review flagged 6 Critical and 17 High-severity issues. Some highlights:
- The checksum verification was useless. I was planning to verify file checksums against a manifest, but if an attacker compromises the server, they control both the file and the checksum. It's like locking your front door but leaving the key under the mat.
- No protection against path traversal. When you extract a zip file, a malicious archive could write files outside the intended folder. Classic security hole.
- No file locking. If two processes tried to update the cache at the same time, they'd corrupt each other's data.
I fixed every single one. Re-ran the reviews. Zero Critical, zero High. Now I had something worth building.
Step Three: Design the Interfaces First
Before writing any real code, I designed the contracts between components — basically, "here's what each piece expects as input and gives as output."
type CacheLookupResult =
| { hit: true; binaryPath: string }
| { hit: false };
Why does this matter? Because once you agree on the interfaces, you can build everything at the same time. The downloader doesn't need to wait for the cache manager to be finished. They just need to agree on the shape of the data they'll pass back and forth.
I also set up a "Chief Architect" persona whose job was to settle disagreements. When four reviewers wanted to expand a security denylist, the Chief Architect said no — use a simple escape hatch instead of maintaining an ever-growing list. One decision, one line of config, problem solved.
Step Four: Build Everything in Parallel
Five components, built simultaneously by independent AI agents:
- Downloader — fetches files over HTTPS with automatic retries
- Extractor — unpacks archives safely (blocks path traversal tricks)
- Cache Manager — stores binaries with file locking so nothing gets corrupted
- Manifest Client — verifies cryptographic signatures on the server registry
- Process Runner — launches the binary and forwards system signals cleanly
Because every component was coded against the same interfaces, they snapped together like LEGO bricks. No "wait, your output doesn't match my input" surprises.
Step Five: Review After Every Phase
After each round of implementation, all 7 personas reviewed again. This caught things that only show up in real code:
- HTTP redirects weren't followed. GitHub Releases always redirects you to a CDN. Node.js's built-in HTTPS module doesn't follow redirects automatically. This would have silently failed for every single user.
-
Signal handlers skipped cleanup. When you force-quit a program,
finallyblocks don't always run. Lock files would have been left behind, blocking future runs. - Cached signatures could be replayed. An attacker could reuse an old signature to trick the system into accepting an outdated (potentially vulnerable) manifest.
Each of these would have been a real bug in production. Caught before shipping.
The Final Scorecard
| What | How Much |
|---|---|
| Total time | ~6.5 hours |
| Source files | 9 modules |
| Production code | ~750 lines |
| Tests | 48 (38 unit + 10 integration) |
| External dependencies | 1 |
| Architecture decisions documented | 10 |
| Critical/High bugs fixed before shipping | 14 |
Why Bounded Scope Is the Secret Sauce
Early on, I had a noise problem. Four personas would flag the same issue. The false positive rate was 30-40%. Research shows developers stop reading AI review comments after a couple of sprints of noise — they just tune it out.
The fix was simple but powerful:
- Each persona owns specific concerns exclusively — no overlap
- Each persona has a "do NOT review" list
- "No findings" is a perfectly valid output
- Every finding must include a Tradeoff field — what does the fix cost?
After making these changes, the false positive rate dropped below 10%. Every comment was actionable. No duplicates.
This is fundamentally different from tools like GitHub Copilot code review, which run one model with one prompt and produce overlapping, noisy output.
The Security Model (Because It Matters)
This package executes binaries on your machine. The trust chain has to be solid:
- The manifest (the list of available servers) is cryptographically signed
- Every downloaded archive is verified against a checksum before extraction
- Archive extraction blocks path traversal, symlinks, and absolute paths
- All downloads are HTTPS-only — even after redirects
- Sensitive environment variables are stripped before launching the binary
- The cache uses atomic writes with file locking — no half-written files
The Security persona reviewed the project four separate times across its lifecycle. Security wasn't an afterthought — it was baked into every phase.
Expand: Full security review cycle
- Spec phase: Caught missing manifest signing, path traversal, no file locking
- Design phase: Validated crypto choices (Ed25519 over RSA), env var filtering approach
- Implementation phase: Found missing HTTPS enforcement on redirects, signal handler cleanup gaps
- Integration phase: Identified cached signature replay attack vector
Deploying the Infrastructure
After publishing to npm, I needed somewhere to host the manifest. Set up in about 5 minutes:
- S3 bucket for the registry files
- CloudFront for HTTPS delivery
- ACM certificate for the custom domain
- Route53 to point the domain at CloudFront
Adding a new server version is a few shell commands — update the manifest, sign it, upload it. Done.
How This Compares to "Just Vibing with AI"
I use AI coding tools every day for quick fixes and exploration. This was different. This was orchestrated.
| Approach | Great For | Not Great For |
|---|---|---|
| Interactive (Cursor, Claude Code) | Quick fixes, exploration, refactoring | Multi-file architecture, consistency |
| Autonomous (Devin-style) | Boilerplate, migrations | Nuanced logic (67% merge rate in benchmarks) |
| Spec-driven + review gates (this approach) | Correctness, security, parallel work | Requires ~35 min upfront conversation |
The spec costs about 35 minutes of conversation upfront. Not writing — just talking through what you want and letting the AI structure it. It pays back immediately: parallel implementation, no rework, and security issues caught before code even exists.
Try It
Install @mcp-bin/runner on npm
npx @mcp-bin/runner <server-name> <version>
chriswessells
/
mcp-bin
Generic runner and manifest registry for distributing prebuilt Rust MCP servers via npm
mcp-bin
A generic runner for distributing prebuilt native MCP servers through Kiro's npm-based MCP registry.
The Problem
Kiro's MCP registry supports npm, pypi, and oci packages — but not compiled binaries from languages like Rust, Go, or C++. Server authors have to rely on manual installation, which doesn't integrate with enterprise registry allowlists or provide automatic versioning and caching.
How It Works
flowchart TD
A["npx @mcp-bin/runner server version"] --> B[Signed Manifest]
B --> C[GitHub Releases]
C -->|"SHA256 verified"| D[~/.cache/mcp-bin/]
D --> E[exec over stdio]
-
Runner (
@mcp-bin/runner) — downloads, verifies, caches, and executes native binaries -
Manifest — Ed25519-signed JSON mapping
{server, version, platform}→ download URL + SHA256 -
Server binaries — prebuilt
.tar.gzarchives on GitHub Releases
Install
npm install -g @mcp-bin/runner
Or use directly via npx (no install needed):
npx @mcp-bin/runner <server-name> <version>
Usage
In a Kiro/MCP agent config
{
"mcpServers": {
"local-memory-mcp":…
The full development process — every spec, design doc, review finding, and architecture decision — is in the repo. Everything about how this software was built is transparent and version-controlled.
What I Took Away from This
- Writing a spec first isn't slow — it's faster. Especially when you're not writing it alone. A 35-minute conversation with an AI agent produced a spec that eliminated the entire rework cycle.
- Give each AI reviewer a lane. Exclusive ownership eliminates duplicate noise.
- Define interfaces before code. It's the unlock for parallel implementation.
- Have a "Chief Architect" to say no. Without one, review pressure slowly inflates scope.
- The process matters more than the code. The code is just the output. The process is what makes it trustworthy.
The AI coding world is optimizing for either speed or correctness. This workflow gets both — speed from parallel execution, correctness from structured review at every phase.
Have you tried structuring AI-assisted development with review gates or personas? I'd love to hear what worked (or didn't) for you — drop a comment below.
This article was originally published by DEV Community and written by Chris Wessells.
Read original article on DEV Community