Here's the short version: I went from an empty folder to a published npm package in one sitting. Nine TypeScript files. Forty-eight tests. Real cryptographic security. One external dependency. And I deployed the hosting infrastructure on top of that.

This isn't a story about asking ChatGPT to write code and hoping for the best. This is what happens when you treat AI like a team — with roles, rules, and accountability.

The Problem I Needed to Solve

I write MCP servers in Rust. (MCP is a protocol that lets AI tools talk to external services.) The problem? The tools I use only support packages from npm, PyPI, or container registries. There's no built-in way to run a compiled binary.

So if someone wanted to use my server, they had to manually download it, unzip it, set permissions, and wire up the file path. That's a lot of friction.

I wanted it to be this simple:

{
  "mcpServers": {
    "local-memory-mcp": {
      "command": "npx",
      "args": ["-y", "@mcp-bin/runner", "local-memory-mcp", "0.2.1"]
    }
  }
}

First run? Downloads the binary in about five seconds. Every run after that? Pulls from cache in under 100 milliseconds. That's it. That's the whole user experience.

Step One: Talk It Out, Then Spec It Out

Most people start a project by writing code. I started by having a conversation.

I sat down with Kiro — an AI coding agent — and just talked through what I wanted. Not in formal requirements language. Just: "Here's the problem. Here's how I want it to work. Here's what I care about." I described the user experience, the security concerns, the edge cases I was worried about. Kiro asked clarifying questions, pushed back on some assumptions, and helped me think through trade-offs.

From that conversation, Kiro generated a full requirements specification:

41 numbered requirements
12 security requirements
15 error codes
11 test scenarios

I didn't write that spec by hand. It came out of a back-and-forth dialogue — me explaining intent, Kiro turning it into a structured contract. Think of it like describing your dream house to an architect and walking away with a blueprint. The ideas were mine, but the precision came from the collaboration.

This is an important point: the spec wasn't an upfront tax I had to pay. It was a natural byproduct of explaining what I wanted.

Step Two: Let AI Reviewers Tear It Apart

Here's where it gets interesting. I set up 7 AI "review personas" — each one an expert in a different area:

Persona	What They Focus On
Security	Crypto, input validation, file safety
Scalability	Performance, caching, handling multiple users
Reliability	Correctness, timeouts, cleanup
Maintainability	Code organization, tests, API design
Marketability	Developer experience, error messages, ease of adoption
Resilience	Failure recovery, state consistency
Profitability	Scope discipline, build vs. buy decisions

I ran all seven against the spec. Not the code — the spec. And they found problems. Big ones.

The first review flagged 6 Critical and 17 High-severity issues. Some highlights:

The checksum verification was useless. I was planning to verify file checksums against a manifest, but if an attacker compromises the server, they control both the file and the checksum. It's like locking your front door but leaving the key under the mat.
No protection against path traversal. When you extract a zip file, a malicious archive could write files outside the intended folder. Classic security hole.
No file locking. If two processes tried to update the cache at the same time, they'd corrupt each other's data.

I fixed every single one. Re-ran the reviews. Zero Critical, zero High. Now I had something worth building.

Step Three: Design the Interfaces First

Before writing any real code, I designed the contracts between components — basically, "here's what each piece expects as input and gives as output."

type CacheLookupResult =
  | { hit: true; binaryPath: string }
  | { hit: false };

Why does this matter? Because once you agree on the interfaces, you can build everything at the same time. The downloader doesn't need to wait for the cache manager to be finished. They just need to agree on the shape of the data they'll pass back and forth.

I also set up a "Chief Architect" persona whose job was to settle disagreements. When four reviewers wanted to expand a security denylist, the Chief Architect said no — use a simple escape hatch instead of maintaining an ever-growing list. One decision, one line of config, problem solved.

Step Four: Build Everything in Parallel

Five components, built simultaneously by independent AI agents:

Downloader — fetches files over HTTPS with automatic retries
Extractor — unpacks archives safely (blocks path traversal tricks)
Cache Manager — stores binaries with file locking so nothing gets corrupted
Manifest Client — verifies cryptographic signatures on the server registry
Process Runner — launches the binary and forwards system signals cleanly

Because every component was coded against the same interfaces, they snapped together like LEGO bricks. No "wait, your output doesn't match my input" surprises.

Step Five: Review After Every Phase

After each round of implementation, all 7 personas reviewed again. This caught things that only show up in real code:

HTTP redirects weren't followed. GitHub Releases always redirects you to a CDN. Node.js's built-in HTTPS module doesn't follow redirects automatically. This would have silently failed for every single user.
Signal handlers skipped cleanup. When you force-quit a program, finally blocks don't always run. Lock files would have been left behind, blocking future runs.
Cached signatures could be replayed. An attacker could reuse an old signature to trick the system into accepting an outdated (potentially vulnerable) manifest.

Each of these would have been a real bug in production. Caught before shipping.

The Final Scorecard

What	How Much
Total time	~6.5 hours
Source files	9 modules
Production code	~750 lines
Tests	48 (38 unit + 10 integration)
External dependencies	1
Architecture decisions documented	10
Critical/High bugs fixed before shipping	14

Why Bounded Scope Is the Secret Sauce

Early on, I had a noise problem. Four personas would flag the same issue. The false positive rate was 30-40%. Research shows developers stop reading AI review comments after a couple of sprints of noise — they just tune it out.

The fix was simple but powerful:

Each persona owns specific concerns exclusively — no overlap
Each persona has a "do NOT review" list
"No findings" is a perfectly valid output
Every finding must include a Tradeoff field — what does the fix cost?

After making these changes, the false positive rate dropped below 10%. Every comment was actionable. No duplicates.

This is fundamentally different from tools like GitHub Copilot code review, which run one model with one prompt and produce overlapping, noisy output.

The Security Model (Because It Matters)

This package executes binaries on your machine. The trust chain has to be solid:

The manifest (the list of available servers) is cryptographically signed
Every downloaded archive is verified against a checksum before extraction
Archive extraction blocks path traversal, symlinks, and absolute paths
All downloads are HTTPS-only — even after redirects
Sensitive environment variables are stripped before launching the binary
The cache uses atomic writes with file locking — no half-written files

The Security persona reviewed the project four separate times across its lifecycle. Security wasn't an afterthought — it was baked into every phase.

Expand: Full security review cycle

Spec phase: Caught missing manifest signing, path traversal, no file locking
Design phase: Validated crypto choices (Ed25519 over RSA), env var filtering approach
Implementation phase: Found missing HTTPS enforcement on redirects, signal handler cleanup gaps
Integration phase: Identified cached signature replay attack vector

Deploying the Infrastructure

After publishing to npm, I needed somewhere to host the manifest. Set up in about 5 minutes:

S3 bucket for the registry files
CloudFront for HTTPS delivery
ACM certificate for the custom domain
Route53 to point the domain at CloudFront

Adding a new server version is a few shell commands — update the manifest, sign it, upload it. Done.

How This Compares to "Just Vibing with AI"

I use AI coding tools every day for quick fixes and exploration. This was different. This was orchestrated.

Approach	Great For	Not Great For
Interactive (Cursor, Claude Code)	Quick fixes, exploration, refactoring	Multi-file architecture, consistency
Autonomous (Devin-style)	Boilerplate, migrations	Nuanced logic (67% merge rate in benchmarks)
Spec-driven + review gates (this approach)	Correctness, security, parallel work	Requires ~35 min upfront conversation

The spec costs about 35 minutes of conversation upfront. Not writing — just talking through what you want and letting the AI structure it. It pays back immediately: parallel implementation, no rework, and security issues caught before code even exists.

Try It

Install @mcp-bin/runner on npm

npx @mcp-bin/runner <server-name> <version>

chriswessells / mcp-bin

Generic runner and manifest registry for distributing prebuilt Rust MCP servers via npm

mcp-bin

A generic runner for distributing prebuilt native MCP servers through Kiro's npm-based MCP registry.

The Problem

Kiro's MCP registry supports npm, pypi, and oci packages — but not compiled binaries from languages like Rust, Go, or C++. Server authors have to rely on manual installation, which doesn't integrate with enterprise registry allowlists or provide automatic versioning and caching.

How It Works

flowchart TD
    A["npx @mcp-bin/runner server version"] --> B[Signed Manifest]
    B --> C[GitHub Releases]
    C -->|"SHA256 verified"| D[~/.cache/mcp-bin/]
    D --> E[exec over stdio]

Runner (@mcp-bin/runner) — downloads, verifies, caches, and executes native binaries
Manifest — Ed25519-signed JSON mapping {server, version, platform} → download URL + SHA256
Server binaries — prebuilt .tar.gz archives on GitHub Releases

Install

npm install -g @mcp-bin/runner

Or use directly via npx (no install needed):

npx @mcp-bin/runner <server-name> <version>

Usage

In a Kiro/MCP agent config

{
  "mcpServers": {
    "local-memory-mcp":

…

View on GitHub

The full development process — every spec, design doc, review finding, and architecture decision — is in the repo. Everything about how this software was built is transparent and version-controlled.

What I Took Away from This

Writing a spec first isn't slow — it's faster. Especially when you're not writing it alone. A 35-minute conversation with an AI agent produced a spec that eliminated the entire rework cycle.
Give each AI reviewer a lane. Exclusive ownership eliminates duplicate noise.
Define interfaces before code. It's the unlock for parallel implementation.
Have a "Chief Architect" to say no. Without one, review pressure slowly inflates scope.
The process matters more than the code. The code is just the output. The process is what makes it trustworthy.

The AI coding world is optimizing for either speed or correctness. This workflow gets both — speed from parallel execution, correctness from structured review at every phase.

Have you tried structuring AI-assisted development with review gates or personas? I'd love to hear what worked (or didn't) for you — drop a comment below.

DE

Source

This article was originally published by DEV Community and written by Chris Wessells.

Read original article on DEV Community

Back to Discover

I Built an npm Package in 6.5 Hours with AI Agents — And It Actually Works