I burned a week comparing Inngest and AWS Step Functions so you don't have to

The setup

I was building an AI article generator. Four phases:

Research — pull live SERP data, extract entities, score competitors
Brief — Claude Sonnet writes a structured brief from the SERP
Draft — section-by-section drafting in a fixed persona voice
Polish — adversarial review pass that flags AI tells

End-to-end: 6–8 minutes. Each phase: 60–180 seconds. Multiple LLM calls per phase. Token streaming back to the user the whole time.

The first version was a single Vercel API route. It worked locally. It died in production.

Why Vercel kills you at 300s

Vercel Pro tops out at 300s per function. My pipeline needs 480s+. A naive await chain in a single route returns a 504 halfway through draft phase.

Worse: the user closes the tab. The HTTP connection drops. The function dies. State is lost. The draft is gone. Refund issued.

This is the real problem. Long-running AI workflows can't live in request–response. They need durability — the workflow continues even when the user disconnects, and resumes streaming when they come back.

That's a durable workflow engine. Two real options:

AWS Step Functions — battle-tested, AWS-native, ASL definition language
Inngest — newer, Next.js-native, code-as-config

I tried both. Here's the decision tree.

How AWS Step Functions works

Step Functions is a state-machine service. You define the workflow in Amazon States Language (ASL) — a JSON DSL with states like Task, Choice, Parallel, Wait. Each Task points to a Lambda function (or another AWS service: SQS, DynamoDB, EventBridge, etc).

{
  "Comment": "Article pipeline",
  "StartAt": "Research",
  "States": {
    "Research": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:research",
      "Next": "Brief"
    },
    "Brief": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:123:function:brief",
      "Next": "Draft"
    }
  }
}

You upload Lambdas. You define the state machine. Step Functions executes it. Each state's output becomes the next state's input. Failures retry per the ASL config.

Strengths:

220+ AWS service integrations native (no Lambda wrapper needed)
Visual workflow studio (debugging is genuinely good)
Pay per state transition ($0.025 per 1k transitions)
Battle-tested at FAANG scale
ASL is a portable standard

Weaknesses:

ASL is a separate language from your app code
Workflow definition lives outside your Next.js repo (Terraform / CDK)
Lambda cold starts on every state transition
AWS-account-coupled — local dev requires LocalStack or sam local
Vercel hosting + AWS Step Functions = two bills, two consoles, two deploy flows

How Inngest works

Inngest is also a durable workflow engine, but the programming model is different. You write a TypeScript function with step.run() calls. Inngest invokes your function as an HTTP endpoint, executes one step, persists the result, then re-invokes the function with the next step.

import { Inngest } from "inngest";

const inngest = new Inngest({ id: "briefworks" });

export const generateArticle = inngest.createFunction(
  { id: "generate-article" },
  { event: "article.requested" },
  async ({ event, step }) => {
    const research = await step.run("research", async () =>
      runSerpResearch(event.data.keyword)
    );

    const brief = await step.run("brief", async () =>
      generateBrief(research)
    );

    const draft = await step.run("draft", async () =>
      draftSections(brief)
    );

    const polished = await step.run("polish", async () =>
      reviewDraft(draft)
    );

    return polished;
  }
);

The function lives in your Next.js repo at /api/inngest. You deploy with the rest of your app. No separate infrastructure.

Strengths:

One repo, one deploy
TypeScript everywhere, no DSL
Local dev is npx inngest-cli dev — actual local replay
Built-in concurrency, throttling, debounce, idempotency
Step memoization (more on this below)
Free tier covers most pre-revenue projects

Weaknesses:

Smaller ecosystem (no 220+ AWS native integrations)
Newer (founded 2022 — you're betting on the company)
Self-hosting is possible but not the default path
Visual debugging is improving but not Step-Functions tier

The technical fork — replay vs memoization

This is the part nobody explains clearly.

Step Functions replay model: Step Functions never re-runs your code. It calls Lambda once per state. The state machine itself holds the workflow position. Each Lambda is stateless and only knows its inputs.

Inngest memoization model: Inngest re-invokes your entire function on each step, but step.run() checks if that step's result is already persisted. If yes, it returns the cached result without executing the inner closure. If no, it runs the closure and persists.

The implication: in Inngest, code outside step.run() runs every time the function re-invokes. If you have side effects outside step.run(), you'll execute them N times for N steps.

// WRONG — this runs once per step, you'll log 4 times
console.log("Starting article gen");
const research = await step.run("research", () => runSerp());
const brief = await step.run("brief", () => writeBrief(research));

// RIGHT — wrap side effects in step.run()
await step.run("log-start", () => console.log("Starting"));
const research = await step.run("research", () => runSerp());

For AI workflows specifically, this matters a lot. LLM calls go inside step.run() — they only run once per workflow even if the function re-invokes. Token-counting, streaming setup, persistence — all stay outside, but execute deterministically each invocation.

Step Functions sidesteps this entirely because it never re-invokes. But you pay for it in language overhead — every cross-state value goes through ASL JSON serialization.

Worked example — cost on 1k articles/month

Assume each article = 4 phases × 2 LLM calls = 8 step transitions. Plus retries, logging steps: ~12 step transitions per article.

Step Functions:

12k state transitions × $0.025/1k = $0.30
4 Lambda invocations × 60s × $0.0000166667/GB-s × 1024MB = ~$0.004
Total per article ≈ $0.005
1k articles ≈ $5/month
Plus Lambda concurrency limits, plus VPC config if needed

Inngest:

Cloud Pro: $20/month base + $0.005 per step run
12k step runs × $0.005 = $60
Total ≈ $80/month for 1k articles
Or self-host: $0 + your server costs

For tiny volume, Step Functions wins on cost. For free-tier scale, Inngest's free plan (50k step runs/month) wins.

But cost wasn't the deciding factor for me. Setup time was.

When Step Functions wins

Pick Step Functions if:

You're already heavy AWS. Your team writes ASL, debugs in CloudWatch, deploys via CDK/Terraform. Step Functions is one more service you already operate.
You need 220+ native integrations — DynamoDB, S3, SNS, SES, EventBridge, ECS tasks, etc. Inngest can't match this without you writing wrapper functions.
Visual workflow approval flows — long-running workflows with human-in-the-loop steps (manager approves a state). Step Functions has Wait and Activity states that match this pattern natively.
Regulated data residency — workflow state must live in your AWS account, in a specific region. Inngest Cloud doesn't give you that level of control.
You have AWS credits. Startup credits ($1k–$100k) make Step Functions effectively free for years.

When Inngest wins

Pick Inngest if:

You're on Vercel / Next.js. Inngest deploys with your app, runs as an API route, requires zero infrastructure setup. No CDK, no Terraform, no IAM roles to debug.
You're a small team. One repo, one deploy, one dashboard. Step Functions adds a second context-switch surface.
TypeScript everywhere. Your workflow is code, not ASL JSON. Refactor with the same tools you use for everything else.
AI workflows with branching logic. if (briefHasGap) await step.run("regenerate-brief", ...) is trivial. The same in ASL is a Choice state with conditions in JSON.
You want fast local dev. npx inngest-cli dev runs a local Inngest server, replays workflows, lets you debug step-by-step in your terminal. Step Functions local dev is sam local + LocalStack, and it's not pleasant.
You're shipping in days, not weeks. Inngest is set up in 30 minutes. Step Functions + AWS infra setup is 2–3 days minimum.

What we picked

We picked Inngest. Three reasons:

The stack matched. Next.js on Vercel, TypeScript end-to-end, no AWS account beyond S3 for image hosting. Bringing in Step Functions meant standing up an entire AWS shadow stack just for workflow orchestration.
AI workflows want code, not JSON. Our pipeline has dynamic step counts based on the article outline. Generating ASL dynamically from JS is doable but ugly. In Inngest, it's a for loop inside the function.
Step memoization fits LLM calls. Each LLM call costs real money. We don't want to re-run an 8-second draft if the next step fails. Inngest persists the result and skips on retry. Step Functions does this too via Lambda result caching, but the path is more verbose.

We'd reconsider if:

We hit Inngest's pricing wall (>50k step runs/month consistently)
We needed AWS-native integrations (Step Functions wins for ECS task orchestration etc)
The team grew past 5 engineers and we needed visual workflow tooling for non-engineers

Real production code

This is what runs in production for the article pipeline:

import { Inngest } from "inngest";
import { runResearchPhase } from "@/lib/article-gen/research";
import { runBriefPhase } from "@/lib/article-gen/brief";
import { runDraftPhase } from "@/lib/article-gen/draft";
import { runPolishPhase } from "@/lib/article-gen/polish";
import { writeStreamToken } from "@/lib/article-gen/stream";

const inngest = new Inngest({ id: "briefworks" });

export const generateArticle = inngest.createFunction(
  {
    id: "generate-article",
    concurrency: { limit: 5 },
    retries: 2,
  },
  { event: "article.requested" },
  async ({ event, step }) => {
    const { projectId, keyword, persona } = event.data;

    const research = await step.run("research", async () => {
      const result = await runResearchPhase(keyword);
      await writeStreamToken(projectId, "research-done", result);
      return result;
    });

    const brief = await step.run("brief", async () => {
      const result = await runBriefPhase(research, persona);
      await writeStreamToken(projectId, "brief-done", result);
      return result;
    });

    const draft = await step.run("draft", async () => {
      const result = await runDraftPhase(brief, persona, (token) =>
        writeStreamToken(projectId, "draft-token", { token })
      );
      await writeStreamToken(projectId, "draft-done", result);
      return result;
    });

    const polished = await step.run("polish", async () => {
      const result = await runPolishPhase(draft, persona);
      await writeStreamToken(projectId, "polish-done", result);
      return result;
    });

    return polished;
  }
);

The browser subscribes to a Supabase Realtime channel keyed on projectId. writeStreamToken writes JSONB updates to an article_streams table; Realtime fans them out to the client. When the user closes the tab, the workflow keeps running. When they come back, they re-subscribe and get the latest state.

That's the whole answer to "how do I run AI workflows that survive page close."

The decision tree

Are you on AWS / regulated data?
├── YES → Step Functions
└── NO
    ├── Need 220+ AWS native integrations? → Step Functions
    ├── Need visual tooling for non-engineers? → Step Functions
    └── Otherwise → Inngest

Step Functions is the safer enterprise pick. Inngest is the faster startup pick. Both work. The wrong question is which is better. The right question is which matches the rest of my stack.

If you're shipping a Next.js AI app on Vercel: Inngest. If you're shipping an enterprise workflow with audit trails and AWS service integrations: Step Functions.

This is the durable workflow pattern behind BriefWorks — an AI article generator that runs research, brief, drafting, and polish in a single pass. If you've been writing AI articles that need a rewrite before they ship, briefworks.io is built around this exact pipeline.