The setup
I was building an AI article generator. Four phases:
- Research — pull live SERP data, extract entities, score competitors
- Brief — Claude Sonnet writes a structured brief from the SERP
- Draft — section-by-section drafting in a fixed persona voice
- Polish — adversarial review pass that flags AI tells
End-to-end: 6–8 minutes. Each phase: 60–180 seconds. Multiple LLM calls per phase. Token streaming back to the user the whole time.
The first version was a single Vercel API route. It worked locally. It died in production.
Why Vercel kills you at 300s
Vercel Pro tops out at 300s per function. My pipeline needs 480s+. A naive await chain in a single route returns a 504 halfway through draft phase.
Worse: the user closes the tab. The HTTP connection drops. The function dies. State is lost. The draft is gone. Refund issued.
This is the real problem. Long-running AI workflows can't live in request–response. They need durability — the workflow continues even when the user disconnects, and resumes streaming when they come back.
That's a durable workflow engine. Two real options:
- AWS Step Functions — battle-tested, AWS-native, ASL definition language
- Inngest — newer, Next.js-native, code-as-config
I tried both. Here's the decision tree.
How AWS Step Functions works
Step Functions is a state-machine service. You define the workflow in Amazon States Language (ASL) — a JSON DSL with states like Task, Choice, Parallel, Wait. Each Task points to a Lambda function (or another AWS service: SQS, DynamoDB, EventBridge, etc).
{
"Comment": "Article pipeline",
"StartAt": "Research",
"States": {
"Research": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:research",
"Next": "Brief"
},
"Brief": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123:function:brief",
"Next": "Draft"
}
}
}
You upload Lambdas. You define the state machine. Step Functions executes it. Each state's output becomes the next state's input. Failures retry per the ASL config.
Strengths:
- 220+ AWS service integrations native (no Lambda wrapper needed)
- Visual workflow studio (debugging is genuinely good)
- Pay per state transition ($0.025 per 1k transitions)
- Battle-tested at FAANG scale
- ASL is a portable standard
Weaknesses:
- ASL is a separate language from your app code
- Workflow definition lives outside your Next.js repo (Terraform / CDK)
- Lambda cold starts on every state transition
- AWS-account-coupled — local dev requires LocalStack or
sam local - Vercel hosting + AWS Step Functions = two bills, two consoles, two deploy flows
How Inngest works
Inngest is also a durable workflow engine, but the programming model is different. You write a TypeScript function with step.run() calls. Inngest invokes your function as an HTTP endpoint, executes one step, persists the result, then re-invokes the function with the next step.
import { Inngest } from "inngest";
const inngest = new Inngest({ id: "briefworks" });
export const generateArticle = inngest.createFunction(
{ id: "generate-article" },
{ event: "article.requested" },
async ({ event, step }) => {
const research = await step.run("research", async () =>
runSerpResearch(event.data.keyword)
);
const brief = await step.run("brief", async () =>
generateBrief(research)
);
const draft = await step.run("draft", async () =>
draftSections(brief)
);
const polished = await step.run("polish", async () =>
reviewDraft(draft)
);
return polished;
}
);
The function lives in your Next.js repo at /api/inngest. You deploy with the rest of your app. No separate infrastructure.
Strengths:
- One repo, one deploy
- TypeScript everywhere, no DSL
- Local dev is
npx inngest-cli dev— actual local replay - Built-in concurrency, throttling, debounce, idempotency
- Step memoization (more on this below)
- Free tier covers most pre-revenue projects
Weaknesses:
- Smaller ecosystem (no 220+ AWS native integrations)
- Newer (founded 2022 — you're betting on the company)
- Self-hosting is possible but not the default path
- Visual debugging is improving but not Step-Functions tier
The technical fork — replay vs memoization
This is the part nobody explains clearly.
Step Functions replay model: Step Functions never re-runs your code. It calls Lambda once per state. The state machine itself holds the workflow position. Each Lambda is stateless and only knows its inputs.
Inngest memoization model: Inngest re-invokes your entire function on each step, but step.run() checks if that step's result is already persisted. If yes, it returns the cached result without executing the inner closure. If no, it runs the closure and persists.
The implication: in Inngest, code outside step.run() runs every time the function re-invokes. If you have side effects outside step.run(), you'll execute them N times for N steps.
// WRONG — this runs once per step, you'll log 4 times
console.log("Starting article gen");
const research = await step.run("research", () => runSerp());
const brief = await step.run("brief", () => writeBrief(research));
// RIGHT — wrap side effects in step.run()
await step.run("log-start", () => console.log("Starting"));
const research = await step.run("research", () => runSerp());
For AI workflows specifically, this matters a lot. LLM calls go inside step.run() — they only run once per workflow even if the function re-invokes. Token-counting, streaming setup, persistence — all stay outside, but execute deterministically each invocation.
Step Functions sidesteps this entirely because it never re-invokes. But you pay for it in language overhead — every cross-state value goes through ASL JSON serialization.
Worked example — cost on 1k articles/month
Assume each article = 4 phases × 2 LLM calls = 8 step transitions. Plus retries, logging steps: ~12 step transitions per article.
Step Functions:
- 12k state transitions × $0.025/1k = $0.30
- 4 Lambda invocations × 60s × $0.0000166667/GB-s × 1024MB = ~$0.004
- Total per article ≈ $0.005
- 1k articles ≈ $5/month
- Plus Lambda concurrency limits, plus VPC config if needed
Inngest:
- Cloud Pro: $20/month base + $0.005 per step run
- 12k step runs × $0.005 = $60
- Total ≈ $80/month for 1k articles
- Or self-host: $0 + your server costs
For tiny volume, Step Functions wins on cost. For free-tier scale, Inngest's free plan (50k step runs/month) wins.
But cost wasn't the deciding factor for me. Setup time was.
When Step Functions wins
Pick Step Functions if:
- You're already heavy AWS. Your team writes ASL, debugs in CloudWatch, deploys via CDK/Terraform. Step Functions is one more service you already operate.
- You need 220+ native integrations — DynamoDB, S3, SNS, SES, EventBridge, ECS tasks, etc. Inngest can't match this without you writing wrapper functions.
-
Visual workflow approval flows — long-running workflows with human-in-the-loop steps (manager approves a state). Step Functions has
WaitandActivitystates that match this pattern natively. - Regulated data residency — workflow state must live in your AWS account, in a specific region. Inngest Cloud doesn't give you that level of control.
- You have AWS credits. Startup credits ($1k–$100k) make Step Functions effectively free for years.
When Inngest wins
Pick Inngest if:
- You're on Vercel / Next.js. Inngest deploys with your app, runs as an API route, requires zero infrastructure setup. No CDK, no Terraform, no IAM roles to debug.
- You're a small team. One repo, one deploy, one dashboard. Step Functions adds a second context-switch surface.
- TypeScript everywhere. Your workflow is code, not ASL JSON. Refactor with the same tools you use for everything else.
-
AI workflows with branching logic.
if (briefHasGap) await step.run("regenerate-brief", ...)is trivial. The same in ASL is aChoicestate with conditions in JSON. -
You want fast local dev.
npx inngest-cli devruns a local Inngest server, replays workflows, lets you debug step-by-step in your terminal. Step Functions local dev issam local+ LocalStack, and it's not pleasant. - You're shipping in days, not weeks. Inngest is set up in 30 minutes. Step Functions + AWS infra setup is 2–3 days minimum.
What we picked
We picked Inngest. Three reasons:
- The stack matched. Next.js on Vercel, TypeScript end-to-end, no AWS account beyond S3 for image hosting. Bringing in Step Functions meant standing up an entire AWS shadow stack just for workflow orchestration.
-
AI workflows want code, not JSON. Our pipeline has dynamic step counts based on the article outline. Generating ASL dynamically from JS is doable but ugly. In Inngest, it's a
forloop inside the function. - Step memoization fits LLM calls. Each LLM call costs real money. We don't want to re-run an 8-second draft if the next step fails. Inngest persists the result and skips on retry. Step Functions does this too via Lambda result caching, but the path is more verbose.
We'd reconsider if:
- We hit Inngest's pricing wall (>50k step runs/month consistently)
- We needed AWS-native integrations (Step Functions wins for ECS task orchestration etc)
- The team grew past 5 engineers and we needed visual workflow tooling for non-engineers
Real production code
This is what runs in production for the article pipeline:
import { Inngest } from "inngest";
import { runResearchPhase } from "@/lib/article-gen/research";
import { runBriefPhase } from "@/lib/article-gen/brief";
import { runDraftPhase } from "@/lib/article-gen/draft";
import { runPolishPhase } from "@/lib/article-gen/polish";
import { writeStreamToken } from "@/lib/article-gen/stream";
const inngest = new Inngest({ id: "briefworks" });
export const generateArticle = inngest.createFunction(
{
id: "generate-article",
concurrency: { limit: 5 },
retries: 2,
},
{ event: "article.requested" },
async ({ event, step }) => {
const { projectId, keyword, persona } = event.data;
const research = await step.run("research", async () => {
const result = await runResearchPhase(keyword);
await writeStreamToken(projectId, "research-done", result);
return result;
});
const brief = await step.run("brief", async () => {
const result = await runBriefPhase(research, persona);
await writeStreamToken(projectId, "brief-done", result);
return result;
});
const draft = await step.run("draft", async () => {
const result = await runDraftPhase(brief, persona, (token) =>
writeStreamToken(projectId, "draft-token", { token })
);
await writeStreamToken(projectId, "draft-done", result);
return result;
});
const polished = await step.run("polish", async () => {
const result = await runPolishPhase(draft, persona);
await writeStreamToken(projectId, "polish-done", result);
return result;
});
return polished;
}
);
The browser subscribes to a Supabase Realtime channel keyed on projectId. writeStreamToken writes JSONB updates to an article_streams table; Realtime fans them out to the client. When the user closes the tab, the workflow keeps running. When they come back, they re-subscribe and get the latest state.
That's the whole answer to "how do I run AI workflows that survive page close."
The decision tree
Are you on AWS / regulated data?
├── YES → Step Functions
└── NO
├── Need 220+ AWS native integrations? → Step Functions
├── Need visual tooling for non-engineers? → Step Functions
└── Otherwise → Inngest
Step Functions is the safer enterprise pick. Inngest is the faster startup pick. Both work. The wrong question is which is better. The right question is which matches the rest of my stack.
If you're shipping a Next.js AI app on Vercel: Inngest. If you're shipping an enterprise workflow with audit trails and AWS service integrations: Step Functions.
This is the durable workflow pattern behind BriefWorks — an AI article generator that runs research, brief, drafting, and polish in a single pass. If you've been writing AI articles that need a rewrite before they ship, briefworks.io is built around this exact pipeline.
This article was originally published by DEV Community and written by Ethan.
Read original article on DEV Community