Technology Apr 19, 2026 · 7 min read

QA is not clicking buttons: it is the quality gatekeeper for the whole SDLC

QA gets stereotyped into a clicker-of-buttons role: open the app, walk the flow, file a Jira ticket when something looks wrong. That caricature is what junior QA looks like at companies that have not yet figured out what QA is actually for. The role at senior is something else entirely — the single...

DE
DEV Community
by aman-bhandari
QA is not clicking buttons: it is the quality gatekeeper for the whole SDLC

QA gets stereotyped into a clicker-of-buttons role: open the app, walk the flow, file a Jira ticket when something looks wrong. That caricature is what junior QA looks like at companies that have not yet figured out what QA is actually for. The role at senior is something else entirely — the single person who owns quality across the full software development lifecycle, from the requirements meeting through post-deployment monitoring. The button-clicker is a subset of the role, not the role.

I run this posture in claude-code-mcp-qa-automation as the concrete operator surface: 16 Claude Code skills, a Python implementation that pulls Jira, aggregates sprint trends into a 7-table SQLite store, emits deterministic HTML reports, and posts Slack digests. The tool exists because senior QA work is pipeline work, and pipeline work wants reproducible artifacts, not manual checklists.

Six SDLC stages, one gate per stage. Each gate has the button-clicker version (the failure mode) and the systems version (the senior posture).

Stage 1 — Requirements review

Button-clicker version. QA shows up after requirements are finalized, reads the spec, writes test cases against it.

Systems version. QA is in the requirements meeting and asks problem-first questions before the spec is written. Who is the user? What outcome are they after? Does a non-software workaround already solve 70% of this? What is the observable failure condition? This is Eugene Yan's "start with the problem, not the technology" applied at the gate where technology gets chosen.

The reason QA has to be here is that half the bugs that ship are not implementation bugs — they are requirements bugs. A feature that was built correctly against a vague spec is still a failure to the user. Catching vagueness at Stage 1 is exponentially cheaper than catching it in Stage 6 when the user is asking why the thing does not do what they expected.

Stage 2 — Design review

Button-clicker version. QA waits until the code is ready to be tested, then asks "how do I test this?"

Systems version. QA reads the architecture doc or attends the design review. Asks which services are touched, which data flows cross a trust boundary, which new external dependencies are introduced. Names the error budget. Names the rollback plan.

The question the senior QA is trying to answer at Stage 2 is: "What is the smallest change downstream that breaks this design?" That question is what separates a testable design from one that will collapse the first time production traffic hits a pattern the designers did not consider.

Design-review QA does not require a separate title. It requires the literacy to read an architecture doc and the authority to flag a risk before implementation starts.

Stage 3 — Implementation

Button-clicker version. QA is a ticket consumer. Developer writes code, opens a PR, QA picks up the ticket when it hits the test-ready column.

Systems version. QA has peer-review authority on PRs touching user-facing flows or critical paths. Not final sign-off on code — sign-off on the test coverage, the observability hooks, the feature flag plan, the rollback path. A PR without a flag wrapping a risky change does not pass Stage 3. A PR that adds an endpoint without a structured-logging statement on the failure path does not pass Stage 3.

This is the stage where "works on my machine" gets caught. The senior QA is reading the PR not for correctness of the logic — that is the developer's job — but for the surface area the logic creates. What can break? How would we know? How would we undo it?

Stage 4 — Pre-merge

Button-clicker version. CI runs the test suite. If it passes, merge. QA's role is "make sure there is a test."

Systems version. QA owns the test pyramid. Unit tests for logic. Integration tests for service boundaries. Contract tests for APIs that other services consume. End-to-end tests only for the three or four flows that literally cannot regress. The shape of the pyramid is the senior QA's artifact, not a byproduct.

Pre-merge QA also owns the flake-free contract. A flaky test in the suite is a tax on every future PR — developers start ignoring the red, which is how a real failure slips through undetected. Flake investigation is senior QA work; "re-run the failing job" is not.

Stage 5 — Pre-deploy

Button-clicker version. QA smoke-tests the staging environment by clicking the happy path.

Systems version. QA owns the staging canary. Before a release ships, the senior QA has run the same load profile staging sees in the first 60 minutes of prod. Has checked that feature flags default to off, or default to on for a small cohort. Has verified the migration rolls back cleanly on a cloned schema.

This is also the stage where environment teardown and recreation matter. A staging environment that drifted out of date from prod weeks ago is not a pre-deploy gate; it is a rubber stamp. The senior QA rebuilds staging from scratch on a defined cadence, or asks for the pipeline authority to do it themselves. Which connects to the next stage.

Stage 6 — Post-deploy

Button-clicker version. QA files tickets when users complain.

Systems version. QA reads CloudWatch (or the equivalent) first, before triaging user reports. Correlation IDs on every request. Structured-logging queries that surface error-rate deltas. X-Ray traces or OpenTelemetry spans that localize the blame span. Alarms written by QA, tuned by QA, routed to the right team by QA.

Triage theater is the 40-minute meeting where QA says "the user reports the upload is broken," devs say "we do not see anything in the logs," and nobody has opened CloudWatch. A senior QA who opened CloudWatch first routes the bug directly to the developer who owns the failing span — with the correlation ID, the timestamp, and the stack trace already attached. The "is this a real issue?" step and the "who owns it?" step are already done.

Why this shape matches the Claude-Code operator pattern

The six-stage gate shape is structurally identical to the agentic-engineering discipline I run on the code side: spec-first work at Stage 1, eval loops at Stage 2, review authority at Stage 3, test pyramid at Stage 4, canary + rollback at Stage 5, traced observability at Stage 6. Same operator pattern, different surface.

That is not an accident. The QA-automation repo exists because the same "spec-first, sub-agent orchestration, eval on output" pattern that ships reviewable Claude-Code artifacts is what makes sprint-reporting and alarm-triaging reproducible instead of manual. The role-specific parts of QA (the Jira literacy, the CloudWatch reading, the flag hygiene) plug into a pipeline shape that does not care whether the artifact is a sprint digest, a staging deploy, or a Claude-Code skill file.

The cultural objection

The objection senior QAs hit when they try to operate this way is always some version of "QA should not have that authority." Not prod write, not design-meeting presence, not rollback triggering, not peer-review sign-off on PRs.

The answer is not that QA should have prod write. The answer is that the role requires the authority to do the job it is measured against. If QA is accountable for quality across the SDLC but only has the authority to file tickets, QA is being set up to fail publicly while having no lever to succeed privately. The six gates above are what the authority looks like concretely.

What this means for a QA career

The stereotype of QA as a dead-end "tester" role is true for the button-clicker version and false for the senior version. The ceiling on button-clicker QA is flat. The ceiling on SDLC-wide quality gatekeeper work is the same as any other senior engineering role that owns a cross-cutting concern.

Two capabilities decide which version a QA becomes: CS fundamentals literacy (how the stack works, so you can diagnose instead of triage) and pipeline freedom (the authority to create environments, trigger deploys, and write alarms without filing a ticket to somebody else). Both are skills. Neither is inherent to the role. Both compound.

The next two posts in this series go deeper on each: why CS fundamentals are a QA superpower, and why pipeline freedom is what separates a senior QA from a blocked one.

Aman Bhandari. Operator of an AI-engineering research lab running Claude Opus as the coaching partner, plus a QA-automation surface shipping against a real sprint workload. Public artifacts: claude-code-agent-skills-framework and claude-code-mcp-qa-automation. github.com/aman-bhandari.

DE
Source

This article was originally published by DEV Community and written by aman-bhandari.

Read original article on DEV Community
Back to Discover

Reading List