Test Release Processes: Seven Steps, Nine Quality Indicators, One Owner Table

Companion paper · pairs with QA Library checklist

Without software to test, there can be no software testing. Obvious, and yet an astonishing number of organizations do a poor job of getting a build into the test lab. The work drifts onto the test team, who then improvise the release process reactively, one fire drill at a time. This paper covers the seven-step reference flow, the nine quality indicators that tell you whether your release process is healthy, and the ownership table that settles the arguments about who does what.

Pairs with the Test Release Process checklist in the QA Library, a printable one-pager you can take into the room and work through with your release manager, build owner, and test lead.

Definitions: so we can argue about the right things

Different shops use different words for the same concepts and the same words for different concepts. Before we go any further, here are the terms this paper uses. If your shop uses different words that's fine, don't worry that you're "using the wrong vocabulary," worry about whether you're doing the wrong things.

Build, release, or test release. An installable, deployable software artifact transmitted to the test group for testing.
Repository or source control system. The system (git-based in practice, hosted on GitHub, GitLab, Bitbucket, or similar) that stores source, configuration, infrastructure code, and tests.
Build manager, release manager, or release engineer. The person (or team, or automation) responsible for producing the deployable artifact from source.
Build ID or version. Some sequence of characters that uniquely identifies the release. Ideally semver plus a git SHA, or a monotonic build number bound to a commit hash.
Test cycle. A set of tests run against a given build.
Test pass. A complete run of all suites in scope, either during one cycle or across multiple.

Out of scope: how you decide when to release to customers. That is a release-criteria question, not a test-release-process question. They interact, but they are not the same thing.

The seven-step reference test release process

Here is the process this paper is organized around:

Select the content (bug fixes, features, configuration, and documentation) for a particular release.
Implement the changes.
Fetch the source, build, sign, and tag the artifact: mark the artifact and the repository with an unambiguous version.
Smoke-test the build in a release-engineering environment. If the tests pass, continue; if they fail, stop and figure out what went wrong before shipping to test.
Package and deliver the build, produce a deployable artifact (container image, signed installer, bundle) and make it available to whoever deploys it.
Install the build in the test environment.
Smoke-test the build in the test environment. If the tests pass, begin the test cycle; if they fail, uninstall, resume the prior cycle, and return the build to development.

Your process may have more or fewer steps. This one is based on processes that consistently work under pressure.

Nine quality indicators for a healthy release process

A release process can score perfectly on steps 1–7 and still be awful to live with. The nine quality indicators below are what separate a healthy pipeline from a chronic fire drill.

#	Quality indicator	Translation
1	Predictable and timely releases	Builds show up on a cadence. No surprise drops, no three-week droughts.
2	Simple install procedure	Idempotent, scriptable, one command ideally.
3	Simple uninstall / rollback procedure	A bad build does not eat a week of test environment time.
4	Testing the real install process	The way you install in the test lab is the way you install in production.
5	Consistent version naming	Unambiguous, monotonic, traceable back to a commit.
6	Simple interrogation	Any tester can ask the running system "what version are you?" and get an answer.
7	Documented content	Release notes identify exactly which tickets made it in.
8	Intelligent content selection	Someone actually decides what goes in each release based on risk and value, not just whatever was ready.
9	Coordination of data and code	Schema migrations, feature flags, and config travel with the code, not behind it.

The rest of this paper walks through these in four clusters.

Cluster 1, Release timing (QI 1)

Accepting a new release has real cost for the test operation. The test manager has to communicate which tests run against this build. The release has to be installed, data loaded, databases migrated, dependent systems synced. In client-server or multiplatform environments, "install" can span many machines. The team has to confirm the build is testable. Then they often have to do a pass of confirmation testing (re-running every test that failed against a now-fixed bug) before they can resume the planned test cycle. When multiple projects share a lab, an urgent release for one project can push everything else aside.

Under these conditions, installing a new release is not a decision to take lightly. Three failure modes are common.

Releases show up unpredictably. Unscheduled releases interrupt test cycle work and maintenance activities. A typical pattern: users or stakeholders get used to "ask and receive" behavior, and the random unscheduled drops never let the test team get ahead on test case maintenance, regression automation, or process improvement. Over a year or two, the test team's productivity degrades even though nothing visibly broke.

Releases show up too often. In one setup common enough to name, developers push builds into test several times a week early in a project. The test team spends 50–75% of the first few weeks just installing the system under test instead of testing. If you have eight person-weeks of test work planned for the first month, you complete maybe two. By the end of the first month you are a month behind on a two-month project, and serious bugs still wait to be found.

Releases show up too rarely. Test against a stale build and you will file bugs that have already been fixed, waste everyone's time on regressions already closed, and lose the early-warning signal testing is supposed to provide.

The goal is a cadence the team can plan around. During system test a weekly or bi-weekly drop cadence is still a reasonable target for larger releases. A good pattern looks like: development buttons up content Friday; release engineering assembles and smoke-tests the build over the weekend; the test team receives an installed, sanity-tested environment Monday morning; the test cycle plan is already written against the known content; the cycle runs through the week; findings feed the next content-selection meeting. Simple, rhythmic, predictable.

In continuous-delivery shops the cadence is different (per-commit builds, canary deploys, and ring rollouts) but the underlying quality indicator is the same: the test team knows when to expect what, and emergencies are rare and visible. Uncontrolled urgency reveals a governance problem, not a delivery capability.

Cluster 2, Install, uninstall, and real-life testing (QI 2, 3, 4)

The install process should be as simple as the target environment allows, because simpler processes are less error-prone, finish faster, and don't need a pod of experts to run. Wherever possible, the test team installs the build themselves, both because it removes a hand-off, and because the install process is something that needs to be tested.

Two things usually go wrong here.

Install is not exercised as a test. The test team "just installs" the system so they can get to the "real" testing. But in shrink-wrap, mobile, embedded, and on-prem software, the install is the system as far as your customer is concerned. Every cycle you install is an opportunity to vary the configuration, the upgrade path, the migration, the order of operations. Spread these variations across cycles and at the end of the phase the install process is systematically tested.

Uninstall / rollback is not supported. For Windows installers, Linux package managers, Homebrew bottles, and most mobile app stores, uninstall is a solved problem. Containers and Kubernetes make rollback trivial (kubectl rollout undo) so long as migrations are reversible. For complex IT systems and data-bearing services, however, rollback can be genuinely hard. Feature flags (LaunchDarkly, ConfigCat, OpenFeature), database schema versioning (Flyway, Liquibase, Alembic), and blue-green / canary deploys are what make rollback tractable. Without that, a fatally flawed build can destroy a test environment for a week while someone rebuilds the OS, the supporting software, and the previous version of the system under test. The prudent test manager lobbies hard for rollback capability and treats its absence as a schedule risk.

Cluster 3, Naming and interrogation (QI 5, 6)

Every bug filed by the test team points at a release. If the test team can't reliably name the release they were testing, the bug tracker's release field becomes noise, developers can't reproduce from what's in the report, and per-release metrics go sideways. Two things have to work: naming and interrogation.

Naming. Release names have to be sequential, unambiguous, and unique. Semver (2.14.0), build numbers (2.14.0+build.1438), or version-plus-git-SHA (2.14.0-a1b2c3d) all work. What does not work: date-based labels without seconds (two builds from the same day collide), developer names, freeform labels, or schemes that try to encode too much information. If you have a thirty-five-number compound version for a system of seven components, testers will make mistakes, use a single build number and resolve the component versions from a manifest.

Interrogation. Every running system should be able to tell you what it is. In web applications this is a /version endpoint or a <meta name="build"> tag. In mobile, it's the app's Settings → About screen and the build metadata published to the store console. In backend services, a /healthz or /version JSON response exposing the image tag, git SHA, and deploy timestamp. In embedded systems, a CLI command or a UI About screen. In Kubernetes, image tag + deployed labels queryable via kubectl describe. Whatever the surface, the rule is the same: a tester who finds a bug should be able to answer "what version" without asking a human.

When the system itself can't be queried, the orchestrator that manages it usually can. A configuration server that "pushes" firmware to devices can answer which version is running where. A CI/CD pipeline can publish a manifest for each environment. A feature-flag dashboard can report which flags were live at the time of a report. Build the interrogation surface deliberately, even if it takes forethought.

Cluster 4, Content selection, coordination, documentation (QI 7, 8, 9)

Over the course of a cycle, the test team files bug reports, sales or product logs enhancement requests, customers report production issues, telemetry surfaces new failure modes. Collectively these become the input to the next release. Three questions have to be answered in order:

Selection. Which of these candidate changes go into the next release?
Notification. Which of the selected changes does the release team believe were actually implemented and delivered?
Confirmation. Which changes, once tested, were genuinely implemented without regression, which were implemented with regression, and which failed?

The standard pattern is a recurring cross-functional triage (sometimes called change control board (CCB), release triage, or just "the Thursday meeting") with product, engineering, test, and support in the room. Inputs: open bug reports, enhancement requests, production signals. Output: a scored, prioritized list for the next release and an explicit deferral list for everything else.

Two field-tested rules consistently hold up here, often quoted as cut early and often:

Don't develop what you're not going to test.
Don't test what you're not going to release.

Print them, frame them, hang them in the triage room.

A few notes:

Data and metadata changes count as content. Schema migrations, seed data changes, feature-flag configuration, and model weights in ML-backed systems are release content. Manage them like code, version-controlled, reviewed, and documented. The classic failure mode is a team calling itself "feature-complete" while silently reshaping the schema underneath every release; every tester who used the old schema now has a broken test suite and a pile of false-positive bugs.
Bug and enhancement tracking should be the system of record. Modern trackers (Jira, Linear, GitHub Issues, ClickUp) all support the selection → in-development → ready-for-test → done lifecycle natively. The release notes for a build should be derivable from the tracker, not hand-written after the fact. AI-assisted release-note generation from ticket histories is a good force multiplier here, but a human should still sign off on the final notes.
Closed-loop confirmation is the step most often skipped. Every ticket marked "ready for test" has to end up as "verified" or "reopened" inside the same cycle. Without the closed loop, your tracker gradually fills with tickets developers think are fixed and testers think are still broken.

Who owns what

The argument every organization eventually has: whose job is release management?

Step	Task	Typical owner (proposed)
1	Select the content	Cross-functional triage (product + engineering + test + support)
2	Implement the changes	Development
3	Build the artifact	Release engineering / build automation
4	Smoke-test the build	Release engineering
5	Package and deliver	Release engineering
6	Install in the test environment	Test team (owns the install-as-test activity)
7	Smoke-test in the test environment	Test team

In many organizations, steps 3–5 live in a no-man's-land, people vaguely assume that software magically migrates from development to test with no further effort. The steps then fall on the test team by default, which is the worst place for them: the test team is not equipped to be release engineers, and once they own the build pipeline they lose both bandwidth and objectivity. Developers, in turn, stop feeling responsible for producing code that builds and deploys cleanly, because "QA will sort it out."

The cure is to put release engineering somewhere explicit. A dedicated release engineer (or platform team, or SRE group) reporting into either development or a shared services function, with a clear charter, removes the ambiguity. In modern shops this is usually a platform / DevOps / SRE team that owns the CI/CD pipeline, the build tooling, and the deploy automation, and a release manager role that owns the content-selection cadence.

Two things the test team should own:

Install in the test environment. If the test team installs the way a customer or operator installs, the install process itself gets tested. This has caught an enormous number of production install failures over the years.
Smoke-test after install. The entry gate for the next test cycle is the test team's to defend.

What's changed, what hasn't

The framework above was born in the shrink-wrap and early-web era. The quality indicators have held up almost perfectly, they describe properties of a well-run release process regardless of whether you release weekly or hourly. What has changed is the machinery you use to hit them.

Indicator	Current machinery
QI 1 Predictable cadence	CI/CD pipelines (GitHub Actions, GitLab CI, CircleCI, Jenkins, Buildkite); release trains; weekly / biweekly mobile app-store cadence; progressive delivery for continuous-deploy shops
QI 2 Simple install	Container images + Helm / Kustomize; Terraform / Pulumi / CDK for infrastructure; blue-green and canary deploys; managed-service auto-provisioning
QI 3 Simple rollback	`kubectl rollout undo`; blue-green flipping; feature flags (LaunchDarkly, ConfigCat, OpenFeature) as the primary rollback mechanism; reversible migrations (Flyway, Liquibase, Alembic); image immutability
QI 4 Real-life install	Deploy-the-same-way-prod-does in staging; GitOps (ArgoCD, Flux) so "deploy to staging" and "deploy to prod" run the same controllers; chaos and upgrade drills
QI 5 Consistent naming	Semver + git SHA; OCI image digests; commit-based provenance (SLSA levels)
QI 6 Interrogation	`/version` endpoints; Kubernetes labels; image-digest tracking via SBOM tooling (Syft, Grype); feature-flag evaluation logs
QI 7 Release notes	Auto-generated from Jira / Linear / GitHub via conventional commits; LLM-assisted summarization with human review; CHANGELOG.md maintained in-repo
QI 8 Intelligent selection	Weekly release-triage cadence; risk-scored backlog using your quality risk analysis; production-telemetry inputs (Sentry, Datadog, Crashlytics) feeding the triage
QI 9 Coordination	Schema migrations version-controlled; config-as-code; feature flags for decoupling deploy from release; LLM-output evaluation harnesses for AI features

Two newer considerations deserve special mention.

Continuous deployment changes the cadence question, not the discipline. If you deploy to production on every merge, "test release" collapses into "the deploy pipeline." But the nine quality indicators still apply: you still need predictable cadence (commits per day, batch size, release windows); you still need simple install and rollback (autoscaled canaries, automatic rollback on SLO breach); you still need interrogation (deploy markers, trace-level version tags); you still need a governance loop that decides which risky changes go out when.

Feature flags have replaced half of traditional rollback. Flags let you ship code that isn't active yet, promote it to 1% / 10% / 50% / 100% of users, and kill it instantly on bad telemetry. They are now part of release management, not an exotic add-on. Flag hygiene (time-bounded flags, owner names, cleanup policies) becomes its own release-management discipline, if left unchecked, flag debt is as damaging as code debt.

Moving forward from where you are

Instituting process changes (especially changes that require cooperation outside the test organization) is hard. When those changes can affect customer-delivery dates, it's harder still. The ironic truth is that the very factors that make a predictable cadence important (scarce test resources, competing demands) are undermined by the urgency of individual release schedules.

A few guidelines for the test manager trying to drive this kind of change:

Frame the business cost, not your inconvenience. Don't push back on out-of-cycle releases because they make your life difficult. Push back because they introduce quantifiable waste (installs you pay for and can't test against, regression you can't keep up with, escaped defects you can forecast) and therefore degrade the quality assessment the business pays you to produce.
Pitch the benefits across the organization. Better install processes save development time. Better interrogation makes bug reports easier for developers to act on. Better rollback reduces support escalations. If you can express what you need in terms that help other managers hit their numbers, you will find allies.
Get help from natural allies. Customer support, SRE, and operations often want the same things the test team wants, simpler installs, reliable rollback, clear versions. Their advocacy is often more persuasive than the test organization's.
Document the baseline, then improve incrementally. A rough metric for your current state (install time per release, rollback success rate, time-to-smoke-test-green, release-note completeness) beats no metric. Improve one at a time.

Tool support: what you can't do without

Release management into the test lab (or to customers) is something you can't do properly without at least:

Source control. Git-based, hosted, with branch protection and signed commits. Today there is no legitimate excuse for anything less.
An issue / change tracker that is the system of record. Jira, Linear, GitHub Issues, anything with a real API and a state machine. Tickets drive release notes and close the loop between content selection and confirmation.
A build pipeline that produces deterministic, reproducible artifacts. Signed images, SBOMs, provenance, and a manifest per environment.
A deploy tool that supports rollback. Kubernetes + GitOps, platform-as-a-service, or a managed service with first-class rollback. For mobile, staged rollout via the app stores.
A feature-flag system. Even a simple one. Feature flags are a release-management tool, not just a product tool.
Observability tuned for release events. Deploy markers in your APM of choice (Datadog, New Relic, Grafana, Sentry) so your team can answer "did the release degrade X" in seconds.

Skimping on any one of these reintroduces problems the framework above already solved decades ago.

Test Release Process checklist, the one-page companion to this paper, printable for your wall or for a working session with release engineering.
Change Management Process checklist, the upstream process that decides what gets into a release.
Test Execution Process checklist, what happens once a build is in the lab.
Test Execution Processes article, long-form companion.
Quality Risk Analysis, how to prioritize content selection (QI 8).

Working on this?

Rex Black, Inc. has been advising enterprise teams on release and test processes since 1994. If you want help designing, measuring, or rescuing a release process (or training your test and release teams to run one of their own) talk to us.

Test Release Processes: Seven Steps, Nine Quality Indicators, One Owner Table

Definitions: so we can argue about the right things

The seven-step reference test release process

Nine quality indicators for a healthy release process

Cluster 1, Release timing (QI 1)

Cluster 2, Install, uninstall, and real-life testing (QI 2, 3, 4)

Cluster 3, Naming and interrogation (QI 5, 6)

Cluster 4, Content selection, coordination, documentation (QI 7, 8, 9)

Who owns what

What's changed, what hasn't

Moving forward from where you are

Tool support: what you can't do without

Related reading

Evaluation Before Shipping: How to Test an AI Application Before It Hits Production

Choosing the Right Model (and Knowing When to Switch)

Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D

The ISTQB Advanced Level path, mapped

Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix

Building Quality In: What Engineering Organizations Do from Day One

Where this leads

Software Quality & Security

Risk Reduction & Clear Decisions

Reliable Software at Scale

Working on something like this?