Skip to main content
Talk · Rex Black, Inc.

Investing in software testing. The ROI case and how to earn it.

Software testing is an investment. The wrong tests return zero — or worse, a false sense of security. The right tests, selected against customer usage profiles and weighted by quality risk, can cut customer-found defects by two-thirds and return roughly four dollars for every dollar spent. This talk walks through the baseline, the math, and the program that makes the numbers real.

Slides
19
ROI example
445%
Customer-found bugs cut
−66%
Format
Executive talk

Abstract

If you can't show ROI, you will lose the testing budget.

Every engineering leader eventually ends up in a room where someone asks, "What does testing actually return?" If the answer is "it's just good practice," the budget shrinks. If the answer is "here's the baseline, here's the investment, here's the measurable change in customer-found defects and cost," the budget holds — and sometimes grows.

This talk gives you that answer. It walks through a concrete worked example in three stages: a project with no testing discipline, a project with a manual testing team, and a project that complements the manual team with test automation. In that worked example, the customer-found defect rate drops by half at stage two and by two-thirds at stage three, with a tooling investment of $150,000 amortized across twelve quarterly releases — a return on investment of 445%.

The numbers are an example, not a guarantee. The point is the structure. To get those returns you have to invest in the right tests — not the tests you have, not the tests that are easy to automate, not the tests the vendor suggests, but the tests that reduce the risks your customers actually care about. The rest of the talk is about how to pick those tests: customer usage profiles, quality risk analysis, the right technique for each risk, and the pervasive-testing operating model that makes all of it work.

Don't waste money testing actions customers will never perform, verifying configurations they don't use, and fixing bugs they will never see. The smart test investments are in the tests that find bugs your customer would want fixed.

Rex Black, Inc.

Outline

What the talk covers, in order.

01

Start with the baseline

Before you can argue for testing investment, you need to know what you'd be replacing. The baseline is the project that ships with developer-only testing — every bug that makes it out the door is a bug the customer found, and every one of those costs you support time, engineering time, opportunity, and trust.

  • Name the baseline concretely: how many customer-found defects per release, how much cost per incident, how much engineering time spent on post-ship firefighting.
  • This is the number you are trying to move. Everything else in the argument points at it.
02

Stage 1 — manual testing team

Add a dedicated manual testing team to a developer-tested project. In the worked example, developers find 250 bugs pre-release and testers find another 350 — roughly two to three bugs found internally for every bug the developers were catching alone. Customers find about 50% fewer bugs. Costs come down. Customers are happier. But the ceiling on manual testing is real: you can only run so many tests, so many times, against so many configurations.

  • A manual testing team is the first lever and often the biggest.
  • The question after stage 1 is always the same: can we find more bugs and reduce costs further without doubling the team?
03

Stage 2 — complement manual with automation

Invest $150,000 in tools, amortized over twelve quarterly releases. Use automation where it belongs: regression, load/volume, performance, reliability, API-driven structural checks, standards compliance. Keep manual where it belongs: usability, localization, installation, error handling, configuration. In the worked example, customer-found bugs fall another ~30% (to −66% vs. the baseline), quality costs are cut in half, and the return on the tooling investment is 445%.

  • $150k / 12 releases = $12,500 per release — the denominator of the ROI calc.
  • The numerator is the difference in quality costs between stage 1 and stage 2, plus the savings from tests that would otherwise have been run manually every release.
  • The 445% figure is illustrative — your mileage varies with release cadence, tool mix, and test selection. The shape of the curve is what holds.
04

Smart test investments — and the alternative

Picking the wrong stocks can result in a total loss of your investment — or worse. The same applies to tests. You won't get big returns if you pick the wrong tests, either. Test investment returns can be zero — or negative, when a misguided test program gives management a false sense of security. The smart test investments are in tests that find bugs your customers would want fixed. To identify those tests, you have to understand how customers actually use the system.

05

Usage profiles — high fidelity vs. low fidelity

A high-fidelity test system mimics customer usage: real data volumes, real configurations, real workflow sequences, real transaction mixes, real latency and failure conditions. A low-fidelity test system runs the developers' happy path on clean data in an ideal environment. High-fidelity testing finds customer-critical bugs before the customer does; low-fidelity testing mostly proves the code compiles. Your first architectural question on any test-investment program is how much fidelity you can afford and how to spend it.

  • High-fidelity = same data shape, same volumes, same configurations, same concurrency, same failure modes as production.
  • Low-fidelity is cheap and signals ~nothing about the customer experience.
  • Smart investment: spend more on fidelity for the highest-risk subsystems, less for the rest.
06

Quality risk categories

Quality is fitness for use — the presence of customer-satisfying behaviors and the absence of customer-dissatisfying behaviors. Quality risks are the potential for dissatisfying behaviors. They come in categories: functional (missing or broken features), use cases (features okay alone, workflows broken), robustness (common errors not handled), performance (too slow at key points), localization (dates, language, money, culture), usability (it works, but what a pain), volume/capacity (can't handle large datasets), and reliability (crashing, hanging, misbehaving). Quality risks go beyond broken functionality.

  • Functional, use-case, robustness, performance, localization, usability, capacity, reliability.
  • Most test programs overweight functional and underweight the rest. Look at the customer-found defect profile from your baseline — the shape of that histogram is your starting prescription for re-weighting.
07

Analyzing quality risks — three options

There are three practical ways to analyze quality risks. Informal analysis starts with the classic quality-risk categories listed above. ISO 9126 starts with six main quality characteristics (Functionality, Reliability, Usability, Efficiency, Maintainability, Performance — FRUEMP) and decomposes into subcharacteristics for your system. Failure Mode and Effect Analysis (FMEA) lets key stakeholders list possible failure modes, predict their effects on system/user/society, assign severity/priority/likelihood, and calculate a Risk Priority Number (RPN). Pick the one that fits your organization's appetite for process. All three get you to the same outcome: tests weighted by risk.

  • Informal — for teams that will not tolerate heavy process.
  • ISO 9126 / FRUEMP — for teams already using standards.
  • FMEA — for safety-critical, regulated, or enterprise work where you need a defensible audit trail.
08

Essential techniques — static / structural / behavioral

Three technique families cover most of what a test investment has to buy. Static testing tests without running the code — inspections, reviews, static analysis, spec checks — and identifies bugs before they're built. Structural testing (white-box) tests how the system works internally — unit, component, integration, coverage-driven, typically owned by developers. Behavioral testing (black-box) tests what the system does from the outside — integration, system, and acceptance, typically owned by testers. The three families need different tools, different phase placements, and different owners, but they share data, cases, and harnesses readily. A good test investment encourages that cross-pollination.

  • Static: simulators, code analyzers, diagramming, spec reviewers. Runs during specification and development.
  • Structural: profilers, coverage analyzers, harnesses, data generators. Runs at unit/component/integration.
  • Behavioral: GUI tools, load generators, performance tools. Runs at integration/system/acceptance.
09

Automated or manual? — a decision grid

A common mistake is trying to automate everything or leaving everything manual. Neither is an investment strategy. Some test types are well-suited to manual: operations and maintenance, configuration and compatibility, error handling and recovery, localization, usability, installation and setup, documentation and help. Some are well-suited to automation: regression and confirmation, monkey/random, load/volume/capacity, performance and reliability, standards compliance, API-driven structural tests, static complexity and code analysis. A third bucket is either or combined: functional, use cases, user interface, date/time handling. Inappropriate manual testing misleads people about coverage; inappropriate automation usually fails.

10

Pervasive testing — the operating model

Successful testing is not a few people in a dark lab at the end of the project. It is concurrent (starts with requirements and happens across the whole lifecycle), cross-functional (many people play active roles — salespeople define requirements, customer support provides usage profiles, developers write specs and do structural tests, testers build tools and do behavioral tests), collaborative (testing has dependencies throughout the project team), and committed (teams deliver what they promise on schedule so testing has something to test). A pervasive testing operating model is what turns the ROI math from a one-time projection into a repeating number.

  • Time: begin testing on day one. Specs, requirements, and sometimes source code get independent evaluation and feed test development.
  • Tasks: the whole team participates — re-usable tools across teams and phases, integrated HW/SW testing, test results inside the PM dashboard.
  • Teamwork: development, management, release/config, and ops must deliver to the test team on schedule so the test team can deliver to them.

Key takeaways

Four things to remember.

01

Frame testing as an investment, not an expense.

The question "what does testing cost?" loses. The question "what does testing return, relative to the baseline?" wins. Run the math before you argue for the budget, not after you lose it.

02

ROI comes from the right tests.

Start with customer usage profiles. Layer on quality risk analysis. Tests that cover actions customers will never perform, configurations they don't use, and bugs they'd never see return zero — or worse, give management a false sense of security.

03

Technique has to match the risk.

Static, structural, and behavioral each find different bug classes. Manual and automated each belong in different test types. Running the wrong technique against a risk wastes money and misses the bug.

04

Testing is pervasive, not a phase.

Concurrent, cross-functional, collaborative, committed. Testing that only happens in the last two weeks of the schedule cannot compound its investment the way a lifecycle-wide program can.

Worked examples

One bug. Eight drafts.

The ROI walkthrough, three stages.

The canonical worked example from the deck. The absolute numbers are illustrative; the relative shape is what holds. Use this as a template to fill in your own baseline and tool-investment figures.

Stage 0 — developer testing only

No dedicated testing team. Developers test their own code.

Every bug that ships is a customer-found bug.

Support cost per incident and engineering cost per hotfix are at full-rate. Quality costs are whatever they are — and nobody's measuring.

This is the baseline against which the next two stages are compared.

Stage 1 — manual testing team

Add a manual testing team.

Developers: 250 bugs found pre-release. Testers: 350 bugs found pre-release.

Customers find roughly 50% fewer bugs post-release.

Quality costs drop. Customer satisfaction rises. The open question: can we reduce further without doubling headcount?

Stage 2 — manual + automated

$150,000 tools investment, amortized over twelve quarterly releases ($12,500 per release).

Automation picks up regression, load/volume, performance, structural API checks, standards compliance.

Manual team retains usability, localization, error handling, configuration, installation.

Customers find 66% fewer bugs vs. baseline. Quality costs halved. Return on the tooling investment: 445%.

Closing

Two closing pieces of advice. First, build your own version of the ROI table above before you go into the next budget conversation. The numbers on this deck are illustrative — yours will differ, and they are more persuasive because they're yours.

Second, don't spend the investment on the wrong tests. A program that triples its test count but still runs mostly low-fidelity happy-path scripts will not move the customer-found defect number. The tests have to be selected against how the customer actually uses the product and weighted by the risks that actually matter to them. That selection work is where the returns come from.

More for this audience

Articles, guides, and case studies tagged for the same readers.

Want this talk delivered in-house?

Rex Black, Inc. delivers every talk on this site as a live workshop, a keynote, or a conference session. Tailored to your stack, your team, and your timeline.