Metrics for Software Testing, Part 3: Project Metrics

Series · Part 3 of 4 · Managing with facts

Project metrics measure progress on a single project, where we stand today, what the trends look like, and what course-corrections we need. They are the most commonly used test metrics, and the most commonly misused. This paper walks through a balanced minimum set: a five-series bug trend, a test-case fulfillment chart that keeps the first one honest, and the internal test-hours chart that tells the test manager whether execution itself is efficient.

Four-part series: Part 1, Why & how · Part 2, Process metrics · Part 3 (this paper), Project metrics · Part 4, Product metrics

What project metrics are for

Project metrics help you understand how a project is progressing against its goals, where it is today, what the trends are, and whether course-corrections are needed to hit schedule, cost, or scope targets. When you see tables and charts of bugs and test cases on a test dashboard, you're almost always looking at project metrics.

Applying the framework from Part 1:

Effectiveness. Is the project on track to achieve its desired results?
Efficiency. Is it achieving those results economically?
Elegance. Is the execution graceful and credible to outside stakeholders?

Two cautions, both earned in the field.

First, don't over-rely on project metrics. They tell you whether testing is on schedule, they don't tell you whether the product is any good. Many dashboards show nothing but project metrics, and the project management team ends up confident they're on track right up until release, when product problems show up in user feedback. Part 4 fixes this.

Second, don't conflate project status with individual performance. Most of the factors that determine project status are under management's control, not the individual contributor's. Project metrics tied to individual reviews will be gamed, distorted, and eventually useless. This is the same warning from Part 1 and Part 2, and it applies here too.

The foundation chart, bug open and close trend

A single well-built chart carries five metrics at once. It's the most valuable chart on most project dashboards, and the easiest to build from any mainstream bug tracker (Jira, Linear, GitHub Issues, Azure DevOps, all export the three fields you need: opened date, closed date, status).

Primary project dashboard chart

Bug opened and closed, weekly cumulative trend

Weeks W1–W23 of formal integration and system testing. Release is scheduled at W23.

Total opened

Total closed

Total daily opened (cumulative) and total daily closed (cumulative) plotted against the right-side axis. As the project approaches release the opened curve should flatten and the closed curve should converge with it.

The chart above shows five metrics derived from the same two columns of data:

Total opened

Cumulative, right axis

Where we are in the total-bugs-found count. Expected to flatten near release.

Total closed

Cumulative, right axis

Running total of resolved bugs. Expected to converge with opened near release.

Avg daily opened

Weekly, left axis

Weekly bug-discovery rate. Expected to trend toward zero.

Avg daily closed

Weekly, left axis

Weekly bug-closure rate. Expected to stay above daily opened near release.

A fifth derived metric (the average daily backlog (daily opened − daily closed, averaged over the week)) can be overlaid on the same chart on the left axis. If the backlog metric trends up, the development team is falling behind. If it trends to zero, the project is closing its quality gap.

How the chart is read in a weekly review

Walk it this way in the status meeting, in this order:

Is the opened curve flattening? If yes, the test system is approaching diminishing returns, most of the important bugs for this phase have been found. If not, you don't know what you're shipping.
Is the closed curve converging? If the gap is closing, development is keeping up. If not, there's either a fix-capacity problem or an upstream defect-introduction problem.
What do the annotations say about process artifacts? Holidays, scrub meetings, scope decisions, and exploratory testing days all create visible features in the chart. Annotate them, the chart without annotations is harder to argue about.
Is the pattern one of the three pathologies? (Endless discovery; ignored reports; chaotic bug management. See the charting defect data paper for the full pathology catalog.)

Balancing it, test case fulfillment

The opened/closed trend is necessary and not sufficient. It's possible to see the "healthy shape" while something else is broken. The classic way it gets gamed: testing is blocked on environment or on prerequisite fixes, so nothing is being filed, the opened curve flattens for the wrong reason.

Balance it with a test-case fulfillment chart, plotted per test pass (two-week intervals for this example project). Four metrics share the same axes:

Balance metric for bug trend

Test case fulfillment within the current pass

A single two-week test pass, day 1 through day 10.

Planned (cumulative)

Fulfilled (cumulative)

Passed (cumulative)

Failed (cumulative)

Fulfilled = passed + failed + deliberately skipped. Planned is the dashed reference line. Gap between Planned and Fulfilled is the test-execution shortfall. Gap between Fulfilled and Passed is the bug-driven attrition on the passing curve.

Each of the four series tracks a different project question:

Planned (dashed blue line): how many test cases were scheduled to be complete by each day. Straight-line planning in this example; in practice often stepped.
Fulfilled: how many were actually completed. A case is fulfilled if it ran and passed, ran and failed, or was deliberately skipped. Should track planned closely.
Passed: how many ran and passed. Should converge with fulfilled near the end of the pass.
Failed: how many ran and failed. Should trend toward zero in the final pass.

How this balances the bug trend

Put the two charts side by side. Suppose the bug-trend chart looks healthy (opened flattening, closed converging), but the test-case fulfillment chart shows planned pulling away from fulfilled. That means testing is falling behind, the bug-trend chart is flattening because tests aren't running, and the apparently-healthy signal is spurious. You'd miss this by looking at either chart alone. Balance is the point.

Trends are not causes

Both charts are trend charts: they show outcomes with a temporal dimension, not causes. A trend of rising cancer rates in a population could reflect worse lifestyle behaviors, worse environmental exposure, or a population that's simply older and living longer. Same discipline applies here. The charts surface questions. The conversations (and the drill-downs into specific test cases, subsystems, or weeks) produce the answers.

Internal efficiency, test-hours-spent

The two charts above are external, the kind you put on a project dashboard. The test manager needs a third chart for internal use: a measurement of test-execution efficiency. The question: is each test case costing us more or fewer hours to execute than we planned for?

Internal efficiency metric

Planned vs. actual test-execution hours per day

Internal metric, used by the test manager, not reported to the project dashboard.

Planned

+ normal variation

− normal variation

Actual

Actual (solid line) drifts below the lower normal-variation band starting day 3. Either the test cases are harder than planned, the environment is intermittently blocked, meetings are eating the day, or the planning assumption about execution hours per day was wrong.

Where the actual curve goes below the lower bound of normal variation, execution is less efficient than planned, and that's the moment to investigate why. Three common causes in modern programs:

Environment flake. Containers failing to start, data setups timing out, or shared test environments in contention. The fix is usually infrastructure, not test case rewrite.
Meetings creep. Planning, standup, retro, grooming, and stakeholder reviews have compressed the execution day to 4 hours instead of the planned 6. Chart the hours actually available for execution against calendar hours.
Planning assumption wrong. The planned hours-per-day figure was never realistic for this team or this product. The actual curve is telling you about true process capability (see Part 2) rather than project incidents.

This chart also illustrates why project metrics and process metrics overlap. A persistently-low actual curve isn't a project problem, it's a process-capability signal hiding in a project chart.

Other common project metrics worth considering

Beyond the three foundational charts above, a small number of additional project metrics earn their place on specific kinds of programs:

Flakiness

Automated-suite reliability

Percentage of test runs that produce inconsistent results. Essential on any CI-integrated test suite.

Cycle time

Build to green

Hours from commit to passing test suite. Directly relevant to DORA metrics and continuous delivery.

Block rate

Tests blocked today

Percentage of planned tests that can't run because of environment or prerequisites. Early warning for schedule risk.

Coverage delta

Requirements coverage change per cycle

Useful on releases with a long sequence of test passes, shows incremental progress on risks or requirements.

Add them when the project's specific risk profile makes them useful. Don't add them by default, the value of a dashboard comes from focus.

What to put on the project dashboard

A minimum balanced project-metrics dashboard:

Bug opened/closed trend, cumulative and weekly-average, with annotations for milestones and process events.
Test-case fulfillment chart for the current test pass.
Requirements coverage table, snapshotted weekly. (Details in Part 4.)
Residual quality risk, if you're running a risk-based strategy. (Details in Part 4.)

That's four views. Together they answer: Are we finding bugs and fixing them? Are we running the tests we planned to run? Are the requirements getting covered? Are the risks getting mitigated? Anything beyond those four should earn its place by answering a specific question that those four don't.

Where this goes next

Part 4, Product Metrics covers the metrics that measure what you're actually about to ship, requirements coverage, residual quality risk, and the risk-category breakdown. These are the metrics most often missing from test dashboards, and without them you don't truly know what the release looks like.

Part 2 (Process Metrics) process capability behind the project.
Part 4 (Product Metrics) next in the series.
Charting the Progress of System Development Using Defect Data, the foundational-charts paper that extends the bug-trend here into four specific charts.
Effective Test Status Reporting, reporting these numbers to executives.
Test Execution Processes, the execution process these metrics measure.

Metrics for Software Testing, Part 3: Project Metrics

What project metrics are for

The foundation chart, bug open and close trend

Bug opened and closed, weekly cumulative trend

How the chart is read in a weekly review

Balancing it, test case fulfillment

Test case fulfillment within the current pass

How this balances the bug trend

Internal efficiency, test-hours-spent

Planned vs. actual test-execution hours per day

Other common project metrics worth considering

What to put on the project dashboard

Where this goes next

Related reading

Evaluation Before Shipping: How to Test an AI Application Before It Hits Production

Choosing the Right Model (and Knowing When to Switch)

Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D

The ISTQB Advanced Level path, mapped

Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix

Building Quality In: What Engineering Organizations Do from Day One

Where this leads

Software Quality & Security

Risk Reduction & Clear Decisions

Reliable Software at Scale

Working on something like this?