A working FMEA for software, not a form.
Ten columns. Every risk. Every release.
The Sumatra FMEA is a Failure Mode and Effects Analysis adapted from aerospace and automotive practice for software quality risk. It converts stakeholder judgment into a defensible, numerically prioritized register you can defend to any audience.
A prioritized risk register does one thing spreadsheets rarely do: it survives its first change request. The FMEA is the shape that makes that survival possible.
Key Takeaways
Four things to remember.
Severity × Priority × Likelihood = RPN
The Risk Priority Number is the numerical shorthand for the risk. 1 (catastrophic) to 5 (cosmetic) on each axis, multiplied together.
Categorize before you enumerate
Functionality, Load, Usability, Compatibility, Reliability, Security, Maintainability — pick the categories first, then populate risks under each.
Recommended action belongs in the register
Extensive, Balanced, Opportunity, Report-Bug-and-Move-On. The register tells you how to test each risk, not just that it exists.
Trace to requirements
Every risk links back to at least one requirement or user story. No link means the risk is either invented or the requirements are incomplete.
Why this exists
What this template is for.
The FMEA originated in 1940s aerospace engineering and migrated into automotive and medical device quality programs. We adapted it for software because it does something unusual: it forces stakeholders to reason numerically about risks they would otherwise argue about qualitatively.
This template ships with SpeedyWriter (a fictional document management system) as a worked example. Replace the system name, stakeholders, and risks with your own — the column structure is what matters.
The columns
What each field means.
Hierarchical identifier. Top-level category gets "1", "2", etc. Each risk under the category gets "1.001", "1.002", ... so you can sort, filter, and trace without losing structure.
Example: 1.005 (for the fifth risk under Functionality)
The ISO/IEC 25010-style category the risk belongs to. Filled in only on the header row for each category, left blank on individual risks inside that category.
Example: Functionality, Load/Capacity/Volume, Reliability, Security, Usability
One-line description of what could fail and how the failure would present. Focus on the observable effect on the user, not the internal cause.
Example: Check-in of new document to DMS fails.
How bad is this failure for users, on a 1–5 scale. 1 = catastrophic / safety / data loss. 5 = cosmetic / minor.
Example: 1 = catastrophic; 5 = cosmetic
How important is this risk to the business, independent of severity, on a 1–5 scale. A rarely-used feature can be low priority even if a failure there would be severe.
Example: 1 = must-fix; 5 = nice-to-have
How likely this failure is to occur, on a 1–5 scale. 1 = will definitely happen without mitigation; 5 = extremely unlikely.
Example: 1 = certain; 5 = negligible
The product of Severity × Priority × Likelihood. Lower RPN = higher risk. Sort the register by RPN ascending to see what to test first.
Example: 1 × 1 × 3 = 3 (very high); 5 × 5 × 5 = 125 (negligible)
How to mitigate the risk in testing. Extensive, Balanced, or Opportunity testing; Report-and-Move-On for known-deferred; Rerun-Regression for existing coverage.
Example: Extensive testing; Opportunity testing; Rerun entire R3.0 test set
Who owns the test (Dev, Test, User-Cert, N/A) and at which phase it applies (Unit, Component, Integration, System). Multiple can be listed.
Example: Dev/UC-Test/IS (Dev unit tests + User Certification + Integration-System test)
Link(s) to the requirement(s) this risk covers. Every risk should trace; an untraced risk is a sign that either the risk is invented or requirements are missing.
Example: 3.1 (ties to Requirement 3.1 in the spec)
Live preview
What it looks like populated.
Excerpt from the Basic Sumatra FMEA workbook, Test Dev sheet.
| Risk ID | Category | Failure Mode | Sev | Pri | Like | RPN | Recommended Action | Who / Phase | Req |
|---|---|---|---|---|---|---|---|---|---|
| 1.0 | Functionality | Failures causing specific features not to work | |||||||
| 1.001 | Regression of existing SpeedyWriter features. | 1 | 1 | 3 | 3 | Rerun entire R3.0 test set. | Test/IS | 3.0 | |
| 1.002 | Can't cancel incomplete actions using cancel or back. | 2 | 3 | 4 | 24 | Opportunity testing. | N/A | ||
| 1.005 | Check-in of new document to DMS fails. | 2 | 1 | 2 | 4 | Extensive testing. | Dev/UC-Test/IS | 3.1 | |
| 2.0 | Load, Capacity, Volume | Failures in scaling to expected peak concurrent usage | |||||||
| 2.001 | System fails at or before 25 concurrent users. | 1 | 1 | 3 | 3 | Extensive testing. | Test/S | 1.1 | |
| 2.004 | System disallows 32,767 or fewer user accounts. | 1 | 3 | 3 | 9 | Balanced testing. | Test/S | 1.3 |
How to use it
8 steps, in order.
- 1
Convene the FMEA workshop with every stakeholder who owns quality: product, engineering, support, security, operations. One meeting per category is fine.
- 2
Fill in the Risk ID and Quality Risk Category header rows first, one per category you will cover.
- 3
Under each category, enumerate failure modes in the Failure Mode / Quality Risk / Effect column. One row per distinct failure. Do not dedupe aggressively at this stage.
- 4
Score Severity, Priority, and Likelihood as a group. When the group disagrees, capture the disagreement in a comment — do not paper over it.
- 5
Compute the RPN (the workbook does this for you via a formula). Sort the register by RPN ascending and review the top of the list with the group.
- 6
Fill in Recommended Action for every row. Tests that will never run (Opportunity, Report-and-Move-On) are still logged here so the judgment is explicit.
- 7
Trace each risk to a requirement. Mark rows with no link for a follow-up requirements review, and handle the incipient-bugs step of the Quality Risk Analysis Process.
- 8
Check the finalized workbook into configuration management and require change requests to alter it afterwards.
Methodology
The thinking behind it.
The FMEA scale is counter-intuitive: lower numbers mean more severe, higher priority, more likely. This is deliberate — it lets you multiply into an RPN where lower means "do this first". If your organization finds it confusing, invert it in display but keep the multiplication convention internally.
The Basic variant uses one sheet. The Advanced variant uses four sheets (Initial, Estimate, Planning, Test Dev) — each snapshots the FMEA at a different phase of the project so you can see how risk understanding evolves over time.
When in doubt, score higher (lower number). An over-scored risk costs a few extra tests. An under-scored risk costs a production incident.
Take it with you
Download the piece you just read.
We keep this library free. All we ask is that you tell us who you are, so we know who to follow up with if we release an updated version. One-time form, this browser remembers you after that.
Related in the library
Pair this with.
Need a QA program to back this up in your organization?
If a checklist is not enough and you want help applying it to a live engagement, we can have a call this week.
Related reading
Articles, talks, guides, and case studies tagged for the same audience.
- Whitepaper
Evaluation Before Shipping: How to Test an AI Application Before It Hits Production
The release-gate playbook for AI features. Covers the five evaluation dimensions, how to build a lean golden set, where LLM-as-judge is trustworthy and where it lies, rollout mechanics with named exit criteria, and the regression suite that keeps a shipped AI feature from quietly rotting in production.
Read → - Whitepaper
Choosing the Right Model (and Knowing When to Switch)
A practical framework for matching LLM model tier to task. Covers the four axes (capability, latency, cost, reliability), cascade routing patterns that cut cost 60 to 80 percent without measurable quality loss, switching costs you did not plan for, and the worked economics at 10K, 100K, and 1M decisions per day.
Read → - Whitepaper
Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D
Most engineering L&D programs over-index on a single certification family, usually ISTQB on the QA side, AWS on the infrastructure side, and under-invest across the rest of the technical domains the org actually needs. This paper covers a multi-domain certification roadmap (QA, AI, cloud, data, security, project management, software engineering) with sequencing logic for each level of the engineering ladder, plus the maintenance discipline that keeps the roadmap relevant as the technology shifts underneath it.
Read → - Guide
The ISTQB Advanced Level path, mapped
The Advanced Level landscape keeps changing — CTAL-TA v4.0 shipped May 2025, CTAL-TM is on v3.0, CTAL-TAE is on v2.0. This guide maps all four core modules, prerequisites, exam formats, sunset dates, and which module a given role should take first. Links directly to the authoritative istqb.org syllabi.
Read → - Whitepaper
Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix
Bug triage is the cross-functional decision process that converts raw defect reports into prioritized action. Done well, it optimizes limited engineering capacity against risk; done poorly, it becomes a backlog-management ritual that neither fixes the important defects nor drops the unimportant ones. This whitepaper covers the triage process, the participants, the six action outcomes, the four decision factors, and the governance disciplines that keep triage effective in continuous-delivery environments.
Read → - Whitepaper
Building Quality In: What Engineering Organizations Do from Day One
Testing at the end builds confidence, but the most efficient quality assurance is building the system the right way from day one. This whitepaper covers the upstream disciplines — requirements clarity, lifecycle selection, per-unit programmer practices, and continuous integration — that make system-level testing cheap and fast rather than the only thing holding a release together.
Read →
Where this leads
- Service · Quality engineering
Software Quality & Security
Independent test programs, security testing, and quality engineering for systems where defects cost real money.
Learn more → - Solution
Risk Reduction & Clear Decisions
Quality programs and decision frameworks that shift risk discussions from anecdote to evidence.
Learn more → - Solution
Reliable Software at Scale
Quality engineering programs for organizations whose software is now operationally critical.
Learn more →