The risk taxonomy every FMEA starts from.
Sixteen categories, three test-level views, and the 2026 additions nobody used to need.
The list of quality risk categories our FMEAs, test plans, and risk-based strategies start from. Use it as a completeness check — every category is a question you should have answered for this release, even if the answer is "not applicable."
A risk taxonomy is a completeness check, not a filing cabinet. Use it to be confident you have not missed an entire class of failure — then discard the categories that do not apply to this release.
Key Takeaways
Four things to remember.
Sixteen categories, zero ceremony
The flat list is a walk-through: name each category out loud at the FMEA kickoff. "Localization — applies? Installation — applies? Competitive inferiority — applies?" Three minutes, no missed class.
Different test levels see different risks
Component testing sees states, transactions, and code coverage. System testing sees localization, installation, and competitive inferiority. The per-level cross-reference keeps each team focused on what they can actually test.
The taxonomy has to move forward
The 2002 taxonomy did not include accessibility as a legal exposure, observability as a release signal, or AI-system-specific risks at all. The 2026 additions section names the seven categories every modern risk analysis should add.
Pair with ISO/IEC 25010:2023
When a formal framework is required (regulated environments, buyer-mandated compliance), map this taxonomy to ISO/IEC 25010:2023 product-quality characteristics. Both views are useful; neither is a substitute for the other.
Overview
Every risk-based test strategy starts with a list. A quality risk category is a class of failure the system could have; the risk items inside each category are specific failures a specific engagement is exposed to.
Most programs waste their first FMEA session reinventing the category list. This page publishes the one we use. Treat it as a starting menu — not every category applies to every product, but naming them all forces an explicit answer instead of an implicit gap.
Below: (1) the flat 16-category reference, (2) the per-test-level cross-reference (which categories are visible at component, integration, and system/acceptance levels), and (3) the seven categories we added between 2002 and today because products and their failure modes have changed.
Three views of one taxonomy
What to expect below.
The 16-category flat list is the completeness check every FMEA should walk at session start. The per-test-level re-cut is the planning aid that tells you which team owns which risk. The seven 2026 additions close the gap between the 2002 taxonomy and today’s failure surface.
01
1. The 16 quality risk categories
The canonical flat taxonomy. Each category is described as "what kind of problems fit here" — deliberately narrow, so an item belongs in exactly one category when you are populating an FMEA.
Functionality
Failures that cause specific features not to work as specified.
Load, Capacity, and Volume
Failures in scaling of the system to expected peak concurrent usage levels and data volumes. Distinct from Performance — Performance is "does a single request meet its SLO," Load is "does the system still meet its SLO with 10,000 concurrent users."
Reliability / Stability
Failures to meet reasonable expectations of availability and mean-time-between-failure. Includes memory leaks, resource exhaustion, and degradation over uptime.
Stress, Error Handling, and Recovery
Failures under beyond-peak or illegal conditions, and the knock-on effects of deliberately inflicted errors. Covers recovery from power loss, network partition, and upstream service failure.
Date Handling
Failures in date math and handling — time-zone boundaries, daylight-saving transitions, leap seconds, year boundaries, fiscal vs. calendar year, and (still, in 2026) systems whose internal epochs expire.
Competitive Inferiority
Failures to match competing systems in quality. Frequently overlooked in internal risk analysis because it requires market research, not engineering research.
Operations and Maintenance
Failures that endanger continuing operation, including backup and restore, runbooks, on-call workflows, and the ability for operations staff to recover from production incidents without engineering escalation.
Usability
Failures in human factors — especially at the user interface, but also in the installation flow, onboarding, and recovery from user error.
Data Quality
Failures in processing, storing, or retrieving data. Includes silent corruption, precision loss, truncation, encoding mangling, and failed referential integrity.
Performance
Failures to perform as required under expected loads. Latency, throughput, and responsiveness against an SLO or budget.
Localization
Failures in specific localities — language, dictionary/thesaurus, collation order, number/date/currency formatting, and localized error messages.
Compatibility
Failures with specific supported OS / browser / device / runtime / dependency combinations. Includes regression against dependency upgrades (the kind CI finds before users do) and the combinatorial explosion of "minor versions."
Security and Privacy
Failures to protect the system and secured data from fraudulent or malicious misuse. Pair with the seven-step security risk reduction whitepaper — this is a surface area with its own deeper taxonomy (OWASP Top 10, MITRE ATT&CK, CWE).
Installation / Migration
Failures that prevent or impede deploying the system. In 2026 this also covers CI/CD failures, canary/blue-green/rollback integrity, database migration safety, and rollback data loss.
Documentation
Failures in operating instructions for users or system administrators, including API reference accuracy, runbook accuracy, and deprecation notice quality.
Interfaces
Failures in interfaces between components — wire formats, contract violations, schema drift, protocol version mismatches, silent field removal.
02
2. Quality risks by test level
The same taxonomy re-cut by which test level has the signal to find each risk. This is a planning aid: assign risk categories to the test level where they are most cost-effectively caught, and use it to identify gaps where a category does not appear at any level (which is almost always a test-plan bug, not a product bug).
Component testing
What the component-test layer is responsible for. Run close to the code, fast feedback, high volume.
- States — internal state transitions, state-machine correctness.
- Transactions — single-component unit-of-work correctness.
- Code coverage — structural coverage of the implementation.
- Data flow coverage — variable definition/use pairs, data-flow anomalies.
- Functionality — component-level feature behavior.
- User interface — component-local UI rendering / input handling (if applicable).
- Mechanical / signal / embedded properties — for physical products, component-level physical correctness.
Integration testing
What the integration-test layer is responsible for. Crossing component boundaries, contract verification, data flow between subsystems.
- Component or subsystem interfaces — contract verification, schema compliance.
- Functionality — feature behavior that spans components.
- Capacity and volume — subsystem-level load behavior.
- Error / disaster handling and recovery — failure propagation across component boundaries.
- Data quality — integrity across subsystem boundaries.
- Performance — subsystem-level latency and throughput.
- User interface — UI-layer integration with data layer.
System and acceptance testing
What only the full assembled system can test. The categories below are the reason system test exists — they are invisible at lower levels.
- Functionality — end-to-end feature correctness.
- User interface — whole-experience usability.
- States and transactions — end-to-end workflow correctness.
- Data quality — persistent-store integrity and recovery.
- Operations — backup/restore, runbook correctness, incident response.
- Capacity and volume — whole-system load behavior.
- Reliability, availability, stability — uptime against SLO.
- Error / disaster handling and recovery — full-system failure and recovery paths.
- Stress — beyond-peak and illegal-input behavior.
- Performance — end-to-end latency and throughput.
- Date and time handling — calendar correctness in context.
- Localization — locale-specific correctness in context.
- Networked and distributed environment behavior — behavior across network topologies.
- Configuration options and compatibility — cross-configuration correctness.
- Standards compliance — regulatory, accessibility, security standards.
- Security and privacy — full-surface security posture.
- Environment — deployment-environment correctness.
- Installation, cut-over, setup, and initial configuration — first-run correctness.
- Documentation and packaging — docs accuracy and operator-readiness.
- Maintainability — post-release operational burden.
- Alpha, beta, and other live tests — controlled-exposure pre-release validation.
03
3. Categories added since 2002
Seven risk categories that were either absent from the original taxonomy or subsumed into generic "non-functional" risk. Each of these has earned a named slot in the current-era list because programs that ignore them get publicly bitten.
Accessibility
Failures to meet accessibility standards (WCAG 2.2, Section 508, EN 301 549, ADA case law). In 2002 this sat inside Usability; in 2026 it is a distinct legal and regulatory exposure with its own testable surface (screen-reader behavior, keyboard navigation, contrast, motion/animation preferences, ARIA semantics, cognitive load, localization of accessibility affordances). Enterprise programs that ignore it invite lawsuits.
Observability
Failures in the instrumentation that would let an operator tell whether the system is working in production. Missing / mis-tagged / too-noisy logs, metrics without cardinality controls, spans that do not cross service boundaries, dashboards that contradict each other. This is distinct from Operations — Operations is "can we run it," Observability is "can we tell what it is doing." Pair with production-telemetry-driven test authoring (see building-quality-in whitepaper).
Cost and financial operations (FinOps)
Failures that cause unexpected cost blowouts — runaway background workers, recursive retry storms, uncapped logging, uncapped LLM-API calls, storage leaks, N+1 query patterns in production traffic. For cloud-native systems this is first-class quality risk: a correctness-passing release can still cause a business-critical incident via the monthly bill.
Supply chain
Failures introduced via dependencies — vulnerable packages, transitive dependency drift, build-system compromise, dependency-confusion attacks, malicious maintainer takeover, SBOM inaccuracy. Includes the CI/CD pipeline itself and the ecosystems it pulls from. In 2002 this was a subset of Compatibility; in 2026 it is its own surface with its own tooling (SCA, SBOM attestation, provenance verification, pinned dependencies).
AI-system accuracy and calibration
For systems whose primary job is inference (classification, generation, recommendation, forecasting), the output is probabilistic — correctness becomes a distribution, not a pass/fail. Risk items: held-out evaluation set accuracy, calibration (does 80% confidence mean 80%?), performance on adversarial / out-of-distribution / long-tail inputs, regression against specific demographic slices. Cannot be tested with example-based assertions; requires eval-set testing, golden-set testing, and slice-based metrics.
AI-system safety, integrity, and alignment
Distinct from generic Security: prompt injection, indirect prompt injection via retrieved documents, over-trust of tool outputs, jailbreaking, training-data poisoning, model-supply-chain risk, hallucination on high-stakes outputs, bias/fairness failures with demonstrable disparate impact. For LLM-backed systems, most of OWASP LLM Top 10 lives here rather than under Security.
Explainability and auditability
Failures to produce a defensible account of why the system did what it did — in regulated contexts (credit, hiring, healthcare, underwriting, insurance pricing, immigration, education), increasingly mandatory. Includes decision logs, feature attribution, lineage of data and model versions, and the ability to reconstruct a specific production decision six months later. Pair with the Cost-of-Exposure and Compliance risks inside Security / Privacy.
04
4. How to use this list
A completeness check, walked top to bottom before the FMEA session closes.
- Walk the 16-category flat list at the top of the session. For each, ask "does this category contain any items that apply to this release?" If yes, the team drafts items. If no, log an explicit "not applicable because…" — the explicit negative answer is the completeness signal.
- Cross-check against the per-test-level view. Every risk category that has items should map to at least one test level with a plan to cover it. Categories with items but no test level mapped = test-plan gap.
- Apply the 2026 additions as a second pass. For any modern system, at least three of the seven will apply; for AI-backed systems, usually five or more.
- Map the final category list to ISO/IEC 25010:2023 characteristics if a formal framework is required for the engagement (regulated buyer, compliance audit, procurement scoring). Both views coexist — the named categories drive the FMEA items, the ISO mapping supports external reporting.
- Treat the list as living. When a new failure class bites an engagement, propose a category addition in the next methodology review. The 2002 list got us 24 years; no taxonomy survives forever.
Where each category is first catchable
System test exists for the bottom category.
Count of the 16 categories that are first cost-effectively catchable at each test level. A category catchable at component is usually cheapest to catch there; pushing it later is more expensive. Any category that never appears at any level is a test-plan gap.
Category count by first-catchable level
Risk categories × test level
Categories count once at the lowest level where they're first catchable.
System and acceptance catches the most categories because many (localization, usability, installation, compatibility, reliability, stress, recovery) are invisible at lower levels — they require the fully-assembled system.
Take it with you
Download the piece you just read.
We keep this library free. All we ask is that you tell us who you are, so we know who to follow up with if we release an updated version. One-time form, this browser remembers you after that.
Related in the library
Pair this with.
Need a QA program to back this up in your organization?
If a checklist is not enough and you want help applying it to a live engagement, we can have a call this week.
Related reading
Articles, talks, guides, and case studies tagged for the same audience.
- Whitepaper
Evaluation Before Shipping: How to Test an AI Application Before It Hits Production
The release-gate playbook for AI features. Covers the five evaluation dimensions, how to build a lean golden set, where LLM-as-judge is trustworthy and where it lies, rollout mechanics with named exit criteria, and the regression suite that keeps a shipped AI feature from quietly rotting in production.
Read → - Whitepaper
Choosing the Right Model (and Knowing When to Switch)
A practical framework for matching LLM model tier to task. Covers the four axes (capability, latency, cost, reliability), cascade routing patterns that cut cost 60 to 80 percent without measurable quality loss, switching costs you did not plan for, and the worked economics at 10K, 100K, and 1M decisions per day.
Read → - Whitepaper
Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D
Most engineering L&D programs over-index on a single certification family, usually ISTQB on the QA side, AWS on the infrastructure side, and under-invest across the rest of the technical domains the org actually needs. This paper covers a multi-domain certification roadmap (QA, AI, cloud, data, security, project management, software engineering) with sequencing logic for each level of the engineering ladder, plus the maintenance discipline that keeps the roadmap relevant as the technology shifts underneath it.
Read → - Guide
The ISTQB Advanced Level path, mapped
The Advanced Level landscape keeps changing — CTAL-TA v4.0 shipped May 2025, CTAL-TM is on v3.0, CTAL-TAE is on v2.0. This guide maps all four core modules, prerequisites, exam formats, sunset dates, and which module a given role should take first. Links directly to the authoritative istqb.org syllabi.
Read → - Whitepaper
Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix
Bug triage is the cross-functional decision process that converts raw defect reports into prioritized action. Done well, it optimizes limited engineering capacity against risk; done poorly, it becomes a backlog-management ritual that neither fixes the important defects nor drops the unimportant ones. This whitepaper covers the triage process, the participants, the six action outcomes, the four decision factors, and the governance disciplines that keep triage effective in continuous-delivery environments.
Read → - Whitepaper
Building Quality In: What Engineering Organizations Do from Day One
Testing at the end builds confidence, but the most efficient quality assurance is building the system the right way from day one. This whitepaper covers the upstream disciplines — requirements clarity, lifecycle selection, per-unit programmer practices, and continuous integration — that make system-level testing cheap and fast rather than the only thing holding a release together.
Read →
Where this leads
- Service · Quality engineering
Software Quality & Security
Independent test programs, security testing, and quality engineering for systems where defects cost real money.
Learn more → - Solution
Risk Reduction & Clear Decisions
Quality programs and decision frameworks that shift risk discussions from anecdote to evidence.
Learn more → - Solution
Reliable Software at Scale
Quality engineering programs for organizations whose software is now operationally critical.
Learn more →