Verifying Third-Party Quality: Entry and Exit Criteria Across the Vendor Boundary

Whitepaper · Vendor and Third-Party Quality · ~14 min read

When work crosses a vendor boundary, quality does not fail in the usual ways, it fails because expectations were never made explicit, measurable, or enforceable in the first place. By the time a problem surfaces, it has become a contractual dispute instead of a bug report. Good quality gates across vendor boundaries prevent that conversion.

This paper covers the entry/exit criteria framework that turns vendor quality expectations into enforceable, measurable gates, the contractual mechanics that make them stick, and the governance discipline needed to run them without putting the test function in an untenable enforcement role. Pairs with the Integrating Outsourced Components whitepaper (risk framework) and the Exit and Release Criteria whitepaper (the in-house counterpart).

Why the vendor boundary needs its own treatment

Inside a single organization, unwritten expectations are a tractable problem. People talk, teams align, and informal feedback loops correct course quickly. Across a vendor boundary none of that is reliable. The vendor has a different incentive structure, a different internal vocabulary, a different reporting cadence, and (critically) a contract that defines what they are obligated to deliver. Anything outside the contract is either a renegotiation or a favor.

Three consequences follow, each of which shapes the framework below:

Expectations must be written, measurable, and specific. "We need good quality" is not a requirement. "Unit test coverage ≥85% of statements and branches on all new or changed modules, with a specified tool producing an auditable report" is a requirement. The vendor cannot meet an expectation that only exists as a feeling on the client side.
The quality gates must be tied to the contract. A criterion that is not traceable to a clause in the contract (or referenced by one) is not enforceable. Vendors can and do push back on expectations that are asserted but not contracted. Plan for that up front.
The enforcement mechanism must be clear, and the test function must not be the sole enforcer. A test manager empowered to stop a project on their signature alone will either be overruled politically or become the designated villain. Neither is a sustainable operating model. The decision to invoke or waive a criterion belongs with program and product leadership; the test function reports status against the criterion.

The four types of criteria

A complete vendor-boundary quality regime defines four categories of criteria, aligned to the phases of work:

Entry criteria: conditions that must be satisfied before a phase of work can start. For a vendor-delivered component, entry criteria typically cover the availability and readiness of the deliverable, the test environment, and the supporting documentation.
Exit criteria: conditions that must be satisfied before a phase of work can be declared complete. For a vendor-delivered component, exit criteria typically cover coverage achieved, residual defects, and non-functional requirements met.
Suspension criteria: conditions under which an in-flight phase is paused. In a vendor context these cover situations where the deliverable is demonstrably not ready (blocking defects, environment instability, missing artifacts) and continuing to test would produce noise instead of signal.
Resumption criteria: conditions for restarting a suspended phase. These are typically stricter than the original entry criteria, because the vendor has already failed to deliver once at the baseline level.

These four together form the quality envelope around a vendor deliverable. Any of the four can be omitted in a low-stakes engagement, but for material outsourced work all four earn their place.

What good entry criteria cover

Entry criteria across a vendor boundary should address at minimum:

Deliverable readiness. The build, package, model artifact, service endpoint, or document set has been delivered to an agreed location in an agreed format, with a manifest listing what is included.
Deliverable integrity. Checksums, digital signatures, SBOM inclusion, provenance attestations (whatever the contract requires) are present and verify cleanly.
Test environment readiness. The environment required to exercise the deliverable is provisioned, configured, and has successfully run a smoke test. For API- or cloud-delivered components, this includes whatever credentials, rate limits, and sandbox endpoints the vendor has contracted to provide.
Test asset readiness. The test cases, test data, automated test harness, and supporting tooling required to verify the deliverable exist and have been reviewed.
Documentation readiness. Release notes, interface specifications, API changelogs, error catalogues, known limitations, and operational runbooks exist at a contracted level of detail.
Prior-phase exit compliance. The vendor's internal upstream phases (unit testing, component integration, static analysis, security review) have satisfied their own exit criteria and produced the expected evidence.

The ISTQB Foundation and Advanced syllabi treat entry criteria at a general level; for vendor boundaries we recommend a tighter, more explicit set because the cost of renegotiation mid-test is high.

What good exit criteria cover

Exit criteria for a vendor-delivered phase should address:

Coverage achieved. Code coverage (statement, decision, and where the contract demands it, MC/DC for safety-critical work), requirements coverage, risk coverage, interface coverage, and test-type coverage (functional, performance, security, reliability, as contracted).
Defect state. Counts and trends of known defects at each severity and priority level, residual-defect density estimates, and (crucially) a list of any waived or deferred defects with the waiver rationale and the owner.
Non-functional requirements. Performance (response time, throughput, resource utilization) against contracted SLAs; security posture (vulnerabilities at each severity, SAST/DAST/SCA findings, dependency-age profile); reliability (MTBF, fault tolerance, graceful degradation behavior); availability, where applicable.
Automated regression asset delivery. For any vendor delivering code, automated unit and integration tests built with a specified tool, documented to run against a specified environment, and maintained across subsequent deliveries. Without this, each new vendor release forces a full re-test from the client side.
Documentation delivery. Updated versions of the documentation set named in the entry criteria, reflecting the state of the delivered artifact.

Writing exit criteria that are actually measurable

A criterion that cannot be measured cannot be enforced. Common anti-patterns:

"High quality," "enterprise-grade," "production-ready": undefined, therefore unmeasurable. Replace with specific, measurable criteria.
"Substantially all defects fixed": "substantially" is a weasel word. Replace with "all Severity 1 and Severity 2 defects fixed or waived via the defined waiver process, Severity 3 defects tracked with named remediation dates."
"Meets performance requirements": without specifying the requirements, this is rhetoric. Each performance requirement needs a target number, a workload profile, and an acceptable tolerance.
"Acceptable test coverage": replace with specific coverage targets at named levels: statement ≥X%, decision ≥Y%, requirements ≥Z%, named high-priority risk items = 100%.

Suspension and resumption criteria

These two are often omitted, which is a mistake in vendor engagements. A well-run program defines:

Suspension criteria. Continuing to test becomes counterproductive when:

A critical-path blocking defect prevents exercising the bulk of the test suite.
The test environment is unstable in a way that causes more than a specified fraction of test results to be unreliable (10% is a reasonable threshold).
The deliverable fails a smoke test run against a clean environment after a new delivery, indicating the package itself is broken.
The vendor is not providing the escalation or support coverage the contract requires, blocking progress on open issues.
Security findings above a specified severity are discovered, triggering a required halt under the organization's security policy.

Suspension does not automatically mean penalty; it means "stop spending testing effort that produces no signal, return the work to the vendor, and resume when the resumption criteria are met."

Resumption criteria. Typically tighter than entry criteria. Usually include:

The issue that triggered suspension has been formally resolved, with evidence (a rebuild, a patched deliverable, a corrected environment, a vendor root-cause analysis).
A delta-smoke test passes on the repaired deliverable.
The vendor has explicitly requested resumption in writing and accepted any contractual implications (e.g. a revised delivery schedule).
Leadership has signed off on resumption, not just the test function.

The combination (explicit suspension criteria, tighter resumption criteria, contract-tied enforcement) is what keeps a struggling vendor engagement from becoming an endless bleed of testing effort against broken deliverables.

The contractual mechanics

Quality gates that are not tied to the contract are requests, not requirements. The mechanics that make them stick:

Direct incorporation. The preferred path: quality criteria appear directly in the contract or in a statement of work that is contractually binding. This is the model used for most regulated industries and is worth fighting for in high-stakes vendor work.

Reference by clause. A middle path: the contract contains a clause of the form "the vendor will meet the quality criteria defined in Appendix X," and Appendix X is an exhibit to the contract. This works well when criteria are expected to evolve across a multi-year engagement; the appendix can be updated via a defined amendment process without renegotiating the master contract.

Acceptance testing clause. A necessary-but-insufficient pattern: the contract specifies an acceptance testing phase during which the client can verify the deliverable. Acceptance testing alone is too late to catch structural quality problems (it's a gate at the end, not quality throughout delivery) but it gives the client refusal rights if the criteria are met.

Service Level Agreement (SLA) with credits. For ongoing vendor services (SaaS, hosted APIs, managed test environments), an SLA that defines availability, response time, and resolution time targets and ties financial credits or penalties to breaches. SLAs do not replace deliverable quality gates but complement them for operational aspects of the engagement.

The escalation path

Even with well-drafted contracts, disputes happen. The governance framework should pre-define the escalation path:

Test-manager-to-vendor-test-lead resolution: the first-line exchange, typically documented in a shared issue tracker.
Program manager escalation: when direct test-to-test resolution fails, the discussion moves to program management on both sides.
Executive sponsor escalation: for contract-level disputes, the designated executive sponsors on each side take the conversation.
Contractual remedy: the mechanisms defined in the contract, typically involving notice periods, cure periods, and ultimately termination rights.

The escalation path should be named in the contract and should include named roles and contact paths. Unnamed escalation paths tend to produce unresolved disputes.

Role discipline, keeping the test function out of the enforcer position

One of the most common failure patterns in vendor engagements is the test function being dragooned into a sole-enforcer role against unhappy colleagues and vendors. The pattern:

Rigorous exit criteria are defined.
A vendor deliverable fails the criteria on a schedule-critical path.
Leadership pressures the test function to "waive just this one" to preserve the schedule.
The test function either caves (criteria lose all credibility) or holds the line (becomes organizationally isolated and eventually dissolved).

The sustainable pattern instead:

The test function reports status against the criteria. It measures, documents, and communicates. Its deliverable is an honest assessment of where the deliverable stands relative to the agreed gates.
Program leadership decides whether to invoke or waive. The waiver decision has schedule, cost, and risk implications that belong at the leadership level. The test function supports that decision with information but does not own it.
The waiver itself is documented. A waived criterion is not an ignored criterion. The waiver rationale, the owner, the remediation plan if any, and the residual risk are captured in writing. This produces an audit trail and prevents "we always waive this" from becoming the silent default.

The practical consequence: test managers operating across vendor boundaries need explicit authority to report bad news honestly, explicit insulation from negative consequences for doing so, and explicit redirection of waiver authority to leadership. Without those three, the test function's position is untenable regardless of how well the criteria are drafted.

Tailoring to the engagement

The stringency and formality of criteria should match the stakes. Factors that argue for tighter criteria:

Safety-critical or regulated product: aviation (DO-178C), medical (IEC 62304), automotive (ISO 26262), financial (SOX), privacy (GDPR, CCPA, HIPAA). In these domains, criteria are partly dictated by regulation and are non-negotiable.
Security-critical functionality: authentication, payment handling, access control, LLM-based decisioning that affects user outcomes.
Prior problematic vendor history: past deliveries with material quality issues argue for tighter entry criteria and shorter suspension triggers on the next engagement.
High switching cost for the deliverable: if ripping out and replacing the vendor component is prohibitively expensive, the quality bar at acceptance needs to be correspondingly higher.
Customer-visible or revenue-critical path: the more directly the deliverable touches customers or revenue, the less tolerance for defects.

Factors that allow looser criteria:

Low-stakes, replaceable components: a minor utility library, a non-critical internal tool, a throwaway MVP. Over-investing in quality gates here is waste.
Strong prior vendor track record: earned trust justifies lighter gates. New engagements should not start here, but a long-standing vendor with a clean delivery history has earned a lighter touch.
Shared architectural understanding: vendors embedded enough in the client's architecture to preempt problems before they reach the gate often need less formal gating.

Modern considerations: SaaS, LLM APIs, and continuous delivery

The classic vendor-boundary framework assumed a periodic-delivery model: a vendor built a component, delivered it, the client tested it, and the gates fired at defined milestones. Modern vendor engagements often look different:

SaaS vendors ship continuously. The client is not testing a version at a milestone; the client is operating against a live service that updates without a traditional release cadence. Quality gates shift toward operational SLAs, regression monitoring against the live service, contract-defined deprecation windows for breaking changes, and feature-flag-based rollouts negotiated with the vendor.
LLM API vendors provide non-deterministic output. Traditional exit criteria ("defect-free on tested inputs") don't apply to probabilistic systems. Modern equivalents involve accuracy thresholds on held-out eval sets, latency SLAs, content-safety classifier pass rates, and explicit contractual terms about model updates, training-data usage, and uptime.
Continuous-delivery vendors integrate into the client's CI. Instead of a discrete acceptance phase, the vendor pushes changes into the client's pipeline and the pipeline's own gates enforce quality. The gating mechanism is the pipeline rather than a phase boundary; the contractual equivalent is SLA-backed gate health.

The four-category framework (entry, exit, suspension, resumption) still applies, but the artifacts change. Entry becomes "the vendor's change enters the pipeline"; exit becomes "the change clears the pipeline gates"; suspension becomes "the vendor's changes are blocked at a gate that requires remediation"; resumption becomes "the gate passes with the fix." The contractual mechanics then tie vendor SLAs to gate health rather than to milestone acceptance.

Implementation checklist

For a new vendor engagement, in rough order of priority:

Define the criteria (entry, exit, suspension, resumption) jointly with the vendor before the contract is signed.
Tie the criteria to the contract via direct incorporation or contract-referenced appendix.
Define measurable, specific criteria (coverage percentages, defect counts by severity, performance targets with workload profiles).
Define the escalation path in the contract, with named roles and contact paths.
Clarify who owns invocation vs. waiver of criteria (program leadership, not the test function).
Document an SLA for ongoing-service aspects of the engagement with financial or remedy consequences for breach.
Confirm that the test function has the insulation needed to report honestly without career risk.
Put a review cadence in place (quarterly for long engagements, per-milestone for shorter ones) to reassess whether the criteria are still calibrated to the engagement.

Integrating Outsourced Components, the risk framework companion to this gate framework.
Exit and Release Criteria, the in-house counterpart for internal release decisions.
Critical Testing Processes, the methodology framework that positions vendor-boundary QA within a complete test function.
Release Management Processes, companion for the release-management side of vendor-delivered artifacts.

Verifying Third-Party Quality: Entry and Exit Criteria Across the Vendor Boundary

Why the vendor boundary needs its own treatment

The four types of criteria

What good entry criteria cover

What good exit criteria cover

Writing exit criteria that are actually measurable

Suspension and resumption criteria

The contractual mechanics

The escalation path

Role discipline, keeping the test function out of the enforcer position

Tailoring to the engagement

Modern considerations: SaaS, LLM APIs, and continuous delivery

Implementation checklist

Related reading

Starting AI Adoption: A Sequence for Mid-Market Engineering Teams

Evaluation Before Shipping: How to Test an AI Application Before It Hits Production

Choosing the Right Model (and Knowing When to Switch)

The Case for Investing in Testing: A Board-Level Argument for Enterprise Test-Function Capability

Deciding When to Bring in External Help: A Framework for Training, Consulting, Staff Augmentation, and Outsourced Testing

Investing in Testing, Part 1: The Cost of Software Quality

Where this leads

Risk Reduction & Clear Decisions

Software Quality & Security

Ecommerce Solutions

Working on something like this?