Seven Steps to Reducing Software Security Risk

Flagship whitepaper · Software Security

Developing secure software is no longer desirable, it is essential. Most of the exploits making the news affect the application layer, not the operating system or the network. This article lays out a seven-step program for systematically reducing security risk in an existing codebase, and tells you what has changed between the original 2008 playbook and today's security landscape.

Read time: ~14 minutes. Written for engineering leaders, application-security engineers, and test managers whose products ship into an increasingly hostile environment.

Why the application layer matters

Some engineering teams still assume most security problems arise from the operating system or networking layers, well below the application code they are working on. Published data on web-application exploits has shown for over fifteen years that the opposite is true: the large majority of exploits arise from applications, not the underlying infrastructure. That pattern is stronger now, not weaker. The operating system and network layers have hardened dramatically since the 2000s; the application layer, with its accelerating surface area (APIs, integrations, SaaS ingress points, browser-side code, LLM features, third-party SDKs), has gotten harder to defend.

The current trend is also away from blunt-force mass attacks and toward carefully crafted, criminal attacks on specific applications: often economically motivated (ransomware, payment fraud, data theft for resale, supply-chain compromise) and increasingly AI-assisted on the attacker side. Your code is a target. So is every dependency it ships.

You know you need secure software. Where do you start? What are your risks? What vulnerabilities are already in your codebase? How do you reduce risk without introducing new problems? How do you track progress? The seven-step program below is an answer. It works for a codebase you already have (not a greenfield project with perfect foresight) and it is designed to be run as a continuous program, not a one-off audit.

Step 1, Assess the risks

Applications have characteristic security risks that come from their implementation technology, their business domain, and their deployment context. A systematic first pass should cover all three.

Implementation-technology risks. Every language, framework, and runtime has a characteristic bug profile. Memory-unsafe languages (C and C++) introduce buffer overflows. Languages and frameworks that concatenate queries expose SQL injection. Client-side code (JavaScript / TypeScript running in a browser) introduces XSS and DOM-based attacks. Serverless runtimes expose over-permissioned IAM roles. Container images ship with vulnerable base layers. Each technology carries an implicit threat model; recognize yours.

Business-domain risks. Applications dealing with money attract financial attackers. Applications holding personal data are subject to regulation (HIPAA for health data in the US; GDPR in the EU; CCPA and the state-level privacy laws in the US; LGPD in Brazil; PIPEDA in Canada; and the current alphabet soup of sector-specific rules). Applications with access to infrastructure (cloud IAM, payment rails, messaging, internal APIs) attract supply-chain attackers who will ride your application into the broader environment.

Deployment-context risks. An application that is correct in isolation can be made insecure by how it is deployed: a secrets-free codebase with credentials leaked into CI logs; an OAuth flow whose callback is misconfigured; an internal service exposed to the public internet by a load-balancer rule; a perfectly good API rate-limited at the wrong layer.

Sources to pull from

OWASP Top 10: the canonical web-application vulnerability list; the 2021 edition is the current baseline as of this writing, with revisions tracked by OWASP.
OWASP Top 10 for LLM Applications: essential for anything using foundation models: prompt injection, training-data poisoning, model denial of service, over-reliance on LLM output, insecure output handling.
OWASP ASVS: the Application Security Verification Standard; good for a structured controls-based analysis.
MITRE ATT&CK: the industry-standard model of adversary tactics and techniques for threat modeling.
CWE: the weakness enumeration that most scanner output is keyed to.
CERT / KEV: CISA's Known Exploited Vulnerabilities catalog; if something on this list affects your stack, it is not theoretical.
The Risks Digest archives: still an excellent source of long-tail anecdote and commentary.
Your industry's regulator guidance: PCI DSS for payments, HIPAA for health, FedRAMP for US federal, SOC 2 for B2B SaaS.

Producing the risk list

Meet with stakeholders (not just engineering: product, legal, compliance, security, operations, customer success) and use the sources above to build a prioritized list of security risk items, each with a likelihood and impact rating. The Quality Risk Analysis methodology covers the mechanics.

Likelihood is the chance of a given risk becoming an actual security bug in your software. Impact is the effect on customers, users, business, regulators, and reputation if the bug is exploited. Multiply for an aggregate. Band to extent-of-testing.

Step 2, Test to know where you stand

If you're like most teams, you aren't starting fresh on every release. How secure is the code already? If you haven't checked, check.

This is the penetration test step. Its purpose is to find the failures your application currently presents to the outside world.

What a modern penetration test looks like

Modern pen testing is several things combined:

External black-box testing: simulated attackers hitting the application's public surface with no internal knowledge, looking for exploitable vulnerabilities.
Authenticated testing: simulated attackers operating as legitimate users, looking for privilege escalation, access-control gaps, and tenant-isolation failures.
Configuration testing: looking at the deployment: IAM policies, network ACLs, logging, secret handling, container image provenance.
Misuse-case testing: end-to-end scenarios of what an attacker would do, not just what a feature requires. A misuse case for an e-commerce checkout might include account takeover, gift-card fraud, and coupon enumeration. A misuse case for an AI product might include prompt injection via document upload, training-data leakage through the model's responses, and cost-exhaustion attacks against the vendor API.
Mobile-specific testing: tampered clients, jailbroken devices, intercepted traffic, reverse-engineered API keys.

Build or buy?

You have two options, and many teams do both.

Hire external specialists for periodic red-team engagements. They bring adversarial perspective you can't easily grow in-house. Required annually (or more) for most compliance regimes.
Build internal capability for continuous testing. Bug-bounty programs (HackerOne, Bugcrowd, Intigriti) are the productized version of this. An internal AppSec team running continuous DAST (Dynamic Application Security Testing) against staging is the engineering version.

The investment trade-off has shifted since the 2000s. Bug-bounty programs and DAST-as-a-service have made continuous external testing much cheaper than it used to be. Most teams today use a combination of continuous DAST + periodic red team + bug bounty.

Don't neglect the deployment

The best lock in the world does no good if it's installed in a rotten-wood door. Applications with good security features in insecurely-configured environments get hacked. Pay attention to:

Secrets in configuration files, CI logs, container images, client-side JavaScript, S3 buckets.
Default passwords and over-permissioned service accounts.
Unencrypted email containing account credentials (still common, still catastrophic).
Notification flows that echo sensitive data.
CI/CD pipeline compromise vectors, this is the supply-chain layer.

Step 3, Analyze to know where you stand

Pen testing finds security failures. Not every security bug causes a failure in a given test, so pen testing alone misses underlying bugs that don't happen to produce a symptom under the test conditions. Static analysis finds bugs whether or not they've produced symptoms yet.

Static Application Security Testing (SAST)

SAST scans source code without running it. It looks for known dangerous patterns, unvalidated input being passed to a database query, unsanitized user input being concatenated into an HTML response, weak cryptographic primitives, hard-coded secrets, unsafe deserialization.

Current tooling:

Semgrep: rule-based, fast, open source. The modern default for most teams.
GitHub CodeQL: query-based, deeper, integrated with GitHub Advanced Security.
SonarQube: broader code-quality + security.
Snyk Code, Checkmarx, Veracode: commercial, with varying focus.
AI-assisted analysis: LLM-powered tools that understand data flow across languages and frameworks. Useful but noisy; pair with SAST, don't replace.

For a large existing code base, any of these tools will identify many findings. Not all are the same severity. Good tools let you tune rules at fine granularity. Your risk list from Step 1 guides which rules to prioritize.

Dynamic Application Security Testing (DAST)

DAST scans the running application (the HTTP traffic, the browser behavior, the API responses) for exploitable behavior. Runs against staging or production (carefully).

OWASP ZAP: open source, mature.
Burp Suite: industry standard, commercial.
Invicti / Acunetix: commercial crawling DAST.

Interactive Application Security Testing (IAST) and RASP

IAST instruments the running application to observe both the code paths exercised and the security-relevant events. RASP (Runtime Application Self-Protection) goes further and blocks exploitation attempts in real time. Both are now mainstream in modern enterprise stacks (Contrast Security, Imperva, Waratek, etc.).

Software Composition Analysis (SCA) and SBOM

This was the big change in the decade after the original 2008 playbook: most application code isn't written by the application team. Modern apps pull in hundreds to thousands of transitive open-source dependencies. Those dependencies have their own vulnerability streams, their own maintainer dynamics, and today their own sophisticated supply-chain attack surface (typosquatting, compromised maintainer accounts, malicious package versions).

SCA tools (Snyk, Dependabot, Renovate, GitHub Dependency Graph, Sonatype, JFrog Xray) scan your dependency graph for known vulnerabilities and produce a Software Bill of Materials (SBOM). Executive order EO 14028 in the US made SBOMs table stakes for federal software suppliers, and most enterprise buyers now require them.

Treat SCA as a first-class scanner alongside SAST and DAST. More exploitable findings come through your supply chain than through your own code today.

After the scan

For every scanner and tool, the output is a list of findings. Feed them into the same defect-tracking process you use for functional bugs. For each, capture:

Type / CWE / OWASP category.
Severity and priority.
Affected code or component.
Date introduced (git blame; often revealing, see Step 4).
Classification against the Step 1 risk list.

Step 4, Evaluate to understand where you stand

Now you have data. What does it mean?

Immediate triage

Sort by priority × severity. The highest-severity, highest-priority findings get fixed immediately. Supply-chain advisories with active exploitation in the wild (CISA KEV entries affecting your stack) are top priority, they are not theoretical risks.

The canonical historical example is Microsoft's "Trustworthy Computing" pivot in 2002, where the number of critical security bugs became so high that Microsoft programmers did nothing but address them for months. You might not be in that deep a hole, or be able to spare that much effort, but the urgent items get fixed first regardless.

Cluster analysis

Bugs cluster. They are not uniformly distributed across the codebase, they live in the minority of modules. Decades ago IBM reported that 38% of production bugs in MVS lived in 4% of modules. The pattern holds in modern codebases: a small percentage of services accounts for a large percentage of vulnerabilities.

Identify those clusters. If one or two components account for most of the findings, refactoring or rewriting those components is often more effective than patching them one bug at a time. Budget for that as a security program, not as a feature.

Pattern analysis

Which CWE categories show up most? Answer: that is what your team is underpowered against. For each top CWE:

Is there training that would prevent the category? (e.g. secure-coding training for SQLi, XSS, SSRF, insecure deserialization).
Is there a framework-level mitigation that would eliminate the category? (parameterized queries, auto-escaping templates, CSP headers, an authorization middleware).
Is there a linter rule or SAST rule that should be blocking these at PR time?

Don't play whack-a-mole on individual bugs. Find the source of the pattern (a missing framework, a missing review step, a missing training, a missing CI gate) and close it.

Age-of-code analysis

Software "wears out" not through physical decay but through ongoing maintenance that gradually reduces code quality. Older code, especially code from a period when the team was new to the language or framework, often concentrates vulnerabilities. Plan for long-term refactoring of decrepit modules that are disproportionate contributors.

Step 5 (Repair the problems) carefully

Any bug fix introduces risk that a new bug will be shipped in the fix. This applies to security bugs as strongly as to any other. Worse: a fix for a security bug can introduce a functional bug that appears to be unrelated.

Manage regression risk deliberately

Code review by at least two reviewers with relevant security knowledge. For high-severity fixes, include an AppSec engineer.
Static analysis on the fix. Run the SAST tool on the diff, not just on the full codebase at the next scheduled scan.
Automated unit tests that specifically cover the vulnerable path, a test that would have caught the bug and would catch a regression if the fix is incomplete.
Integration tests that exercise the end-to-end flow under which the bug was exploitable.
Pen-test re-verification after deployment to staging.

The shift-left principle

Every step above is cheaper the earlier you do it. Today the common pattern is:

Stage	Tools	Work
IDE	Semgrep IDE, CodeQL extension, AI-assisted code review	Catch obvious issues at keystroke time
Pre-commit hook	git-secrets, gitleaks, Semgrep	Block secrets, block trivially bad code
Pull request	SAST, SCA, CodeQL, AI review	Automated security review on every PR
CI	SAST, DAST, SCA, fuzzing, container scanning	Full security scan before merge to main
Staging	Continuous DAST, IAST	Observe runtime behavior before production
Production	RASP, WAF, runtime monitoring, anomaly detection	Detect and block active exploitation

No single layer catches everything. Layering is the approach.

Secure-by-default frameworks

Many of the modern AppSec wins are not security tooling but secure-by-default frameworks: web frameworks that template with auto-escaping on, ORMs that parameterize queries by default, HTTP client libraries that validate TLS properly without configuration, authentication libraries that invoke proper session hygiene by default. Where possible, choose these over libraries that require explicit correct configuration.

Step 6, Examine results in the real world

Process changes only matter if they change outcomes. Instrument the ones you've made.

Metrics that matter

Known-security-bugs trend over time. Monthly or quarterly count of open security findings by severity. The expectation is a long-term downward trend, with some month-to-month variation.
Time-to-remediate by severity. Mean time from discovery to fix, by critical / high / medium. A good program is driving these numbers down.
Field incident trend. Security-related customer incidents and support tickets by month. Similar pattern: expect long-term decline.
Escaped defects. Bugs that reached production and were found by external parties (researchers, customers, attackers) vs. internally. A declining ratio of external-to-internal discovery is one of the strongest signals of a maturing program.
Supply-chain CVE count. Open CVEs in your dependency graph, by severity.
Coverage of security tests against the risk list. How many of the Step 1 risks have active, passing tests?

Expect variation. Financial applications see security issues cluster around fiscal-year-close; e-commerce applications see them around high-traffic events. The long-term trend is the signal, not the month-to-month.

External intelligence

In addition to monitoring your own data, monitor the industry:

CISA advisories, vendor security bulletins, and CVE feeds for your stack.
News of breaches in applications like yours: business-domain analogues, implementation-technology analogues, or both.
Bug bounty reports across your industry: HackerOne and Bugcrowd publish aggregated statistics that show what classes of bugs are being actively exploited.

When you hear about a problem that might apply to you, update your risk analysis and re-evaluate.

Step 7, Institutionalize success

The last step is to do the first six over and over again, on every project, every release. You don't start from a clean slate each time (you use existing work as a baseline) but you do need to revisit:

Re-assess security risks. New features, new deployment contexts, new dependencies, new threat intelligence.
Re-test the application for security failures. New code is new attack surface.
Re-analyze the codebase for underlying bugs. Scanners improve; run the current tools.
Re-evaluate patterns in risks, failures, and bugs. What has changed?
Repair with care. Regression management doesn't stop.
Re-examine the real-world results. Has the trend held?

Why this step matters most

A common failure pattern: after a big push to improve security, teams celebrate, relax vigilance, and gradually slip back into old practices. A pen test finds issues; the team fixes them; the team builds new features with the same insecure practices; the next pen test finds the same classes of bug in the new features. The symptoms were treated; the underlying practice was not.

Institutionalizing success means making the security program part of the development process, not a phase or a project. Concretely:

Every PR is security-reviewed (by tools, by human, or both). Not just the ones touching "security-sensitive" code.
Every sprint's risk analysis includes security risks.
Every retrospective includes a security-findings review.
Every new hire gets secure-coding onboarding.
Every quarter, refresh the Step 1 risk list against current threat intelligence.
Every release, track the security metrics from Step 6.

If the security program is not scheduled, not owned, and not measured, it is not a program.

Current additions: AI and LLM security

AI-powered features are now routine in application code, and they introduce a distinct class of security risk that didn't exist when the original seven-step framework was written. The OWASP Top 10 for LLM Applications is the current reference. Key concerns to fold into your risk assessment:

Prompt injection. An attacker crafts input that manipulates the model to ignore its system prompt. Defend at the input layer, the output layer, and the tool-use layer, not just one.
Insecure output handling. LLM output consumed downstream as code, SQL, HTML, or commands is the LLM-era version of SQL injection. Treat LLM output as untrusted user input at every boundary.
Training-data poisoning and model theft. Relevant primarily to teams training their own models; less relevant to teams using vendor APIs.
Model denial of service / cost exhaustion. LLM inference is expensive; an attacker can drive your bills up by crafting long or expensive prompts. Rate-limit at the user level and the token level.
Sensitive data in prompts. What you send to a vendor's API leaves your perimeter. Understand your vendor's retention and training policies.
Over-reliance on LLM output. An LLM that "confidently" emits wrong, biased, or unsafe content at scale is a failure mode with no analog in traditional software. Design human-in-the-loop or guardrail gates for consequential outputs.

Each of these maps onto the same seven-step program, assess, test (against LLM-specific misuse cases), analyze (tooling is emerging: Lakera, Rebuff, Protect AI, LangSmith evals, NVIDIA NeMo Guardrails), evaluate, repair, examine, institutionalize. The framework is stable; the attack surface is new.

Takeaways

Software security is a program, not a project. The seven-step cycle runs continuously.
The application layer is where most exploits happen. The operating system and network have hardened; applications haven't, relatively.
Every step has a current tooling landscape (SAST, DAST, IAST, SCA/SBOM, DevSecOps pipelines, bug bounties, RASP) that makes the work cheaper than it was when the original playbook was written. Use it.
Clusters matter more than individual bugs. Find the source of a pattern and close it.
LLM-integrated applications add a new class of risk that plugs into the same framework.
Institutionalizing success is the step most commonly skipped and the one most responsible for long-term outcomes.

Seven Steps to Reducing Software Security Risk

Related reading

Evaluation Before Shipping: How to Test an AI Application Before It Hits Production

Choosing the Right Model (and Knowing When to Switch)

Beyond ISTQB: A Multi-Domain Certification Roadmap for Technical L&D

The ISTQB Advanced Level path, mapped

Bug Triage: A Cross-Functional Framework for Deciding Which Defects to Fix

Building Quality In: What Engineering Organizations Do from Day One

Where this leads

Software Quality & Security

Risk Reduction & Clear Decisions

Reliable Software at Scale

Working on something like this?