Structural detection vs pattern matching: why idioms defeat regex scanners

3 min read · AppSec practice
TL;DR

Pattern-matching scanners recognize vulnerable code by its surface shape, so any unfamiliar idiom, refactor, or wrapper function walks straight past them. Structural analysis reasons over syntax trees, control flow, and data flow, detecting the class of vulnerability regardless of how it is spelled. The practical test: rewrite a known-vulnerable snippet in a different idiom and see if your scanner still finds it.

Two scanners can both claim to detect SQL injection and be doing profoundly different work. One is checking whether your code looks like SQL injection examples it has been taught. The other is checking whether attacker-controlled data actually flows into a query. The difference sounds academic until the first refactor, when one of them goes quiet.

How pattern matching fails politely

Pattern-based detection, regexes and syntax templates over source text, has real virtues: fast to run, fast to write rules for, easy to understand. Its failure mode is equally easy to understand: it detects the spelling of a vulnerability, not the vulnerability.

Consider the same injection written three ways: string concatenation directly at the query call; the same concatenation assigned through two intermediate variables first; the query built inside a helper function that a dozen call sites feed. A pattern tuned to the first spelling misses the second and third, not because the risk changed, but because the surface text did. Every codebase’s private idioms, wrapper layers, and utility functions become, in effect, camouflage.

The insidious part is the silence. A pattern that no longer matches does not report reduced confidence; it reports nothing, and a clean report reads as safety. Teams then optimize for the scanner: code that avoids the flagged spellings passes, whatever it actually does.

What structural analysis does instead

Structural detection operates on what the code means: the abstract syntax tree, control flow, and, critically, data flow. For injection classes, the analysis is taint tracking: identify where attacker-influenced data enters (a request parameter, a file, a message), follow it through assignments, calls, string operations, and returns, and report when it reaches a sensitive sink (a query, a command, an output stream) without passing through a recognized sanitizer.

Under that analysis, the three spellings above are the same finding, because they are the same data flow. Rename the variables, add layers of indirection, move the sink behind a helper: the taint path persists, and so does the detection. Cross-file and cross-function tracking extends the same reasoning through real architectures, where source and sink are rarely adjacent.

There is a second benefit that matters as much as recall: evidence. A structural finding is a path, source to sink, step by step, which is exactly the proof that makes triage fast and lets a developer confirm the issue without a security escort. A pattern match, by contrast, can only say “this line resembles something bad.”

The different-idiom test

You can evaluate any scanner’s depth in an afternoon without reading its marketing:

  1. Take a vulnerability it definitely detects, in its obvious spelling.
  2. Rewrite the same flaw in an idiom the tool has plausibly never seen: indirect the data through a field, wrap the sink in a local helper, rebuild the string in a loop.
  3. Scan again.

Structural engines keep finding it, because the data flow survived the rewrite. Pattern engines go quiet on some variant, and wherever they go quiet, your real codebase’s idioms are already living. This test is a standing gate inside SecuSAST’s own development: detection that only works for the canonical spelling does not ship as detection.

Where patterns still belong

Honesty requires the converse note: not everything needs data flow. Configuration keys, protocol constants, and file formats are legitimately pattern-shaped problems, and secrets detection rightly combines format awareness with contextual analysis. The failure is not using patterns; it is using patterns for semantic vulnerability classes, injection, deserialization, path traversal, where meaning, not spelling, defines the flaw.

The question to put to any scanner vendor is therefore simple: when my team writes the same vulnerability differently, do you still see it, and can you show me the path? Tools built on structural analysis answer with a trace. Ask for the demonstration, on code written your way.

See this working in your own network
A 30-minute live session, no slides, your questions.
Request a demo