An Introduction to Exploit Reliability

Earlier this year I was invited to give a talk at University of California San Diego (UCSD) for Nadia Heninger's CSE 127 ("Intro to Computer Security"). I chose to talk about modern exploit development, stepping through the process of finding and exploiting some of the memory corruption bugs that the class had been studying so far with Nadia.

I think it's a great school, and the main campus and surrounding area has a nice feeling too. The sand beaches and crumbling bluffs at Torrey Pines, the famous golf course, the cluster of biotech companies, the Eucalyptus-lined streets, and a frankly excessive amount of brutalist architecture for a place that gets this much sunshine.

One interesting effect of planning a talk about exploit development in an academic setting is remembering how un-referenceable large parts of exploit development really are. These days you can find some great resources for the parts about finding a good bug and building a proof-of-concept exploit, but there's not much beyond that. Turning a proof-of-concept into a fully-fledged product that can be packaged up and sold as a capability is something entirely different, and that work is mostly done in secret.

A big consideration when you're writing exploits professionally is exploit reliability. Exploit reliability hasn't traditionally been a big focus area in the defensive community's thinking about security, so the purpose of this blog post is to explore the concept in more detail, but from a primarily defensive point of view.

What is exploit reliability?

Exploit reliability is a measure of an exploit's failure rate. If your exploit is 100% reliable, you know it's always going to succeed. If it's 90% reliable, then your exploit will fail in one in ten cases, and so on.

There are lots of related questions that follow. What happens when a "failure" occurs? What work can you put in to minimize that chance of failure? How do you return a successfully exploited process to its original state so that it continues to function correctly? How do you cleanly integrate a successful exploit with your implanted payload, and ensure that your implant can execute in a stable manner?

It's not exactly glamorous work, but in certain scenarios exploit reliability is a make-or-break factor for an attacker.

Chris Evans has a great technical overview here, and Skylar Rampersaud's "Creating User-Friendly Exploits" from 2009 gives an excellent insight into the exploit developer's mindset around reliability.

Reliability is an important consideration for every exploit, but particularly for 0day exploits, and particularly for 0day exploits that are using memory corruption bugs (which is most of them, for now).

First let's step back and think about what exploits are used for. At a basic level we use exploits to gain unauthorized access to systems and/or data. Sometimes it doesn't really matter if an exploit fails: just keep trying, just move on to a new target. But there are other times where only one target matters, and it's imperative that they don't know that they are the target.

If an exploit fails, that can lead to all sorts of negative outcomes for an attacker. The first concern is that a failed exploit will often fail in ways that lead to the attacker's target getting suspicious. If you wake up in the morning to 100 missed WhatsApp calls from an unknown number in your notifications, that's weird. If you click on a link that your boss sends you and your browser crashes, that seems weird. If your phone mysteriously reboots after you join the WiFi network in Davos, that's also pretty weird.

This is all particularly challenging for attackers, because the target of these exploits often knows that they're a target, and so they're naturally more inclined to be suspicious of weirdness. The outcome of these suspicions can be varied -- perhaps the target changes their behavior in some way that's detrimental to you as an attacker (like shifting certain types of communications offline), or they change technologies (like upgrading a mobile device), or they enlist the support of technical experts (like all the activists who have Citizen Lab on speed-dial). All of this is bad news for an attacker with an objective, and may lead to losing the exploit capability entirely if the bug you were using gets found and fixed.

The other side of things is that you might only have one good opportunity to compromise the target -- perhaps you're only confident that they can get one or two clicks on your malicious link, or you have a limited timeframe where you're in close enough proximity to launch a WiFi-based exploit. In that case, poor exploit reliability means that your core objectives will not be met. That's a deal breaker.

The core value proposition of a 0day exploit is a high rate of success combined with a low rate of detection. That's why the people who write these exploits are obsessed with exploit reliability, because bad exploit reliability affects both of these factors.

Why should defenders care about exploit reliability?

Whenever you come across a situation where attackers are much more engaged and interested in a topic than defenders, that's an asymmetry that's worth noticing. As defenders it's our job to cause additional costs on attackers, and so it makes sense to understand what keeps an attacker up at night. Exploit reliability is one of those things. So what can a defender do to take advantage of this asymmetry?

The challenge for defender's is that it's very hard to measure exploit reliability when you're not in a position to develop and deploy exploits. Most of what I've learned about exploit reliability has come from late-night chats with professional exploit developers, rather than first-hand experience. That's hard to scale. In saying that, I think we know enough (just barely enough) to start thinking about what a defensive strategy might look like.

At a high level, we know that exploits fail because something unexpected happened. An unexpected heap layout, an unexpected value for a pointer, an unexpected behavior for an API or system call, and so on. So as defenders you want to cause unexpected things to happen when an exploit is running, and to detect and act upon any failures that result.

This is easier said than done. In my experience, it's already very difficult to enable defensive security measures that have a known, concrete, and measurable impact. Defensive changes that have some sort of probabilistic benefit are not very popular, and there's good reasons for that. Still, defenses that make exploit reliability harder is an interesting idea to explore, and it's still possible to find simple and cost-effective improvements.

What can defenders do to make writing a reliable exploit harder?

If you're writing the code that's being attacked (like if you're an operating system or browser vendor) then you have quite a lot of options, and you're in a good position to affect exploit reliability. The general theme is to increase the entropy in your execution environment, and we've already seen some good progress on this front.

GWP-ASan is a good example of this, where the heap allocator changes its behavior to be more favorable for bug detection (at the expense of performance) at a certain low percentage of allocations. An attacker has to factor this probability of running into an "attacker hostile heap", and in some cases they may not completely be able to. That's bad for exploit reliability, and good for defenders.

Another example is the default memory allocator for Android, Scudo, which randomizes the location of allocated chunks. This is on top of the normal address space layout randomization (ASLR) that most platforms support. For certain types of bugs (particularly where the ability to heap spray is constrained), this chunk-based randomization can mean that an attacker is uncertain which data structure their heap overflow will overwrite, and that can lead to instability in an exploit.

An even more sophisticated approach is used in iOS's kernel allocator, called kalloc_type. With kalloc_type, Apple introduced both randomization and partitioning into kernel heap allocations, meaning that an attacker doesn't know if two different allocation types will be allocated in the same memory region, and the random partitioning of allocation types changes every time the device boots. This added uncertainty makes writing a reliable exploit for a use-after-free or out-of-bounds write kernel vulnerability much more challenging in recent versions of iOS.

As a platform vendor, you also have the ability to collect real-time telemetry like behavioral events, performance events, and crash logs. Analyzing crash logs to find signs of exploitation is like looking for a needle in a haystack, and most suspicious looking crashes are actually the result of hardware failure or buggy third-party software, but there have been some recent successes with using heuristic analysis of crashes to find exploits.

Attackers tend to assume that any crashing failure of their exploit will lead to detection, and defenders should work hard to make that a reality. The science behind finding exploits in crash logs is still in its infancy, and the technical details are a closely-guarded secret -- but if you're in a position to perform this type of work, you should probably be trying to learn more about this type of analysis.

What about enterprise networks?

For enterprise defenders, your options are perhaps more limited, and any efforts here should be carefully weighed against deployment risks and maintenance costs.

At a high level, your first goal would be to make your systems unpredictable compared to a normal enterprise network, and especially to make your critical systems unpredictable compared to your other systems. Your second aim would be to have a very good understanding of what your "baseline" looks like given the logging and telemetry that's available to you, such that any anomalies stand out and can be investigated further.

An example of this would be tactical deployments of Microsoft Edge's Super Duper Secure Mode, Grsecurity for Linux, or Apple's Lockdown Mode for iOS and macOS. Each of these technologies have behavior that could decrease the reliability of exploits, or even block certain exploits entirely.

The key idea is to utilize the fact that attacker's don't always know how your systems are configured until after the exploit has succeeded, and to use that to your best advantage. Even Endpoint Detection & Response (EDR) tools have been known to disrupt browser exploits in the past -- for example, by installing hooks on certain NTDLL APIs in a way that the exploit's second stage didn't anticipate.

What about all those enterprise server exploits we've been seeing recently?

While I've mainly been discussing exploit reliability in the context of 0day exploits using memory corruption bugs, this idea of exploit reliability does apply to any type of attacker and any type of bug.

Just take a look at the CISA KEV database or the CISA/FBI most exploited CVEs list -- most of the issues listed are not memory corruption vulnerabilities. Instead, we commonly see logic issues like command injection, path traversal, and deserialization bugs, all of which typically have extremely high exploit reliability.

I don't think it's a coincidence that the opportunistic attackers who are performing widespread, indiscriminate exploitation typically gravitate toward exploits with high exploit reliability: high-reliability exploits are easy to use exploits.

In general, vulnerabilities that are based on design flaws and logic bugs that can be exploited with 100% reliability are going to be increasingly valued by attackers, and this is true at every level of sophistication.

Final Thoughts

Memory safety, exploit mitigations, and sandboxing are improving software security, and so it's getting much harder to find vulnerabilities that can be reliably exploited. Attackers need reliable exploits to achieve their objectives, and so this will remain a big focus and challenge for attackers in the near future.

A challenge for attackers is an opportunity for defenders. The more we see exploits being used in-the-wild, the more we understand how fragile and complex exploits can be. The margin between a fully reliably exploit and one that barely functions at all is often razor thin.

As a defender, understanding the areas of fragility in an exploit can unlock interesting approaches to improving security. Some of those approaches are simple enough to be cost-effective, even when the defensive impact is probabilistic in nature.

- Ben Hawkes