Every so often a piece of security research will generate a level of excitement and buzz that's palpable. Dan Kaminsky's DNS bug, Barnaby Jack's ATM Jackpotting, Chris Valasek and Charlie Miller's Jeep hacking escapades. There's something special about the overheard conversations, the whispered sightings of the superstar du jour, and the packed-to-the-rafters conference hall. These moments have delivered something more than just research: they delivered entertainment.
Stagefright was one of these big moments. A frenzied feeling in the air, a willing showman, and a message to deliver. Mobile security was broken, seriously broken.
It's been 8 years since Stagefright's careful dissection of Android's remote security posture, and it seems like a great time to revisit the event and its aftermath. Like any great piece of research, Stagefright changed the world, and it's only with hindsight that it's really possible to understand how.
Setting the Stage
The story starts with Joshua "jduck" Drake. Jduck had been in and out of the hacking scene since the late 90's, but I first met him on the pulltheplug IRC servers in the early 2000s. We were both working our way through the vortex wargame at roughly the same time, and I felt a bit of a competitive spirit emerge with him. I wanted to beat jduck at the game, but he clearly knew his stuff. I parsed every tiny detail in his messages for tips and tricks to guide my own efforts, and I remember learning a lot.
Soon after this, jduck started working at iDefense. iDefense is a fascinating company that had a big role in the creation of the modern bug bounty, but that's a different story. Eventually Jduck moved to Metasploit (you can still find a bunch of his code there), and then to the security services company Accuvant.
That brings us to early 2015, when jduck switched from Accuvant to the mobile security company Zimperium. It looks like some of the Stagefright research traveled with him, probably from the tail-end of his time there. Sometimes it's these small quirks of timing that have to happen to achieve something great. I don't think Accuvant had the right DNA to pull off the publicity around Stagefright, but Zimperium did. Zimperium had Zuk Avraham as its CEO, and Zuk is a natural showman. It seems he had all the right instincts for bringing jduck's research to the world, and it was a great show.
So what exactly was Stagefright?
We take this for granted now, but Android lets you play videos -- like for example when you browse a webpage, or when someone sends you a video message in a messaging app. There's a lot of software engineering that has to happen to enable this, and it doesn't make sense for each mobile app to implement this on their own. Android provides an API that lets apps (including your browser and chat apps) display different forms of media without having to worry about all of the technical details.
And there are a lot of technical details: container formats, codecs, metadata, color profiles, hardware acceleration, and so on. In 2015 when jduck was researching this stuff, the media architecture was fairly simple. An app would receive some data over the network (like an .mp4 file), the app would invoke the Android media APIs, the Android framework would use interprocess communication (IPC) to send that data to a special process called the media server, the media server would process the data and display it on the right surface.
Inside of the media server, most of this processing happened in a library called libstagefright. With a library name that good, the exploits name themselves.
The attack surface of this library was huge, but jduck quickly narrowed in to one area that looked potentially fruitful, which was MPEG-4 decoding. Jduck realized that this code looked particularly sketchy and undertested, and also that it was a remote attack surface via text messaging (MMS). That means that an attacker could cause your device to start processing video files without your involvement or permission, all they had to know was your phone number. Jduck knew that if he could find a suitable vulnerability, that could be enough to compromise any Android device in the world.
In the end, jduck and Amir Etemadieh (Zenofex) wrote a simple fuzzer for MPEG-4 files. The fuzzer found several crashes on real Android devices, and although the crashes weren't directly interesting, they pointed him to areas of fragility in the code. From there, jduck manually audited the code line-by-line, and he found several excellent memory corruption vulnerabilities.
Jduck is quick to point out that while he is widely known for this research, he was standing on the shoulders of giants. Contributions and foundational research from Zenofex, Collin Mulliner, Mathew Solnik, Alexandru Blanda, Wang Hao Lee et al., and Charlie Miller all had an important role in enabling jduck's discoveries on Stagefright.
Was Stagefright actually exploitable?
Before we consider the impact that Stagefright had on mobile security, there's a short diversion on exploitability to consider.
When you have a memory corruption bug, all sorts of things can go wrong for you as an attacker. Weird quirks of memory layout, the order of dereferences on a structure, the range of values you can write. All of these can have devastating effects on the practical exploitability of an issue.
When you're considering remote exploitation of memory corruption bugs, there's one consistent hurdle for you as an attacker: address space layout randomization (ASLR). ASLR is like a slot machine that plays each time you boot up your device, spinning the location of each memory mapping in a process to a new random location each time. The theory is that an attacker needs to know those locations before they can do anything but crash your process.
But with a media-based vulnerability, the attack looks like it might be one-shot. Take the text messaging attack for example, we can clearly send data into the media server to get processed (i.e. triggering the vulnerability), but what information can we get back from the media server? And if we can't get information back from that process, how will we leak the ASLR locations?
One advantage at the time was that the media server process was a 32-bit address space, 64-bit support was just starting to roll out at the time. That potentially meant that you could use a heap spray technique, spamming the process with new memory mappings until there was statistical likelihood that there will be a mapping at a fixed location that you know. That solves the problem for data, but the code mappings would still be at an unknown location, and we'd typically need to know where the code is to trigger the next stage of the exploit's payload.
At the time I was working as a security researcher on Project Zero, and we decided this would be a fun problem to look at. Shortly after BlackHat, our team took off from Vegas and drove to Utah. We had rented a house in St George, planning to spend the days hiking Zion National Park, and the evenings hacking. The result of that trip was Mark Brand's blog post: Stagefrightened?
The basic summary is that the issue we looked at was exploitable. The MPEG-4 parser was flexible enough to allow the consistent overwrite of a known object, heap spray was feasible due to a reasonably small range for the randomized mmap base, and from there you could bruteforce the base address of libc due to the media server restarting after each crash (albeit after 5 seconds). Successful exploitation ranged from 30 seconds to 1 hour.
Once your exploit was successful, you were executing in a slightly restricted process that didn't give you full access to all the data on the system -- but the media server still had access to both the camera and microphone, and pivoting to a more privileged process or the kernel would have been relatively straightforward.
So yes, Stagefright was exploitable.
What changed after Stagefright?
The news of Stagefright's impending release reverberated up and down Shoreline Boulevard, and it was all that anyone was talking about in our office. It quickly became clear that this was going to be a major event in Android's history. It would be fair to say that Android had a shaky reputation for security at this point, and Stagefright threatened to be the final nail in the coffin.
Many organizations face watershed moments like this in security and privacy, something with such force and exposure that inevitably the organization will have to change. Android was no exception here. Up until this point, Google's security team had been consistently arguing for a bigger investment in Android's security posture, and Android's fledgling security team were stuck in a reactive cycle -- Play Store malware, lockscreen bypasses, permission bypasses, rooting apps -- these were all flowing in on a daily basis, and addressed in one-off fixes. They were a small team tasked with an impossible mission, given the meteoric rise of Android and the break-neck pace of development.
Arguably Stagefright was the event that helped executives at Android realize (as Bill Gates had 14 years earlier) that user trust was foundational to the success of their platform. The attitude of managing fires as they popped up had to change, and strategic investment had to occur. Much of the ensuing story happened behind closed doors at Google, but we can see just some of the tangible impact that Stagefright had from what was launched publicly in the months and years that followed:
- Monthly Security Bulletins: up until this point Android had no concept of a security update. Vulnerabilities were fixed in minor releases of operating systems, e.g. if you were on KitKat 4.2.1 and someone found a new Linux kernel bug, you'd have to wait until KitKat 4.2.2 was released to get the fix, and that could take several months. After StageFright (in fact just days after Stagefright was released), Android started releasing security updates every month, and also published a bulletin describing all of the vulnerabilities that were fixed. In practice Android OEM vendors (like Samsung and LG) and carriers (like Verizon) wouldn't consistently release these patches in a timely manner, but the foundational shift had begun, and eventually monthly updates for flagship and popular devices became standard.
- Compartmentalization: shortly after the patches for Stagefright had landed, Android security engineers started working on "de-privileging and isolating components that handle untrusted content". In other words, they took the singular process that did all of this complicated parsing and split it up into smaller and more restricted chunks. Now, if you exploited a bug similar to Stagefright, you wouldn't have free access to the camera and microphone. Aside from reduced privileges, a new layer of sandboxing called "seccomp-bpf" was used for the first time, reducing direct access to the kernel.
- Sanitizers: at the same time as working to improve sandboxing, the Android security engineers did something very creative. Compiler sanitizers (such as AddressSanitizer) had been used in testing and debugging for some time, but hadn't seen any real production usage. The Android team realized that a very specific sanitizer called UBSan (the "undefined behavior sanitizer") seemed to have acceptable performance characteristics, while also targeting a large portion of the bug classes that were seen in Stagefright. By enabling UBSan, Android could take certain types of integer overflows that would have been serious security vulnerabilities, and change them into benign crashes.
- Exploit Mitigations: the changes didn't stop there, and with Android P we started to see new exploit mitigations being enabled on the media stack. These are changes to the execution environment intended to make exploitation more expensive (or in some cases, impossible). Control Flow Integrity (CFI) was the first of these, which is a compiler mitigation designed to make it hard for an attacker to utilize the target application once they've managed to take control of it. At this point in time we already knew of a number of bypasses for CFI (such as finding useful function entry points or overwriting saved return addresses on the stack), but since the media stack was becoming so limited, CFI may have been a practical challenge for some types of bugs.
- And yet more compartmentalization, sanitizers, and exploit mitigations: then in Android Q, a new round of hardening appeared in the media stack. They strengthened the sandbox for the codec processing of media streams (historically a super buggy area), they enabled the integer overflow sanitizers across the entire media stack, and launched a new security-focused heap allocator called Scudo, designed by a hacker called Kostya Kortchinsky (Kostya wrote one of the first publicly known virtual machine escape exploits and knows a lot about heaps). Scudo uses a variety of clever tricks to make heap-related exploitation more challenging.
- Fuzzing: it turns out that binary-based media file formats like MPEG-4 are very amenable to fuzzing. You build a corpus of interesting files (which is easy if you're Google and you have the entire Internet packaged neatly in a box), you randomly mutate them, and then you start to observe what happens when you run them. If you see files that touch new areas of the target application, you keep those and keep going, running millions of tests on thousands of machine cores. Eventually your target crashes, and then you fix the bug that caused the crash. Rinse and repeat a few hundred times, and you've got yourself a "fuzz clean" file format parser, which is arguably (and many people do argue) a bit more secure than it used to be. In 2015, practically nothing on Android was being fuzzed. As of today, essentially all of the natively supported container formats and media codecs are being fuzzed by Google.
In essence, the media stack became the experimental playground for new security features in the Android platform, and most of those security features have held up very well. This is the defensive strategy of throwing the kitchen sink at the problem, and so far it seems to have worked well. Since Stagefright, there hasn't been a serious remote exploit for the media stack. ALHACK came close, but they only exploited it from a malicious app -- and of course my old crew at Project Zero managed remote exploits against WebRTC and image decoding that were similar in shape. A recent keynote (definitely worth a read) by DFSec's Android technical lead, Ki Chan Anh (@Externalist), declared the "end of the Stagefright era".
What does the attack surface look like today?
One of the attractions of the media stack is that it's one of the few remote C/C++ attack surfaces for messaging apps (image decoding is another one, but that's a separate technology stack on Android). That still sounds interesting, even if we know it will be difficult (or impossible). For the sake of curiosity, let's lay out a research plan for finding a Stagefright-style vulnerability that could work on a modern Android device running Signal.
The beauty of Stagefright was that it was a zero-click attack surface: no user interaction was required to start the exploit because thumbnailing and the media scanner were so aggressive. This immediately makes life challenging when thinking about Signal, because Signal has message requests. That means that any message (such as one containing an exploit) that originates from a new contact needs to be accepted before a two-way conversation can begin. There's no guarantee that our attacker's media file will be parsed automatically, and indeed this can quickly be confirmed by running frida-trace on Signal and hooking the native media APIs.
That means we have to adjust our expectations. We can't get an entirely zero-click attack surface on Signal just by sending a video from a random device: it has to be sent from a contact that's already trusted. Perhaps there are other zero-click ways to trigger the media attack surface on Signal, I certainly haven't done an exhaustive search, but the naive approach doesn't seem to work.
Still, we can imagine a world where our intended target already has a conversation going with someone we can utilize to send an exploit (like a diplomat talking to a different embassy), or perhaps we think we can trick them into accepting a new conversation. Trickery is the worst form of hacking though, so this is all a bit disappointing. Well done to Signal for being thoughtful about their zero-click attack surface.
Let's soldier on, and assume we can send a video from an existing contact. The first thing we can visibly observe is a thumbnail image for the video appearing in the chat window. This implies that the media file format is being parsed, at least enough to extract a single frame of the video. It would be possible to generate a thumbnail on the sender's device (as is done with link previews) and then send that image alongside the video, but fortunately for us that doesn't appear to be the design used in Signal.
The nice thing about targeting thumbnailing is that it requires one less click by the victim user, i.e. the victim doesn't need to click to play the video before the attack can succeed, since the thumbnailing happens automatically. Browsing the Signal source code, there seems to be several different Android APIs that are used for video thumbnailing, but all paths lead to the Android media extractor service, and we can confirm this with our frida-trace session.
The media extractor is a special process used by Android to unpack container formats and extract metadata from media files. When Stagefright was originally released, this process didn't exist. The media extractor process only appeared as part of the compartmentalization efforts in Android N. On the plus side, this process does a lot of complicated parsing of untrusted content. On the down side, it's a hardened and heavily sandboxed process, so even if we succeed in exploiting a bug, we've got a lot of work left to do.
So which formats can we actually attack? It depends on your device, but there are several standard formats that will be present on every Android device, and the device manufacturer can also add their own supported formats. Let's take a look at the Samsung S10, a common device that just recently fell out of security support:
|Library||File Formats||Support Type|
|libamrextractor.so||amr awb||Android Standard|
|libflacextractor.so||flac fl||Android Standard|
|libmidiextractor.so||imy mid midi mxmf
ota rtttl rtx smf xmf
|libmkvextractor.so||mka mkv webm||Android Standard|
|libmp3extractor.so||mp2 mp3 mpeg mpg mpga||Android Standard|
|libmp4extractor.so||3g2 3ga 3gp 3gpp 3gpp2
m4a m4r m4v mov mp4 qt
|libmpeg2extractor.so||m2p m2ts mts ts||Android Standard|
|liboggextractor.so||oga ogg opus||Android Standard|
|libsecamrextractor.so||amr awb||Samsung Added|
|libsecmidiextractor.so||imy mid midi mxmf
ota rtttl rtx smf xmf
|libsecmkvextractor.so||mka mkv webm||Samsung Added|
|libsecmp3extractor.so||mp2 mp3 mpeg mpg mpga||Samsung Added|
|libsecmp4extractor.so||3g2 3ga 3gp 3gpp 3gpp2
m4a m4v mov mp4
|libsecmpeg2extractor.so||m2p m2ts mts ts||Samsung Added|
|libsecoggextractor.so||oga ogg opus||Samsung Added|
|libsmkvextractor.so||mkv mka||Samsung Added|
|libswmfextractor.so||asf wma wmv||Samsung Added|
From this, a research strategy starts to emerge. On the one hand, the standard Android libraries are attractive because they are a ubiquitous attack surface, every device will have them. On the other hand, we know that Android invested heavily in fuzzing after Stagefright, and it's unlikely that we can dedicate more resources to this than they have to fuzzing these core libraries. I think this allows two distinct approaches:
- Target the vendor-supplied additions ("Samsung Added" in this table). While some of this attack surface looks like it might be duplicated, a lot of it isn't. File formats like WMV ("Windows Media Video") and FLV ("Flash Video") have had issues in their container formats in the past, and some of these other formats like DFF/DSF are entirely new to me. Samsung-provided code has a reputation for being under-tested relative to core Android, so it seems like a high probability area to explore.
I'd start by manually reviewing (e.g. reverse engineering, since these libraries aren't present in the Samsung open source package for this device) each of the extractor sniff routines and working forward from there. These sniff routines (you'll see them in the ExtractorDef structure for each library) are responsible for content-sniffing the untrusted data and deciding whether to return an extractor for this particular file. The extractor is then used to pull out the metadata required to eventually decode a single frame of the video (e.g. the thumbnail). At some point, either when the extractor is instantiated or when metadata is retrieved, the file format will be fully parsed and you'll be looking at some interesting code.
With a bit of work you may even be able to find that a library corresponds to an open source implementation, and then you have the option of cross-referencing recent bug fixes in the open source repository with the binary that you're reviewing. Sometimes vendors are slow to integrate upstream patches into their own products, and this could be all it takes to find a useful vulnerability.
Perhaps not though, and perhaps you've encountered an area of code that you feel is a bit too complex to review manually. At this point we would start fuzzing. The first step would be to build a corpus of interesting files for the file format we're targeting (this is an art of its own), and the next step is to use greybox fuzzing using AFL++'s frida mode. There are other options of course, but I think this approach would be sensible. A recent guide can be found here.
- Use Android's fuzzing efforts to guide our own research. The observation here is that we're not going to be able to out-fuzz Google in terms of machine resources or engineering time, but Google's fuzzing efforts do still provide a roadmap of sorts. Specifically, if we can assess what the Google fuzzers are doing, then we can focus our attention on areas they're not testing well.
The ideal way to do this is to look at code coverage reports, like those that oss-fuzz provide, but with Android that doesn't appear to be readily available. We can examine and run the fuzzers that are present in AOSP though, and that should usually give us a pretty good idea of what has been tested.
Based on this we have two directions to pursue. The first is to extend the existing fuzzers into areas and features that aren't currently supported, and the second is to manually review the code paths that aren't being reached. A good example where this would have been successful is the FreeType Load_SBit_Png 0day bug that was exploited in-the-wild (CVE-2020-15999); it was a shallow bug that slipped past the Chrome/oss-fuzz teams due to limitations in the FreeType fuzzing harness.
The first approach is the most likely to find any bug at all, but the bug will only affect Samsung devices. The second approach is your best chance to find a bug that's ubiquitous across all of Android, but that chance is quite a bit smaller. I'm hopeful that some intrepid reader will take up the mantle here and find some real bugs, but regardless we'll explore this type of vulnerability research in more detail in a future blog post.
For now though, let's imagine that we have found a potentially interesting bug. What would exploitation look like, and what would that buy us? The hurdles are quite immense. Even if you manage to find a memory corruption bug that isn't covered by one of the enabled sanitizers, and that you can wrangle with the attacker-hostile heap implementation (Scudo) to get a reliable arbitrary R/W primitive, you still need to bypass ASLR remotely.
Remember that the initial Stagefright exploits used bruteforce to bypass ASLR, and they could do so because most devices were 32-bit. Nowadays, most devices are 64-bit, and that's almost certainly going to make the heap spray technique a non-starter. That means you either have to find a way to leak a value back to the attacker (but remember that the media.extractor process doesn't have networking access), use a crashing oracle (described in more detail here and here), or to make the entire payload consist of partial overwrites, such that the entire exploit is relative and an absolute base address is never needed. The crashing oracles route has been the best option recently, but it's not exactly pretty.
Even if we're successful, the net result is code execution in a heavily sandboxed process. Between SELinux and minijail, the media.extractor process has very limited access indeed. Yes, you have access to all future media decodes, but it won't be easy to find a way to exfiltrate them. Realistically, you need to break out of this sandbox, and that means having a very powerful Android or Kernel bug, one in a core system service or a part of the linux kernel that isn't blocked by seccomp-bpf. This privilege escalation bug is likely going to be more valuable than the media bug you're exploiting, and you're probably going to have to be including it with every exploit attempt, since there's very limited ability to stage payloads here. That's not good.
The Stagefright era is over. There's no doubt there are still bugs here, and probably some of them are theoretically exploitable, but it's hard to see a world here in which this is the most cost-efficient way for an attacker to compromise a device, even in a scenario where all you have is the target's phone number. The browser, the baseband, even network device drivers -- they all are better options for remote attacks on Android today.
We can clearly say that the "Legacy of Stagefright" is seen in almost every aspect of Android platform security today. When confronted with an unwieldy and dangerous media codebase, the Android team could have taken the easy route: patch the bugs and move on, rinse and repeat. Instead, we saw generation after generation of structural security improvements being made to the Android platform, and the Android media stack became an incubator for some of the most important security technologies in modern computing. That's an impressive legacy.
Thanks to Joshua Drake (jduck) for technical review and feedback.
Update (2023-07-31): Android security engineer Jeff Vander Stoep helpfully added some additional context in this thread reply on X.
Jeff makes two important points – firstly that the Android media team deserve credit for doing the "heaviest lifting" in the work to harden the media framework, particularly around decomposing the framework in to the smaller components that enabled future hardening work. Kudos to them!
Interestingly, he also makes the point that monthly security bulletins were a work-in-progress even before Stagefright, and was nearly ready to launch when Stagefright happened. He says that Stagefright may have accelerated things by a month or two.
Zuk Avraham, the founder of Zimperium (the mobile security firm that published the Stagefright research) disputes this in a follow-up reply, suggesting that it was unlikely that Android OEM vendors like Samsung would have adopted monthly updates if not for Stagefright, and that Jeff's timeline of a 1-2 month acceleration may be overly optimistic.
In any case, Jeff is right to point out that a monthly security update program doesn't appear out of thin-air, and that Android certainly knew they had to work on improving updates at a much earlier point than Stagefright. Thanks for the feedback!
- Ben Hawkes