Pointer Authentication

ComputerGuru · on Dec 15, 2019

I’m guessing this was developed by or at the behest of Apple and ARM, based off the supported hardware and languages? Are any versions of the iOS or macOS kernel (or even user lands) utilizing this “across the board” now? I’d read papers and theory on strong pointer authentication to mitigate control flow attacks a very long time ago but I did not realize this was now “mainstream” in a consumer compiler (with support for multiple C-like languages to boot!); but it certainly is a tough sell without hardware support, both for security and (perceived) performance benefits. (I say perceived because it turns out that a lot of runtime safety checks are actually virtually undetectable as they are perfect fits for speculative execution and branch prediction, and very easily guessable with high success rates, as demonstrated by the rust benchmarks with and without runtime safety checks enabled having such close performance on modern x86_64 architectures.)

ARMv8.3 shipped with the instructions needed for this implementation of signed pointers in 2016, and presumably Apple played a good role in contribution this feature to the Clang codebase as no other hardware-accelerate authentication scheme is supported, per the document. I wonder if there are any plans to bring this to the desktop, by either of Intel or AMD. AMD is now in a position to actually develop new extensions rather than largely playing catch-up to Intel’s extensions (in recent years). (Then again, AMD remains the only one to really offer hardware acceleration for SHA [0], and that doesn’t seem to have really motivated developers to take advantage of that code.)

[0]: https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring...

kccqzy · on Dec 15, 2019

iOS since at least 12 has been using it. Here's a blog post by Project Zero describing it: https://googleprojectzero.blogspot.com/2019/02/examining-poi...

And I also remember attending a DEF CON (or a similar conference) talking about how the keys used to sign shared library code is shared across all processes (including unsandboxed, privileged processes) and depending on your perspective, it could be a vulnerability.

saagarjha · on Dec 15, 2019

It’s been in the compiler for over a year to support the A12 chip, which shipped with pointer authentication.

rurban · on Dec 15, 2019

> AMD remains the only one to really offer hardware acceleration for SHA

Intel added SHA with Goldmont, for smaller laptops. But it's only SHA1 and SHA2, which are already outdated. Good enough for ptrauth though.

ChrisSD · on Dec 15, 2019

SHA-2 isn't outdated if used correctly (e.g. combine with HMAC if length extension attacks are a threat to your protocol). SHA-3 is the "break glass in case of emergency" hash.

loeg · on Dec 15, 2019

Doesn't even need HMAC to break length extension. You can just truncate the result ("SHA-512/256"), or use SHA_d(x) = SHA(SHA(x)) or SHA(0^B || SHA(x)). (Although SHA_d starts to look a lot like HMAC.)

ComputerGuru · on Dec 16, 2019

Absolutely but it’s all besides the point for this particular application, anyway. If you fix the pointer size (e.g. payload (i.e. both the secure and raw pointers) must be 64 bits, no less, no more) then length extension is a non-issue.

(Rebuttal: Some languages have fat pointers or maybe some architectures support 32-bit binaries.. I think having the key tied to the payload size would address the theoretical weakness there? In practice signed pointers are already fat pointers and the contents of fat pointers are not actually addresses in the first place, and neither are supported in the current implementation.)

SlowRobotAhead · on Dec 16, 2019

You are right, SHA2 is fine still, mostly.

But... Combining with HMAC isn’t exactly a seal of approval. HMAC with MD5 is still considered secure because even if you made a hash collision you didn’t have the key.

And yes, SHA3 would be overkill. Most people don’t understand the mechanics of hashing let alone the new keying that SHA3 brings.

loeg · on Dec 15, 2019

AMD offers it across the board on models since Zen 1, which is probably what the article has in mind. It's nice that Intel offers it as an accelerator on their low-end parts, but ideally they would ship the support across the product range.

eyegor · on Dec 16, 2019

If you care about speed and are concerned about sha1/2 vulnerabilities, maybe take a look at blake2? Faster than sha1, probably a similar level of security as sha3 (it is a redesign of a finalist for sha3). It's also the default for nacl/sodium.

https://blake2.net/

rurban · on Dec 16, 2019

Nope. ptrauth needs a good hash in Hardware. There's only crc32c, aes, sha1 (-160) and sha256 (i.e. sha2-256). Of these crc32c is entirely insecure, aes not applicable, so we are left with AMD, Intel laptops (Goldmont), armv8 and Power8. For sure we don't care about any SHA vulnerabilities here, only about speed.

Blake2 is about a factor of 1000 slower. Secure hashes are not really applicable for pointer hashing (i.e. 48bit). Some need 16byte alignment (but you can use a stack word for that), some need excessive padding. Here the padding will kill you.

SAI_Peregrinus · on Dec 16, 2019

SHA3 is designed to be quite fast in hardware, there just isn't much hardware that accelerates it yet. Blake2 is faster in software, but that software is (as you note) much slower than SHA2's hardware implementations. So for existing systems SHA2 is the best option. Eventually SHA3 will probably be widespread and SHAKE128 with appropriately chosen output size will be the best choice.

rurban · on Dec 16, 2019

SHA2 is by far not the best option for ptrauth. The best option is HW and API dependant. There are dozens of better options. I recently included most secure hashes with tons of variants into https://github.com/rurban/smhasher/blob/master/README.md#smh... SHA3 would be atrocious. Something like fletcher2 or wyhash seem to be best.

rjmccall · on Dec 16, 2019

Right. Pointer authentication depends on it not being easy to forge a signature given the desired pointer. Given that there's a secret key included in the signature, the requirements on the hash function aren't really very strong. Signatures just need to not leak information about the key that could be used to predict other signatures.

SCHiM · on Dec 16, 2019

Although nice in theory, contemporary implementations of the same scheme leave a lot to be desired. Take Windows' control flow guard (CFG) as a practical example. The scope of valid indirect branch targets is so vast that even with CFG enabled it's almost always possible for an attacker to craft a ROP chain to achieve arbitrary code execution.

That doesn't invalidate the concept of pointer authentication as a whole, but does reduce the number of situations in which you should consider to apply it. If you have a large codebase 'bolting it on' will make pointer authentication much less effective. And when you're starting a new process, then why not start it in a language that offers stronger memory safety for a start, such as Rust?

olliej · on Dec 16, 2019

The point of PAC is that you can’t create an arbitrary return pointer - it’s not based on ahead of time knowledge of what they fill CFG state is. The return address is signed as a byproduct of making the call in the first place, so to return to a different location you need to be able to rop through a pointer with a forged signature.

geocar · on Dec 15, 2019

If you don't use the standard library, and you don't need JIT, you can simply not use pointers to callbacks. You can still have something like qsort() but you need to have statically defined:

    typedef void*(*callback)(...);extern const callback callbacks[256];

and qsort() takes an index instead of a raw pointer to a callback. "Validating" a callback is cheap: Just make sure it's <256 (how many do you need anyway?). If you don't do an unchecked call* or a jmp* then you don't have anything an attacker can exploit, and I find it hard to believe a cached load is going to be slower than something like this.

choeger · on Dec 15, 2019

Neat. But how do you make this modular and still safe? When the code invoking qsort() does not know about how many callbacks there are, can you still deal with it?

tedunangst · on Dec 15, 2019

The idea is you have some struct that is in peril. Perhaps it will be overflowed, or used after free, etc. It has a function pointer.

    struct danger {
        sort_fn sorter;
        char data[16];
    };
    danger->sorter(danger); /* what if it's a bad pointer? */

You replace that with this.

    int add_sort_fn(sort_fn sorter) {
        sort_fns[num_sorts++] = sorter;
        return num_sorts;
    }
    struct lessdanger {
        int sortidx;
        char data[16];
    };
    danger->sortidx = add_sort_fn(sorter);
    /* could clean this up a bit more */
    sort_fns[danger->sortidx % num_sorts](danger);

There's still the possibility of calling the wrong function, but only one from a finite set of possibilities. There's no direct control over the pointer value.

You can make the sort_fns array resizeable, but in practice there's usually only so many targets.

geocar · on Dec 15, 2019

Our qsort() caller is an application with at most 256 different comparators. Let us call that an ABI limit. The addresses are in read-only memory so they cannot be changed- new addresses simply cannot be introduced at run-time, so there is no risk of qsort being tricked into “jumping into the middle of a function”. The implementation of my qsort() does not need to know if there are fewer- the entry will be null and the program will crash.

zozbot234 · on Dec 15, 2019

This is a whole-program transformation though, it doesn't seem to be possible to make it modular. Unless you manage to map some module-specific dispatch table whenever you're running code from that module - in a way that cannot easily be subverted by exploit code! Not sure how to do this however.

yuubi · on Dec 16, 2019

Have a section of callback pointers and check against the bounds of the section? G++ uses this mechanism for static constructors, but it's general purpose (Linux uses it to collect lists of drivers to initialize, for instance, with macros like IRQCHIP_DECLARE that populate a section for each type of entity, which then gets scanned at boot time).

You would need a linker script to collect the callbacks into a section and provide a symbol for the end, and define variables something like

int (my_sort_callback_ptr)(void , void *) __attribute__((section, "sort_callbacks"))=my_sort_comparator;

You would, of course, use a macro for that.

geocar · on Dec 16, 2019

gcc (ld actually) makes __start_sort_callbacks and __end_sort_callbacks for you, so you shouldn't need a custom linker script.

Hello71 · on Dec 15, 2019

how does that fix ROP

zozbot234 · on Dec 15, 2019

Use CPS and get rid of RET altogether.

totorovirus · on Dec 16, 2019

There are so many mitigation techniques out there to protect against any execution takeover attacks.. and I wonder how exploiters survive through them.

I forecast it would get impossible to hack a personal computer in 10 years. Maybe there would be some vulnerability left on IOT devices

_8j50 · on Dec 16, 2019

I have to disagree with that. In my experience, attackers have been relying much less on exploits. A software exploit is one "initial access" tactic , there are those that use target exploits, zero-day exploits and most commonly exploit kits but setting those aside phishing is the number one technique, web driveby downloads are common too (e.g.: "flash update for your pc" ) in other words social engineering attacks, there are also logical flaws,misconfigurations, bad architecture (e.g.:rdp is exposed on the intranet and it so happens another compromised host can reach the computer).

Let's say you have really good security hygeine, apps and sites are whitelisted, no exploits are possible, things can't execute from removable drives,etc... What happens when someone you know sends you an email containing a link to a whitelisted service (say onedrive,dropbox,etc...) and that link downloads a zip file with a malicious jar,javascript,mshta,macro enabled document basically any thing that uses a whitelisted app to run some code? Let's say your email security is top notch, are you gonna ban peoplr from accessing their personal email? Let's say you do,what if a whitelisted site has XSS used to inject JS that tells the user "you need to download and install this font to view this site" (something I have seen) even if you whitelist everything there are bypass techniques,code signing certs get compromised, a new technique to use some existing known app to run code may exist,etc...

I think initial access will get a lot more difficult but not impossible. Up to the point someone can run code,it will be very difficult to lock down well,but there is a lot that is being done to harden systems and monitor events to catch when someone does something afterwards.

I personally think endpoint software and technology continues to get more and more complicated. I can see big companies being resilient to many types of attacks but consumers in general are too denseless.

Take something as simple as a usb worm, a company might make a calculated decision to block usb exexution but what laptop will ship with that turned off? A windows shortcut (lnk) running a whitelisted executor will continue to be abused for at least 5 more years but I dare not speculate as far as 10 years.

ngneer · on Dec 16, 2019

Wow. Incredible to see the new heights of complexity that the von Neumann architecture has led to. Corruption of data leading to corruption of control leading to control flow guarantees through the addition of cryptography. Wow.

EddieCPU · on Dec 15, 2019

Is it technically possible to design a MMU that prevents a process reading or writing to a region of memory that don't belong to it?

aseipp · on Dec 15, 2019

Yes, absolutely. There are a lot of possible approaches in this design space, but recent CPUs from ARM can do this using memory tagging at a much finer granularity than process-wide. I don't think any silicon is actually shipping with this feature yet, unfortunately.

pdkl95 · on Dec 16, 2019

yes: http://millcomputing.com/wiki/Protection

Regarding ROP, that entire class of attack isn't possible if return addresses are stored in hardware that isn't accessible to the program.

> "[Stacks on the Mill architecture] contain no control flow information. In particular, no return addresses. The control stacks are maintained entirely within the hardware, inaccessible from programs. They get saved and restored by the Spiller as needed. This makes several classes of common security exploits simply impossible."

kjs3 · on Dec 17, 2019

Yes; you're looking for 'capability based MMU'. The Intel i960MX/MC MMU had one, and development tools to support it (mostly in Ada). Unfortunately, Intel stripped/disabled the MMU in the vast majority of i960s it shipped and really only sold the MMU version in the military market.

moonbug · on Dec 15, 2019

that is exactly what an MMU is for.

saagarjha · on Dec 15, 2019

MMUs work on much larger regions than what is useful for many classes of memory safety issues. Luckily, ARMv8.5 adds support for memory tagging at a more granular level.

twic · on Dec 15, 2019

Aha, a chance to mention my favourite dead processor design, the 432, which had an MMU designed to work at much finer, object-sized, granularity:

https://en.wikipedia.org/wiki/Intel_iAPX_432#Object-oriented...

tyingq · on Dec 16, 2019

Here's a presentation from lowrisc.org on their plans for tagged memory on a RISCV chip: https://riscv.org/wp-content/uploads/2017/05/Wed0930riscv201...

gpderetta · on Dec 16, 2019

Let's resurrect segments!

bla3 · on Dec 15, 2019

Does anyone here know how this compares to hwasan?

aseipp · on Dec 15, 2019

HWASAN detects against spatial and temporal memory errors; i.e. you have a pointer to memory that gets freed, and use it again (temporal), or a stack overrun into nearby memory (spatial).

Pointer Authentication is used to sign pointers to give them a kind of provenance, but it's largely used to protect against code reuse/control flow attacks (ROP is much more difficult because you cannot re-use arbitrary gadgets in the executable; the stack pointer is part of the pointer signature, so screwing with it results in termination if the signature doesn't check out.)

They are both complimentary; you could use them both. HWASAN (using memory tagging) and pointer authentication use the unused upper bits of a virtual address to store their metadata. They are compatible, but this does mean combining them reduces the overall amount of bits available for pointer signatures.

There is a recent paper discussing the use of pointer authentication to build more advanced defenses; it looks like it's worth a read, and some comparisons (including HWASAN) are available in Section 8: https://www.usenix.org/system/files/sec19fall_liljestrand_pr...

saagarjha · on Dec 15, 2019

Hardware Address Sanitizer is intended to protect against memory corruption in general, while pointer authentication helps ensure code flow integrity. (And I think it's mutually incompatible with the implementation that iOS uses because they both use TBI).

rjmccall · on Dec 16, 2019

They can be compatible. Memory tagging does use the TBI bits. Pointer authentication uses an arbitrary number of bits, and the kernel configures the width and whether the TBI bits are preserved. So you can use both, it just costs you 8 bits of signature.

Moreover, this can be configured independently for code and data pointers. iOS turns off TBI on code pointers to get 8 more bits of signature. That's not a problem for memory tagging because memory tagging isn't particularly useful for code pointers anyway.

saagarjha · on Dec 16, 2019

> Moreover, this can be configured independently for code and data pointers. iOS turns off TBI on code pointers to get 8 more bits of signature.

Ooh, this is cool. Does iOS currently use different signature sizes? Can I write an application that uses the top bits of data pointers?

rjmccall · on Dec 17, 2019

> Does iOS currently use different signature sizes?

Code and data live in the same address space, and the address-space needs of the system are the main input to the basic signature width, so the basic signatures widths are currently the same, and the only difference is TBI.

You could imagine a system where code was always loaded into a restricted subset of the address space and so code pointers could use wider signatures.

> Can I write an application that uses the top bits of data pointers?

Apple's ABIs actually consider the top 8 bits of data pointers to be outside the addressable range on all its 64-bit targets, including x86_64. ARM64 TBI just means that you don't need to explicitly mask off those bits before doing loads and stores. But there are caveats:

- ARMv8.5 memory tagging uses bits 56-59, so you should probably stick to just the top four bits in case Apple ever uses memory tagging.

- IIRC the first ARM64 iOS release didn't enable TBI, so if your deployment target goes really far back, you do still need to mask.

- The ABI for pointers expects those bits to be clear on normal ABI boundaries. This means you need to mask before handing pointers off to other code; on the upside, however, you don't need to worry about those bits being set when you receive a pointer.