# MTE As Tested

Mark Brand

POC 2023



#### About Me



- Security / vulnerability research for ~12 years
  - Researcher at Project Zero for ~8 years

#### About Me



- Security / vulnerability research for ~12 years
- Researcher at Project Zero for ~8 years

#### About MTE



- On Pixel 8 you can configure sync MTE as the default for most\* apps today!

#### Enabling MTE

#### markbrand@markbrand\$ adb shell shiba:/ \$ setprop arm64.memtag.bootctl memtag shiba:/ \$ setprop persist.arm64.memtag.default sync shiba:/ \$ setprop persist.arm64.memtag.app\_default sync shiba:/ \$ reboot

## Background

### 0x4141414141414141

### 0x41414141414 **141**

Page Offset











#### About MTE



#### MTE and Me: Kernel Heap Protector



#### MTE and Me: Speculative Execution



## MTE as a security "boundary"







### Known-tag attacks vs. Unknown-tag attacks

### Known-tag attacks (SYNC + ASYNC)

```
case '1': // "Alloc"
  idx = ipc_read(in_pipe);
  if (idx < 0) {
    break;
  }
  if (instances[idx]) {
    instances[idx]->vtable->destructor(instances[idx]);
    free(instances[idx]);
```

```
data = ipc_read_string(in_pipe);
```

```
instances[idx] = malloc(sizeof(struct Class));
fprintf(stderr, "instances[%i] = %p (%p)\n", idx, instances[idx], data);
Class_constructor(instances[idx], data);
```

break; Google Project Zero

```
case '2': // "Free"
  idx = ipc_read(in_pipe);
  if (idx < 0 || !classes[idx]) {</pre>
    break;
  }
  if (instances[idx]) {
    instances[idx]->vtable->destructor(instances[idx]);
    free(instances[idx]);
  }
  // Bug: we don't set class to NULL, so we're left with a dangling
```

// pointer.

break;

```
case '3': // "Replace"
if (replacement) {
   replacement->vtable->destructor(replacement_instance);
   free(replacement);
}
```

```
data = ipc_read_string(in_pipe);
```

```
replacement = malloc(sizeof(struct ReplacementClass));
fprintf(stderr, "replacement = %p\n", replacement);
ReplacementClass_constructor(replacement, data);
data = NULL;
```

break;

```
case '4': // "Write"
idx = ipc_read(in_pipe);
if (idx < 0 || !instances[idx]) {
    break;
}
if (ipc_write_ready(out_pipe)) {
    instances[idx]->vtable->write(instances[idx], out_pipe);
}
```

break;







#### Exploit Flow: Use (type-confusion) stack heap .rodata instances[0] Vtable **ReplacementClass** destructor vtable method replacement data placementVtable structør

#### Demo: No tagging

shiba:/ \$ cd /data/local/tmp shiba:/data/local/tmp \$ 🗌





#### **Exploit Flow: Replace** stack heap .rodata instances[0] ReplacementClass vtable **ReplacementVtable** replacement data destructor method instances[1] Class vtable Vtable data destructor method



```
case '4': // "Write"
 idx = ipc_read(in_pipe);
 if (idx < 0 || !classes[idx]) {</pre>
   break;
  }
   if (ipc_write_ready(out_pipe)) {
     classes[idx]->vtable->write(classes[idx], out_pipe);
 break;
```





```
case '4': // "Write"
                                                  CPU now
 idx = ipc_read(in_pipe);
                                                  expects this
 if (idx < 0 || !classes[idx]) {</pre>
                                                  branch will
   break;
                                                  always be
  }
                                                  taken
   if (ipc_write_ready(out_pipe)) ^
     classes[idx]->vtable->write(classes[idx], out_pipe);
```

break;

```
case '4': // "Write"
                                                   CPU now
 idx = ipc_read(in_pipe);
                                                   expects this
 if (idx < 0 || !classes[idx]) {</pre>
                                                   branch will
   break;
                                                   always be
  }
                                                   taken
   if (ipc_write_ready(out_pipe)) {
     classes[idx]->vtable->write(classes[idx], out_pipe);
 break;
                                       So, if the condition is
                                       false, but evaluating it
                                       is slow, we'll load
                                       vtable speculatively.
```



# Exploit Flow: Speculative Use [tag mismatch]



### **Exploit Flow: Reload**



# Exploit Flow: Speculative use [tag match]



### **Exploit Flow: Reload**



#### Demo: Software tagging

shiba:/ \$ cd /data/local/tmp
shiba:/data/local/tmp \$

#### Demo: Hardware tagging (MTE)

shiba:/ \$ cd /data/local/tmp
shiba:/data/local/tmp \$ [

#### Speculation window length

- With speculative side-channels, we're (typically) using branch misprediction to speculatively execute some instructions that do not execute architecturally.
- The number of instructions executed depends on how long it takes for the misprediction to resolve.
- However, does CPU continue to execute if the instructions are nonsense?
- Can tag-check failure during speculation influence the length of speculation after a failed tag-check?

ldr x0, [x0] ; this load is slow (\*x0 is uncached)
cbnz x0, speculation: ; this branch is always taken during warmup
ret

#### speculation:

1, [x1] ; this load is fast (\*x1 is cached) ; the tag-check success or fail will happen on ; this access, but during warmup the tag-check ; will always be a success.

orr x2, x2, x1 ... n times ... orr x2, x2, x1

ldr x2, [x2]

ret

Google Project Zero

; this is a no-op (as x1 is always 0) but it ; maintains a data dependency between the ; loads (and the no-ops), hopefully preventing ; too much re-ordering.

; \*x2 is uncached, if it is cached later then
; this instruction was (probably) executed.

ldr x0, [x0] ; this load is slow (\*x0 is uncached)
cbnz x0, speculation: ; this branch is always taken during warmup
ret

#### speculation:

ldr x1, [x1]

; this load is fast (\*x1 is cached) ; the tag-check success or fail will happen on ; this access, but during warmup the tag-check ; will always be a success.

orr x2, x2, x1 ... n times ... orr x2, x2, x1

ldr x2, [x2]

ret

Google Project Zero

; this is a no-op (as x1 is always 0) but it ; maintains a data dependency between the ; loads (and the no-ops), hopefully preventing ; too much re-ordering.

; \*x2 is uncached, if it is cached later then
; this instruction was (probably) executed.

```
ldr x0, [x0] ; this load is slow (*x0 is uncached)
cbnz x0, speculation: ; this branch is always taken during warmup
ret
```

#### speculation:

ldr x1, [x1] ; this load is fast (\*x1 is cached)
; the tag-check success or fail will happen on
; this access, but during warmup the tag-check
; will always be a success.

orr x2, x2, x1 ... n times ... orr x2, x2, x1

ldr x2, [x2]

; this is a no-op (as x1 is always 0) but it ; maintains a data dependency between the ; loads (and the no-ops), hopefully preventing ; too much re-ordering.

; \*x2 is uncached, if it is cached later then ; this instruction was (probably) executed.

```
ldr x0, [x0] ; this load is slow (*x0 is uncached)
cbnz x0, speculation: ; this branch is always taken during warmup
ret
```

```
speculation:
```

| ldr | x1, [x1] | this load is fast (*x1 is cached)              |  |  |
|-----|----------|------------------------------------------------|--|--|
|     |          | ; the tag-check success or fail will happen on |  |  |
|     |          | ; this access, but during warmup the tag-check |  |  |
|     |          | ; will always be a success.                    |  |  |

```
orr x2, x2, x1
... n times ...
orr x2, x2, x1
```

ldr x2, [x2]

- ; this is a no-op (as x1 is always 0) but it ; maintains a data dependency between the ; loads (and the no-ops), hopefully preventing ; too much re-ordering.
- ; \*x2 is uncached, if it is cached later then ; this instruction was (probably) executed.

#### Cortex-A510 (Pixel 8 small cores)





Google Project Zero



Google Project Zero

#### Cortex-A715 (Pixel 8 middle cores)



Google Project Zero

Cortex-X3 (Pixel 8 large core)



Google Project Zero

#### The limits of measurement



Google Project Zero

#### The limits of measurement



Google Project Zero

#### Is a shared memory timer better?



#### Is a shared memory timer better?



#### Is a shared memory timer better?





#### Realisation

- Measurements on the fastest core are noisy enough that the signal/noise ratio is poor.
- Repeated experiments consistently show (inconsistent) results that **appear** to differentiate tag-check pass and failure.
- Small-scale experiments and eyeballing graphs is not enough to be confident about the behaviour of the fastest core.

- Proving a negative...

# Unknown-tag attacks (Mostly ASYNC)

#### Revisiting signal-safety...

// This function runs in a compromised context: see the top of the
file.

// Runs on the crashing thread.

// static

void ExceptionHandler::SignalHandler(int sig, siginfo\_t\* info, void\* uc) {

// Give the first chance handler a chance to recover from this
signal

```
if (g_first_chance_handler_ != nullptr &&
    g_first_chance_handler_(sig, info, uc)) {
    return;
}
```

#### Demo: Initial exploit

shiba:/ \$ cd /data/local/tmp shiba:/data/local/tmp \$ 🗌

#### Classic exploit code

# // XXX: This is a load-bearing string concatenation. var do\_not\_remove = "A".concat("B");

#### Demo: Initial exploit reliability

markbrand@markbrand:~/poc2023\$

#### Demo: Optimized exploit reliability

markbrand@markbrand:~/poc2023\$ python3 ./reliability.py ./poc\_demo\_6.js

| Context                           | Mode  | Bypass techniques                 |                                           |
|-----------------------------------|-------|-----------------------------------|-------------------------------------------|
| Context                           |       | known-tag-bypass                  | unknown-tag-bypass                        |
| Chrome: Renderer Exploit          | async | Trivial 🛟                         | Likely trivial 🛟                          |
|                                   | sync  | Trivial 🛟                         | Bypass techniques should be rare 🛠        |
| Chrome: IPC Sandbox Escape        | async | Likely possible in many cases     | Likely possible in many cases             |
|                                   | sync  | Likely possible in many cases     | Bypass techniques should be rare 💸        |
| Android: Binder Sandbox<br>Escape | async | Difficulty will depend on service | Difficulty will depend on service         |
|                                   | sync  | Difficulty will depend on service | Bypass techniques should be rare 🛠        |
| Android: Messaging App<br>Oneshot | async | Likely impossible in most cases   | Good enough bugs will be very<br>rare 🐛 * |
|                                   | sync  | Likely impossible in most cases   | Bypass techniques should be rare 💸        |

# Questions?