



# Transient Execution Attacks explained to your Grandma

by pietroborrello

inspired by: [A Systematic Evaluation of Transient Execution Attacks and Defenses]







# 1. How do Modern Processors work?

# 2. Let's dive into micro-architectural attacks!































































add qword ptr [rax], rbx



cmp rdx, qword ptr [rax] jne 0xdeadbeef







add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef









add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef

















add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef







REORDER BUFFER

µops

µops

µops



add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef















add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef maybe / oh shit it yes ops ops











add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef









add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef

> oh shit RAX was pointing to kernel memory!!









add qword ptr [rax], rbx cmp rdx, qword ptr [rax] jne 0xdeadbeef

General Protection Error!

oh shit RAX was pointing to kernel memory!!



REORDER









- What does it mean to roll back (undo) an operation for a CPU?
- You cannot undo a Logical operation (it was an Electrical signal!)
- But you can hide what you did
- $\Rightarrow$  Behave as nothing happened
  - Do not save the operation into the architectural state



# What is the architectural state?



- General Purpose Registers (RAX, RSP, ...)
- Control Registers (RFLAGS, GDTR, IDTR, CR0, CR1, ...)
- Model Specific Registers
- Floating Point Registers
- Memory
- ...

But, this doesn't include:

- All Instruction and Data Caches (L1, L2, ...), TLB, ...
- Branch Predictors
- And all the microarchitecture that we just saw...







- So we are using data or executing code we shouldn't and we are exposing it into the microarchitecture!
- But we cannot access directly the microarchitecture







- So we are using data or executing code we shouldn't and we are exposing it into the microarchitecture!
- But we cannot access directly the microarchitecture
- Directly...
  - 1. Read kernel dword into X
  - 2. if(X == 0xdeadbeef)

flush\_entire\_cache

Executed only transiently

When resuming from SIGSEGV, is the cache flushed?







• Two ways to induce a roll back of a transient execution:









• Exceptions are enforced lazily

⇒ There is a small window where we can use the result of faulty instructions, and access data that should be architecturally inaccessible (e.g. kernel memory!)









• Exceptions are enforced lazily

⇒ There is a small window where we can use the result of faulty instructions, and access data that should be architecturally inaccessible (e.g. kernel memory!)

- What to do with the results of faulty instructions? How can we read them?
- $\Rightarrow$  Use a micro-architectural covert channel!







- Use cache as covert channel: HIT: fast MISS: slow
- 1. char array[256]
- 2. flush all array cache lines
- 3. read secret byte into X
- 4. tmp = array[X]





- Use cache as covert channel: HIT: fast MISS: slow
- 1. char array[256]
- 2. flush all array cache lines
- 3. read secret byte into X
- 4. tmp = array[X]

1. for(i = 0; i < 256; i++)

measureTime(array[i])

 The index with fastest access corresponds to X





- Use cache as covert channel: HIT: fast MISS: slow
- 1. char array[256 \* 4096]
- 2. flush all array cache lines
- 3. read secret byte into X
- 4. tmp = array[X \* 4096]

1. for(i = 0; i < 256; i++)

measureTime(array[i\*4096])

 The index with fastest access corresponds to X







- Different types of faults can be involved, depending on what I shouldn't read:
  - Kernel Memory
  - Secure Enclave Memory
  - Privileged System Registers
  - FPU Registers of other Processes
  - Unreadable pages, bypassing Protection Keys
  - Out-of-Bound access driven by exceptions (more with Spectre)





- Reading Kernel Memory rises a General Protection Fault
- But we can access the value during transient execution!
- char array[256 \* 4096] 1.
- 2. flush all array cache lines
- 3. read kernel byte into X
- 4. tmp = array[X \* 4096]

1. handle SIGSEGV

2. for(i = 0; i < 256; i++)

measureTime(array[i\*4096])

- 3. The index with fastest access corresponds to X
- Dump entire kernel memory byte by byte





- Trusted execution environment, with integrity and confidentiality guarantees
- Isolated and HW encrypted compartment, even secret for the kernel
- The memory is silently replaced with 0xFF when try to read  $\Rightarrow$  No fault!





- Trusted execution environment, with integrity and confidentiality guarantees
- Isolated and HW encrypted compartment, even secret for the kernel
- The memory is silently replaced with 0xFF when try to read  $\Rightarrow$  No fault!
- 1. Execute the enclave to bring unencrypted data to L1 cache
- 2. Manually revoke access permission to enclave memory
- 3. Now when trying to access enclave memory we have a Page Fault! Before 0xFF substitution takes place

 $\Rightarrow$  Then same attack!

(And can also be extended to break VM isolation)







- Privileged system registers can be read and written by the kernel
- They contain private kernel informations (i.e. IA32\_LSTAR MSR contains fast syscall handler address)
- Accessing them from users space issues a General Protection Fault
  - 1. char array[256 \* 4096]
  - 2. flush all array cache lines
  - 3. rdmsr byte into X
  - 4. tmp = array[X \* 4096]







- Privileged system registers can be read and written by the kernel
- They contain private kernel informations (i.e. IA32\_LSTAR MSR contains fast syscall handler address)
- Accessing them from users space issues a General Protection Fault
  - 1. char array[256 \* 4096]
  - 2. flush all array cache lines
  - 3. rdmsr byte into X  $\Rightarrow$  Now you have broken KASLR!
  - 4. tmp = array[X \* 4096]







- At context switches the kernel saves all the registers of the current process
- Floating Point Unit and SIMD registers are huge!
   So kernel doesn't save them, but marks them as NOT AVAILABLE
- If FPU or SIMD is used by next process, a NOT AVAILABLE exception is raised, and the kernel can save them, before next process can access them







- At context switches the kernel saves all the registers of the current process
- Floating Point Unit and SIMD registers are huge!
   So kernel doesn't save them, but marks them as NOT AVAILABLE
- If FPU or SIMD is used by next process, a NOT AVAILABLE exception is raised, and the kernel can save them, before next process can access them
  - 1. char array[256 \* 4096]
  - 2. flush all array cache lines
  - 3. read SIMD byte into X
  - 4. tmp = array[X \* 4096]

 $\Rightarrow$  EXCEPTION??



 $\Rightarrow$  EXCEPTION??





- At context switches the kernel saves all the registers of the current process
- Floating Point Unit and SIMD registers are huge!
   So kernel doesn't save them, but marks them as NOT AVAILABLE
- If FPU or SIMD is used by next process, a NOT AVAILABLE exception is raised, and the kernel can save them, before next process can access them
  - 1. char array[256 \* 4096]
  - 2. flush all array cache lines
  - 3. read SIMD byte into X
  - 4. tmp = array[X \* 4096]

Can leak SIMD cryptographic computations!





- With the same approach we can bypass memory protection (i.e. Execute Only) even if enforced with Protection Keys
- Additionally can perform out of bound speculative reads, if enforced with bound instruction





• The CPU executes predicted instructions transiently

 $\Rightarrow$  There is a small window of instructions that shouldn't be executed, due to misprediction

- If we manage to control the mispredictions, we may be able to induce a program execute (transiently) arbitrary code
- $\Rightarrow$  Predictors are shared between processes!







| Pattern History Table    | Branch Target Buffer               |
|--------------------------|------------------------------------|
| jne 0xdeadbeef           | call [rax]                         |
| Will it take the branch? | Where will it jump?                |
|                          |                                    |
| Return Stack Buffer      | Store to Load Forwarding           |
|                          |                                    |
| ret                      | mov [rax+1], 1<br>mov rdx, [rcx-1] |





- if (x < len(array1)) {</pre>
  - y = array2[array1[x] \* 4096]; }
  - This is a dangerous loop to mispredict!
  - If the loop is taken long enough, the Pattern History Table will predict it will be taken not depending on the value of x
     ⇒ bypass the if check transiently

 $\Rightarrow$  read array1[x] with arbitrary x, and then array2 will act as covert channel!

• We have an arbitrary out of bound read in the context of a process (i.e. Javascript sandboxed program executed on your machine!)





- if (x < len(array1)) {</pre>
  - y = array2[array1[x] \* 4096]; }
  - The predictor can be mistrained from the same process, making it repeatedly executing on safe inputs, and then attack
  - But also from another process with an equivalent loop on the same address, since predictors are indexed by virtual addresses





Attacker context

| 0x1000: | <pre>*rdx = 0xdeadbeef</pre> |
|---------|------------------------------|
| 0x1001: | while(true)                  |
| 0x1002: | call [rdx]                   |

Victim context
0x1002: call [rdx]
 spectre gadget
0xdeadbeef: A = rdi[\*rsi];

Attacker also controls rdi and rsi in the victim context

 Use rsi to read victim memory, and rdi as an oracle buffer for covert channel





Attacker context

0xdeadbeee: rdx = 0xdeadbeef 0xdeadbeef: call rdx

```
Victim context
0x1002: ret
    spectre gadget
0xdeadbeef: A = rdi[*rsi];
```

• Attacker also controls rdi and rsi in the victim context

 Use rsi to read victim memory, and rdi as an oracle buffer for covert channel





Victim context mov byte [rax], 0xff movzx r8, byte [rcx] mov rcx,[rdx + r8\*4096]

- The victim may inadvertently, leak the value that was in memory at [rax]
- Difficult to exploit









## Questions?

