Let's see if I've got Metldown right

I thought I’d write down the proof-of-concept to see if I got it right.

So the Meltdown paper lists the following steps:

 ; flush cache
 ; rcx = kernel address
 ; rbx = probe array
 retry:
 mov al, byte [rcx]
 shl rax, 0xc
 jz retry
 mov rbx, qword [rbx + rax]
 ; measure which of 256 cachelines were accessed

So the first step is to flush the cache, so that none of the 256 possible cache lines in our “probe array” are in the cache. There are many ways this can be done.

Now pick a byte of secret kernel memory to read. Presumably, we’ll just read all of memory, one byte at a time. The address of this byte is in rcx.

Now execute the instruction:
    mov al, byte [rcx]
This line of code will crash (raise an exception). That’s because [rcx] points to secret kernel memory which we don’t have permission to read. The value of the real al (the low-order byte of rax) will never actually change.

But fear not! Intel is massively out-of-order. That means before the exception happens, it will provisionally and partially execute the following instructions. While Intel has only 16 visible registers, it actually has 100 real registers. It’ll stick the result in a pseudo-rax register. Only at the end of the long execution change, if nothing bad happen, will pseudo-rax register become the visible rax register.

But in the meantime, we can continue (with speculative execution) operate on pseudo-rax. Right now it contains a byte, so we need to make it bigger so that instead of referencing which byte it can now reference which cache-line. (This instruction multiplies by 4096 instead of just 64, to prevent the prefetcher from loading multiple adjacent cache-lines).
 shl rax, 0xc


Now we use pseudo-rax to provisionally load the indicated bytes.
 mov rbx, qword [rbx + rax]

Since we already crashed up top on the first instruction, these results will never be committed to rax and rbx. However, the cache will change. Intel will have provisionally loaded that cache-line into memory.

At this point, it’s simply a matter of stepping through all 256 cache-lines in order to find the one that’s fast (already in the cache) where all the others are slow.


Article Link: http://blog.erratasec.com/2018/01/lets-see-if-ive-got-metldown-right.html