This is part of my series of writeups on the Shabak 2021 CTF challenges. See the complete collection here.
Introduction
The challenge description reads:
This binary is not an elf. So what is it?
Load the .ko file and find out...
Use the image:
http://uec-images.ubuntu.com/releases/focal/release-20201210/
Good luck!
Unlike Windows, the Linux kernel API — the API inside the kernel, not the syscall API — can, and does, change between releases. So if we're going to be reversing kernel modules, we best have the correct kernel sources close to hand.
The supplied image is of Ubuntu 20.04, which according to this has kernel version 5.4. Instead of downloading the complete source tree, I found this nice website that can search through the kernel sources.
So, let's get started.
Part I: A kernel of truth
We take loader.ko
, throw it into our favorite decompiler, and see what's
what:
The stateless_rc4
function hints that perhaps the other binary is encrypted with
RC4, and the build_key
function probably constructs the decryption key.
There's also load_magen_binary
that looks like the "main" function here.
However, it accepts some unknown parameter which likely describes the binary being
loaded. So let's table it for now, and instead take a look at init_module
.
This function does only one thing: calls __register_binfmt
with the address
of the global variable magen_fmt
.
This rings a bell: binfmt_misc is the Linux kernel feature that allows us to define custom file formats as executable. Looks like in this case we're dealing with the lower-level variant of the same feature. A quick search yields the structure being passed at registration:
|
|
And, sure enough, the load_binary
field points to our old friend load_magen_binary
.
We also learn that the function accepts a pointer to linux_binprm
as its only parameter. This structure is rather large, but it's worth our time to get it
into Ghidra, since we don't know yet what fields load_magen_binary
uses.
Get a load of this
We can't put this off any longer. Time to see how the binary is loaded.
The load_magen_binary
function begins with what looks like parameter validation.
The interesting bit is here:
|
|
This is the output from Ghidra, I just cleaned it up a little. So, what's going on here?
In line 1, the code reads a DWORD at offset 5 in the buf
field
of the linux_binprm
structure, and allocates a buffer with this size.
It's a fair guess that this field contains part of the binary file being loaded.
Indeed, let's take a look at our binary:
|
|
The first 5 bytes are the magic string MAGEN
1, and so at offset 5 we have the
DWORD 0x00000619 == 1561
. In fact, this is the exact size of the binary! Great, we're
making progress.
In line 3 the code allocates a buffer of size 0x12 == 18
.
Then, in lines 5-8, there is a call to build the decryption key (presumably). What's being passed here?
- The first argument is a pointer to the input buffer, at offset 9. Looking at the file, we can see that this is the beginning of a section with a bunch of strings.
- The second argument is the 18-byte buffer we just allocated, so presumably this will receive the decryption key.
- The third argument is the size of the file minus 9, so really it's just the size without the magic and size fields.
- The fourth argument points to some local variable,
local_44
.
Can we deduce what that last local variable is? Later on, it's used in some calculations
in lines 10-11, and then the results are passed as arguments
to kernel_read
. Here's kernel_read
's signature:
|
|
The fourth parameter is the offset within the file to read from, and we're passing it:
|
|
So local_44
is an offset within the remainder of the file, the remainder being
what comes after the magic and size fields.
The third parameter is the number of bytes to read from the file, and we're passing it:
|
|
Which is the size of the block of bytes starting from local_44 + 9
until the end of
the file.
Later on, in line 17, the data read from the file is passed
to the decryption function, so presumably local_44 + 9
is the offset of the code
to be executed within the file.
Armed with these observations, we can now clean up the decompiled code:
|
|
Master builder
Time to see how the decryption key is constructed.
Ghidra does a fairly decent job of decompiling the build_key
function,
so understanding what it does is mostly a matter of following the code.
And since the only other internal function it calls is update_path
, there is almost no
guessing about the meaning of parameters. There are, however, several gotchas:
- At the very beginning of the function there is a call to
__fentry__
. AFAICT this is part of some kernel tracing mechanism, and usually this call doesn't do anything. Ghidra, however, trips up on it because it thinks it spoils some registers. The workaround I used is toNOP
-out the call instruction in Ghidra: right-click on the instruction in the disassembler and choose "Patch Instruction". - The big chunk of code in the middle of the function (lines 45-48)
is an optimized and/or obfuscated
memcpy
, so don't get stuck reversing it 😐.
Here is the cleaned-up version of the code:
|
|
Recall that the "magen" binary contains a fairly long string of filenames, where
each filename begins with an R
, and the whole list is terminated with a Z
:
|
|
What build_key
does is pass each filename to the update_path
function
along with a file offset (but technically we don't know it's a file offset yet 😉).
For the first file, the offset is 0x5117
(see line 29), and it is
incremented by 0x11
for each subsequent file (line 60).
Cool, so what does update_path
do?
This just in
After changing the function signature in Ghidra according to what we learned from
build_key
, we get a pretty coherent output. Here it is, after some cleanup2:
|
|
So we're reading 18 bytes (which is precisely the key size) from the file and XORing the existing key with these bytes. Cool.
Cherry-pick
We now know all we need in order to decrypt the binary. Here's how we do it:
- Get all the files needed to construct the key from the image.
- Extract 18 bytes from each file. Start from offset
0x5117
in the first file, and increment the offset by0x11
for each subsequent file. - XOR all these arrays to produce an 18-byte key.
- Decrypt the remainder of the binary with this key, and the RC4 algorithm3.
After decryption we get a binary with some strings in it, such as
"Good work my friend, go submit the flag"
, so it looks like we got it right 🥳.
Part II: The Valley of Fear
We take our decrypted binary, throw it into Ghidra, and set the language to
x86:LE:64:default:gcc
. Initially, it's a whole lot of nothing:
Assuming that execution begins at the start of the shellcode, we hit the D
key
and lo and behold: it's a jump deeper down. Here's what we get:
|
|
Not too bad, all things considered. No complicated control flow here, just plain old "call API, exit if it fails". Let's see where this leads us.
Where we're going we don't need libc
Looking at the first function, it's a wrapper around syscall number 0x65
:
Now, I have no idea what syscall 0x65
is, but this page does. Apparently,
this is ptrace
. Combined with the appropriate man
page we can set the correct
signature for this wrapper function. In fact, most of the functions here are syscall
wrappers, and we can make quick work of them.
Here's what the code looks like now4:
|
|
The ptrace
call is there to prevent attaching to the process with a debugger.
Then, the code allocates two buffers, addr
and buf
. buf
is later used to store
user input (recall that fd 0 is stdin
), and it appears that its contents are compared
to those of addr
, inside FUN_00000200
. A quick look at this function confirms
that it performs string comparison.
Onwards, to find the flag!
A skip and a hop
So, what's inside FUN_00000285
? Let's take a look:
|
|
The function opens /lib/x86_64-linux-gnu/libc.so.6
and reads 0x18
bytes from it,
all from different locations. The offsets are calculated on line 18,
but what's going on there?
The strange 0x1a0
address is actually within the binary, Ghidra just failed to
recognize it as such. In fact, the assembly uses RIP
-relative addressing,
so there's no explicit reference to 0x1a0
anywhere, it's just a by-product
of us loading the binary at address 0 in Ghidra.
So, the offsets to read from are stored in an array:
All we have to do now is get /lib/x86_64-linux-gnu/libc.so.6
from the image
and read the bytes at the specified offsets.
FIN
-
Fun fact: AFAICT, this magic is not validated anywhere in
loader.ko
. ↩︎ -
I removed a call to
__fentry__
and some stack cookie checks. ↩︎ -
If, like me, you're using Python, the PyCryptodome library implements RC4 as ARC4. ↩︎
-
The only difference between this and the Ghidra output is the inlining of string references. Originally, they all appear as
s_mmap_0000039c
and the like, or as raw integers when Ghidra fails to recognize them as addresses in the binary. ↩︎