-->
In late July, I played GreyCTF 2024 Finals with slight_smile and got 1st place by a really small margin of 29 points!
The challenge archive can be found at https://github.com/NUSGreyhats/greyctf24-challs-public/tree/main/finals.
Everyone rekt the Blob Runner so hard in Greyhats Welcome CTF 2023, I made sure to make it extra secure this time ;)
Author: Elma
Solves: 4
The challenge reads shellcode into an mmaped region at 0x1337000
, and then removes write perms on this region. Before
jumping to the shellcode, it runs the shellcode through a blacklist:
which is meant to block syscall
and int 0x80
.
Actually, the comparison for int 0x80
doesn’t work since char
is signed but the comparison is unsigned. I tried this
for a while but soon realized that the following seccomp not only specifies syscall type, but also architecture (x64
):
So it blocks int 0x80
as well :(
The organizers also took a pic of me thinking I was very smart and showing the “unintended” to Elma :(
Anyway, after the seccomp, it then jumps to the shellcode after executing the following:
All registers are set to 0, presumably to prevent libc or pie leaks. But if we run it in gdb, we actually see the following:
pwndbg> i r
rax 0x13370000 322371584
rbx 0x0 0
rcx 0x0 0
rdx 0x0 0
rsi 0x0 0
rdi 0x0 0
rbp 0x0 0x0
rsp 0x0 0x0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0x0 0
r13 0x0 0
r14 0x0 0
r15 0x0 0
rip 0x13370000 0x13370000
eflags 0x10246 [ PF ZF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
fs_base 0x7ffff7d8d740 140737351571264
gs_base 0x0 0
In fact, the fs_base
register hasn’t been cleared out and contains a libc address. This points to the fs
struct
which stores certain important values like canary or the key for PTR_MANGLE
.
To get this value in our shellcode, we can simply mov $register, fs_base
and manipulate it from there.
With a libc leak, we can then call syscall
gadgets. The rest is trivial.
Writing challenges are too hard… so I wrote a program in 2 lines of code for you to pwn :/ I can even tell you that there is an obvious buffer overflow there, should be simple right?
Author: Elma
Solves: 2
The challenge source is literally 2 lines long:
And the checksec is as follows:
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x400000)
Given that there’s no pie and we don’t have libc leak, we can usually do 2 things:
However, we have no syscall gadget here, so SROP is out of the question. Partial RELRO is also a good sign as we can fake resolve libc functions (I don’t think ret2dlresolve is possible on full RELRO?).
In ret2dlresolve, we need to fake 3 structs: STRTAB
, SYMTAB
, and JMPREL
. These 3 structs are placed at an offset
from corresponding sections, and they are identified through indexes from the start of the section. The addresses of
these sections can be found using readelf
:
readelf -d challenge
Dynamic section at offset 0x2e20 contains 24 entries:
Tag Type Name/Value
...
0x0000000000000005 (STRTAB) 0x400450
0x0000000000000006 (SYMTAB) 0x4003d8
...
0x0000000000000017 (JMPREL) 0x400528
...
For example, in the actual resolving of fgets
, this is the helper function that is placed in GOT and called:
pwndbg> tel 0x404010
00:0000│ 0x404010 (_GLOBAL_OFFSET_TABLE_+16) —▸ 0x7ffff7fd9660 (_dl_runtime_resolve_xsavec) ◂— endbr64
So, once we’ve faked all the structs, we need to jump to 0x401020
with the index of our fake JMPREL
struct at top of
stack, which will trigger the loading (and call) of our desired function.
It contains a table of strings of function names to be resolved. This is used later in SYMTAB
.
str_idx
is the distance of fake string from STRTAB
start.
On x64, this is the struct for SYMTAB
:
st_name: str_idx
st_info: 0x12
st_other: 0x0
st_shndx: 0x0
st_value: 0x0
st_size: 0x0
sym_idx
, which is the index of our fake struct from SYMTAB
start, is calculated by dividing the distance by
sizeof(Elf64_Sym)
, which is 0x18.
r_offset: got address to write the resolved function address to
r_info: (sym_idx << 32) | 0x7
Before calling the resolver, we need to put the index of our fake JMPREL
struct at top of stack. The index is also
calculated by dividing the distance by sizeof(Elf64_Rel)
which is 0x18. Afterwards, we can jump straight to 0x401020
and trigger the resolver.
Now that we know how to fake our structs, where do we write them to? As of the first write, fgets
is writing to the
stack, which we don’t know the location of.
Disassembly of main:
However, notice that the buffer that fgets
writes to is relative to rbp
. We can control rbp
when leave
is
executed, as it will pop top of stack into rbp
, which we can overwrite through our buffer overflow. Therefore, we can
carefully pivot the stack to a static region in memory, return to the mov rdx, [rip+0x1ee7]
instruction (which sets
stdin
- if not fgets
won’t read from stdin
and will error out), and continue our exploit from there.
Afterwards, it’s just a matter of faking the structs, and then choosing what functions to resolve.
In my exploit, my approach was slightly more complicated. When we resolve system
, we want rdi
to point to a string
containing sh\x00
. However, after fgets
is called, these are the values in the registers:
*RDI 0x7ffff7f99720 (_IO_stdfile_0_lock) ◂— 0
*RSI 0xa666473
*RDX 0xfbad2288
*RAX 0x7fffffffd99f ◂— 0x7f000a66647361 /* 'asdf\n' */
Firstly, rdi
has been set to the address of _IO_stdfile_0_lock
, which is writable memory. However, this memory
currently doesn’t contain the sh
string we want. Since it’s not pointing to my initial buffer but to this address, I
can either try to set rdi
back to my buffer, or write sh\x00
to this address.
Secondly, rdx
is set to the flags in the stdin
FILE, and not the address of stdin
. This means that jumping
directly to fgets
without first putting the proper stdin
address in rdx
will cause the program to crash.
Thirdly, rax
actually contains the address of our buffer. So maybe we could mov rdi, rax
and hence get our buffer?
Unfortunately, we don’t have a gadget that can let us do that.
$ ROPgadget --binary challenge | grep di
0x00000000004010aa : add dil, dil ; loopne 0x401115 ; nop ; ret
0x00000000004010a5 : je 0x4010b0 ; mov edi, 0x404030 ; jmp rax
0x00000000004010e7 : je 0x4010f0 ; mov edi, 0x404030 ; jmp rax
0x00000000004010a7 : mov edi, 0x404030 ; jmp rax
0x00000000004010a6 : or dword ptr [rdi + 0x404030], edi ; jmp rax
0x0000000000401118 : sbb ebp, dword ptr [rdi] ; add byte ptr [rax], al ; add dword ptr [rbp - 0x3d], ebx ; nop ; ret
0x00000000004010a3 : test eax, eax ; je 0x4010b0 ; mov edi, 0x404030 ; jmp rax
0x00000000004010e5 : test eax, eax ; je 0x4010f0 ; mov edi, 0x404030 ; jmp rax
So trying to mov rdi, rax
is out of the question. But we can still call fgets
again to write to
_IO_stdfile_0_lock
, right?
In the disassembly of main, we can see that rdx
is set before rdi
is. Thus, it’s impossible to set rdx
to the
correct stdin
address without also modifying the value in rax
(and hence rdi
). Note: we can’t set rbp
to the
address of _IO_stdfile_0_lock
since we don’t have libc leak.
Another way to write to _IO_stdfile_0_lock
is to use an input function that doesn’t need stdin
. Then, we can simply
call that function without needing to care about rdx
being set to stdin
. So let’s resolve gets
first!
After gets
is resolved, it will be called with rdi
pointing to _IO_stdfile_0_lock
, and hence write our input into
there. In my exploit, I chose to write the address of gets
to the fgets
GOT, but it doesn’t really matter, because
for all intents and purposes, gets
and fgets
behave identically for us. After gets
is finished, it also ends up
with the same _IO_stdfile_0_lock
in rdi
, which allows us to resolve system
directly after with rdi
pointing to
sh\x00
.
If strcat exists why not memcat? or even memecat?? (Adapted from failed CS1010 exercise)
Author: jro
Solves: 1
RELRO: Full RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x400000)
Obviously, the pointers are incremented and later freed, which won’t free the original chunks. This seems really trivial right?
Unfortunately, because the program frees all 3 chunks, it’s very difficult to make all the chunks valid, before we can use our arbitrary write to get RCE. The program will most likely crash before it loops back to give us our arbitrary write. We need to be a bit more deliberate in how we fake our chunks.
We need to take into consideration certain restrictions before we start making our fake chunks:
fake_chunk_addr+size+0x8
contains a valid chunk size, or it will crash with free(): invalid next size (fast)
. This means that in our first iteration, c
must go into tcache, since it’s the last chunk to be allocated, and
after it is the forest which we have no control over.b
and c
generally have to be the same (as long as b
is not size 0), since by the
end of the function, c
will be pointing to the same data as in b
(but copied over). Consequently, b
must go
into tcache in our first iteration as well.Now that we know some of the conditions we need to satisfy, let’s also plan our exploit path.
__malloc_hook
(luckily we’re on libc 2.31)Since b
and c
both need to be in tcache in our first iteration, only a
can be used as our unsorted chunk.
Conveniently, since a
is copied into c
, we can also use a
to create the fake size to fulfill free(): invalid next size
check. Then, the very end of a
should contain the fake unsorted chunk that we want to free.
For b
, we just need to fake a tcache chunk. In this case, I will use 0x41, but its quite arbitrary.
c
inherits the same chunk size as b
.
After freeing, this is the state of the bins:
tcachebins
0x40 [ 2]: 0x7017c0 —▸ 0x701ce0 ◂— 0
fastbins
empty
unsortedbin
all: 0x701790 —▸ 0x7f7f99fa7be0 ◂— 0x701790
smallbins
empty
largebins
empty
Our 2 fake tcache chunks are nicely placed in tcache, and the unsorted bin contains our one chunk with the libc leak.
Unsorted bin is used to service small requests when tcache, fastbin, and smallbin don’t have exact fit for the requested
chunk size. In our next iteration, if we allocate a chunk of size 0x0, not only will it allocate from the unsorted bin,
it will obviously also write 0 bytes to the memory at that chunk, which allows us to retain our libc leak (if not,
fgets
will append null terminator and block off our libc leak).
Now a
contains our libc leak and we can read it once c
is written to stdout.
Let’s go ahead and also make b
size 0x0.
Earlier, when we created the unsorted a
in the 1st iteration, this is what the heap looks like after freeing all the
chunks:
x/50gx 0x194d790
┌───────────────────────────────┐
│ │ 0x421 │ <- fake 0x420 chunk (a), freed
│ libc leak │ │
│ │ 0x41 │ <- fake 0x40 chunk (b), freed
│ fd │ │ fd points to ──┐
│ │ 0x41 │<-──────────────┘ (c), freed
│ │ │
...
After we allocate the 2 0x0 (size 0x21) chunks and c
, but before we free, this is what the heap looks like:
x/50gx 0x194d790
┌───────────────────────────────┐
│ │ 0x21 │ <- a
│ libc leak │ │
│ │ 0x21 │ <- b (initially 0x41)
│ │ │
│ │ 0x21 │ <- c (initially 0x41)
│ │ │
│ │ 0x3c1 │ <- rest of unsorted chunk
│ │ │
...
The tcache still contains the 0x41 chunks that we freed earlier. Once c
and b
are freed, the 0x40 and 0x20 tcache
will contain the same chunks, giving us our double free!
tcachebins
0x20 [ 3]: 0x194d7a0 —▸ 0x194d7c0 —▸ 0x194d7e0 ◂— 0
0x40 [ 2]: 0x194d7c0 —▸ 0x194d7e0 ◂— 0
We will need 3 more mallocs to write to our __malloc_hook
. The 1st malloc will clear the first chunk in 0x20 that
isn’t our double free. This chunk still needs to be freed later, so we still need to fake the size (just fake it to
some other tcache that we’re not using).
The 2nd malloc will overwrite the fd pointer of the 0x40 chunk. This chunk too needs to be freed, so we also fake the size.
Now, the 3rd chunk (c
) will be allocated, but in tcache 0x30 (c = malloc(0x10 + 0x10)
). So it won’t use the chunks
from 0x40 tcache. Then we can just allocate the last 2 chunks to get our arbitrary write and shell.
After the success of my heap allocator, I decided to help the NUSmods team out with a super fast data structure
Author: jro
Solves: 0
The moment I saw this challenge, I immediately had nightmares of heapheapheap from quals and closed it.
After the CTF there was a first blood prize of $50 for the first solve, so the Thursday after the CTF I decided to take another look to win more money :D
The challenge is an “AVL tree” implementation that allows us to insert, delete, and search for nodes. Nodes are rebalanced after every insert and delete. Now, as a non-CS student, I have no idea what an AVL tree is, so I just spammed random stuff until I managed to crash the program.
When deleting a node with 2 children, the program will somehow swap the parent node’s data with the child’s, move the child up to the parent’s spot, and then free the original node. This grants us our UAF, and afterwards it should be quite trivial..?
This is the struct of a Node:
And this is the implementation for create/delete:
The challenge is using glibc 2.39, and using calloc to allocate the chunks. This creates a few considerations for us:
Tcache
Tcache is not used by calloc, so we cannot do a simple tcache dup/overwrite.
Fastbin
_int_malloc
checks whether the chunk is in the correct fastbin size before allocating, which means we already need
partial write at our desired target before we can actually allocate to it, making fastbin attack a lot more restricted.
Unsorted/largebin
Either unsorted bin or largebin attack is not possible. (I initially wanted to use this to overwrite global_max_fast
so I can easily do House of Apple with fastbins)
Unsorted bin attack was made obsolete in glibc 2.29, with the addition of these checks, which prevents us from putting
arbitrary values into bk
pointer.
Largebin attack is impossible in this current scenario, because the 0x30 chunks that are allocated after each data chunk is allocated will cause any existing largebin chunks to go back into unsorted bin.
With this info, we can see that only an attack on fastbin is feasible. Before we proceed, let’s get a useful leak first. (Note: I used the debug version to visualize the tree so that I can tell which nodes to free)
With the UAF earlier, it’s easy to leak the heap - simply read the data of the node that inherited the parent’s data chunk.
Within the heap region, we can still do arbitrary writes because we can write the appropriate size to the location we are targeting. It looks something like this:
┌─────────────────────────────────┐
│ │ 0xxx │ actual data chunk
│ │ │
│ │ │
│ │ │
│ │ 0x81 │ fake fastbin size
│ fd │ │ (make sure fd is valid, so after allocating this chunk the fastbin won't be broken)
│ │ 0x31 │ end of data chunk, start of Node chunk
│ hash │ *left │ just use gdb to check and copy these values
│ *right │ *data │ overwrite this pointer
│ │ │
└─────────────────────────────────┘
The crucial parts are ensuring that fd is properly encrypted as 0
with the correct address, and hash, left, and right
are correctly set. (In some instances, these values may need to be tweaked to manually fix the tree)
We can then escalate this into something better (eg. overwrite data pointer for more useful leaks)
As mentioned earlier, we just need to overwrite the data pointer for libc leak.
After this, I initially tried a different approach (which got really close!), and because I think it’s quite cool I’ll add it to the writeup too.
House of Mind is an old attack that used non-main arena logic to write a freed chunk address at an arbitrary location in memory. This attack comes in 2 variants, unsorted and fastbin.
Non-main arena chunks are indicated in the size field by the 0x4 bit. The arena stores information about fastbins, unsorted bins and small/large bins, as well as other stuff like available memory in the heap and the address of top chunk. If a chunk is using the main arena, it will simply read from and write to the main arena, which is found in libc. However, if the chunk isn’t using main arena, it first resolves the address from another struct in the heap, which (for non-main arena heaps) should be placed at the very top of the heap.
From glibc source:
To find the arena for a chunk on such a non-main arena, heap_for_ptr performs a bit mask operation and indirection through the ar_ptr member of the per-heap header heap_info (see arena.c).
arena.c
:
libc-pointer-arith.h
:
malloc.c
:
So for non-main arena chunks, the address of arena is resolved from heap_info
, which is found using the
PTR_ALIGN_DOWN
macro. In other words: heap_info = chunk & ~(0x4000000 - 1)
. At this address, the arena pointer is
stored, which will be passed to _int_free
. _int_free
then does stuff with it (eg. writing to the fastbins in arena).
If we are able to control the pointer at this address, then we can achieve our partial write.
And it’s not hard - we just need to create enough chunks so that the heap expands all the way past our desired address
of heap_info
and write the av
pointer to the location. Then when we create a chunk (fastbin), overwrite its size to
add the non_main_arena
flag, and then free that chunk, it will write the pointer to the arena address we control.
Small caveat - at around +0x880 from the start of our fake arena, there must be a sufficiently large value that goes
into system_mem
, if not we fail this check:
This restriction actually makes our write a lot more restricted, considering a lot of memory in libc is null bytes. This
rules out overwriting global_max_fast
. After a while, I finally found that we can use _IO_2_1_stdin_
. We can target
the _chain
field and set it to our freed chunk, then fake a file according to House of Apple. This file should give us
our shell when it’s closed on exit!
Actually, I almost got this exploit to work perfectly - I was able to fake the file and trigger an exit (on debug version, it exits when it fails the assertion). It didn’t spawn my shell because the fake file needed a bit of tweaking to make some of the necessary pointers valid (it would crash while trying to close my file). However, when I tested on remote, I realized that the exploit takes way too long to setup House of Mind (sends on average 300 chunks) and the Docker jail will kill the process before House of Mind is finished. I’m too lazy to fix this exploit since it wouldn’t work on remote, and it’s not so trivial to exit on the actual challenge binary anyway (without the printing of the tree, the only other assert possible is quite difficult to fail).
After spending a few hours on this, it was almost midnight so I went to sleep.
The next day, I looked for other vectors to convert this almost arbitrary fastbin write to RCE. After searching for a
bit, I found a reliable stack leak from libc (libc_leak + 0x2046e0
), that is always a fixed offset from the saved RIP
when create_mod
is called. Additionally, the size check can be easily passed since we can write the size just above
the stack, when the program asks for the module code. We also need the value of canary, which we can also get from libc.
The steps to leak these are similar to libc leak, so I won’t repeat it again.
With the stack leak and canary, we can prepare to allocate a fastbin chunk there. Using the double free we have, we need to write the address of the fake chunk to our poisoned fastbin chunk before writing the fake size. Then we can allocate our fake chunk, write our ROP chain and win.
For the ROP chain, there’s a small consideration: the address of Tree will be overwritten by our chain, which will crash
create_mod
before it returns. To fix this, we need to put the address back. We can use some other gadget that pops the
address into irrelevant registers to skip over the address and continue our ROP chain.
This challenge was a lot of fun because of how different not having tcache made things (despite having a really simple vuln). I still don’t know how AVL trees work tho…
I also solved IP Cam
, and Malware
together with scuffed (actually they did most of the reversing, I just helped a
little with the libc and decrypting the flag). I’m too lazy to write writeups for them (especially IP Cam
since I
don’t have access to the camera anymore), but still cool challs nonetheless!