Two Bytes is Plenty: FortiGate RCE with CVE-2024-21762
Disclaimer
The exploit described in this post is tailored to the exact version of FortiGate SSL VPN used for testing. It is unlikely the exploit will work on other versions. The purpose of our research is primarily to power our exposure engine. We also publish research to add more colour and help defenders.
We strongly advise all Fortinet customers to apply the Fortinet-provided patch as soon as possible.
Introduction
Early this February, Fortinet released an advisory for an "out-of-bounds write vulnerability" that could lead to remote code execution. The issue affected the SSL VPN component of their FortiGate network appliance and was potentially already being exploited in the wild.
FortiGate is widely deployed and a pre-auth remote code execution vulnerability would have a huge impact. Our security research team immediately began work to ensure that customers of our Attack Surface Management platform were notified if they were affected.
In this post we detail the steps we took to identify the patched vulnerability and produce a working exploit.
We've highlighted the exploit chain below
Extracting the Binary
Unfortunately, we were only able to obtain versions 7.2.5 and the latest which was 7.2.7 of the appliance. This meant the delta was larger than we would have liked, but it would have to do. We set up two VMs, <span class="code_single-line">FGT_VM64-v7.2.5.F-build1517</span> and <span class="code_single-line">FGT_VM64-v7.2.7.M-build1577</span> and confirmed they worked with trial licenses.
We had worked with FortiGate before and knew that FortiGate bundled almost all the applications into one binary, <span class="code_single-line">/bin/init</span>. To obtain a copies of the binaries we mounted the vmdks from our two FortiGate VMs into a third VM. We then decompressed and extracted the <span class="code_single-line">rootfs.gz</span> archive which contained most of the filesystem.
There was an odd "decompression OK, trailing garbage ignored" message that didn't seem to be a problem, but would cause trouble later.
Inside the archive the <span class="code_single-line">bin</span> folder is further compressed using custom versions of <span class="code_single-line">ftar</span> and <span class="code_single-line">xz</span>. The modified applications are provided in the <span class="code_single-line">sbin</span> folder and we can use <span class="code_single-line">chroot</span> to run each and extract <span class="code_single-line">bin.tar.xz</span>. This gave us the copies of <span class="code_single-line">/bin/init</span> we needed to compare.
Patch Diffing
We decompiled each <span class="code_single-line">/bin/init</span> binary with Ghidra and used BinDiff to compare. Unfortunately, the version difference was too big and we decided it would be easier to manually look for differences.
We started by looking at the HTTP parsing functionality. Historically, there have been memory corruption issues in this part of the code and so it seemed like a good place to start. We searched for strings of common header names such as <span class="code_single-line">Content-Length</span> and <span class="code_single-line">Transfer-Encoding</span> as well as paths we knew were associated with the SSL VPN component like <span class="code_single-line">/remote/login</span>.
We would look for each of these strings in both versions and then try to line up the functions to see if there were any changes. Function names were stripped, but log messages often included the function name, this proved very helpful. We slowly looked through these functions and where they were called, labelling and comparing where we could.
We found <span class="code_single-line">FUN_01701ee0</span> which appeared to handle parsing HTTP requests that used chunked transfer encoding. The patched version of this function contained some additional length checks and error messages. The relevant original and patched versions are shown below. Comments and function names have been added where possible.
The first check is added when processing the HTTP trailers sent after the chunked body.
The second check is added when decoding the length of a chunk.
<div id="finding-an-endpoint"></div>
Finding an Endpoint
This was promising, but we still didn't know if it was exploitable. We couldn't determine how to reach this function through static analysis. Instead we turned on debug logging and started sending chunked requests to some of the known endpoints. Debug logging was enabled with the following commands.
Every endpoint we tried logged the error <span class="code_single-line">chunked Transfer-Encoding</span> forbidden. Searching for this string we found the function that logged the error. The error was only logged when the function was called and the second argument was 1.
We checked all the call sites for this function and worked backwards from the ones that called it where <span class="code_single-line">param_2</span> was not 1. One of the calling functions contained a helpful log message and the function name, <span class="code_single-line">default_handler</span>. All this time we had been looking for a specific endpoint, but we didn't consider no endpoint!
<div id="triggering-a-crash"></div>
Triggering a Crash
We knew two checks were added in the patch.
- The amount of data read before getting to the chunk trailers had to be less than 1024 bytes.
- The chunk length string had to be less than 17 characters.
We wrote a Python script to start prodding the endpoint with different chunked requests focusing on these two aspects. The parsing was surprisingly resilient, the amount of data read was always kept within the allocated buffer. We tried chunk lengths that would decode to negative integers, but these immediately terminated the parsing. Many other malformed requests were also handled gracefully.
Luckily, we did eventually get a crash with the following payload. A zero-length chunk indicating the end of the request body, followed by 89 chunk trailers. Weirdly neither of these seem to violate the new checks as we understood them.
<div id="setting-up-a-debugger"></div>
Setting up a Debugger
To investigate the crash we had to setup a debugger. However, the management shell provided can't run system commands or access the filesystem. We would have to backdoor one of the existing binaries. This meant bypassing some integrity checks performed during startup. The checks were performed by the kernel during the boot process and by <span class="code_single-line">/bin/init</span> shortly after. We will start with <span class="code_single-line">/bin/init</span> because the checks there were easier to bypass.
Patching /bin/init
We searched for the string <span class="code_single-line">rootfs.gz</span> and found a function (<span class="code_single-line">FUN_028af770</span>) that loads an RSA key then reads <span class="code_single-line">rootfs.gz</span> and some other files. This was most likely the integrity check we were looking for.
We tried to trace this function call backwards but hit a dead end. Instead, we decided to look from the other end and searched for the string "System is starting" which is printed to the console during startup. Just after "System is starting" we saw a block that Ghidra didn't disassemble.
We forced Ghidra to disassemble this block and found some function calls which led to the integrity check above.
This block also contained <span class="code_single-line">FUN_00451440</span> which was called when the integrity checks failed. <span class="code_single-line">FUN_00451440</span> contained a log message with the function name <span class="code_single-line">do_halt</span>. The decompiled block is shown below with the important calls commented.
Since <span class="code_single-line">do_halt</span> was called multiple times, we patched it to just return immediately. This way we only had to make one change instead of modifying multiple integrity checks.
The <span class="code_single-line">do_halt</span> function was changed from this
to this.
After patching the instruction in Ghidra we used this helpful script to save our changes back to the binary.
Kernel Debugging
The other check we needed to bypass was done by the kernel. Reading <span class="code_single-line">extlinux.conf</span> from our mounted vmdk we could see the kernel boot arguments and the name of the kernel image: <span class="code_single-line">flatkc</span>.
Using vmlinux-to-elf we converted <span class="code_single-line">flatkc</span> to an ELF file and decompiled it.
There were more symbols here, so we searched for functions containing the word <span class="code_single-line">verify</span>. We found <span class="code_single-line">fgt_verify_initrd</span>, which was called by <span class="code_single-line">kernel_init_freeable</span> returning the value from <span class="code_single-line">fgt_verify_initrd</span>. This can be seen below.
In <span class="code_single-line">kernel_init</span> we saw that if zero is returned the system boots, otherwise it panics.
Patching this check seemed too difficult. Instead we opted to attach a debugger to the kernel and just change the return value coming back from <span class="code_single-line">fgt_verify_initrd</span>.
To do this we added the following to our VM's vmx file, enabling remote debugging on port 12345.
We then started GDB, set a breakpoint on <span class="code_single-line">fgt_verify_initrd</span> and attached to our VM shortly after starting it.
When we hit <span class="code_single-line">fgt_verify_initrd</span> we exited from the function with finish and changed the return value in rax by running set <span class="code_single-line">$rax = 0</span>.
Unfortunately, the system still did not boot. After some debugging, we tracked it down to a function called <span class="code_single-line">populate_rootfs</span>. This function took the data loaded from <span class="code_single-line">rootfs.gz</span> and passed it to <span class="code_single-line">unpack_to_rootfs</span> to be decompressed.
To calculate the length of the data to decompress <span class="code_single-line">0x100</span> is subtracted. This was that "trailing garbage ignored" warning we saw earlier!
This meant our repacked archive was not being decompressed correctly because it was 256 bytes shorter than expected. We figured 256 bytes was probably a signature that we would ignore anyway, so we just padded our modified archive with zeroes.
We now had the following repacking script which would be run from the unpacked rootfs folder.
We prepared the following backdoor program which would kill <span class="code_single-line">sshd</span> and run <span class="code_single-line">telnetd</span> instead. This would replace <span class="code_single-line">/bin/smartctl</span> and has been used in previous FortiGate vulnerabilities to get easy shell access.
We copied everything we needed into the unpacked rootfs folder as follows.
- <span class="code_single-line">init-patched</span> overwriting <span class="code_single-line">./bin/init</span>
- <span class="code_single-line">smartctl-backdoor</span> overwriting <span class="code_single-line">./bin/smartctl</span>
- <span class="code_single-line">gdb</span> from here to <span class="code_single-line">./bin/gdb</span>
- <span class="code_single-line">busybox</span> statically compiled and copied to <span class="code_single-line">./bin/busybox</span>
We then unlinked <span class="code_single-line">./bin/sh</span> and relinked it to <span class="code_single-line">./bin/busybox</span>.
This was then repacked into <span class="code_single-line">rootfs.gz</span> and copied onto the vmdk.
We booted the VM, modified the return value of <span class="code_single-line">fgt_verify_initrd</span> with GDB and were finally able to login to the management shell.
The failing integrity checks caused some issues with the saved networking settings. We found running the following commands forced a new DHCP lease and got things working.
We then ran the command that would trigger our <span class="code_single-line">/bin/smartctl</span> program. The <span class="code_single-line">ls</span> and <span class="code_single-line">id</span> command output was printed, which was a good sign.
Lastly, we connected with telnet to the device on port 22 and could start debugging.
<div id="dissecting-the-crash"></div>
Dissecting the Crash
It took a while, but we could now attach a debugger to <span class="code_single-line">/bin/sslvpnd</span> and try to triage the crash we triggered. Looking at the registers we could see <span class="code_single-line">0x0a0d</span> had been written over the start of <span class="code_single-line">r12</span> resulting in a segfault when it was dereferenced.
<span class="code_single-line">0x0a0d</span> is the <span class="code_single-line">\r\n</span> terminator used for HTTP headers and trailers, but even if we changed our request to only use <span class="code_single-line">\n</span> we still got this same crash. We set a breakpoint after the call to our potentially vulnerable function <span class="code_single-line">FUN_01701ee0</span>. Inspecting the call stack and registers at this point we could see the clobbered value. However, it was a few stack frames away.
The clobbered value was being popped off the stack into <span class="code_single-line">r12</span> just before returning to <span class="code_single-line">0x182a540</span>. The crash then occurred a few instructions later at <span class="code_single-line">0x182a544</span>.
A buffer on the stack was used to process the chunked request, but this <span class="code_single-line">0x0a0d</span> overwrite was quite a bit past that and also skipped over the stack canaries in between.
After some debugging we found where the <span class="code_single-line">0x0a0d</span> was being written. When processing the trailers in <span class="code_single-line">FUN_01701ee0</span>, <span class="code_single-line">0x0a0d</span> was written to the stack buffer at an offset that incremented each time.
With each trailer encountered the following would happen:
- The trailer was read into the buffer on the stack.
- <span class="code_single-line">0x0a0d</span> was written into the buffer at the offset stored in <span class="code_single-line">field654_0x2d8</span>.
- <span class="code_single-line">field654_0x2d8</span> was incremented by two.
- The buffer was advanced.
- If there was still space in the buffer, another line of input would be read.
The offset used to write <span class="code_single-line">0x0a0d</span> wasn't properly checked against the remaining buffer length and so only <span class="code_single-line">0x0a0d</span> could be written past the buffer. All the incoming data was constrained to be within the buffer.
Interestingly the offset is incremented by two each time and also used to advance the buffer. Because the offset is not reset the following would happen, assuming a buffer size of 15:
Since we are advancing both the buffer and offset, we get a scenario where the buffer is nearly empty and the offset is much larger than the remaining space. This would explain why none of the canaries triggered, we can go past the buffer, but only to write <span class="code_single-line">0x0a0d</span>.
<div id="a-better-crash"></div>
A Better Crash
Trying to control where we wrote <span class="code_single-line">0x0a0d</span> using this approach was difficult. We decided to track down the starting value of <span class="code_single-line">field654_0x2d8</span>, if we could start with it much higher we would need to send fewer trailers and not have to worry about the incrementing offsets.
The value of <span class="code_single-line">field654_0x2d8</span> was copied from <span class="code_single-line">amount_read</span> just before trailer processing. Looking at <span class="code_single-line">amount_read</span> we found it was set during chunk length processing.
The chunk length preceding the trailer processing always needed to be zero as that was how the parser knew the request body was finished. Looking at the hex decoding function, it started by skipping all leading '0' characters.
This meant we could pad our chunk length with many zeroes, <span class="code_single-line">ap_getline</span> would return a large value for <span class="code_single-line">amount_read</span>, the chunk would still be decoded to zero and trailer processing would begin. We modified our request to the following, replacing the terminator for the chunk length with a null byte which was also allowed by the parser.
We set a breakpoint where the <span class="code_single-line">0x0d</span> was written when processing the trailers and ran our exploit.
We continued until we returned from the vulnerable function <span class="code_single-line">FUN_01701ee0</span> and saw <span class="code_single-line">0x0a0d</span> written at the offset calculated at breakpoint 4.
With this we could now write <span class="code_single-line">0x0a0d</span> somewhere on the stack. It's not the most powerful write primitive, but it was enough to get us started.
<div id="what-to-do-with-only-two-bytes"></div>
What to Do With Only Two Bytes
We looked at the stack and saw four options for what we could overwrite.
- Return addresses
- Saved base pointers
- Saved locals (miscellaneous values)
- Saved locals (heap pointers)
Option 1 was quickly ruled out. All the return addresses were <span class="code_single-line">0x182xxxx</span> and could only be overwritten to <span class="code_single-line">0x1820a0d</span>, which contained an invalid instruction and immediately faulted.
Option 2 was promising, rewriting the lower significant bits of these pointed them into the stack buffer used to read in the request. However, looking at each function in the call stack, none of them used stack local variables that much. Most just kept everything in registers.
Option 3 was tried for a little while, but nothing interesting happened when these values were modified.
Option 4 was all that was left and it was our least favourite, because it meant heap manipulation which had the potential to be very unreliable.
Before starting with option 4, we took a fresh stack dump without overwriting and lined up the heap addresses with the registers they would be popped into. We wanted to verify that controlling these addresses could lead to something useful before spending a lot of time setting up the heap.
We traced each register through its returning function. The <span class="code_single-line">pop r13</span> and return to <span class="code_single-line">0x182a540</span> had the most promise. Looking at the disassembly we see that <span class="code_single-line">r13</span> is used as the first argument to the function we are returning from.
We also saw in the decompilation that this function was called in a loop. We could overwrite <span class="code_single-line">r13</span> in the first pass of the loop, it would then be used as a <span class="code_single-line">param_1</span> in the second pass.
<span class="code_single-line">FUN_01828e10</span> has a lot going on and calls function pointers at multiple locations. One such location is shown below, note that at this stage the <span class="code_single-line">r13</span> value we overwrote has been copied to <span class="code_single-line">rdi</span>. Extraneous instructions have been omitted.
This was really promising. It looked like if we set things up correctly we could jump to an address we controlled. The problem was we needed to perform two pointer dereferences and we wouldn't know the heap address containing our buffer so we couldn't point it at itself.
Instead we could try call a linked external function. These should already have the appropriate pointers in the PLT and GOT tables. We chose system and tried to determine what values we would need to call it.
Working backwards, we searched for references to system and found a pointer at <span class="code_single-line">0x042c5770</span>.
This was the last dereference, so we had the following, separated into two steps.
We stepped through the code with the debugger and saw <span class="code_single-line">rax</span> was often <span class="code_single-line">0x20</span> at this point, so we could simplify it to the following.
Going back another step we searched all memory blocks for <span class="code_single-line">0x042C5730 (0x042c5770 - 0x40)</span>. We found it in the <span class="code_single-line">.rela.plt</span> section at <span class="code_single-line">0x004337b8</span>.
We now had the following:
And the last step meant we just needed to write <span class="code_single-line">0x00433748</span> at <span class="code_single-line">rdi + 0x298</span>. Which since we controlled where <span class="code_single-line">rdi</span> pointed, should be no problem.
To recap, this was the plan going forward.
- Allocate a heap buffer containing <span class="code_single-line">0x00433748</span> at the right offset.
- Overwrite the lower two bytes of the saved <span class="code_single-line">r13</span> pointer with <span class="code_single-line">0x0a0d</span>, hopefully this should cause it to point to somewhere in the above heap allocation.
- <span class="code_single-line">r13</span> is popped and we loop around to call <span class="code_single-line">FUN_01828e10</span> with <span class="code_single-line">rdi</span> set to <span class="code_single-line">r13</span>.
- <span class="code_single-line">FUN_01828e10</span> will dereference <span class="code_single-line">rdi</span> then <span class="code_single-line">r13</span> then <span class="code_single-line">r15</span> leaving <span class="code_single-line">rax</span> with the address of system.
- <span class="code_single-line">system</span> is called and we get remote code execution.
Controlling the Heap
To get started, we had to understand how the value pointed to by <span class="code_single-line">r13</span> was allocated and if we could get an allocation of our own nearby.
We noticed that <span class="code_single-line">r13</span> was often allocated the same address and so we set a watchpoint on it. The goal was to find where the allocation occurred and what size it was. The watchpoint was hit as soon as we sent through a request and can be seen below along with the stack trace.
We set a breakpoint at <span class="code_single-line">0x18380a6</span> which is the function called for frame #3 in the above output. When this was hit we saw the requested allocation size was <span class="code_single-line">0x730</span> or <span class="code_single-line">1840</span> bytes.
Next we setup some GDB scripts to automatically print calls to <span class="code_single-line">je_malloc</span> and <span class="code_single-line">je_calloc</span> if the allocation size was near <span class="code_single-line">0x730</span>. The script would print the start and end addresses of the allocations and their size.
With our crash request we saw just one allocation.
We knew from previous exploits that FortiGate would create individual allocations for each form post parameter when they were parsed. This let us have a very fine-grained control of the allocations. We sent a request with five form parameters, each the same length as our target allocation size.
We could now see lots of allocations being printed. They weren't quite the same size, 32 bytes were added. However, we could just shrink the parameter size if we wanted it to be exact. Many of the allocations were contiguous and appeared to be in <span class="code_single-line">0x800</span> byte blocks.
After some back and forth, tweaking the sizes and checking the results we had the following two requests.
We sent the requests and put a breakpoint just after our <span class="code_single-line">0x0a0d</span> overwrite.
With this we could reliably redirect the <span class="code_single-line">r13</span> pointer to a buffer we controlled. Now we just had to fill the buffer with our payload and we should have remote code execution.
<div id="calling-system"></div>
Calling System
We tweaked the form parameter to contain our pointer chain which would call <span class="code_single-line">system</span>. This was done by manually adding and removing padding either side until the value was aligned. We ended with the following request.
We had to change the padding from "A" to "B" because of a check that a specific byte in our buffer ANDed with <span class="code_single-line">0x2</span> was not zero. "A" was <span class="code_single-line">0x41</span> and didn't meet this requirement.
We stepped through the pointer chain up to the call to <span class="code_single-line">system</span> and saw that the first argument, <span class="code_single-line">rdi</span>, already pointed to our buffer.
We wrote in a payload and it worked, but realised we had made a mistake. system always runs <span class="code_single-line">/bin/sh</span>, which we had modified. The original <span class="code_single-line">/bin/sh</span> was a custom application that would only run a few commands.
Calling <span class="code_single-line">system</span> wasn't going to get us remote code execution. We would have to try a different approach.
<div id="not-giving-up"></div>
Not Giving Up
While this was quite disheartening, we weren't ready to give up. There were loads of other dynamically linked functions we could call. We looked for any that took a string as the first argument, but found none were that interesting.
Previous FortiGate exploits often overwrote a function pointer in an <span class="code_single-line">SSL</span> struct which would then be triggered by a call to <span class="code_single-line">SSL_do_handshake</span>. We didn't consider this originally because we didn't think we could overwrite this struct with just <span class="code_single-line">0x0a0d</span>.
However, we realised that since <span class="code_single-line">SSL_do_handshake</span> was dynamically linked we could call it ourselves. We controlled the first argument and just had to forge an SSL struct with the function pointer where we wanted it.
First we calculated the start of the PLT/GOT pointer chain to call <span class="code_single-line">SSL_do_handshake</span> as <span class="code_single-line">0x42ce60</span>. We then started stepping through SSL_do_handshake to see what parts of the SSL struct we needed to set in order to call the function pointer.
Below is a simplified version of <span class="code_single-line">SSL_do_handshake</span>. We wanted to call <span class="code_single-line">handshake_func</span> at the end of the function. It's a short function, but still requires some work. Most notably the function pointer call <span class="code_single-line">ssl_renegotiate_check</span>.
To avoid a segfault on <span class="code_single-line">ssl_renegotiate_check</span> we used the same trick we used to call <span class="code_single-line">SSL_do_handshake</span>. It didn't matter what we called as long as it didn't break anything. The assembly for <span class="code_single-line">s->method->ssl_renegotiate_check(s, 0);</span> is:
So we grabbed the PLT/GOT pointer for an innocuous function, <span class="code_single-line">getcwd</span> and subtracted <span class="code_single-line">0x60</span> from it which gave us <span class="code_single-line">0x42c6270</span>. After aligning everything again, we called <span class="code_single-line">SSL_do_handshake</span> and saw the following in the debugger.
Next was <span class="code_single-line">SSL_in_init</span> which was the following:
This was easy to achieve as none of our padding bytes were zero and the check always evaluated to true.
Last was the async job check <span class="code_single-line">sc->mode & SSL_MODE_ASYNC</span>, which was the following assembly.
It checked a specific byte somewhere in our buffer had the lowest bit set. Not a problem because we wanted the check to fail and all our padding bytes were <span class="code_single-line">0x42</span>.
We stepped through to the <span class="code_single-line">handshake_func</span> call and saw we had loaded in an address from our buffer. Now for the first time we could direct execution to an arbitrary address.
<div id="rop-chain-time"></div>
ROP Chain Time
From here it was mostly smooth sailing. We needed to build a ROP chain that would setup and call <span class="code_single-line">execl</span> with the same Node.js reverse shell as previous FortiGate exploits but modified to run <span class="code_single-line">/bin/node</span> instead of <span class="code_single-line">/bin/sh</span>. The <span class="code_single-line">/bin/init</span> binary is huge so there was no shortage of gadgets.
We looked at the registers just before the <span class="code_single-line">jmp rax</span> and saw that <span class="code_single-line">rdi</span> still pointed to our buffer. Using ropr we found a gadget to pivot the stack to our buffer with <span class="code_single-line">push rdi; pop rsp; ret;</span>.
After this pivot, space was tight so we used another stack pivot <span class="code_single-line">add rsp, 0x2a0; pop rbx; pop r12; pop rbp; ret;</span> to advance the stack forward. This gave us plenty of room.
We wanted to setup this call, <span class="code_single-line">execl("/bin/node", "/bin/node", "-e", "..js reverse shell..", 0)</span>, which meant setting the registers as follows:
- <span class="code_single-line">rdi</span> = pointer to "/bin/node"
- <span class="code_single-line">rsi</span> = pointer to "/bin/node"
- <span class="code_single-line">rdx</span> = pointer to "-e"
- <span class="code_single-line">rcx</span> = pointer to "..js reverse shell.."
- <span class="code_single-line">r8</span> = 0
Starting with <span class="code_single-line">rcx</span>, we created the following gadget chain. This would copy our buffer pointer in <span class="code_single-line">rdi</span> to <span class="code_single-line">rax</span>, shift it back <span class="code_single-line">0x2b8</span> bytes, then OR it into <span class="code_single-line">rcx</span>.
Next was <span class="code_single-line">rdx</span>, after the previous gadget the value of <span class="code_single-line">rax</span> was one, so we shift it left to equal 16, OR <span class="code_single-line">rcx</span> into <span class="code_single-line">rdx</span> and then subtract <span class="code_single-line">rax</span> from <span class="code_single-line">rdx</span>. <span class="code_single-line">rdx</span> and <span class="code_single-line">rax</span> now point to 16 bytes before <span class="code_single-line">rcx</span>. Plenty of room for "-e"
Next was <span class="code_single-line">rsi</span>, we move <span class="code_single-line">rax</span> back another 16 bytes then copy it to <span class="code_single-line">rsi</span> with an ADD because <span class="code_single-line">rsi</span> is zero at this point.
Lastly <span class="code_single-line">rdi</span> and <span class="code_single-line">r8</span>, copy <span class="code_single-line">rax</span> to <span class="code_single-line">rdi</span>, then set <span class="code_single-line">r8</span> to zero by popping a zero.
Before we can call <span class="code_single-line">execl</span> we need to move the stack pointer again because it is too close to the arguments. Calling <span class="code_single-line">execl</span> will clobber the payload as part of its execution.
We pivot one last time with <span class="code_single-line">add rsp, 0xd90; pop rbx; pop r12; pop rbp; ret;</span> then return to <span class="code_single-line">execl</span> at <span class="code_single-line">0x43c180</span>. It was probably possible to do this third pivot before the start of the argument setup and shift the whole chain, but writing the exploit had already taken long enough.
We ended with the following payload. We found that moving the payload from the form name to the form value helped with heap allocation, but it wasn't required.
We started a netcat listener, ran the exploit and finally caught the reverse shell.
<div id="conclusion"></div>
Conclusion
This was another case of a network / security appliance having a pretty serious memory corruption vulnerability. It's also far from the first for FortiGate. As is often the case with these issues the mitigations are known, it's just whether or not they are applied. Stack canaries were present, but ASLR was not.
It seems like a lot of effort has been spent on preventing access to the filesystem; setting up the debugger was a significant portion of the time spent on this vulnerability. Would that effort be better spent on auditing and hardening the applications themselves?
Not much has been released in terms of IOCs for this vulnerability. However, watching for new Node.js processes may be beneficial as this isn't the first FortiGate exploit where this technique has been useful.
As always, customers of our Attack Surface Management platform were the first to know when this vulnerability affected them. We continue to perform original security research in an effort to inform our customers about zero-day vulnerabilities in their attack surface.
More Like This
Ready to get started?
Get on a call with our team and learn how Assetnote can change the way you secure your attack surface. We'll set you up with a trial instance so you can see the impact for yourself.