This blog post provides an overview of Dependency Confusion attacks and explains in detail how they can be exploited in the wild, with examples using NPM packages and tips to prevent these vulnerabilities from occurring.
Author: Lucas Morais
In software development, dependencies are the software needed in a program to make it work. Typically, these pieces of software perform a common and necessary task that is often developed by entire communities or even within companies.
These essential codes for a system are centralized and can be imported by programmers in the project. Managing the packages (as dependencies are also known) can be cumbersome in large projects, so it is common to use package managers.
A package manager is a system used to carry out tasks related to the use of dependencies, such as publishing packages, installing and removing them, among other tasks associated with managing such dependencies.
Using third-party code is a common and necessary activity in software development, but how is it possible to somehow exploit this entire structure already consolidated in the software creation process?
On February 9, 2021, there was a publication titled Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies, where Alex Birsan showed us that some configurations related to dependencies can pose risks to the system and how it exploited this to gain access to the internal infrastructure of large companies after learning about packages used internally.
How does dependency confusion work?
Dependency confusion arises when a package manager installs a package from a public source rather than the usual private repository. In large companies, it is common to use packages created in-house that can only be used in their infrastructure.
Thus, internal repositories are used to store these codes and can only be imported by company programmers, so there is no public record of this package in the language package manager.
Programming languages have different package managers; some of the best-known are Python pip, npm for Node.JS, and RubyGems for Ruby.
What was noticed was that because the package is not publicly registered in the language package manager, if an attacker knows the name of the dependency used internally and the environment is not configured in the best way, it is possible to create a public registry with the same name and with a higher version and so, when installing or requesting an update of this package within the company the package manager will download the public version which is higher, but has the code created by the attacker.
An attacker can find dependency names in package.json files; besides that, another interesting file to have in the dictionary is package-lock.json, which is automatically generated after some operation executed by NPM.
Application error responses can also leak used modules name, as this stack overflow forum image shows us:
Exploiting Dependency Confusion
To validate the vulnerability, an internal repository was created using the Verdaccio tool, and a package was created that does not have a public record and can only be used internally.
sudo docker run -it --rm --name verdaccio -p 4873:4873 verdaccio/verdaccio
An attacker, after discovering the package name, can check the public record in the package manager of the language used by the application, in this case, in NPM.
When verifying the lack of a public registry, it is possible to create the package using the npm init command with a higher version than the one used so that when installing or updating the dependency, the package manager will look for the higher version in case of bad configuration.
In the case of the NPM package, there is a property called scripts in the generated package.json file. In this case, we will use the preinstall option passing a command to verify the possibility of remote code execution.
For this check, Interactsh was used, which is a tool used for out-of-band data extraction.
Then, it is possible to enter a command to verify the code execution, extracting, in this case, the contents of the passwd file from the /etc directory, bringing the hostname in the subdomain.
It is necessary to create an account at https://www.npmjs.com/, and only after verifying the account by email it is possible to publish the package, as seen in the image below:
This time, when consulting the package on npm, it is possible to verify its existence.
So, after publication, it is necessary to wait for a developer or continuous integration (CI) system that has not correctly configured the private registry to update or install this package in order to receive pingback communication on the server.
When installing the package, there doesn’t seem to be any problem:
However, the machine that installed it executed the commands entered in the package.json file and communicated with the configured server. At the end of the image, you can see that the hostname blaze-machine was passed as a subdomain.
The contents of the passwd file are also displayed in the request made by the machine that installed the package.
The same vulnerability can be exploited in Python by including code in the package’s setup.py file; when installing the package using the –extra-index-url parameter the package downloads the version which must be higher in the public registry.
Preventing Dependency Confusion
nodejs allows the creation of scoped and unscoped packages. By creating scoped packages, only those who belong to the organization can publish in that scope. For the publication of scoped packages, the line npm init –scope=@my-org is used, where my-org refers to the organization.
Defining the scope at .npmrc improves your configuration. Configuring the private registry on CI systems and enforcing the setting in the .npmrc file is also necessary. Another important measure is not to reference multiple feeds but only a private one.
In Python, it is necessary to change the arguments from –extra-index-url to –index-url to install the package. You can also specify the version used in the package.json or requirements.txt file, avoiding configurations that use the latest and >= before versions, as in the example below.
The lockfile is another important measure as the dependencies are specified with the exact version to be used, so in case of update commands, the newer version from the public registry will not be fetched. This file must be included in the project.
Package managers can provide ways to protect against this kind of attack. Another interesting one is Hash-Checking Mode available in pip; this feature checks the downloaded packages against local hashes protecting against remote tampering as mentioned in the documentation that can be found in this URL: https://pip.pypa.io/en/stable/cli/pip_install/#hash-checking-mode
By analyzing a dependency confusion vulnerability, it is possible to understand the huge impact it can have on security if exploited. If an attacker exploits a code execution flaw remotely, as in this case, he can gain access to internal files and make changes that can impact the company’s operations, bringing losses.
In this way, the importance of caring for security from the beginning of the projects is perceived since it is the phase where the choices of existing or internally created packages are made, the best practices in relation to the technologies used, the communication with new members of the internal infrastructure that will use the resources and also the need for awareness of developers and those responsible for configuring the environments, in addition to the importance of continuous tests to verify such failures.
https://email@example.com/dependency-confusion-4a5d60fec610 – Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies
https://developer.mozilla.org/en-US/docs/Learn/Tools_and_testing/Understanding_client-side_tools/Package_management#a_dependency_in_your_project – Package management basics
https://arxiv.org/pdf/1902.09217.pdf – Small World with High Risks: A Study of Security Threats in the npm Ecosystem