This New Supply Chain Attack Technique Can Trojanize All Your CLI Commands

MalBot · October 14, 2024, 11:05am

The open source ecosystem, due to its widespread adoption, has become a prime target for supply chain attacks. Malicious actors often exploit built-in features of open source packages to automatically distribute and execute harmful code. They particularly favor two techniques: Automatic, preinstall scripts that execute upon package installation, and seemingly innocent packages that import malicious dependencies.

As these tactics have become more recognizable, current security tools and vigilant developers have improved at detecting them quickly. However, an often overlooked yet potentially dangerous feature remains: Entry points.

This blog post explores how attackers can leverage entry points across multiple programming ecosystems with an emphasis on Pypi to trick victims into running malicious code. While this method doesn’t allow for immediate system compromise like automatic scripts or malicious dependencies, it offers a subtler approach for patient attackers to infiltrate systems, potentially evading standard security measures.

By understanding this lesser-known vector, we can better defend against the evolving landscape of Open source supply chain attacks.

Key Points

Entry points, a powerful feature for exposing package functionality, are vulnerable to exploitation across various ecosystems including PyPI (Python), npm (JavaScript), Ruby Gems, NuGet (.NET), Dart Pub, and Rust Crates.
Attackers can leverage these entry points to execute malicious code when specific commands are run, posing a widespread risk in the open-source landscape.
Attack methods include command-jacking—impersonating popular third-party tools and system commands—and targeting various stages of the development process through malicious plugins and extensions. Each approach carries varying levels of potential success and detection risk.
Entry point attacks, while requiring user interaction, offer attackers a more stealthy and persistent method of compromising systems, potentially bypassing traditional security checks.
This attack vector poses risks to both individual developers and enterprises, highlighting the need for more comprehensive Python package security measures.

Understanding Python Entry Points

Entry points are a powerful feature of the packaging system that allows developers to expose specific functionality as a cli command without requiring users to know the exact import path or structure of the package.

Entry points serve several purposes which include:

Creating command-line scripts that users can run after installing a package.
Defining plugin systems where third-party packages can extend the functionality of a core package.

The most popular kind of entry point is console_scripts, which points to a function that you want to be made available as a command-line tool to whoever installs your package.

While primarily designed to enhance modularity and plugin systems, entry points can, if misused, become a vector for malicious actors to embed and execute harmful code. To understand how attackers can leverage Python entry points in their favor, let’s first understand how entry points were originally meant to work.

How Entry Points are Defined in Package Metadata

The location and format of entry point definitions can vary depending on the package format (wheel or source distribution).

Source Distributions (.tar.gz)

For source distributions, entry points are typically defined in a package’s setup configuration. This can be in setup.py, setup.cfg for traditional setups, or pyproject.toml for more modern packaging approaches.

Here’s an example of how entry points might be defined in setup.py:

Wheel Files (.whl)

In a wheel file, which is a built package format, entry points are defined in the entry_points.txt file within the .dist-info directory.

Here’s how the entry_points.txt file might look for the above example:

The syntax for entry points follows this pattern:

name: The name of the entry point (e.g., the command name for console scripts)
package.module: The Python module path
object: The object (function, class, etc.) within the module to be used

In the above examples, my_command is a console script that will be created during installation. Anytime after the package installation, when a user types my_command in their terminal, it will execute the my_function from mypackage.module.

The plugin_name is a custom entry point that could be used by my_package to discover plugins. It points to PluginClass in my_package.plugins.

When a package is installed, these entry points are recorded in the package’s metadata. Other packages or tools can then query this metadata to discover and use the defined entry points.

If an attacker can manipulate a legitimate package’s metadata or convince a user to install a malicious package, they can potentially execute arbitrary code on the user’s system whenever the defined command or plugin is invoked. In the following section, I will provide multiple methods an attacker could use to trick someone into executing their malicious code through entry points.

Understanding CLI Commands in Operating Systems

Command-line interface (CLI) commands are the primary means by which users interact with an operating system through a text-based interface. These commands are interpreted and executed by the shell, which acts as an intermediary between the user and the operating system.

When a user enters a command, the shell follows a specific resolution mechanism to locate and execute the corresponding program. The exact order can vary slightly between different shells. However, the process typically begins by checking in order the directories listed in the PATH environment variable and runs the first matching executable it finds. Users can view their current PATH by entering the command “echo $PATH” in their terminal (the exact command will differ between operating systems), which displays the list of directories the shell searches for executables.

This resolution process ensures that when a user types a command, the appropriate action is taken. Understanding this process is crucial when considering how Python entry points, which can create new CLI commands, might interact with or potentially interfere with existing system commands.

Terminal output on an Ubuntu system showing the ‘ls’ command execution, its PATH location using ‘which ls’, and the system’s PATH environment variable, displaying the ‘ls’ PATH priority.

How Attackers Can Abuse Entry Points to Execute Malicious Code

Malicious actors can exploit Python entry points in several ways to trick users into executing harmful code. We’ll explore a number of tactics, including Command-Jacking, Malicious Plugins and Malicious Extensions.

Command-Jacking

Impersonating Popular Third-Party Commands

Malicious packages can use entry points to masquerade as widely-used third-party tools. This tactic is particularly effective against developers who frequently use these tools in their workflows.

For instance, an attacker might create a package with a malicious ‘aws’ entry point. When unsuspecting developers who regularly use AWS services install this package and later execute the aws command, the fake ‘aws’ command could exfiltrate their AWS access keys and secrets. This attack could be devastating in CI/CD environments, where AWS credentials are often stored for automated deployments—potentially giving the attacker access to entire cloud infrastructures.

Another example could be a malicious package impersonating the ‘docker’ command, targeting developers working with containerized applications. The fake ‘docker’ command might secretly send images or container specifications to the attacker’s server during builds or deployments. In a microservices architecture, this could expose sensitive service configurations or even lead to the exfiltration of proprietary container images.

Other popular third-party commands that could be potential targets for impersonation include but not limited to:

npm (Node.js package manager)
pip (Python package installer)
git (Version control system)
kubectl (Kubernetes command-line tool)
terraform (Infrastructure as Code tool)
gcloud (Google Cloud command-line interface)
heroku (Heroku command-line interface)
dotnet (Command-line interface for .NET Core)

Each of these commands is widely used in various development environments, making them attractive targets for attackers looking to maximize the impact of their malicious packages.

Impersonating System Commands

By using common system command names as entry points, attackers can impersonate fundamental system utilities. Commands like ‘touch,’ ‘curl,’ ‘cd’, ‘ls’, and ‘mkdir’ just to name a few, could be hijacked, leading to severe security breaches when users attempt to use these fundamental tools.

While this method potentially provides the highest chances of the victim accidentally executing the malicious code, it also carries the highest risk of failure for the attacker. The success of this approach primarily depends on the PATH order. If the directory containing the malicious entry points appears earlier in the PATH than the system directories, the malicious command will be executed instead of the system command. This is more likely to occur in development environments where local package directories are prioritized.

Another thing to keep in mind is that globally installed packages (requiring root/admin privileges) might override system commands for all users, while user-installed packages would only affect that specific user’s environment.

Comparison of Ubuntu terminal outputs before and after installation of a malicious package. An ‘ls’ command is added to the PATH /home/ubuntu/.local/bin/ls, which takes priority over the PATH of the legitimate ls command.

Enhancing Attacks with Command Wrapping

In each of these Command-Jacking tactics, while it’s simpler for an attacker to merely override CLI commands, the chances of remaining undetected are quite low. The moment victims can’t execute a command, they’ll likely become suspicious immediately. However, these attacks can be made much more effective and stealthy through a technique called “command wrapping.” Instead of simply replacing a command, wrapping involves creating an entry point that acts as a wrapper around the original command. Here’s how it works:

The malicious entry point is triggered when the user calls the command (whether it’s an impersonated third-party tool or an attempt to impersonate a system command).
In addition to silently executing the attacker’s malicious code, it calls the original, legitimate command with all the user’s arguments.
Finally, it returns the output and exit code of the legitimate command to the user.

This method of command wrapping is particularly dangerous as it executes malicious code without the user’s knowledge while maintaining the appearance of normal operation. Since the legitimate command still runs and its output and behavior are preserved, there’s no immediate sign of compromise, making the attack extremely difficult to detect through normal use. This stealthy approach allows attackers to maintain long-term access and potentially exfiltrate sensitive information without raising suspicion.

However, implementing command wrapping requires additional research by the attacker. They need to understand the correct paths for the targeted commands on different operating systems and account for potential errors in their code. This complexity increases with the diversity of systems the attack targets.

An alternative approach, depending on the command being hijacked, is for the malicious package to not only perform its covert operations but also replicate some or all of the functionality of the original command. Instead of calling the real command, the wrapper simulates its behavior. This method could further decrease suspicion, especially for simpler commands, but it requires more effort from the attacker to accurately mimic the original command’s behavior across various scenarios.

The success of these attacks ultimately depends on the malicious package being installed and its scripts directory being prioritized in the system’s PATH.

Malicious Plugins & Extensions

Another powerful technique for abusing entry points is through the creation of malicious plugins for popular Python tools and frameworks. This approach can be particularly dangerous as it targets the development and testing process itself.

Manipulating pytest

As an example, let’s consider how an attacker might target pytest, a widely-used testing framework in the Python ecosystem. By creating a malicious pytest plugin, an attacker could potentially compromise the integrity of the entire testing process.

Here’s how such an attack could work:

The attacker creates a plugin that uses pytest’s entry point system to inject malicious code.
This plugin is distributed as a seemingly helpful testing utility.
Once installed, the plugin can manipulate various aspects of the testing process such as assertion handling.

The malicious plugin could then stealthily run malicious code in the background during testing. The malicious plugin could also override pytest’s assertion comparison, causing, for example, all equality checks to pass regardless of their actual values, leading to false positives in test results, allowing buggy or vulnerable code to pass quality checks unnoticed.

In the following video demonstration, we showcase how such a malicious plugin can target pytest’s assertion handling, allowing an attacker to manipulate test results without alerting the developers. In this example, a developer was attempting a simple test scan of a basic calculator package.

Manipulating Flake8

Attackers can also target popular development tools, manipulating them to run malicious extensions. Flake8, a widely-used linting tool in the Python ecosystem, is one such example. Since Flake8 uses entry points to discover and load extensions, it becomes a potential target for malicious actors.

An attacker might exploit Flake8 by creating a malicious extension disguised as helpful linting rules. This extension would be defined as an entry point in the package’s setup configuration. For example, the setup file might specify an entry point named ‘MCH’, pointing to a malicious checker class within the package.

The malicious checker’s implementation could include functionality to perform harmful actions on the victim’s system, inject malicious “fixes” into the code, or manipulate linting results to hide or create issues. When a user runs Flake8 on their codebase, this malicious extension would activate, allowing the attacker to execute their harmful code.

This attack is particularly dangerous because linting tools often run on entire codebases, giving the attacker broad access to the source code. Moreover, the attack can be perpetrated through seemingly helpful linting rules, making it less likely to raise suspicion. It could serve as part of a larger supply chain attack to gather intelligence or introduce vulnerabilities into the target’s codebase.

Working around .whl File Limitations

Python wheels (.whl files) have become increasingly prevalent due to their performance benefits in package installation. However, they present a unique challenge for attackers

While both .tar.gz and .whl files may contain a setup.py file, .whl files don’t execute setup.py during installation. This characteristic has traditionally made it more difficult for attackers to achieve arbitrary code execution during the installation process when using .whl files.

However, the entry point attack method we’ve discussed provides a workaround for this limitation. By manipulating entry points, attackers can ensure their code is executed when specific commands are run, even if the package is distributed as a .whl file. This is particularly significant because when developers build a Python package using commands like “pip -m build”, newer pip versions automatically create both .tar.gz and .whl files. Additionally, pip prioritizes delivering the .whl file to users during installation

This shift in package format and installation behavior presents a new opportunity for attackers. Many security tools focus on analyzing execution of preinstall scripts during installation, which are typically associated with .tar.gz files. As a result, they may miss malicious code in packages distributed as .whl files, especially when the malicious behavior is triggered through entry points rather than immediate execution.

Entry Points in Other Ecosystems

While this blog primarily focuses on Python, the exploitation of entry points for malicious purposes extends beyond the Python ecosystem. Through our research, we have confirmed that this type of attack vector exists in several other major ecosystems, including:

npm (JavaScript), Ruby Gems, NuGet (.NET), Dart Pub, and Rust Crates, though the vulnerability may not be limited to these alone.

Understanding how entry points function across various programming languages and package managers is crucial for grasping the widespread nature of this potential security risk and for developing comprehensive defensive strategies.

Conclusion

Entry points, while a powerful and useful feature for legitimate package development, can also be manipulated to deliver malicious code across multiple programming ecosystems

Attackers could exploit this mechanism through various methods, including Command-Jacking and the creation of malicious plugins and extensions for popular development tools.

Moving forward, it’s crucial to develop comprehensive security measures that account for entry point exploitation. By understanding and addressing these risks, we can work towards a more secure Python packaging environment, safeguarding both individual developers and enterprise systems against sophisticated supply chain attacks.

As part of the Checkmarx Supply Chain Security solution, our research team continuously monitors suspicious activities in the open-source software ecosystem. We track and flag “signals” that may indicate foul play, including suspicious entry points, and promptly alert our customers to help protect them from potential threats.

Article Link: Command-Jacking: The New Supply Chain Attack Technique