Threat-Informed Defense to Secure AI

Written by Tabitha Colter, Shiri Bendelac, Lily Wong, Christina Liaghati, & Keith Manville

The Secure AI research project is a collaborative effort between MITRE ATLAS™ and the Center for Threat-Informed Defense (Center) designed to facilitate rapid communication of evolving vulnerabilities in the AI security space through effective incident sharing. This research effort will boost community knowledge of threats to Artificial Intelligence-enabled systems. With AI technology and adoption advancing exponentially across critical domains, new threat vectors and vulnerabilities are emerging every day, and they require novel security procedures.

In partnership with AttackIQ, BlueRock, Booz Allen Hamilton, CATO Networks, Citigroup, CrowdStrike, FS-ISAC, Fujitsu, HCA Healthcare, HiddenLayer, Intel Corporation, JPMorgan Chase Bank, Microsoft Corporation, Standard Chartered, and Verizon Business, we deployed a system for improved AI incident capture and added new case studies, techniques, and mitigations to the ATLAS knowledge base. These case studies illustrate novel AI attack procedures that organizations should be aware of and defend against.

Rapid Information Sharing for AI Incidents

Organizations across government, industry, academia, and nonprofit sectors continue to incorporate AI components into their software systems. Commensurately, incidents involving these systems will increasingly occur. Standardized and rapid information sharing about these AI incidents will empower the entire community to improve the collective defense of such systems and prevent external harms. Sharing information about the affected AI artifacts, affected system and users, attacker, and incident detection can be vital to improving those defenses. For this reason, we focused on aligning the capture of information for AI incident expression with existing cybersecurity standards, using a STIX 2.1 bundle as our basis.

Incident Sharing Portal Landing Page

This project developed the AI Incident Sharing initiative as a mechanism for a community of trusted contributors to both receive and share protected and anonymized data on real world AI incidents that are occurring across operational AI-enabled systems. Just as MITRE operates CVE for the cyber community or ASIAS for the aviation community, this AI Incident Sharing initiative will serve as the safe space for AI assurance incident sharing at the intersection of the industry, government, and extended community. In capturing and carefully distributing the appropriately sanitized and technically focused AI incident data, this effort aims to enable more data driven risk intelligence and analysis at scale across the community.

The first version of the AI Incident Sharing website launched in September at https://ai-incidents.mitre.org/.

Report an AI Incident

Case Studies About Attacks Against AI-Enabled Systems Expand Community Knowledge

Since AI-enabled systems are susceptible to both traditional cybersecurity vulnerabilities and new attacks that exploit unique characteristics of AI, we knew mapping these new threats would be a crucial aspect of securing organizations against those unique and emergent attack surfaces. For this release, we identified a swath of new case studies based on real-world attacks or realistic red teaming exercises that are designed to inform organizations about the latest threats to AI-enabled systems. We highlight one case study below and the rest are published as part of ATLAS’s most recent update:

  • ShadowRay AI Infrastructure Data Leak: In late 2023, the Ray software team addressed multiple newly discovered vulnerabilities, including a lack of Authorization in the Jobs API that was not included in security scans because of an ongoing dispute about whether it was a feature or a vulnerability. As a result, unknown attackers were able to use the vulnerability over the span of 7 months to invoke arbitrary jobs on the remote host with access to Ray production clusters, allowing for the theft of sensitive information and unauthorized access to compute power to mine cryptocurrency. User cost to pay for hijacked machines and compute time was estimated at almost $1 billion.

Through our research on these case studies, we added the following new techniques to the ATLAS matrix:

  • Acquire Infrastructure: Domains — Tactic: Resource Development
  • Erode Database Integrity — Tactic: Impact
  • Discover LLM Hallucinations
  • Publish Hallucinated Entities

The Secure AI research participants also collaborated to identify other cutting-edge threats against AI-enabled systems such as:

  • Privacy/Membership Inference Attacks: Organizations need to prepare for membership inference attacks. Booz Allen Hamilton gathered resources that illustrate the existing Exfiltration via ML Inference APIL: Infer Training Data Membership technique. These resources highlight the way that adversaries can infer the membership of a data sample within a model’s training set, raising privacy concerns and risking the leak of private information or intellectual property. Another example involved researchers recovering over 10,000 examples of training data from ChatGPT for only $200 in query cost. The membership inference attack was then used to distinguish which examples were hallucinated and which came from ground truth memorized examples, exfiltrating data that contained personally identifiable information.
  • LLM Behavior Modification: Standard Chartered helped identify the potential for attackers to modify the behavior of large language models (LLMs) whose model weights are accessible and modifiable by attackers. Malicious threat actors can then use known techniques to effectively “kill switch” current alignment methods and gain access to powerful models that are now unaligned. Potential uses for this type of unaligned model include assistance with cyber attacks, biological or chemical weapon production, and human sentiment manipulation.
  • LLM Jailbreaking: Verizon showcased an attacker who repeatedly posts demonstrations of jailbreaking the latest Generative AI models including text and image generators. The X user — Pliny the Prompter — has also released a repository of jailbreaking methods that can be used on frontier models on GitHub and further illustrate the LLM Jailbreak technique.
  • Tensor Steganography: Borrowing on the concept of steganography from cybersecurity, tensor steganography allows an attacker to hide either data or malware within a model. In a recent example, researchers at HiddenLayer hid malware that executed quantum ransomware automatically within ResNet18, an open-source image recognition model.

Relevant Mitigations

Identifying novel vulnerabilities and attack methods is an important first step in improving the security of our AI-enabled systems. However, the increased adoption of AI within existing systems means that the mitigation of those vulnerabilities is critical to organizational success across the AI lifecycle. That’s why the ATLAS knowledge base also includes mitigations as a list of security concepts and classes of technologies that can be used to prevent a (sub)technique from being successfully executed against an AI-enabled system. In addition to the identification of new case studies and attack techniques, we also collaborated on the identification of new mitigations that can help minimize or prevent harms within AI-enabled systems. These included:

  • Generative AI Model Alignment: Utilizing techniques to improve model alignment with safety, security, and content policies can be done when training or fine-tuning a model. This can include using techniques like Supervised Fine-Tuning, Reinforcement Learning from Human or AI Feedback, and Targeted Safety Context Distillation to improve the safety and alignment of the model.
  • Guardrails for Generative AI: Based on ongoing efforts by organizations such as NIST and FS-ISAC, guardrails refer to measures within a GenAI model’s structure that limit the model’s output. These guardrails allow the model to adhere to model objectives, content guidelines, and model safety and security. By defining out-of-bounds and unacceptable behaviors and outputs and using real-time output monitoring, organizations can use these guardrails to ensure that generated outputs stay within scope and fulfill the intended purpose.
  • Guidelines for Generative AI: Guidelines are safety controls placed between user-provided input and a generative AI model to direct the model to produce desired outputs and prevent undesired outputs.
  • AI Bill of Materials: Generating an AI bill of materials (AI BOM) containing information about the raw AI model, sub-AI systems, and greater AI-enabled system components, and delivering it to an end-user will allow for improved detection of vulnerable AI artifacts used to create the target ML model such as pre-trained models and datasets.
  • AI Telemetry Logging: Intel emphasized the importance of this mitigation approach that relies on logging to help collect events related to access of AI models and artifacts, including inference API invocation. Monitoring logs can also help to detect security threats and prevent impact.

ATLAS and ATT&CK Integration

As the ground truth of adversarial TTPs for traditional cybersecurity, MITRE ATT&CK® is already widely adopted within the security community. ATLAS is modeled after and complementary to ATT&CK in order to raise the awareness of rapidly evolving vulnerabilities of AI-enabled systems as they extend beyond cyber. To continue facilitating the improved understanding of these vulnerabilities and how they relate to and differ from TTPs seen within ATT&CK, we have synchronized updates between the two knowledge bases. When ATT&CK releases a new version, ATLAS will update in kind.

The ATLAS STIX data has now been updated to include ATT&CK Enterprise v15.1 and the ATLAS matrix has now been expressed as a STIX 2.1 bundle following the ATT&CK data model. That ATLAS STIX 2.1 data has now been combined with the ATT&CK Enterprise data and can be used as domain data within the ATLAS Navigator.

ATLAS Integrated into ATT&CK Navigator

Get Involved

Our collective research has provided actionable documentation of novel threat vectors for AI-enabled systems, steps for mitigating those novel threats, and improved incident capture for AI incidents. But we’re not done. AI security needs evolve as new AI applications are adopted. We welcome community feedback and additional suggestions for other real-world attacks that can be included in future work as we advance community awareness of threats to AI-enabled systems. There are several ways you can get involved with this and other projects to continue advancing AI security and threat-informed defense:

Contact us at [email protected] for any questions about this and future AI security work.

About the Center for Threat-Informed Defense

The Center is a non-profit, privately funded research and development organization operated by MITRE Engenuity. The Center’s mission is to advance the state of the art and the state of the practice in threat-informed defense globally. Comprised of participant organizations from around the globe with highly sophisticated security teams, the Center builds on MITRE ATT&CK®, an important foundation for threat-informed defense used by security teams and vendors in their enterprise security operations. Because the Center operates for the public good, outputs of its research and development are available publicly and for the benefit of all.

About ATLAS

MITRE Adversarial Threat Landscape for AI Systems (ATLAS™) is a globally accessible, living knowledge base of adversary tactics and techniques based on real-world attack observations and realistic demonstrations from artificial intelligence (AI) red teams and security groups. There are a growing number of vulnerabilities in AI-enabled systems as the incorporation of AI increases the attack surfaces of existing systems beyond those of traditional cyberattacks. We developed ATLAS to raise community awareness and readiness for these unique threats, vulnerabilities, and risks in the broader AI assurance landscape. ATLAS is modeled after the MITRE ATT&CK® framework and its tactics, techniques, and procedures (TTPs) are complementary to those in ATT&CK.

© 2024 MITRE Engenuity, LLC. Approved for Public Release. Document number CT0123

Threat-Informed Defense to Secure AI was originally published in MITRE-Engenuity on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: Threat-Informed Defense to Secure AI | MITRE-Engenuity