AI needs transparency: How software supply chain security tools can help secure ML models

MalBot · November 9, 2023, 12:05pm

transparency-ai-light-window-slsa-sigstore

Solutions designed to protect the software supply chain can also be used to protect machine learning (ML) models from similar attacks.Two such solutions: The Supply-chain Levels for Software Artifacts (SLSA) framework and Sigstore.

SLSA (known as "salsa") is a security framework — a checklist of standards and controls to prevent tampering, improve integrity, and secure packages and infrastructure. Sigstore is an open-source project focused on improving supply chain security by providing a framework and tooling for securely signing and verifying software artifacts, including release files, container images, binaries, and software bills of materials (SBOMs).

Mihai Maruseac, Sarah Meiklejohn and Mark Lodato argued in a recent post on the Google Security Blog that ML model makers should extend their use of the software supply chain security tools to protect ML supply chains from attack. Using Sigstore, ML model builders can sign a model so anyone using it can be confident it's the exact one the builder, or trainer, created. The team noted:

"Signing models discourages model hub owners from swapping models, addresses the issue of a model hub compromise, and can help prevent users from being tricked into using a bad model."

Meanwhile, SLSA — used to describe how a software artifact is built and implements controls to prevent tampering — can be used to provide information not covered in ML model signing, such as compromised source control or training process, and vulnerability injection. The team wrote:

"Our vision is to include specific ML information in a SLSA provenance file, which would help users spot an undertrained model or one trained on bad data. Upon detecting a vulnerability in an ML framework, users can quickly identify which models need to be retrained, thus reducing costs."

While the tools are a great first step for securing AI applications, they're not a complete solution. Here's what your security team needs to know about using SLSA and Sigstore to secure ML models.

Understand the limits of digital signatures

Digital signatures, when used correctly, can ensure that software, including AI platforms, have not been tampered with, said ReversingLabs Field CISO Matt Rose. But signatures are no panacea.

“The problem is that the data of the AI platform is typically not secured in the same way. You need to worry about the supply chain for the software itself and the data it uses to function.”
—Matt Rose

Steve Wilson, chief product officer for Exabeam, said that by integrating digital signatures into AI development and deployment processes, organizations can significantly enhance the security and trustworthiness of their ML models and the data they are built upon. This, in turn, contributes to the broader goal of ensuring responsible and trustworthy AI systems

“While digital signatures are a powerful tool for enhancing supply chain security, they are not a panacea and come with certain limitations and challenges, particularly in AI and machine learning models."
—Steve Wilson

Wilson cited a number of issues associated with digital signatures and AI, including:

Complexity and overhead. Implementing digital signature systems can add complexity and overhead to the development and deployment processes. This includes the need for secure key management, signature verification mechanisms, and the computational resources required for signing and verifying signatures.
Key management. Securely managing the cryptographic keys used for digital signatures is a non-trivial task. If keys are compromised or mismanaged, the integrity and authenticity assurances of digital signatures could be undermined.
Limited scope. Digital signatures ensure integrity and authenticity but do not address other crucial aspects of supply chain security such as confidentiality, privacy, or availability.
False sense of security. There might be a tendency to over-rely on digital signatures, leading to a false sense of security. Digital signatures can verify that a model or dataset has not been altered, but they cannot verify the intrinsic quality, fairness, or safety of the model or dataset.
Limited efficacy with training and dynamic data. Digital signatures might provide a level of assurance for on-premise foundation models by verifying their integrity and authenticity. However, they are less likely to address the challenges associated with training data or dynamic data, such as the data used for retrieval-augmented generation (RAG). Training data, crucial for building and fine-tuning models, is often vast and dynamic, making it challenging to ensure its integrity and authenticity through digital signatures.

Similarly, dynamic data that is continuously changing or being updated can pose challenges for digital signature verification, as the signatures may become outdated rapidly. This limitation underscores the need for additional mechanisms and strategies to secure the training data and dynamic data that play a critical role in the performance and behavior of AI models, beyond just the verification of static, foundational model artifacts.

Good — but not good enough

Sigstore and SLSA are great for what they were designed to do — which is secure the software supply chain, Rose said. But he said problem was that even if the AI software package itself is not compromised, the data that the AI platform uses may still be tainted.

“These approaches need to be extended beyond just securing the software itself.”
—Matt Rose

Wilson said that the nuanced nature of ML systems brings about a distinct set of challenges and considerations regarding supply chain security. He said the SLSA framework serves as a solid foundation, but adapting it to the unique landscapes of AI and Large Language Models (LLM) requires a deeper level of contemplation and, potentially, the evolution of the framework itself, he explained.

“While SLSA lays a strong groundwork for supply chain security, the distinctive aspects of AI systems call for a tailored approach. This might involve extending SLSA, integrating it with other standards like ML-BOM, and fostering a broader understanding and community engagement to ensure supply chain security in the rapidly evolving landscape of AI and Large Language Models.”
—Steve Wilson

Jeremy Newberry, a cybersecurity architect and strategist with Merlin Cyber, said that SLSA and Sigstore are good starts to the overall requirements, but they don’t answer the question for growing or self-improvement.

“They feel like a legacy approach to a new problem, and I believe a more modular and adaptive approach needs to be taken."
—Jeremy Newberry

True transparency is key

Google’s approach to AI supply chain security is a good first step toward securing ML models, but it's a fundamentally flawed approach, said Merlin Cyber solutions engineer Dean Webb.

“The Foundation Model Transparency Index rated Google’s AI at only 40% transparent, so we need more from the AI vendors than their instructions on how we, the customers, can shoulder the full security load. We need their transparency and cooperation in sharing that security load.”
—Dean Webb

Article Link: How supply chain security tools can protect ML models