As Google's collaborative project known as the Graph for Understanding Artifact Composition (GUAC) starts to gain steam, the firm is bolstering its investment in dependency mapping by supporting a new project on top of GUAC that is geared toward risk modeling.
Dubbed GUAC-ALYTICs, the new project aims to develop an algorithmic engine that will make it easier for software maintainers and practitioners to predict supply chain dependency risks — without full visibility into downstream connections or proprietary code.
Sabine Brunswicker, director of the Research Center for Open Digital Innovation at Purdue University and one of the lead researchers for the project, said one of the biggest issues seen with software supply chain security is the "hiddenness of knowledge."
"Nobody actually knows where things are built, how they move through the supply chain. And that comes from the fact that we are lacking data."
GUAC-ALYTICS will use dependency mapping data provided by GUAC and layer in vulnerability and risk data from MITRE to create network views of supply chain risks within open-source supply chain ecosystems. From there, researchers will apply data-science and machine-learning techniques to develop algorithms that should make it easier for practitioners and industry watchers to predict potential supply chain dependency risks in a range of software ecosystems.
Security industry watchers and engineering practitioners are hopeful that the work on GUAC and GUAC-ALYTICS could improve software supply chain risk management. Here's what your team needs to know.
[ Learn more: Do you trust your software? Why verification matters ]
GUAC-ALYTICS: The next generation of GUAC
Santiago Torres-Arias, assistant professor at Purdue and a longtime software supply chain security advocate, developer, and researcher, is the other lead researcher on the project. Torres-Arias runs in-toto, an open metadata standard framework for supply chain attestation, and Sigstore, a standard for signing, verifying, and protecting software.
"You can think of GUAC as a way to index and reason throughout the information that's on Sigstore. The goal is to collect all the software supply chain information and try to use it to better understand software supply chain threats and vulnerable surfaces."
Then, with GUAC-ALYTICS, GUAC is able to tap into leading-edge data-science, ML, and AI techniques that you can use on the graph to better predict and model software supply chain threats and risks, Torres-Arias said.
The project has started by applying analysis to the Debian supply chain as mapped by GUAC and by adding in data from the MITRE ATT&CK framework, Brunswicker explained.
"We have basically represented that supply chain now for the Debian ecosystem, and we're pretty far. Then from there, we're using network analytics to understand the risk in this supply chain first, with the goal to translate it to other supply chains as well."
Link prediction modeling could help dev teams
Ideally, the use of data science techniques in network analytics will enable the GUAC-ALYTICs team to do link prediction modeling, which will feed a risk prediction model, Brunswicker said.
Once prediction modeling is in place, teams can use it as a forewarning system to also identify that a package is potentially more likely to be attacked, she said.
"We want to support the software engineering process with better models and metrics, which they currently don't have. Even if you do have some of it right now in certain metadata that is used and is published, they don't really provide sufficient insight."
Georgia Weidman, security architect for Zimperium, said that while software teams understand intuitively that they are increasing a package's risk profile with each library added, they don’t always understand just how many dependencies a single library may bring along.
"Providing tools that automatically provide that data in a visual form that is easy to generate and easy to understand will help us make better decisions about when and where we take risks as well as what risks we are actually taking."
Mike Parkin, senior technical engineer at Vulcan Cyber, said the project sounds fascinating and could do a lot to help with supply chain attacks, but he's curious to watch how researchers navigate some of the bigger challenges.
"Untangling the interdependencies in open-source software can be inordinately complex, and it can be even worse for closed-source projects, where the dependencies may be hidden inside proprietary code. Delivering this in a human-understandable graphical form will be a formidable challenge as well."
Parkin said better clarity and context for the software supply chain, as well as improved security, would be valuable for software teams, "if they're able to pull it off."
Melissa Bischoping, director of endpoint security research for Tanium, said she was hopeful about the data science–intensive approach to this problem because it is likely the only one that's going to stick for the scale and complexity of dependency relationships in the supply chain.
"Understanding, prioritizing, and navigating the wildly complex dependencies and interconnected nature of shared libraries and app components is only possible at scale through automation and modeling. While it won’t prevent vulnerabilities themselves, it is certain to aid our efforts in rapidly detecting vulnerable software components — especially those present in open-source software."
The next steps for GUAC-ALYTICS
The GUAC-ALYTICS team is working through significant data science validation work for the algorithms and the engineering of a reliable prediction model.
Torres-Arias said the operational benefits of the work over the long haul will be not only in helping practitioners make better decisions about how they shore up supply chain risks, but also in providing insights to projects such as the Linux Foundation's OpenSSF Securing Critical Projects initiative and Alpha Omega project, which could use the higher-fidelity data to decide where to dedicate the most resources for efforts to fix components that impact a lot of upstream software.