Memory-safe languages and security by design: Key insights, lessons learned

MalBot · March 21, 2024, 12:05pm

memory-safe-launguages-secure-by-design For more than 50 years, software engineers have struggled with memory vulnerabilities, but it has only been in recent times that serious efforts have been undertaken to get a handle on the problem. One of the leaders in memory safety, Google, has released a new technical report containing some valuable lessons distilled from its experience tackling the problem.

Google engineers Alex Rebert and Christoph Kern wrote in the report:

"Like others’, Google’s internal vulnerability data and research show that memory safety bugs are widespread and one of the leading causes of vulnerabilities in memory-unsafe codebases. Those vulnerabilities endanger end users, our industry, and the broader society."

The researchers wrote that with the prevalence of memory safety issues, they expect a high assurance that memory safety can only be achieved via a, security by design centered on "comprehensive adoption of languages with rigorous memory safety guarantees."

Here's a deeper look at the issue of memory-safe languages, and what they mean for attaining security by design goals for software — plus the key lessons learned from Google's researchers.

[ See related: How legacy AppSec is holding back Secure by Design | Webinar: Secure by Design: Why Trust Matters for Software Risk Management ]

The memory-safety problem turns 50

Michael J. Mehlberg, CEO of Dark Sky Technology, said memory-safety issues in software have persisted for more than 50 years, primarily due to the widespread use of languages such as C and C++, which do not enforce memory safety. He said that while more secure programming languages have been available for decades, less-safe languages have been popular with development teams because of their performance, applicability, and extensibility.

"[C and C++] give programmers low-level control, which is great for performance and flexibility but can lead to errors such as buffer overflows and use-after-free vulnerabilities."
—Michael J. Mehlberg

Dave (Jing) Tian, an assistant professor in the computer science department at Purdue University, explained that the problem is a legacy issue from a different era in computing.

"Most existing system infrastructure software is written in C/C++ because they were probably the best choice in the old days in terms of development efficiency and low-performance overhead."
—Dave (Jing) Tian

Joel Marcey, director of technology at the Rust Foundation, which stewards the memory-safe language Rust, said legacy languages prioritized performance and maximum flexibility, including the ability to control how memory is allocated directly.

"There is a staggering amount of code written in languages that don’t provide memory safety guarantees in use today."
—Joel Marcey

When performance trumped security

Over the years, the focus in developing new languages has been on abstractions for improved functionality or performance, not safety, explained Mark Sherman, director for cybersecurity foundations in the CERT Division at Carnegie Mellon University's Software Engineering Institute.

"Particularly with performance, the overheads for many technical approaches were considered unacceptable decades ago when CPU and memory resources were more expensive than today. Similarly, the educational focus has been on how to introduce and maintain functionality or performance, and not on safety."
—Mark Sherman

However, as advantageous as memory-safe languages can be, they may not be appropriate in all situations. And they may not always be possible to use, said Antonio Bianchi, an assistant professor in the computer science department at Purdue University.

"For instance, low-level system code — such as the code used by device drivers or kernel code or code that has strict performance requirements — typically needs to be written in C or C++."
—Antonio Bianchi

Memory-safe practices have been making progress, said Jeff Williams, CTO and co-founder of Contrast Security. Memory safety has improved dramatically in the last 20 years with the advent of address space layout randomization (ASLR) and data execution prevention (DEP), Williams said. ASLR randomly shifts the memory layout of key components within a process, making it difficult for attackers to predict the memory addresses where those components reside. DEP marks specific memory locations as non-executable so malicious code can't be executed in those areas.

But Williams warned they are only good if development teams use them.

"[Memory safety] is still a problem because it's way too easy for developers to write code that doesn't handle memory properly."
—Jeff Williams

Lessons learned from the Google research team

In their report, Google researchers Rebert and Kern discuss Google's experience dealing with memory safety. Here are some key takeaways, with expert insights for deeper understanding.

Don't expect C++ to become a more memory-safe language

"[W]e see no realistic path for an evolution of C++ into a language with rigorous memory safety guarantees that include temporal safety," wrote Rebert and Kern. "At the same time," the Google researchers continued, "a large-scale rewrite of existing C++ code into a different, memory-safe language appears very difficult and will likely remain impractical."

Williams said he agreed that significant changes would be needed to make C++ memory-safe. "Why not just move to a newer, better language like Rust?" he asked.

Chris Romeo, CEO of the threat modeling firm Devici, said that with the evolution of Go and Rust, dependence on C++ will "drift away over time and become nothing."

"We have had the past 20-plus years to evolve C and C++ to eliminate memory challenges, so I don’t see that happening in the future."
—Chris Romeo

Purdue's Tian said he believes all is not lost for C++, however. "C++20 has already introduced a superset of features that tackle the problems of memory safety," he said.

C++ is going through a "subset" phase, he explained, picking up the most useful features and evolving the legacy code into memory safety, mostly when it's compiled. "In my opinion, this is a realistic path for C++," he said.

"Meanwhile, even memory-safe languages sometimes cannot provide a rigorous memory safety guarantee. For instance, Java and C# still have type safety problems, but we still think they are memory-safe."
—Dave (Jing) Tian

Dark Sky's Mehlberg disagreed, saying C++ was designed for system-level programming with direct memory access and manual memory management. "Adding strict memory safety would fundamentally change the nature of the language and could break compatibility with existing C++ codebases, making it impractical for a massive number of legacy applications," he said.

Memory-safety bugs can be prevented through safe coding practices

Classes of problems can be engineered away at scale by eliminating the use of vulnerability-prone coding constructs, Rebert and Kern explained. That can be done with something Google calls safe coding, which disallows unsafe constructs by default and replaces them with safe abstractions, with carefully reviewed exceptions.

Devici's Romeo said the secure by default aspect of CISA's Secure by Design initiative may point the way to a more secure future.

"Disallowing unsafe constructs results in a more secure product or application. We will experience pain in the short term in developers lamenting their loss of freedom and creativity, and gain in the long run with fewer vulnerabilities."
—Chris Romeo

Secure coding as early as possible in the development cycle is a strategic imperative, said Carnegie Mellon's Sherman. "The easier it is for programmers to avoid mistakes — like disallowing unsafe constructs by default — the more likely one can achieve safer code," he said.

Bianchi said that if unsafe constructs need to be used, "the developer will have to explicitly mention it. Hence, a security review of the code can primarily focus on those code locations that have been marked as potentially unsafe."

Mehlberg praised safe coding practices as "a big step in the right direction" and explained that by setting safe defaults, programmers will be less likely to introduce vulnerabilities into the code inadvertently.

"This might take some time, as it would be a cultural shift and require training and education that might be met with resistance. That said, there is plenty that can be done to produce secure code by default through the tools programmers use, automated security checks, and dependency management. The use of safe languages will help that considerably."
—Michael J. Mehlberg

While safe coding is a good policy, Contrast Security's Williams said it's just one small piece of a much larger puzzle.

"Disallowing unsafe constructs sounds great, but it only really covers a small number of vulnerabilities. Most vulnerabilities aren't solved by simply eliminating dangerous functions. Developers often require powerful functions, and they need help from runtime security tools and training to ensure that they use them safely."
—Jeff Williams

Memory safety issues are rare in languages with garbage collection

Rebert and Kern wrote that In the memory-safety domain, the safe coding approach is embodied by safe languages, which replace unsafe constructs with safe abstractions such as runtime bounds checks, "garbage-collected" references, or references adorned with statically checked lifetime annotations. "Experience shows that memory safety issues are indeed rare in safe, garbage-collected languages such as Go and Java," they added.

Mehlberg explained that garbage-collected languages such as Go and Java have fewer memory safety issues because they automate the allocation, deallocation, and bounds-checking of memory. "This significantly reduces the risk of inadvertent programming errors such as memory leaks, use-after-free vulnerabilities, and buffer overflows," he said.

The Rust Foundation's Marcey, however, pointed out that a language need not have garbage collection to be memory safe.

"Rust is a highly memory-secure language and doesn’t have a garbage collector. In essence, with the language managing memory, a developer can be assured that they won’t make those types of memory mistakes since the garbage collector or compiler will guarantee it. Of course, this does not mean that the garbage collector or compiler is mistake free."
—Joel Marcey

Sandboxing is an effective mitigation for memory-safety vulnerabilities

Researchers Rebert and Kern noted that Google commonly uses sandboxing to isolate brittle libraries with a history of vulnerabilities.

Mehlberg said sandboxing is a viable solution for some applications, but it comes with costs and challenges.

"Performance overhead makes it untenable for high-performance or real-time applications. Increased complexity in application deployment can cause developers to think twice. Furthermore, there may be compatibility issues that have to be addressed along with a deep understanding of the application's behavior in the environment in which it will run to ensure that it adheres to security requirements."
—Michael J. Mehlberg

Bianchi said that for sandboxing to be effective, it must be implemented safely and efficiently. That requires hardware and software functionality such as specific CPU features or primitives offered by the operating system, he explained. "In some systems, such as embedded systems or legacy devices, this functionality may not be available."

Tian added that sandboxing uses compiler techniques, such as software fault isolation (SFI) and control-flow integrity (CFI), which impose non-negligible performance overhead. However, he said "the real benefit of these solutions is still hard to quantify."

"Of course, we know it would be harder for attackers to launch exploitations, but how much harder exactly?"
—Dave (Jing) Tian

Bug finding is an essential part of memory safety, but it does not itself improve security

Static analysis and fuzzing are effective tools for detecting memory safety bugs, Rebert and Kern wrote. They reduce the volume of memory-safety bugs in a codebase as developers fix the detected issues. "However, in our experience, bug finding alone does not achieve an acceptable level of assurance for memory-unsafe languages," the researchers wrote.

And finding bugs does not in itself improve security, they continued. The bugs must be fixed and the patches deployed. There is evidence suggesting that bug-finding capabilities are outpacing bug-fixing capacity, said Carnegie Mellon's Sherman.

"Finding a bug can tell you that something is wrong. It does not tell you what the desired behavior should be, how to fix the defect while still resulting in the desired behavior, and often does not tell you the location of the root cause of the error."
—Mark Sherman

Those latter activities are more complex and take more developer time than the bug-finding activities, Sherman said. And often, the act of fixing a defect may accidentally introduce a new defect. "The focus should be on reducing the number of bugs introduced to reduce the total effort of both finding and fixing," Sherman said.

Devici's Romeo said the application security industry projects an unhealthy viewpoint that "bug hunting is the excellent and sexy part of cybersecurity."

"Ask any cybersecurity student what they want to do after graduation, and most will answer, 'Break stuff.' We must refocus on building better applications and products and take the spotlight off breaking things."
—Chris Romeo

Memory safety at scale requires a language to prohibit unsafe constructs by default

To achieve a high degree of assurance that a codebase is free of vulnerabilities, the Google researchers said, they have found it necessary to adopt a model where unsafe constructs are used only by exception, enforced by the compiler.

Unsafe constructs should cause a compile-time error unless a portion of code is explicitly dumped into an unsafe subset, they wrote.

Sherman explained that Google's solution to the problem is to intensely scrutinize the code in the unsafe subset.

"In principle, one could argue that if the same intense scrutiny were applied to the current code, one could achieve better security. However, the cost of such an analysis is prohibitive, hence the focus on containing the unsafe code to as limited a construct as possible."
—Mark Sherman

Mehlberg said that unsafe constructs pose a threat to memory safety because they allow for the direct manipulation of memory, which can easily lead to unintentional errors, such as buffer overflows and use-after-free vulnerabilities.

"In languages like Rust, unsafe sections of code are explicitly called out and provide guidelines for greater scrutiny. In languages like C and C++, the entire program must be treated as unsafe, exposing potentially any line of code to possible exploit."
—Michael J. Mehlberg

The need for security by design

After 50 years, memory-safety bugs remain some of the most stubborn and dangerous software weaknesses, Google researchers Rebert and Kern wrote. As one of the leading causes of vulnerabilities, they continue to result in significant security risk. "It has become increasingly clear that memory safety is a necessary property of safe software. Consequently, we expect the industry to accelerate the ongoing shift towards memory safety in the coming decade," they wrote.

The researchers wrote that security by design is required for a high assurance of memory safety, "which requires adoption of languages with rigorous memory safety guarantees."

"Given the long timeline involved in a transition to memory-safety languages, it is also necessary to improve the safety of existing C and C++ codebases to the extent possible, through the elimination of vulnerability classes."

Article Link: Memory-safe languages and security by design: Key insights, lessons learned