Ideas in Action: An Interview with Michael Pfaffenberger — MLOps and LLMs

Ideas in Action: An Interview with Michael Pfaffenberger — MLOps and LLMs

Authored by Mike Pfaffenberger, Jiazhen Zhu

Photo credit: Jiazhen Zhu

In the TEDx Talk series, Ideas in Action, we have the opportunity to engage with experts and acquire valuable insights and ideas from them. In the first installment of this series, we will be introduced to Michael Pfaffenberger.

Can you give a short introduction of yourself and your role at Walmart Global Tech?

Hi, I’m Michael Pfaffenberger and I joined Walmart Global Tech in 2021 as a principal data scientist, mainly working on machine learning engineering, AI/ML governance, and fostering technological innovation. Recently, I transitioned to a new role within Walmart’s Global Investigations Technology division.

What has been your career journey leading up to your current position?

As a teenager, I learned coding due to my curiosity about computer graphics and a fascination with the mathematics behind 3D computer games. I pursued a computer science major in college, focusing on computer graphics and taking electives in artificial intelligence. Later, I briefly worked in a biomedical signal and image processing lab at the graduate level, learning fundamental signal processing techniques and applying my skills in real-world applications.

In 2013, I joined a small computer research firm in Charlottesville, Virginia, as a data scientist, working primarily on computer vision applications in geospatial domains. I then shifted my focus to natural language processing (NLP) with the emergence of transformer architectures. Over time, my career has evolved from focusing on the “science” to concentrating on the “engineering” of building ML and AI-driven systems.

When implementing your models in real-world situations, particularly for LLMs, what are the main concerns in making decisions through the model?

In descending order:
a) Human in the loop: Human validation and oversight are crucial for meeting privacy, ethical, bias, and fairness requirements.
b) Cost of inference: LLMs can be expensive to run at web-scale, so this is a significant consideration.
c) Data quality and availability assurance: Viewing AI/ML systems from a supply chain perspective, it’s essential to consider the long-term availability and trustworthiness of the data.

How do you decide between using a custom in-house language model and a GPT version language model?

For LLM-specific applications, I recommend starting with the best available model for your task, which is often an OpenAI model like GPT-4. While this may be more expensive initially, the primary goal should be to bring your GenAI solution to market quickly. You can later replace it with a fine-tuned Llama2 to save on inference costs.

Regarding ML and AI platforms, Walmart chooses what’s best for the company, but other organizations may prefer commercial-off-the-shelf (COTS) solutions if they have a successful track record of integrating them.

What do you think is the importance of MLOps in the world of LLMs?

The challenging problems in the GenAI space which need to be addressed by MLOps are both the management and quantification of risk as well as the reconciliation of traditional AI / ML model lifecycles against the unbridled proliferation of GenAI.

As our industry’s legal and regulatory frameworks related to GenAI are still maturing, we need to engineer flexible MLOps solutions to adapt to new governance requirements.

The rate of innovation in GenAI itself further complicates things as rapid innovation means our MLOps platforms that exist today may become obsolete compared to what is available at the bleeding edge.

What job-related challenges in the LLM world keep you awake at night?

What keeps me awake is knowing that large companies necessarily must move slower than small startups. For example, most large companies will access GenAI endpoints through a corporate gateway — a requirement to tabulate GenAI usage by cost center. In such an example, feature teams at large companies have an additional lag-time behind the rest of the industry waiting on their gateway team to open newly available GenAI features.

A concrete example of this is that GPT-4-Turbo became available through Azure on November 17th, 2023 while it was released in a preview mode on November 6th for direct OpenAI customers, and yet another short waiting period will affect feature teams while they await the gateway team to wire things up. It may not sound like much, a few weeks at most, but speed to market is massively important in this GenAI arms race.

We absolutely must find a way to move fast while also remaining compliant and ensuring that our customers are safe.

In which areas of machine learning do you foresee investments being made over the next few years?

GenAI makes advanced NLP modeling accessible even to engineers with no previous experience in NLP. I believe the original transformer era (BERT) is likely over. The same patterns are beginning to emerge in computer vision domains, and I expect GenAI to break into other areas like time-series forecasting within the next 1–5 years, driven by research firms, academia, and large corporations like Walmart.

Are there any ML or AI aspects, be it risks or opportunities, that people should discuss more and be aware of?

Retrieval Augmented Generation (RAG) is currently a popular technique that produces impressive results. However, perfecting RAG-based question answering systems can be challenging since the generative results’ quality depends on the retrieval step, and relevancy tuning can be deceptively difficult.

Can you share any real-world examples where MLOps has significantly improved LLM performance or efficiency?

Certainly. In the context of RAG, focusing on the retrieval step, you can collect end-user feedback (e.g., thumbs up or thumbs down) and train a learning to rank model on it. Better retrieval results in better question-answering outcomes.

What are the best practices for continuous model improvement and deployment in the LLM ecosystem?

LoRA (low rank adapters) is one of my favorite new techniques in the field of deep learning. Its quantized variant (QLoRA) allows LLMs to be fine-tuned on consumer grade hardware and can yield close to state-of-the-art results.

For all I know, the field is moving so quickly that there is already a better method of fine-tuning, but I think QLoRA is an incredibly powerful and cheap way to iteratively improve LLMs.

However, while QLoRA is powerful, if you can accomplish your modeling task without fine-tuning, I would highly recommend doing so.

Prioritize prompt-engineering over fine-tuning and resort to fine-tuning when prompt-engineering alone doesn’t live up to performance expectations.

If you’re a Walmart Associate looking for additional details, stay updated by following us. Be the first to get notified about our latest TEDx Talks by contacting us at [email protected].

Ideas in Action: An Interview with Michael Pfaffenberger — MLOps and LLMs was originally published in Walmart Global Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: Ideas in Action: An Interview with Michael Pfaffenberger — MLOps and LLMs | by Jiazhen Zhu | Walmart Global Tech Blog | Nov, 2023 | Medium