BloreBank ChatBot – Introducing our Prompt Injection Game

MalBot · February 29, 2024, 12:05pm

BloreBank Chatbot is a prompt injection game where you try to trick the AI into giving away sensitive information. With 10 levels, each one adds new safeguards against these tricks, making it tougher to get information you’re not supposed to. This game, inspired by Lakera’s Gandalf, has been adapted to more accurately simulate real-world cybersecurity contexts. The backend AI draws from actual scenarios encountered in the field. Are you up to the challenge of overcoming all 10 levels and outsmarting the AI? Give it a go today!

How to play

In each level of the game, you’re given a scenario and an objective. The scenario offers clues about the security measures in place to protect sensitive information. Your objective outlines the specific information you need to uncover. To advance to the next level, find this information and submit your answer. Interact with the AI through BloreBank’s Chatbot to submit your queries.

Disclaimer: BloreBank is a fictional bank and is not referencing any real company. All client and employee data was randomly generated.

Design

The game’s backend design is straightforward. It begins with the user’s input, which is fed into the model. The model then processes this input and generates a response, which is delivered back to the user.

The system prompt provides details about the chatbot’s function and all necessary information regarding the company. The user prompt is the message you send to the chatbot.

Safeguards

Although we don’t want to spoil the game, we can offer a glimpse into the safeguards it features.

Prompt-level safeguards are used to direct the language model not to disclose any sensitive information or respond inappropriately. It also alerts the model that the user might try to deceive it into releasing data it shouldn’t. This type of security measure is becoming increasingly common in large language models (LLMs), but it’s considered the least robust form of protection.

What if, instead of warning the model about possible attempts by users to access sensitive data, we scrutinize the input for signs of injection attempts? We could examine user inputs for typical injection keywords, sensitive information, or any terms we consider risky. Moreover, we could run the user input through a model that’s specially trained to identify prompt injections.

We can use similar strategies for monitoring the output as well. By understanding what constitutes sensitive data, we can employ various fuzzy search methods to spot any such information in the model’s responses. But what if a user attempts to disguise the data by encoding or translating it? In that case, we can rely on a model that’s been fine-tuned to recognize these kinds of evasion tactics.

What if we mix all these methods together? Each level in the game incorporates a blend of these techniques. We hope these clues help you navigate through the game. If you manage to complete it, drop me a message on LinkedIn with your prompts. I’m eager to see the diverse approaches people come up with!

The main aim of this game is to raise awareness about AI system security. The safeguards mentioned here are just a few examples; they don’t cover everything. As attackers devise new strategies, we’ll need to develop fresh defensive measures to keep AI chatbots safe.

The post BloreBank ChatBot – Introducing our Prompt Injection Game appeared first on LRQA Nettitude Labs.

Article Link: BloreBank ChatBot - Introducing our Prompt Injection Game - LRQA Nettitude Labs