Introducing DarkBERT: combating cyber crimes at scale with AI

Author: Changhoon Yoon | Co-founder | Head of R&D

DarkBERT: A Language Model for the Dark Side of the Internet

S2W has been proudly serving a special-purpose data intelligence platform as a service to public and private entities dealing with a variety of threats ranging from traditional crimes such as drug trafficking to high tech crimes involving cryptocurrency and ransomware. Our clients have been taking advantage of our platform that is loaded with a wide range of data gathered from purposely hidden data sources to derive actionable intelligence.

This figure rougly illustrates how we have made our data collected from various data sources readily available on our analysis platform. Every component depicted in the figure is vital in establishing a complete threat intelligence production process, and thus they all have been developed and maintained in-house. Operating such a system on our own, we have been able to self-produce, directly deliver fresh intelligence to our clients (farm direct delivery!).

Having such a scalable data acquisition platform, we have been dealing with a massive amount of data everyday. For all incoming data, our threat data warehouse applies a variety of threat analysis workflows to extract, relate and enrich information. For example, one of the workflows analyze all the web data and classifies them into clusters of similar websites. This new type of information generated by the workflow unlocks various features, such as advanced search filters, phishing website detection (The Web Conf 2019), etc. There is also another workflow enriches extracted information with cryptocurrency metadata and transactions so that our clients can correlate and investigate cyber crimes involving cryptocurrency transfer (NDSS 2019).

But what if a task in a workflow involves threat analysts’ domain knowledge or insights to complete? Can we automate such tasks and eliminate scalability concerns? This is where AI comes into play.

DarkBERT and beyond

Since our company’s establishment, we have been putting a lot of efforts into this special-purpose AI research, and DarkBERT is our second AI/NLP research paper that have been accepted for publication in top-tier conferences. (FYI, the first paper introduced our threat topic classification model, which has been deployed to our cyber crime investigation platform, XARVIS)

Unlike popular general-purpose AI models (e.g., chatGPT) available today, DarkBERT is a special purpose AI model that is specifically trained and tuned to work well in the field of cyber crime. DarkBERT is more like a professional analyst with years of experience in cyber crime rather than Mr. Ask-Me-Anything. Having such special agents in our company will significantly improve the quality of the intelligence that we produce as well as the overall productivity and efficiency. Since the performance and the effectiveness of DarkBERT has been officially peer-reviewed, it is now just a matter of how we take advantage of it.

We have already started developing various applications based on DarkBERT, and we would like to introduce one of the use cases.

DarkBERT for Threat Detection

Threat detection is the most vital process in the field of cyber crime and cyber threat intelligence, as the earlier threat detection can occur, the quicker and organization can response to minimize the damage. Despite understanding its importance, many organizations struggle with threat detection, because it is fundamentally a complex and challenging task. Cybercriminals are the earliest of adaptors eagerly taking advantage of new technologies to ultimately achieve their goals without being noticed, hence it must be difficult to quickly identify and respond to their new tactics or techniques.

Our threat analysis platform enables threat detection by solving a vast majority of problems as stated above, but we were able to observe that the organizations using our platform can make the best out of it if they have their own team of threat analysts and they had the following concerns:

  • Can small businesses without their own security team still perform threat detection?
  • Should businesses hire more and more threat analysts as they expand and face a great volume and complexity of security threats?

DarkBERT, as previously mentioned, could be the core AI engine to build a well-trained cybercrime analysts. If such an AI analyst could understand and analyze the information extracted by our threat analysis platform to assist or even replace human analysts, we could eliminate the scalability and cost concerns.

We are currently building and testing a such an AI cybercrime analyst that autonomously analyzes data to answer the following:

  • Does the data contain any information that might pose a threat?
  • What type of threat is it?
  • If so, who is the potential target / victim?
  • Who is the actor?
  • Which piece of information is more important than the other?

The following figure is a screenshot of the DarkBERT application specifically built to deal with data breaches. It is designed to answer the questions above and recommend urgent data breaches that users should be aware of. In this example, we have configured the application to detect the data breaches of the United States’ public sectors.

Besides the use case briefly introduced in this blog post, we also have several research projects on-going that might be interested. An undercover chatbot that can communicate with cybercriminals to perform active investigation on behalf of human agents, cybercrime inference engines that can probabilistically predict the potential threats, etc.

Please stay tuned for updates!

Introducing DarkBERT: combating cyber crimes at scale with AI was originally published in S2W BLOG on Medium, where people are continuing the conversation by highlighting and responding to this story.

Article Link: Introducing DarkBERT: combating cyber crimes at scale with AI | by S2W | S2W BLOG | Jun, 2023 | Medium