Power of AI: Dark Web Monitoring with ChatGPT

MalBot · February 21, 2024, 12:05pm

Power of AI: Dark Web Monitoring with ChatGPT

The dark web, often shrouded in mystery and intrigue, is a part of the internet that remains hidden from conventional search engines and browsers. It’s a realm where anonymity is paramount and various illegal activities, from drug trafficking to cybercrime, find a haven. With the rapid advancements in artificial intelligence, tools like ChatGPT by OpenAI have emerged, offering a unique opportunity to monitor and analyze these covert activities, thereby aiding organizations in staying ahead of potential threats and vulnerabilities.

But not only the security community, also the threat actors may use these technologies. Emergence of a ChatGPT-style AI tool known as WormGPT, which was marketed on cybercrime forums on the dark web. Unlike ChatGPT, which has built-in protections against misuse, WormGPT is designed with “no ethical boundaries or limitations,” making it a potent tool for hackers. This tool, described as a “sophisticated AI model,” allegedly may produce human-like text that can be weaponized in hacking campaigns. Such advancements underscored the double-edged nature of AI while tools like ChatGPT can be used for beneficial purposes, their counterparts, like WormGPT, can be exploited for malicious intents.

Europol, the European Union’s law enforcement agency, has also raised alarms about the potential misuse of Large Language Models (LLMs) like ChatGPT. Their report emphasized that cybercriminals could exploit these models to commit fraud, impersonation, or even sophisticated social engineering attacks. The ability of ChatGPT to draft highly authentic texts makes it an invaluable tool for phishing. Historically, phishing scams were often detectable due to glaring grammatical and spelling errors. However, with the advent of AI tools, even individuals with a basic grasp of the English language can now impersonate organizations or individuals in a highly realistic manner.

As we delve deeper into this writing, we’ll explore how ChatGPT can be harnessed positively to monitor and analyze activities on the dark web, the emergence of tools like DarkGPT, and the broader implications of AI in the dark Web. The journey ahead will provide insights into the potential, challenges, and ethical considerations surrounding using AI in dark web monitoring.

Potential Role of ChatGPT in Dark Web Monitoring

While notorious for its illicit activities, the dark web is also a treasure trove of information for cybersecurity professionals and threat hunters. This hidden realm of the internet, inaccessible by traditional search engines, requires specialized software like the TOR browser to navigate.

Many established cybersecurity firms, such as SOCRadar offer Dark Web Monitoring as part of their threat intelligence services. However, the evolution of AI and language models has opened up new possibilities in this domain. ChatGPT’s capabilities extend far beyond its initial purpose. Its prowess in many areas that can also benefit dark web analysis.

Using ChatGPT for dark web offers several advantages:

Efficient and Timely: Strongly possible that ChatGPT can swiftly analyze vast data, producing coherent and realistic reports. This rapid analysis ensures timely detection of potential threats or illicit activities.

Cost-Effective: Building a monitoring tool using ChatGPT can be a cost-effective alternative for a brief time or for very specifying purposes to purchasing monitoring services. It offers a DIY (Do It Yourself) approach for organizations and individuals with technical know-how.

A simple understanding of coding, network technologies, and familiarity with LLM’s and AI’s is mostly enough to start.

Flexibility: Given its adaptability capabilities, ChatGPT can be tailored to specific monitoring needs, ensuring that the tool remains relevant and adaptive to the ever-changing landscape of the dark web.

Reduced Labor Intensity: Manual monitoring can be labor-intensive and time-consuming. Automating this process with ChatGPT reduces the manual effort required, allowing professionals to focus on analysis and response rather than data collection.

If these interest you, we will discuss use cases in the following headings. However, before that, we will take a look at topics such as other large language models and hacker tools that mimic ChatGPT.

Another LLM, DarkBERT

The proliferation of Large Language Models (LLMs) like ChatGPT has led to a surge in applications that harness the power of AI. While ChatGPT has been recognized for its potential in various fields, it’s also been identified as a tool that can be used to create advanced malware. As the AI landscape evolves, specialized models are emerging, each tailored to a specific purpose and trained on distinct datasets.

DarkBERT’s logo

DarkBERT was a relatively new application trained on data sourced directly from the dark web. South Korean researchers developed this model based on the RoBERTa architecture, an AI approach that gained traction in 2019. Interestingly, while RoBERTa was initially believed to have reached its peak performance, recent findings suggest that it was undertrained and had more potential than previously thought.

To create DarkBERT, researchers crawled the dark web using the TOR network’s anonymizing firewall. They then processed this raw data, applying techniques like deduplication, category balancing, and pre-processing to curate a comprehensive Dark Web database. This database was then used to train the RoBERTa Large Language Model, resulting in DarkBERT. The model’s primary function is to analyze new dark web content, which often includes coded messages and unique dialects, and extract valuable information.

While it’s a misconception that English is the predominant language of the dark web, the dialect used is distinct enough to warrant a specialized LLM. The researchers’ efforts bore fruit, as DarkBERT was shown to outperform other large language models. This enhanced performance could be a boon for security researchers and law enforcement agencies, allowing them to delve deeper into the hidden corners of the web.

However, the journey for DarkBERT doesn’t end here. Like other LLMs, it will undergo further training and refinement to improve its results. Its future applications and the insights it can provide remain a topic of intrigue. In the broader context, the emergence of models like DarkBERT once again underscores the dual nature of AI but also the possibility to scrape and analyze dark web.

Dark Versions of ChatGPT

The dark web, unindexed by standard search engines and accessible only through specialized software, is a hotbed for cybercriminal activities. Here, the sale of weapons, distribution of illegal drugs, and even child pornography find a haven. The environment is ripe for creating a dark version of ChatGPT, a prospect far from improbable. The model’s ability to produce realistic text can be weaponized by cybercriminals to automate their operations, making them more efficient and effective. Imagine a chatbot impersonating a human to execute phishing scams, social engineering tactics, or other deceptive practices.

Furthermore, an unfiltered version of ChatGPT could be programmed to churn out content that is not only inappropriate but also illegal or immoral. This includes pornography, hate speech, or extremist propaganda via using AI generated images or sounds. If disseminated widely, such content could pose significant harm to vulnerable individuals or groups in society.

Jailbreaking

The term “jailbreaking” has emerged in online communities, referring to circumventing ChatGPT’s ethical rules. This process involves manipulating the model to perform tasks it was designed to avoid, such as generating malware or participating in other illicit activities. This term mostly involves strategically querying chatbots like ChatGPT to make them break their programmed rules. It can be described as a brute force method where hackers continuously change their prompts, trying different approaches to get the desired output. This collaborative effort has given rise to online communities where members share techniques and insights to “jailbreak” ChatGPT.

Evil Twins

However, the genuine concern lies in creating malicious Large Language Models (LLMs). WormGPT, for instance, emerged as a black-hat alternative to GPT models. Designed explicitly for malicious activities like Business Email Compromise (BEC), malware creation, and phishing attacks, WormGPT is marketed on underground forums with the tagline “like ChatGPT but [with] no ethical boundaries or limitations.” This model, trained on data related to cyberattacks, allows hackers to execute large-scale attacks at minimal costs, targeting their victims more precisely than ever before.

Following WormGPT, other products like FraudGPT and EvilGPT have surfaced in the cyber-underground. These models, often built upon open-source platforms like OpenAI’s OpenGPT, are customized by hackers, repackaged, and rebranded with ominous names. While they might not pose an immediate threat to businesses, their proliferation indicates a shift in the landscape of cyber threats.

EvilGPT is marketed on hacker forums as an alternative to WormGPT.

Experts suggest that traditional training might not suffice to counter these next-gen AI cyberweapons and automatization. These attacks’ specificity and targeted nature require a more robust defense mechanism. To defend against AI threats, one must employ AI-based protections.

ChatGPT for Dark Web Analysis and Monitoring

We talked about the various uses of ChatGPT and similar AI assistants and LLMs, and it is obvious how they are particularly relevant to cybersecurity and cybercrime. Now, let’s have an idea about how a security researcher or employee can use ChatGPT on the dark web.

We will create use cases that work for the given web pages. However, the examples given are designed to guide you on what you can do rather than being necessarily working tools.

Keyword Searching Tool

Here’s a basic Python script created by ChatGPT that simulates the process of searching for certain words on the web. As a side note, a similar approach may be used for credentials and data leak searches too.

import requests
from bs4 import BeautifulSoup
def search_keyword_on_websites(keyword, websites):
    “”"
    Search for a keyword on a list of websites.
    Parameters:
    - keyword (str): The keyword to search for.
    - websites (list): A list of websites to search on.
    Returns:
    - dict: A dictionary with websites as keys and the number of occurrences of the keyword as values.
    “”"
    results = {}
    for website in websites:
        try:
            response = requests.get(website)
            response.raise_for_status()  # Raise an exception for HTTP errors
            soup = BeautifulSoup(response.text, ‘html.parser’)
            text = soup.get_text()
            count = text.lower().count(keyword.lower())
            results[website] = count
        except requests.RequestException as e:
            print(f"Error fetching {website}: {e}")
            results[website] = “Error”
    return results

List of websites to search on
websites = [
    “https://www.example.com”,
    “https://www.example2.com”,
    # Add more websites as needed
]

Keyword to search for
keyword = “your_keyword_here”

Execute the search
search_results = search_keyword_on_websites(keyword, websites)

Print the results
for website, count in search_results.items():
    print(f"{website}: {count} occurrences of ‘{keyword}’")

Hacker Forum and Marketplace Monitoring

Again, you can regularly search for keywords and titles with a Python script as shown above and we can use this script for hacker forum monitoring. An illustration of this is connecting to a forum and forwarding the results of a web scraping script to ChatGPT for analysis. Moreover, we may want it to produce consumable output like JSON format.

An example prompt should look like this:

I have a JSON template such as:
{
  “title”: “{TITLE}”,
  “publishedDate”: “{PUBLISHED_DATE}”,
  “sourceURL”: “{SOURCE_URL}”,
  “tags”: [
    {
      “tagName”: “{TAG_NAME}”,
      “entityType”: “{ENTITY_TYPE}”
    }
  ],
  “isCyberThreatIntelligenceRelevant”: “{BOOLEAN}”,
  “brief”: “{BRIEF}”
}

Could you replace the placeholders with the scraped data?

Alerts and Notifications

In the scenario where we manage to search for keywords in these forums, we can use ChatGPT to set up a system that sends alerts to our email to make this more effective. However, we will need an API key to make this fully automated.

An example script is:

import openai
import requests
import smtplib
from email.message import EmailMessage

Initialize OpenAI API (replace ‘YOUR_API_KEY’ with your actual key)
openai.api_key = ‘YOUR_API_KEY’
def monitor_content_for_keywords(content, keywords):
    “”"
    Use ChatGPT to monitor content for specific keywords.
    “”"
    prompt = f"Check the following content for these keywords: {', '.join(keywords)}.nn{content}"
    response = openai.Completion.create(engine=“davinci”, prompt=prompt, max_tokens=150)
    return response.choices[0].text.strip()
def send_email_alert(subject, body):
    “”"
    Send an email alert.
    “”"
    msg = EmailMessage()
    msg.set_content(body)
    msg[‘Subject’] = subject
    msg[‘From’] = ‘[email protected]’
    msg[‘To’] = ‘[email protected]’
    # Establish a connection to the email server (e.g., Gmail)
    with smtplib.SMTP_SSL(‘smtp.example.com’, 465) as server:
        server.login(‘[email protected]’, ‘your_password’)
        server.send_message(msg)

Fetch content from the data source (e.g., a forum)
response = requests.get(‘https://example.com/forum’)
content = response.text

Keywords to monitor
keywords = [‘threat’, ‘hack’, ‘vulnerability’]

Check content for keywords using ChatGPT
result = monitor_content_for_keywords(content, keywords)

If a result is not empty, send an email alert
if result:
    send_email_alert(‘Keyword Alert’, f"The following keywords were detected: {result}")

Remember, while ChatGPT is powerful, it’s not infallible. Always verify alerts manually, especially if they’re used in critical decision-making processes.

Davinci Engine in the script above is the language model used by OpenAI.

When using the OpenAI API, especially with the “openai.Completion.create method”, setting variable temperature should be low (e.g., 0.2 or 0.1); this way, the output will be more deterministic and focused.

Pattern Recognition

If we succeed in the above steps, we can again create various tools to go one step further and examine the threats. One such is pattern recognition to automatically detect real threats. Suppose you have a collection of scripts in textual format that you scraped from hacker forums, and you want to identify if any of them contain patterns commonly associated with malware.

An example script is:

import openai

Initialize OpenAI API (replace ‘YOUR_API_KEY’ with your actual key)
openai.api_key = ‘YOUR_API_KEY’
def check_for_malware_patterns(script):
    “”"
    Use ChatGPT to check a script for malware coding patterns.
    “”"
    prompt = f"Analyze the following script for any malware coding patterns:nn{script}nn"
    prompt += “Output format: {malicious: boolean, certainty: float, language: string, urls: [string]}n”
    response = openai.Completion.create(engine=“davinci”, prompt=prompt, max_tokens=200)
    # Parse the response to extract the required information
    response_text = response.choices[0].text.strip()
    response_lines = response_text.split(‘n’)
    # Extracting the results
    result = {
        “malicious”: False,
        “certainty”: 0.0,
        “language”: “unknown”,
        “urls”: []
    }
    for line in response_lines:
        if line.startswith(“malicious: true”):
            result[“malicious”] = True
        elif line.startswith(“certainty:”):
            result[“certainty”] = float(line.split(“:”)[1])
        elif line.startswith(“language:”):
            result[“language”] = line.split(“:”)[1].strip()
        elif line.startswith(“urls:”):
            result[“urls”] = [url.strip() for url in line.split(“:”)[1].split(‘,’)]
    return result

Sample script (replace with your actual script)
script_content = “”"
function stealData() {
    var credentials = getCredentials();
    sendToServer(credentials);
}
“”"

Check the script for malware patterns using ChatGPT
result = check_for_malware_patterns(script_content)
print(“Malicious:”, result[“malicious”])
print(“Certainty:”, result[“certainty”])
print(“Language:”, result[“language”])
print(“URLs:”, result[“urls”])

With an example like this, you can capture some patterns and take your automation one step forward.

Sentiment Analysis

Another automated process you can perform is Sentiment Analysis using ChatGPT, which can provide valuable insights into the mood and intent behind dark web discussions. When fine-tuned, large language models like ChatGPT can detect subtle sentiments, helping analysts identify potential threats or gauge public sentiment towards illicit activities or products. By analyzing the tone of forum posts or market listings, organizations can proactively respond to emerging threats or trends.

Sentiment Analysis example, key points may also be changed according to needs.

Entity Analysis

To perform entity analysis with ChatGPT, the process typically involves multiple steps. First, you need to gather and prepare the text data you intend to analyze. This data can come from various sources, such as forums, articles, or any text-based content.

Next, you need to train GPT with samples or get a Named Entity Recognition (NER) model or library to identify named entities within the text. Popular options include libraries like SpaCy, NLTK, or pre-trained models like BERT. These tools are specifically designed to recognize entities such as persons, organizations, locations, dates, and more in the text.

Once the entities are identified, the next step is to extract them from the text. This is crucial for further analysis. At this point, you have a list of recognized entities along with their respective types.

An example of basic Entity Analysis

And Many More

For those seeking concise updates, ChatGPT can craft daily or weekly summaries of significant activities from designated dark web sources and even generate detailed incident reports when threats or breaches are detected. Given the multilingual nature of the dark web, ChatGPT’s language translation feature becomes indispensable, breaking down language barriers and facilitating comprehension of discussions in unfamiliar languages.

Analysts can also engage in interactive sessions with ChatGPT, posing questions about dark web content and receiving informed responses. Beyond mere content analysis, ChatGPT aids in data categorization, helping differentiate between various threats or rank discussions based on severity. Its adaptability allows it to seamlessly integrate with other cybersecurity tools, potentially collaborating with visualization software for graphical data representation or databases for efficient information storage and retrieval.

In wrapping up, it’s worth noting a framework like LangChain for streamlining these processes. With its ChatGPT integrations, LangChain facilitates the development of LLM-driven applications. By leveraging this, you can craft an AI-enhanced application that automates numerous tasks highlighted earlier, optimizing the workforce in dark web monitoring.

Conclusion

While the dark web remains a complex and often perilous realm, the advent of advanced AI tools like ChatGPT offers a glimmer of hope in navigating its murky waters. These tools provide a unique opportunity to monitor, analyze, and counteract potential threats, ensuring that organizations remain one step ahead of cyber adversaries. However, as with all powerful tools, there’s a dual nature to AI, with the potential for misuse in the wrong hands.

SOCRadar, Dark Web Monitoring

For organizations that may find the task of dark web monitoring daunting, SOCRadar offers a comprehensive Dark & Deep Web Monitoring solution. This platform not only identifies and mitigates threats across various web layers but also fuses automated external cyber intelligence with a dedicated analyst team, ensuring proactive defense mechanisms. By leveraging such solutions, organizations can effectively combat the myriad of challenges posed by the dark web and ensure a safer digital landscape.

Article Link: https://socradar.io/power-of-ai-dark-web-monitoring-with-chatgpt/