Cybersecurity experts develop a dark web-trained AI

Emre Çitak
May 27, 2023
Updated • May 26, 2023

Researchers from the Korea Advanced Institute of Science and Technology (KAIST) and data intelligence organization S2W have introduced DarkBERT, an unprecedented language model specifically trained on data extracted from the dark web.

DarkBERT aims to empower cybersecurity professionals by equipping them with a cutting-edge tool to identify and flag potential threats lurking within the depths of the internet's underbelly.

This unique AI system harnesses the language used in dark web environments, enabling it to enhance the comprehension abilities of AI tools. DarkBERT has the potential to become an invaluable asset for cybersecurity professionals and law enforcement agencies.

Dark web-trained AI DarkBERT
Cybersecurity experts worked on a dark web-trained AI, DarkBERT - Image courtesy of KAIST

Trained on the TOR network

To ensure optimal adaptation to the language prevalent on the dark web, the research team undertook a meticulous process. They extensively crawled the Tor network, constructing a comprehensive database for DarkBERT.

The team implemented deduplication, data filtering, and thorough pre-processing techniques to address ethical concerns associated with dark web content, which often contains sensitive information.

During the training process, DarkBERT was exposed to two distinct sets of data over a period of 16 days. The pre-processed data underwent careful scrubbing, with redactions made to protect the identities of victim organizations, details regarding leaked data, menacing statements, and illicit images.

Notably, a significant portion of the data set, comprising over a thousand pages, was categorized as adult entertainment.

How can a Dark Web-Trained AI benefit cybersecurity?

This milestone development paves the way for further exploration and refinement of dark web-trained AI systems, offering promising avenues to fortify digital defenses against cyber threats. By understanding the methodology of these threat actors, we can keep ourselves safe even in the darkest places on the internet.

You may find further information about DarkBERT here.

Can you access DarkBERT?

While DarkBERT represents a groundbreaking achievement, the release of this AI model to the public is not currently anticipated due to the potential risks associated with dark web materials. However, academic institutions may request access to DarkBERT for research purposes.


Tutorials & Tips

Previous Post: «
Next Post: «


There are no comments on this post yet, be the first one to share your thoughts!

Leave a Reply

Check the box to consent to your data being stored in line with the guidelines set out in our privacy policy

We love comments and welcome thoughtful and civilized discussion. Rudeness and personal attacks will not be tolerated. Please stay on-topic.
Please note that your comment may not appear immediately after you post it.