Natural Language Processing for Detecting Anomalies and Intrusions in Unstructured Cybersecurity Data
Main Article Content
Abstract
With the increasing volume and variety of data generated in cybersecurity systems, leveraging unstructured text data has become crucial for detecting anomalies and intrusions. Natural language processing (NLP) provides effective techniques for analyzing unstructured data and identifying threats. This paper provides a comprehensive overview of NLP techniques for cybersecurity applications. First, we present the motivations and challenges of using NLP in cybersecurity. We then provide background on the types of unstructured data relevant to cybersecurity and discuss NLP methods, including named entity recognition, sentiment analysis, topic modeling, and document classification. The core of the paper examines how these techniques can be used for anomaly detection and intrusion detection systems. We provide a taxonomy of NLP-driven approaches and conduct an extensive literature review categorized according to this taxonomy. We critically examine the advantages and limitations of current techniques. Based on this analysis, we highlight research gaps and propose an agenda for advancing NLP research for cybersecurity applications. Overall, this paper synthesizes past research and establishes a foundation for applying NLP to address pressing cybersecurity challenges involving unstructured data.