Survey of Network Intrusion Detection Methods from the Perspective of the Knowledge Discovery in Databases Process
ABSTARCT :
The identification of network attacks which target information and communication systems has been a focus of the research community for years. Network intrusion detection is a complex problem which presents a diverse number of challenges. Many attacks currently remain undetected, while newer ones emerge due to the proliferation of connected devices and the evolution of communication technology. In this survey, we review the methods that have been applied to network data with the purpose of developing an intrusion detector, but contrary to previous reviews in the area, we analyze them from the perspective of the Knowledge Discovery in Databases (KDD) process. As such, we discuss the techniques used for the collecion, preprocessing and transformation of the data, as well as the data mining and evaluation methods. We also present the characteristics and motivations behind the use of each of these techniques and propose more adequate and up-to-date taxonomies and definitions for intrusion detectors based on the terminology used in the area of data mining and KDD. Special importance is given to the evaluation procedures followed to assess the detectors, discussing their applicability in current, real networks. Finally, as a result of this literature review, we investigate some open issues which will need to be considered for further research in the area of network security.
EXISTING SYSTEM :
? We can explore the existing links in the dataset pointing to the related entities in other LOD datasets.
? In the next step, various techniques for data consolidation, preprocessing and cleaning are applied,e.g.,schemamatching,datafusion,valuenormalization,treatmentof missing values and outliers, etc.
? Many approaches have been proposed for extracting the schema of the tables, and mapping it to existing on to logies and LOD. Mulwad et al. have made significant contribution for interpreting tabular data using LOD, coming from independent domains.
DISADVANTAGE :
? Data preprocessing and transformation phases have received a similar lack of analysis: the number of surveys covering these tasks is small, with limited content and they ignore their associated problems and challenges
? These techniques select a subset of the whole feature set that is assumed to be the most relevant to solve the problem at hand.
? FSS methods reduce the cost of data mining models, help to prevent mode lover-fitting, and enable a better understanding of the data because redundancies are removed
PROPOSED SYSTEM :
? Several approaches and APIs have been proposed for extracting named entities from text documents and linking them to LOD. One of the most used APIs is DB pedia Spotlight ,which allows for automatically annotating text documents with DB pedia URIs
? A similar approach is proposed by Perez et al. with the Onto Data Clean framework, which is able to guide the data cleaning process in a distributed environment. The framework uses apreprocessingontologytostoretheinformationabouttherequiredtransformations.
ADVANTAGE :
? In contrast to in-line mode, mirroring allows traffic from many machines to be captured but has a bearing on the performance of the device, which is affected by the increasing number of machines to be monitored and the growing speed of modern networks.
? We review the techniques used in all the steps of this process, paying special attention to the motivation behind their use.
? We propose a taxonomy of NIDS detection methods based on the data mining terminology to avoid the ambiguity of previous classification proposals. To that end, we use two different criteria: (a) the detection approach and (b) the learning approach
|