Performance Evaluation of a Smart Intrusion Detection System ( IDS ) Model

DOI: http://dx.doi.org/10.24018/ejers.2021.6.2.2371 Vol 6 | Issue 2 | February 2021 148 Abstract — The research work titled “Smart Intrusion Detection System Comprised of Machine Learning and Deep Learning” was published in European Journal for Engineering and Technology Research (EJERS) online journal in the October edition where a smart IDS model was proposed. In this present work, validation of the IDS model is conducted. KDD Cup'99 intrusion detection dataset was used to build the IDS model. A unique method is incorporated to test the performance of the model. Here, training is conducted by using the KDD'99 dataset. But testing is done through the NSL-KDD dataset. Testing is conducted in three-stage. In the first stage, using generic 41 features the accuracy, sensitivity, and FPR of detecting attack was 95.240%, 93.103%, 1.936% respectively for Random Forest and for MLP it is 87.811%, 90.065%, and 15.168% respectively. In the second stage selective 15 features are used where accuracy, sensitivity, and FPR of detecting attack is 70.808%, 81.992%, 43.971% respectively for Random Forest and for MLP it is 67.637%, 87.660%, 54.266% respectively. In the third stage selective 22 features are used where accuracy, sensitivity, and FPR of detecting attack is 97.001%, 96.643%, 2.272% for Random Forest respectively and for MLP it is 85.442%, 82.350 and 10.472 respectively. Total 3,11,021 record is used for training and 22,544 record is used for testing purpose. The final accuracy, sensitivity and FPR of the model can be resulted as 95.24%, 70.808%, 96.988% for 41 features, 93.103%, 87.68%, 97.233% for 15 features, 1.936%, 43.97%, 3.36% for 22 features. Therefore, the IDS model is efficient and effective.

I. INTRODUCTION 1 Computer and large interconnected networks like the internet have become the most needed facility of our day to day modern life. As systems become increasingly large, complex and interconnected it becomes vulnerable. Network security pertains to the use of technology and policies to assure confidentiality, integrity, and availability by adopting the activities of prevention, detection, and recovery. Intrusion detection is the first layer of network security. 1  Machine learning is an ideal solution to counter threats and vulnerabilities of a large network, expose to even different kinds of cyber-attack. [1].
Machine learning is broadly used in the field of security in two ways. Firstly, pattern recognition and secondly anomaly detection. Spam, malware, and botnet detection clearly fall under the category of pattern recognition. An access control system, the first line of defense, can also be achieved through a flexible machine learning method. User authentication and behavior analysis fall in between pattern recognition and anomaly detection. IDS analyses all of the above criteria to ensure the first line of defense [2]. Among the machine learning methods Random Forest and MLP are proved to be the most prudent to form an IDS model. [3].
The paper is developed in the following way. Section 2 provides some theoretical background, Section 3 describes the building of the IDS model, Section 4 validates the IDS model. Lastly, Section 5 concludes the paper.

II. PRELIMINARIES
This part gives a short foundation of network intrusion, ML algorithms, and datasets used in this study.

A. IDS
IDS work beyond the initial access control barrier to detect attempted or successful breaches of a network. Modern Intrusion Detection Systems are also called Intrusion Detection and Prevention System (IDPS0. It has the ability to intercept the direct line of communication between the source and the destination. It also acts on the anomalies automatically [IV].

B. KDD'99 & NSL-KDD
KDD'99 is the data set utilized for The Third International Knowledge Discovery and Data Mining Tools Competition. The competition task was to assemble an organization interruption locator, a prescient model fit for recognizing ``bad'' connections, and ``good'' normal connections [4].
NSL-KDD is the updated version of KDD'99 data set. This data set proposed to take care of a portion of the innate issues of the KDD'99 data set. The advantage of this dataset is no duplicate and redundant data.

(I) Random Forest
The random forest comprises of countless individual decision trees. It works as an ensemble. Every individual tree splits out a class prediction. The class with the most votes turns into the model's prediction [6].

(II) Multi-Layer Perceptron
Multi-layer perceptron is a deep learning strategy where more than one direct layer (the blend of neurons) is included. In a three-layered network, the primary layer will be the input layer, and the last one will be the output layer and with a hidden layer in the middle of [7].

D. Data Analysis Platform: Jupyter Notebook
Jupyter Notebook is an intelligent open-source web application that permits coders to compose expressive code. The Jupyter Notebook is a helpful apparatus for Data Science and ML. It empowers to display discoveries and install the outcomes (representations) in a similar record as the code [8].

III. MODEL BUILDING & DESCRIPTION
Random Forest and MLP were found to be most prudent to build an Intrusion Detection Model. It performs well in identifying normal flow and attack as well. Furthermore, it is efficient for both generic and selective features. Therefore, these two methods are used to develop an IDS model. [9].

A. Description of the IDS Model
The model is prepared using the MLP of the DL method and Random Forest of the machine learning algorithm. Repeated training to minimize error is used to optimize performance. Dual methods consisting of DL and ML are combined together therefore greater detection is achieved.
Variation and dynamism of the model detects intrusion pretty efficiently [10].

A. Overview
To evaluate the performance of the model a new data set NSL-KDD is used. This is an updated version of the KDD'99 data set. The unique part of the evaluation is that the train part is conducted by using the data set KDD'99 and the test part is conducted by using the data set NSL-KDD. In both train and test, 100 percent data is used instead of any percentage. Hence, the actual performance of the model is brought out. For finding out the performance of the model using 41 features a data set of CSV file was prepared by using the KDD'99 data set. Finally, the NSL-KDD Test+ data set is used for testing purposes. Therefore, a unique process is followed to evaluate the performance of the model.

B. Performance Analysis Using Generic 41 Features
Detection process using 41 generic features are used to evaluate the performance of the IDS model. In the case of Random Forest accuracy and sensitivity are found 96.983%& 96.103%. FPR is quite low for normal and attack respectively which provides the symptoms of a prudent system. But in the case of MLP accuracy is 64.123% percent for the normal flow of data and 90.065% for the attack. Besides the sensitivity is high which provides the symptoms of a good system. But in the case of MLP FPR is a bit high which degrades the performance of the system.

C. Performance Analysis Using Selective 15 Features
In this stage selective 15 features are used to evaluate the performance of the system. In this case, the results are found moderate. It speaks about the difference between the quality of the data set and also how it varies with the performance of the system. These test results dictate that these 15 feature may be utilized by the attackers to deceive the IDS model.

D. Performance Analysis Using Selective 22 Features
In the third stage selective 22 features are used following the previous method. Training of the system is conducted using the KDD'99 dataset and testing is conducted by the NSL-KDD data set. Total 3,11,021 record is used for training and 22,544 record is used for testing purpose. The results of accuracy, sensitivity is quite high and FPR is ideally low except in the case of attack detection by MLP. Overall, the model has shown a better performance.

E. Analytical Review
Experimental results in two cases have displayed good performance and in one case it is very moderate. Although use of selective 15 features have given a very moderate performance, but it also displays the testimony of a true model. However, through using generic features and selective 22 features the model performed well. Hence, the model can be recognized as an efficient and true IDS model.

F. Final Performance Measure of the IDS Model
Performance of the IDS model is found efficient after combining all the results. In these situations, only intrusion detection rate is considered. It is found to be stable except two cases. Firstly, attack detection accuracy using 15 features was 70.808% and FPR using 15 features was 43.971%, which is a bit high. However, all other results found were quite impressive, which makes the model efficient.  V. CONCLUSION In this paper, initially a model is prepared using KDD'99 dataset and using the algorithm MLP of DL and Random Forest of machine learning. Performance evaluation is conducted in a unique manner. Training of the model is conducted using KDD'99 data set, and testing is carried out using completely a separate dataset NSL-KDD. Among three types of test in two types, using 41 and 22 features, the system worked well and in one type, using 15 features it has displayed moderate performance. However, the unusual moderate performance also signifies the originality of the model. At the same time, since in two test it has passed scoring good marks therefore, the model can be taken as a efficient one. Although, the further research will eliminate the limitations of the result, to further improve the system.