The effect of k-nearest neighbors classifier for Intrusion detection of streaming of Net-flows in the Apache Spark environment

Author/Creator ORCID

Date

2017-01-01

Type of Work

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Subjects

Abstract

An Intrusion Detection System (IDS) is built with the purpose to detect normal and attack packets in network traffic data. Due to enormous amount of data present in the network traffic, analyzing all the individual packets present is both an impractical task which also increases the system performance overhead. To solve this problem, another technique is employed, which aggregates packet information into flows and reduces the amount of data to be examined from the network traffic. In addition, IDS efficiency is increased by the use of the k-NN classification algorithm to classify the incoming connections as normal or suspicious. Combining the flow based Intrusion detection approach and k-NN classifier in the Spark Streaming framework has helped develop a system which is able to detect attacks in real time. In this theses, the KDD-99 data set has been used for testing the proposed approaches. Experimental results show that Apache Spark Streaming, a modern distributed stream processing system provides enough throughput to process large volumes of data in shorter span of time which is suitable for network traffic monitoring.