IP REPUTATION SCORING � A PERSPECTIVE ON CLUSTERING WITH META-FEATURES AUGMENTATION

Author/Creator

Author/Creator ORCID

Date

2018-01-01

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

We propose a novel approach to assess the reputation of an IP address in network usage data by augmenting the network features with meta-features such as geospatial knowledge. While there is abundant literature on geospatial data mining, limited attention is given to geolocation in the realm of cybersecurity applications. We present experimental results that highlight the importance of geospatial knowledge in augmenting network anomalies and compare several traditional clustering methods with a clustering technique called unified clustering that overcomes the problems of using both continuous and categorical attributes in clustering. Thus, the contributions in this paper are three folds. First, we show that the approach of combining traditional network observables with geospatial observables presents a more robust and unique IP reputation scoring model; Second, this study provides an empirical validation of applying unified clustering approach for data with heterogeneous attributes in the cybersecurity domain to have better well-formed clusters. Third, we have devised a reputation scoring model for an IP address by applying unified clustering on a combined dataset that encompasses network & geospatial information; This research study has implications for anomaly detection for cyber security applications, especially when there is limited information about the network session or there is a lack of historical data for the network observables.