Clustering Approaches for Anonymizing High-Dimensional Sequential Activity Data

Author/Creator ORCID

Date

2020-01-01

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.
This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu

Abstract

In the current IoT era, collection of activity data such as physical and daily activity data has become ubiquitous. Publishing activity data can facilitate personal and population health management and promote reproducible health care research. However, publishing such data can also bring high privacy risks including re-identification of individuals in the data set. Therefore, there is a growing need for anonymizing the data before publishing. One of the challenges in anonymizing sequential data such as activity data is its high-dimensional nature. Although existing techniques work sufficiently for cross-sectional data, they result in low run-time performance when applied directly to sequential data. In this research, we propose Multi-level Clustering (MC) based anonymization approaches that apply k-anonymity, differential privacy, and l-diversity privacy models. The proposed MC step improves the performance of the anonymization approaches by reducing the clustering time drastically. Results show that the proposed approaches in addition to being more efficient than the existing approaches, also preserve the utility of the data as much as the existing approaches.