A DATA MINING APPROACH TO COMPARE PRIVACY POLICIES

Author/Creator

Author/Creator ORCID

Date

2017-01-01

Department

Information Systems

Program

Information Systems

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.

Abstract

As it becomes easier and less expensive for service providers to store huge amounts of data, the information collected about individuals is growing rapidly. At the same time, individuals' concerns about their privacy is increasing. Although most service providers use privacy policies to explain what information they are collecting, who will access it, and for what purpose, existing research shows that users often do not read privacy policies or they find privacy policies difficult to understand. Thus, users may not make the right regarding securing their privacy. Some studies have proposed tools to enhance the effectiveness of privacy policies and thus facilitate decision making, whereas others have introduced visualization models to increase privacy usability and effectiveness. Despite all of this, there has been relatively little work done on providing a comparison model to assist users when comparing the privacy practices of different companies in an effort to make informed decisions. In this study, we first analyze users' awareness of privacy policies and the privacy practices described in them, their privacy concerns, and their privacy needs. Next, we use text mining techniques to extract information users care about such as collected information, shared information, and provided controls. Unlike existing techniques, our approach attempts to avoid the use of patterns or rules as much as possible because the format of privacy policies often changes over time, and therefore, patterns and rules often become obsolete. We then develop a comparison tool to show the extracted information side by side. Then we conduct a survey to validate our comparison tool and gather users' privacy preferences. Because a side-by-side comparison may not work well when users are comparing a large number of policies, we propose a data mining based method to rank privacy policies. Unlike existing techniques that rely on user ratings, which are often not reliable, our approach relies on pair-wise preferences given by users, which are often a lot more reliable.