Towards Effective Paraphrasing for Information Disguise

dc.contributorAgarwal, Anmol
dc.contributor.advisorGaur, Manas
dc.contributor.authorAgarwal, Anmol
dc.contributor.authorGupta, Shrey
dc.contributor.authorBonagiri, Vamshi
dc.contributor.authorGaur, Manas
dc.contributor.authorReagle, Joseph
dc.contributor.authorKumaraguru, Ponnurangam
dc.contributor.departmentComputer Science and Electrical Engineeringen_US
dc.contributor.programPart of Knowledge-infused AI and Inference Lab at UMBCen_US
dc.date.accessioned2023-01-14T21:07:45Z
dc.date.available2023-01-14T21:07:45Z
dc.date.issued2023
dc.descriptionWork is accepted at 45th European Conference on Information Retrieval (ECIR) 2023en_US
dc.description.abstractInformation Disguise (ID), a part of computational ethics in Natural Language Processing (NLP), is concerned with best practices of textual paraphrasing to prevent the non-consensual use of authors’ posts on the Internet. Research on ID becomes important when authors’ written online communication pertains to sensitive domains, e.g., mental health. Over time, researchers have utilized AI-based automated word spinners (e.g., SpinRewriter, WordAI) for paraphrasing content. However, these tools fail to satisfy the purpose of ID as their paraphrased content still leads to the source when queried on search engines. There is limited prior work on judging the effectiveness of paraphrasing methods for ID on search engines or their proxies, neural retriever (NeurIR) mod- els. We propose a framework where, for a given sentence from an author’s post, we perform iterative perturbation on the sentence in the direction of paraphrasing with an attempt to confuse the search mechanism of a NeurIR system when the sentence is queried on it. Our experiments involve the subreddit “r/AmItheAsshole” as the source of public content and Dense Passage Retriever as a NeurIR system-based proxy for search engines. Our work introduces a novel method of phrase-importance rankings using perplexity scores and involves multi-level phrase substitutions via beam search. Our multi-phrase substitution scheme succeeds in disguising sentences 82% of the time and hence takes an essential step towards enabling researchers to disguise sensitive content effectively before making it public. We also release the code of our approach.en_US
dc.description.sponsorshipUMBC Faculty Startup Awarden_US
dc.description.urihttps://github.com/idecir/idecir-Towards-Effective-Paraphrasing-for-Information-Disguiseen_US
dc.format.extent10 pagesen_US
dc.genreconference papers and proceedingsen_US
dc.identifierdoi:10.13016/m2ep9g-rqrw
dc.identifier.urihttp://hdl.handle.net/11603/26665
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.en_US
dc.rightsAttribution-NonCommercial-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/us/*
dc.subjectNeural Information Retrievalen_US
dc.subjectAdversarial Retrievalen_US
dc.subjectParaphrasingen_US
dc.subjectInformation Disguiseen_US
dc.subjectComputational Ethicsen_US
dc.titleTowards Effective Paraphrasing for Information Disguiseen_US
dc.typeTexten_US
dcterms.creatorhttps://orcid.org/0000-0001-6730-3565en_US
dcterms.creatorhttps://orcid.org/0000-0002-5160-2226en_US
dcterms.creatorhttps://orcid.org/0000-0002-5537-1664en_US
dcterms.creatorhttps://orcid.org/0000-0002-5411-2230en_US
dcterms.creatorhttps://orcid.org/0000-0003-0650-9097en_US
dcterms.creatorhttps://orcid.org/0000-0001-5082-2078en_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
KAI2_ECIR_Information_Disguise.pdf
Size:
606.42 KB
Format:
Adobe Portable Document Format
Description:
Main camera ready article to be presented at 45th ECIR conference

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: