Engineering a Simplified 0-Bit Consistent Weighted Sampling

dc.contributor.authorRaff, Edward
dc.contributor.authorSylvester, Jared
dc.contributor.authorNicholas, Charles
dc.date.accessioned2019-01-29T15:16:05Z
dc.date.available2019-01-29T15:16:05Z
dc.date.issued2018-10-23
dc.description.abstractThe Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and provides state-of-the-art performance for this problem space. However, ICWS suffers a computational burden as the sketch size K increases. We develop a new Simplified approach to the ICWS algorithm, that enables us to obtain over 20x speedups compared to the standard algorithm. The veracity of our approach is demonstrated empirically on multiple datasets and scenarios, showing that our new Simplified CWS obtains the same quality of results while being an order of magnitude faster.en_US
dc.description.urihttps://arxiv.org/abs/1804.00069en_US
dc.format.extent10 pagesen_US
dc.genreconference papers and proceedings preprintsen_US
dc.identifierdoi:10.13016/m2gueu-ztec
dc.identifier.citationEdward Raff, Jared Sylvester, Charles Nicholas , Engineering a Simplified 0-Bit Consistent Weighted Sampling, In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. (2018) 1203-1212, DOI: 10.1145/3269206.3271690en_US
dc.identifier.uri10.1145/3269206.3271690
dc.identifier.urihttp://hdl.handle.net/11603/12638
dc.language.isoen_USen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectmin-hashingen_US
dc.subjectjaccard similarityen_US
dc.subjectconsistent weighted samplingen_US
dc.titleEngineering a Simplified 0-Bit Consistent Weighted Samplingen_US
dc.typeTexten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
engineering-simplified-0.pdf
Size:
744.59 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: