Selectively estimation for Boolean queries

dc.contributor.authorChen, Zhiyuan
dc.contributor.authorKoudas, Nick
dc.contributor.authorKorn, Flip
dc.contributor.authorMuthukrishnan, S.
dc.date.accessioned2025-06-05T14:02:55Z
dc.date.available2025-06-05T14:02:55Z
dc.date.issued2000-05-01
dc.descriptionPODS '00: Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
dc.description.abstractIn a variety of applications ranging from optimizing queries on alphanumeric attributes to providing approximate counts of documents containing several query terms, there is an increasing need to quickly and reliably estimate the number of strings (tuples, documents, etc.) matching a Boolean query. Boolean queries in this context consist of substring predicates composed using Boolean operators. While there has been some work in estimating the selectivity of substring queries, the more general problem of estimating the selectivity of Boolean queries over substring predicates has not been studied.Our approach is to extract selectivity estimates from relationships between the substring predicates of the Boolean query. However, storing the correlation between all possible predicates in order to provide an exact answer to such predicates is clearly infeasible, as there is a super-exponential number of possible combinations of these predicates. Instead, our novel idea is to capture correlations in a space-efficient but approximate manner. We employ a Monte Carlo technique called set hashing to succinctly represent the set of strings containing a given substring as a signature vector of hash values. Correlations among substring predicates can then be generated on-the-fly by operating on these signatures.We formalize our approach and propose an algorithm for estimating the selectivity of any Boolean query using the signatures of its substring predicates. We then experimentally demonstrate the superiority of our approach over a straight-forward approach based on the independence assumption wherein correlations are not explicitly captured.
dc.description.urihttps://dl.acm.org/doi/10.1145/335168.335225
dc.format.extent10 pages
dc.genreconference papers and proceedings
dc.genrepostprints
dc.identifierdoi:10.13016/m2smr3-464p
dc.identifier.citationChen, Zhiyuan, Nick Koudas, Flip Korn, and S. Muthukrishnan. “Selectively Estimation for Boolean Queries.” Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’00, May 1, 2000, 216–25. https://doi.org/10.1145/335168.335225.
dc.identifier.urihttps://doi.org/10.1145/335168.335225
dc.identifier.urihttp://hdl.handle.net/11603/38622
dc.language.isoen_US
dc.publisherACM
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC College of Engineering and Information Technology Dean's Office
dc.relation.ispartofUMBC Information Systems Department
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectUMBC Mobile, Pervasive and Sensor Computing Lab (MPSC Lab)
dc.subjectUMBC Cybersecurity Institute
dc.subjectUMBC Accelerated Cognitive Cybersecurity Laboratory
dc.titleSelectively estimation for Boolean queries
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-6984-7248

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SelectivelyestimationforBooleanqueries.pdf
Size:
214.51 KB
Format:
Adobe Portable Document Format