Delta TFIDF: an Improved Feature Space for Text Analysis

dc.contributor.authorMartineau, Justin
dc.contributor.authorFinin, Tim
dc.contributor.authorPatel, Shamit
dc.contributor.authorJoshi, Anupam
dc.date.accessioned2025-11-21T00:29:46Z
dc.description.abstractBag of words text classification techniques represent documents as a collection of words with associated counts. Machine learning algorithms, such as support vector machines, use these feature spaces to classify new documents using their similarity to a set of manually labeled documents. We describe an efficient way to weight these feature scores to improve classification accuracy
dc.description.urihttps://ebiquity.umbc.edu/_file_directory_/papers/1467.pdf
dc.format.extent1 page
dc.genreposters
dc.identifierdoi:10.13016/m2sk3a-tyx8
dc.identifier.urihttp://hdl.handle.net/11603/40794
dc.language.isoen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department
dc.relation.ispartofUMBC Center for Cybersecurity
dc.relation.ispartofUMBC Mathematics and Statistics Department
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectUMBC Accelerated Cognitive Cybersecurity Laboratory
dc.subjectUMBC Accelerated Cognitive Cybersecurity Laboratory
dc.subjectUMBC Cybersecurity Institute
dc.subjectUMBC Cybersecurity Institute
dc.subjectUMBC Accelerated Cognitive Cybersecurity Laboratory
dc.subjectUMBC Cybersecurity Institute
dc.titleDelta TFIDF: an Improved Feature Space for Text Analysis
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-6593-1792
dcterms.creatorhttps://orcid.org/0000-0002-8641-3193

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1467.pdf
Size:
1.72 MB
Format:
Adobe Portable Document Format