Small Effect Sizes in Malware Detection? Make Harder Train/Test Splits!

dc.contributor.authorPatel, Tirth
dc.contributor.authorLu, Fred
dc.contributor.authorRaff, Edward
dc.contributor.authorNicholas, Charles
dc.contributor.authorMatuszek, Cynthia
dc.contributor.authorHolt, James
dc.date.accessioned2024-01-12T13:11:46Z
dc.date.available2024-01-12T13:11:46Z
dc.date.issued2023-12-25
dc.descriptionCAMLIS’23: Conference on Applied Machine Learning for Information Security, October 19–20, 2023, Arlington, VA
dc.description.abstractIndustry practitioners care about small improvements in malware detection accuracy because their models are deployed to hundreds of millions of machines, meaning a 0.1\% change can cause an overwhelming number of false positives. However, academic research is often restrained to public datasets on the order of ten thousand samples and is too small to detect improvements that may be relevant to industry. Working within these constraints, we devise an approach to generate a benchmark of configurable difficulty from a pool of available samples. This is done by leveraging malware family information from tools like AVClass to construct training/test splits that have different generalization rates, as measured by a secondary model. Our experiments will demonstrate that using a less accurate secondary model with disparate features is effective at producing benchmarks for a more sophisticated target model that is under evaluation. We also ablate against alternative designs to show the need for our approach.
dc.description.urihttps://arxiv.org/abs/2312.15813
dc.format.extent12 pages
dc.genreconference papers and proceedings
dc.genrepreprints
dc.identifier.urihttps://doi.org/10.48550/arXiv.2312.15813
dc.identifier.urihttp://hdl.handle.net/11603/31274
dc.language.isoen_US
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Computer Science and Electrical Engineering Department Collection
dc.relation.ispartofUMBC Faculty Collection
dc.relation.ispartofUMBC Student Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.rightsCC BY 4.0 DEED Attribution 4.0 International en
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleSmall Effect Sizes in Malware Detection? Make Harder Train/Test Splits!
dc.typeText
dcterms.creatorhttps://orcid.org/0009-0003-3212-8156
dcterms.creatorhttps://orcid.org/0000-0001-9494-7139
dcterms.creatorhttps://orcid.org/0000-0003-1383-8120

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2312.15813.pdf
Size:
875.64 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.56 KB
Format:
Item-specific license agreed upon to submission
Description: