PyLZJD: An Easy to Use Tool for Machine Learning
Loading...
Permanent Link
Author/Creator
Author/Creator ORCID
Date
Type of Work
Department
Program
Citation of Original Publication
Rights
This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
Attribution 4.0 International (CC BY 4.0)
Attribution 4.0 International (CC BY 4.0)
Subjects
Abstract
As Machine Learning (ML) becomes more widely known and popular, so too does the desire for new users from other backgrounds to apply ML
techniques to their own domains. A difficult prerequisite that often confounds
new users is the feature creation and engineering process. This is especially true
when users attempt to apply ML to domains that have not historically received
attention from the ML community (e.g., outside of text, images, and audio).
The Lempel Ziv Jaccard Distance (LZJD) is a compression based technique
that can be used for many machine learning tasks. Because of its compression
background, users do not need to specify any feature extraction, making it easy
to apply to new domains. We introduce PyLZJD, a library that implements LZJD
in a manner meant to be easy to use and apply for novice practitioners. We
will discuss the intuition and high-level mechanics behind LZJD, followed by
examples of how to use it on problems of disparate data types