Hood College Computer Science and Information Technology

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 4 of 4
  • Item
    Protein-ligand binding affinity prediction using SARS-CoV-2
    (2024-04-25) Anthony Rispoli; Rafael Zamora-Resendiz; Dr. Aijuan Dong; Dr. Dana Lawrence; Hood College Department of Computer Science and Information Technology; Hood College Departmental Honors
    The urgency of the COVID-19 pandemic has accelerated drug discovery efforts, prompting advancements in computational methods. This study aims to predict protein-ligand (PL) binding affinities using atomic-resolution structural data from SARS-CoV-2 interactions with ligands. Utilizing data from large-scale ensemble-docking experiments, Multivariate Linear Regression (MLR) and Random Forest (RF) regression models were trained. Despite marginal improvement with RF, both models struggled to establish reliable predictions, highlighting the complexity of PL binding affinity prediction. Future work entails exploring larger RF models, integrating deep learning approaches, and developing novel predictor features for enhanced predictive capabilities.
  • Item
    Forecasting the Supreme Court: A Comparative Analysis of Machine Learning Algorithms on Petitioner vs. Appellee Outcomes
    (2024-04-25) Benjamin Chase Davids; Dr. George Dimitoglou; Hood College Department of Computer Science and Information Technology; Hood College Departmental Honors
    Since its inception, Supreme Court decisions have impacted American laws and life. The ability to predict the high court’s decisions, known as quantitative legal prediction, would be of interest to those in the legal profession and the general public. While much research has been conducted on quantitative legal prediction, for various foreign high courts, the few experiments that have specifically addressed United States Supreme Court cases are now outdated, have been prone to overfitting, or were based on limited datasets. Our work and experimentation attempt to predict case outcomes while addressing the shortcomings of past research. In this work, we deployed several machine learning algorithms to predict whether the petitioner or appellee will win a Supreme Court case and compared the algorithms based on their prediction accuracy. Finally, we embarked on identifying which case features have the greatest predictive impact on the winner of a case. Using four machine learning algorithms (Random Forest, XGBoost, LightGBM, and Multilayer Perceptron) we trained, evaluated, and tested the predictive accuracy on the Washington University School of Law dataset of over 8,000 Supreme Court cases that were litigated between 1946 to 2016. Success was measured via a model’s accuracy, AUROC, and the associated weighted F1 score. Three of the four algorithms achieved accuracy, AUROC, and weighted F1 score in the mid-0.70s with LightGBM being the most accurate. The three case features that most influence LightGBM’s performance are the reason the Supreme Court granted a petition for certiorari, the category of the appellee, and the category of the petitioner. High performing algorithms and models such as the ones we have deployed could provide some predictive insight to individuals, lawyers, and policymakers that may be affected by Supreme Court decisions. Future research directions may include training the algorithms using semantically meaningful textual data or additional case variables.
  • Item
    ENHANCING RISK PREDICTION IN FINANCIAL APPLICATIONS USING DATA MINING AND GAME THEORY PRINCIPLES
    (Hood College, 2016-05) Allcheliwi, Turki; Hood College Information Technology; Hood College Information Technology
    This thesis examines the potential of applying Game Theory to Data Mining mechanisms to enhance the accuracy of predicting risk in .financial settings. There have been many attempts made in the past to enhance Data Mining results using different methods including Game Theory principles. Despite the promising results of previous work in integrating Game Theory and Data Mining, further research is needed to explore the potential of creating a combined model that can be applied to a range of datasets to successfully enhance risk prediction. We apply a variety of different tree data mining algorithms to the German Credit Dataset. Then, we propose a combined model to enhance the accuracy of the data mining results by using Game Theory principles. Our approach focuses on correcting the error from the incorrectly classified instances by our proposed enhanced game tree model. By using the payoff table derived from our enhanced game tree model and the binomial distribution, we can determine the percentage of enhancement to the tree-based data mining results. Our results show that applying Game Theory principles to Data Mining techniques in a combined model can improve overall accuracy and enhance decision support systems in financial applications.
  • Item
    Multi-Stage Pattern Reduction in Lossless Image Compression
    (ProQuest Information and Learning Company, 2007) Newman, Mark; Ford, W. Randolph; Hood College Computer Science; Master of Science
    Lossless image compression is the process of compressing and subsequently decompressing images without the loss of data. Historically, image compression was carried out by treating images as complex text [13]. Only in recent years have images been treated as data collections that could be processed for compression and decompression in a manner unique to images [1]. Even the best modern lossless image compression techniques, however, yield less than desirable results [5]. The biggest drawback for lossless image compression is that images can only be reduced to about one-third of their original image size. Lossy image compression algorithms, i.e., those techniques for compressing image size where image information is lost upon decompression, are capable of reducing images to one tenth of their actual size with little or no humanly perceptual loss in image detail. Multi-stage pattern reduction is an emerging approach for encoding data that has recently demonstrated efficient processing in the field of natural-language processing. It relies on the ability to discern small local patterns in a source, recreating a new source using these local patterns and then reapplying the technique over multiple stages. In this thesis, the value of using multi-stage pattern reduction to compress images will be explored. The goal of this thesis is to create a lossless image compression algorithm by employing the techniques of multi-stage pattern reduction and to determine if such an approach can provide better compression on average than the current major competing algorithms in the field.