Ding, TaoBickel, Warren K.Pan, Shimei2025-01-082025-01-082017-09Ding, Tao, Warren K. Bickel, and Shimei Pan. “Multi-View Unsupervised User Feature Embedding for Social Media-Based Substance Use Prediction.” In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, edited by Martha Palmer, Rebecca Hwa, and Sebastian Riedel, 2275–84. Copenhagen, Denmark: Association for Computational Linguistics, 2017. https://doi.org/10.18653/v1/D17-1241.https://doi.org/10.18653/v1/D17-1241http://hdl.handle.net/11603/37202Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, September, 2017.In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different unsupervised feature learning methods to take advantage of a large amount of unsupervised social media data. We also demonstrate the benefit of using multi-view unsupervised feature learning to combine heterogeneous user information such as Facebook “likes” and “status updates” to enhance system performance. Based on our evaluation, our best models achieved 86% AUC for predicting tobacco use, 81% for alcohol use and 84% for illicit drug use, all of which significantly outperformed existing methods. Our investigation has also uncovered interesting relations between a user's social media behavior (e.g., word usage) and substance use.10 pagesen-USAttribution 4.0 International CC BY 4.0https://creativecommons.org/licenses/by/4.0/Multi-View Unsupervised User Feature Embedding for Social Media-based Substance Use PredictionText