Browsing by Subject "text classification"
Now showing 1 - 4 of 4
Results Per Page
ItemA Bayesian Methodology towards Automatic Ontology Mapping(AAAI, 2005-07-09) Ding, Zhongli; Peng, Yun; Pan, Rong; Yu, YangThis paper presents our ongoing effort on developing a principled methodology for automatic ontology mapping based on BayesOWL, a probabilistic framework we developed for modeling uncertainty in semantic web. The pro-posed method includes four components: 1) learning prob-abilities (priors about concepts, conditionals between sub-concepts and superconcepts, and raw semantic similarities between concepts in two different ontologies) using Naive Bayes text classification technique, by explicitly associating a concept with a group of sample documents retrieved and selected automatically from World Wide Web (WWW); 2) representing in OWL the learned probability information concerning the entities and relations in given ontologies; 3) using the BayesOWL framework to automatically translate given ontologies into the Bayesian network (BN) structures and to construct the conditional probability tables (CPTs) of a BN from those learned priors or conditionals, with reason-ing services within a single ontology supported by Bayesian inference; and 4) taking a set of learned initial raw similarities as input and finding new mappings between concepts from two different ontologies as an application of our formalized BN mapping theory that is based on evidential reasoning across two BNs. ItemBlog Link Classification(AAAI, 2008-03-31) Martineau, Justin; Hurst, MatthewBlog links raise three key questions: Why did the author make the link, what exactly is he pointing at, and what does he feel about it? In response to these questions we introduce a link model with three fundamental descriptive dimensions where each dimension is designed to answer one question. We believe the answers to these questions can be utilized to improve search engine results for blogs. While proving this is outside the scope of this paper, we do prove that knowing the rhetorical role of a link helps determine what the author was pointing at and how he feels about it. ItemPREDICTING LATENT DEMOGRAPHIC ATTRIBUTES OF TWITTER USERS(2016-01-01) Frolov, Georgiy; Oates, Tim; Computer Science and Electrical Engineering; Computer ScienceSocial media websites such as Twitter, Facebook, and LinkedIn aggregate large amounts of textual data. There is a wealth of user information that can be inferred from this, that is potentially useful in advertising, analytics, sentiment analysis, etc. It is estimated that over 60% of people in the US have a Twitter account, and a significant portion of US population is comprised of immigrants. As social media have become common place, people are willingly posting their personal information such as their name, age, location, alma mater, etc. This makes it possible to use text classification methods to accurately determine demographic profiles. This theses focuses on extracting latent demographic information from social media data. Previous works have attempted to determine user's race and ethnicity, while our work focuses on using posts on Twitter (tweets), to determine whether a user is an immigrant or a native US citizen. The method uses ethnic name distribution among immigrant and native populations to find and collect users in the United States, and their tweets across three race groups: Asian, Latino, and Caucasian/White. We use supervised machine learning approach to predict the immigration status of a user by examining the textual content of tweets, using Multinomial Naive Bayes, Support Vector Machines, Logistic Regression, k-Nearest Neighbors, and Decision Trees. We investigate methods for improving the performance of algorithms and determine how number of features affects the accuracy of the built models. Additionally we evaluate which features have more weight in classifying users, and attempt to discover latent topical patterns in the data corpus using Latent Dirichlet Allocation. ItemYahoo! as an ontology: using Yahoo! categories to describe documents(ACM, 1999-11-02) Labrou, Yannis; Finin, TimWe suggest that one (or a collection) of names of Yahoo! (or any other WWW indexer's) categories can be used to describe the content of a document. Such categories offer a standardized and universal way for referring to or describing the nature of real world objects, activities, documents and so on, and may be used (we suggest) to semantically characterize the content of documents. WWW indices, like Yahoo! provide a huge hierarchy of categories (topics) that touch every aspect of human endeavors. Such topics can be used as descriptors, similarly to the way librarians use for example, the Library of Congress cataloging system to annotate and categorize books. In the course of investigating this idea, we address the problem of automatic categorization of webpages in the Yahoo! directory. We use Telltale as our classifier; Telltale uses n-grams to compute the similarity between documents. We experiment with various types of descriptions for the Yahoo! categories and the webpages to be categorized. Our findings suggest that the best results occur when using the very brief descriptions of the Yahoo! categorized entries; these brief descriptions are provided either by the entries' submitters or by the Yahoo! human indexers and accompany most Yahoo!- indexed entries.