Browsing by Author "Raim, Andrew M."

Now showing 1 - 19 of 19

An Approximate Fisher Scoring Algorithm for Finite Mixtures of Multinomials
Raim, Andrew M.; Liu, Minglei; Neerchal, Nagaraj K.; Morel, Jorge G.
Finite mixture distributions arise naturally in many applications including clustering and classi cation. Since they usually do not yield closed forms for maximum likelihood estimates (MLEs), numerical methods using the well known Fisher Scoring or Expectation-Maximization algorithms are considered. In this work, an approximation to the Fisher Information Matrix of an arbitrary mixture of multinomial distributions is introduced. This leads to an Approximate Fisher Scoring algorithm (AFSA), which turns out to be closely related to Expectation-Maximization, and is more robust to the choice of initial value than Fisher Scoring iterations. A combination of AFSA and the classical Fisher Scoring iterations provides the best of both computational efficiency and stable convergence properties.
Assessment of Simple and Alternative Bayesian Ranking Methods Utilizing Parallel Computing
(2011) Raim, Andrew M.; Liu, Minglei; Neerchal, Nagaraj K.; Morel, Jorge G.; Allen, Samantha; Kirlew, Dorothy; Obetz, Neil T.; Wade, Derek; Albertine, April C.; Neerchal, Nagaraj K.; Klein, Martin
The U.S. Census Bureau (USCB) assists the federal government in distributing approximately $400 billion of aid by providing a complete ranking of the states according to certain criteria, such as average poverty level. It is imperative that this ranking be as accurate as possible in order to ensure the fairness of the allocation of funds. Currently, the USCB ranks states based on point estimates of their true poverty level. Dr. Klein and Dr. Wright of the USCB have compared the performance of this method against more sophisticated procedures in simulation trials, but have found that they do not consistently outperform the existing method. We investigate this phenomenon by revisiting some of these procedures, and we expand on this work to produce new ranking algorithms. We utilize parallel programming to expedite Dr. Klein’s procedures. In addition, we specify two new prior distributions on the population means — using previous years’ census data as well as regression. We discuss the results of our methods in conjunction with Klein and Wright’s corresponding simulation results. In our final report, we compare the performance of our techniques to that of the USCB’s current method and show the resulting state ranks for each procedure.
A Comparative Evaluation of Matlab, Octave, FreeMat, Scilab, and R on Tara
(2012) Popuri, Sai K.; Raim, Andrew M.; Brewster, Matthew W.; Gobbert, Matthias K.
Matlab is the most popular commercial package for numerical computations in mathematics, statistics, the sciences, engineering, and other fields. Octave, FreeMat and Scilab are free numerical computational packages that have many of the same features as Matlab. R is a free Statistical package. Although R does not belong to the same line of products as Matlab, it is similar to Matlab in its computational capabilities. These packages are available to download on the Linux, Windows, and Mac OS X operating systems. We investigate whether they are viable alternatives to Matlab for uses in research and teaching. We compare the results on the cluster tara in the UMBC High Performance Computing Facility with 86 nodes, each with two quadcore Intel Nehalem processors and 24 GB of memory. The tests focused on usability lead us to conclude that the package Octave is the most compatible with Matlab, since it uses the same syntax and has the native capability of running m-files. Both FreeMat and Scilab were hampered by somewhat different syntax or function names and some missing functions. The tests focused on efficiency show that Matlab and Octave are fundamentally able to solve problems of the same size and with equivalent efficiency in absolute times, except in one test dealing with a very large problem. FreeMat and also Scilab exhibit significant limitations on the problem size and the efficiency of the problems they can solve in our tests. The syntax of R is significantly different from that of Matlab, Octave, FreeMat, and Scilab. R too exhibited certain limitations on the size of problems it could solve for and its performance was similar to that of FreeMat and Scilab. In summary, we conclude that Octave is the best viable alternative to Matlab because it was not only fully compatible (in terms of syntax) with Matlab in our tests, but it also performed very well.
A Comparative Evaluation of Matlab, Octave, FreeMat, Scilab, R, and IDL on Tara
(2012) Coman, Ecaterina; Brewster, Matthew W.; Popuri, Sai K.; Raim, Andrew M.; Gobbert, Matthias K.
Matlab is the most popular commercial package for numerical computations in mathematics, statistics, the sciences, engineering, and other fields. IDL, a commercial package used for data analysis, along with the free numerical computational packages Octave, FreeMat, Scilab, and the statistical package R shares many of the same features as Matlab. They are available to download on the Linux, Windows, and Mac OS X operating systems. We investigate whether these packages are viable alternatives to Matlab for uses in research and teaching. We compare the results on the cluster tara in the UMBC High Performance Computing Facility with 86 nodes, each with two quadcore Intel Nehalem processors and 24 GB of memory. The tests focused on usability lead us to conclude that the package Octave is the most compatible with Matlab, since it uses the same syntax and has the native capability of running m-files. FreeMat, Scilab, R, and IDL were hampered by somewhat different syntax or function names and some missing functions. The tests focused on efficiency show that Matlab and Octave are fundamentally able to solve problems of the same size and with equivalent efficiency in absolute times, except in one test dealing with a very large problem. Also IDL performed equivalently in the case of iterative methods. FreeMat, Scilab, and R exhibit significant limitations on the problem size and the efficiency of the problems they can solve in our tests. The syntax of R and IDL are significantly different from that of Matlab, Octave, FreeMat, and Scilab. In summary, we conclude that Octave is the best viable alternative to Matlab because it was not only fully compatible (in terms of syntax) with Matlab in our tests, but it also performed very well.
The Graph 500 Benchmark on a Medium-Size Distributed-Memory Cluster with High-Performance Interconnect
(2012-12-17) Angel, Jordan B.; Flores, Amy M.; Heritage, Justine S.; Wardrip, Nathan C.; Raim, Andrew M.; Gobbert, Matthias K.; Murphy, Richard C.; Mountain, David J.
While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance gauge for many large problems today. The Graph 500 benchmark has been developed to measure a computer’s performance in memory retrieval. The Graph 500 implementation considers large, randomly generated graphs, which may be spread across many nodes on a distributed memory cluster. The benchmark conducts breadth-first searches on these graphs, and measures performance in billions of traversed edges per second (GTEPS). We present our experience implementing and running the Graph 500 benchmark on the medium-size distributed-memory cluster tara in the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). The cluster tara has 82 compute nodes, each with two quad-core Intel Nehalem X5550 CPUs and 24 GB of memory, connected by a high-performance quad-data rate InfiniBand interconnect. Results are explained in detail in terms of the machine architecture, which demonstrates that the Graph 500 benchmark indeed provides a measure of memory access as the chief bottleneck for many applications. Our best run to date was of scale 31 using 64 nodes and achieved a GTEPS rate that placed tara at rank 98 on the November 2012 Graph 500 list.
Graph 500 Performance on a Distributed-Memory Cluster
(2012) Gobbert, Matthias K.; Angel, Jordan B.; Flores, Amy M.; Heritage, Justine S.; Wardrip, Nathan C.; Raim, Andrew M.; Murphy, Richard C.; Mountain, David J.
Social network and medical informatics analysis are examples of modern computing problems that involve large data sets. These data can be represented using graphs, which are sets of vertices that are connected by edges. While traditional performance benchmarks for high-performance computers measure the speed of arithmetic operations, memory access time is a more useful performance standard for large graph problems. The Graph 500 benchmark is intended to rank high-performance computers based on speed of memory retrieval. The implementation of this benchmark considers a large, randomly generated graph. It then executes a breadth-first search of the graph starting at one vertex and visiting all other vertices that are connected to the source. The search is executed 64 times using a different vertex as the root each time. Each search records billions of traversed edges per second (GTEPS), and the harmonic mean of these measurements establishes the ranking on the Graph 500 list. We implement the Graph 500 benchmark on the distributed-memory cluster tara in the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). The cluster tara has 82 compute nodes, each with two quad-core Intel Nehalem X5550 CPUs and 24 GB of memory, connected by a quad-data rate InfiniBand interconnect. Our best run to date using 64 nodes achieved a GTEPS rate that would put tara at rank 58 on the June 2012 Graph 500 list. We intend to submit an official benchmark run for the next publication of the Graph 500 list in November 2012.
Identifying Nonlinear Correlations in High Dimensional Data with Application to Protein Molecular Dynamics Simulations
(2013) Bailey, William J.; Chambless, Claire A.; Cho, Brandynne M.; Smith, Jesse D.; Raim, Andrew M.; Adragni, Kofi P.; Thorpe, Ian F.
Complex biomolecules such as proteins can respond to changes in their environment through a process called allostery, which plays an important role in regulating the function of these biomolecules. Allostery occurs when an event at a specific location in a macromolecule produces an effect at a location in the molecule some distance away. An important component of allostery is the coupling of protein sites. Such coupling is one mechanism by which allosteric effects can be transmitted over long distances. To understand this phenomenon, molecular dynamic simulations are carried out with a large number of atoms, and the trajectories of these atoms are recorded over time. Simple correlation methods have been used in the literature to identify coupled motions between protein sites. We implement a recently developed statistical method for dimension reduction called principal fitted components (PFC) in the statistical programming language R to identify both linear and non-linear correlations between protein sites while dealing efficiently with the high dimensionality of the data. PFC models reduce the dimensionality of data while capturing linear and nonlinear dependencies among predictors (atoms) using a flexible set of basis functions. For faster processing, we implement the PFC algorithm using parallel computing through the Programming with Big Data in R (pbdR) package for R. We demonstrate the methods’ effectiveness on simulated datasets, and apply the routine to time series data from Molecular Dynamic (MD) simulations to identify coupled motion among the atoms.
An Implementation of Binomial Method of Option Pricing using Parallel Computing
Popuri, Sai K.; Raim, Andrew M.; Neerchal, Nagaraj K.; Gobbert, Matthias K.
The Binomial method of option pricing is based on iterating over discounted option payoffs in a recursive fashion to calculate the present value of an option. Implementing the Binomial method to exploit the resources of a parallel computing cluster is non-trivial as the method is not easily parallelizable. We propose a procedure to transform the method into an “embarrassingly parallel” problem by mapping Binomial probabilities to Bernoulli paths. We have used the parallel computing capabilities in R with the Rmpi package to implement the methodology on the cluster tara in the UMBC High Performance Computing Facility, which has 82 compute nodes with two quad-core Intel Nehalem processors and 24 GB of memory on a quad-data rate InfiniBand interconnect. With high-performance clusters and multi-core desktops becoming increasingly accessible, we believe that our method will have practical appeal to financial trading firms.
Intel Concurrent Collections as a Method for Parallel Programming
(2011) Adjogah, Richard; Mckissack, Randal; Sibeudu, Ekene; Raim, Andrew M.; Gobbert, Matthias K.; Craymer, Loring
Computer hardware has become parallel in order to run faster and more efficient. One of the current standard parallel coding libraries is MPI (Message Passing Interface). The Intel Corporation is developing a new parallel software and translator called CnC (Concurrent Collections) to make programming in parallel easier. When using MPI, the user has to explicitly send and receive messages to and from different processes with multiple function calls. These functions have numerous arguments that need to be passed in; this can be error-prone and a hassle. CnC uses a system of collections comprised of steps, items, and tags to create a graph representation of the algorithm that defines the parallelizable code segments and their dependencies. Instead of manually assigning work to processes like MPI, the user specifies the work to be done and CnC automatically handles parallelization. This, in theory, reduces the amount of work the programmer has to do. Our research evaluates if this new software is efficient and usable when creating parallel code and converting serial code to parallel. To test the difference between the two methods, we used benchmark codes with both MPI and CnC and compared the results. We started with a prime number generator provided by Intel as sample code that familiarizes programmers with CnC. Then we moved on to a π approximation, for which we used a MPI sample code that uses integration to approximate π. We ran it in MPI first, then stripped it of all MPI, ported it to C++, and added our own CnC code. We then ran performance studies to compare the two. Our last two tests involved doing parameter studies on a variation of the Poisson equation using the finite difference method and a DNA entropy calculating project. We used existing serial code for the two problems and were easily able to create a couple of new files to run the studies using CnC. The studies ran multiple calls to the problem functions in parallel with varying parameters. These last two tests showcase a clear advantage CnC has over MPI in parallelization of these types of problems. Both the Poisson and the DNA problems showed how useful techniques from parallel computing and using an intuitive tool such as CnC can be for helping application researchers.
Introduction to Distributed Computing with pbdR at the UMBC High Performance Computing Facility
(2013-06-26) Raim, Andrew M.
ManifoldOptim: An R Interface to the ROPTLIB Library for Riemannian Manifold Optimization
(Foundation for Open Access Statistics, 2020-04-18) Martin, Sean; Raim, Andrew M.; Huang, Wen; Adragni, Kofi P.
Manifold optimization appears in a wide variety of computational problems in the applied sciences. In recent statistical methodologies such as sufficient dimension reduction and regression envelopes, estimation relies on the optimization of likelihood functions over spaces of matrices such as the Stiefel or Grassmann manifolds. Recently, Huang, Absil, Gallivan, and Hand (2016) have introduced the library ROPTLIB, which provides a framework and state of the art algorithms to optimize real-valued objective functions over commonly used matrix-valued Riemannian manifolds. This article presents ManifoldOptim, an R package that wraps the C++ library ROPTLIB. ManifoldOptim enables users to access functionality in ROPTLIB through R so that optimization problems can easily be constructed, solved, and integrated into larger R codes. Computationally intensive problems can be programmed with Rcpp and RcppArmadillo, and otherwise accessed through R. We illustrate the practical use of ManifoldOptim through several motivating examples involving dimension reduction and envelope methods in regression.
Maximum-likelihood estimation of the random-clumped multinomial model as a prototype problem for large-scale statistical computing
(Taylor and Francis Online, 2012-05-08) Raim, Andrew M.; Gobbert, Matthias K.; Neerchal, Nagaraj K.; Morel, Jorge G.
Numerical methods are needed to obtain maximum-likelihood estimates (MLEs) in many problems. Computation time can be an issue for some likelihoods even with modern computing power. We consider one such problem where the assumed model is a random-clumped multinomial distribution. We compute MLEs for this model in parallel using the Toolkit for Advanced Optimization software library. The computations are performed on a distributed-memory cluster with low latency interconnect. We demonstrate that for larger problems, scaling the number of processes improves wall clock time significantly. An illustrative example shows how parallel MLE computation can be useful in a large data analysis. Our experience with a direct numerical approach indicates that more substantial gains may be obtained by making use of the specific structure of the random-clumped model
Modeling Overdispersion in R
(2015) Raim, Andrew M.; Neerchal, Nagaraj K.; Morel, Jorge G.
The book Overdispersion Models in SAS by Morel and Neerchal (2012) discusses statistical analysis of categorical and count data which exhibit overdispersion, with a focus on computational procedures using SAS. This document retraces some of the ground covered in the book, which we abbreviate throughout as OMSAS, with the objective of carrying out similar analyses in R (R Core Team, 2014). Rather than attempting to cover every example in OMSAS, we will focus on two specific goals: analysis based on binomial/multinomial likelihoods which support extra variation, and model selection with the binomial goodness-of-fit (GOF) test. We will not cover examples based on count data, but extension to those should not be difficult. We will generally not spend much time discussing the data, on justification for the selected models, or on interpretation of the results. The reader should refer to OMSAS for more complete discussions of the examples and statistical models. In several places we will present additional material not found in OMSAS, such as the binomial finite mixture and the recently proposed Mixture Link binomial model.
Parallel Performance Studies for a Maximum Likelihood Estimation Problem Using TAO
(2009) Raim, Andrew M.; Gobbert, Matthias K.
In this report, we present an application of parallel computing to an estimation procedure in statistics. The method of maximum likelihood estimation (MLE) is based on the ability to perform maximizations of probability functions. In practice, this work is often performed by computer with numerical methods, and may be time consuming for some likelihood functions. We consider one such likelihood function based on the Finite Mixture Multinomial distribution. We perform estimation for this problem in parallel using the Toolkit for Advanced Optimization (TAO) software library. The computations are performed on a distributed-memory cluster with InfiniBand interconnect in the High Performance Computing Facility at University of Maryland, Baltimore County (UMBC). We study how the resource requirements change as problem sizes vary, and demonstrate that scaling the number of processes for larger problems decreases wall clock time significantly.
Parallel Performance Studies for a Parabolic Test Problem on the Cluster tara
(2010) Muscedere, Michael; Raim, Andrew M.; Gobbert, Matthias K.
The performance of parallel computer code depends on the intricate interplay of processors, the architecture of the computer nodes, their interconnect network, the numerical algorithm, and its implementation. The solution of large, sparse, highly structured of equations of linear equations by an iterative linear solver that requires communication between the parallel processes at every iteration is an instructive test of this interplay. This note considers a parabolic test problem given by a time-dependent, scalar, linear reaction-diffusion equation in three dimensions, whose time-stepping requires the solution of such a system of linear equations at every timestep. The results presented here show excellent performance on the cluster tara with up to 512 parallel processes when using 64 compute nodes. The results support the scheduling policy implemented, since they confirm that it is beneficial to use all eight cores of the two quad-core processors on each node simultaneously, giving us in-effect a computer that can run jobs efficiently with up to 656 parallel processes when using all 82 compute nodes. The cluster tara is an IBM server x iDataPlex purchased in 2009 by the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). It is an 86-node distributed-memory cluster comprised of 82 compute, 2 develop, 1 user and 1 management nodes. Each node features two quad-core Intel Nehalem X5550 processors (2.66 GHz, 8 MB cache), 24 GB memory, and a 120 GB local hard drive. All nodes and the 160 TB central storage are connected by an InfiniBand (QDR) interconnect network.
Parallel Performance Studies for an Elliptic Test Problem on the Cluster tara
(2010) Raim, Andrew M.; Gobbert, Matthias K.
The performance of parallel computer code depends on an intricate interplay of the processors, the architecture of the compute nodes, their interconnect network, the numerical algorithm, and its implementation. The solution of large, sparse, highly structured systems of linear equations by an iterative linear solver that requires communication between the parallel processes at every iteration is an instructive test of this interplay. This note considers the classical elliptic test problem of a Poisson equation with Dirichlet boundary conditions in two spatial dimensions, whose approximation by the finite difference method results in a linear system of this type. Our existing implementation of the conjugate gradient method for the iterative solution of this system is known to have the potential to perform well up to many parallel processes, provided the interconnect network has low latency. Since the algorithm is known to be memory bound, it is also vital for good performance that the architecture of the nodes in conjunction with the scheduling policy does not create a bottleneck. The results presented here show excellent performance on the cluster tara with up to 512 parallel processes when using 64 compute nodes. The results support the scheduling policy implemented, since they confirm that it is beneficial to use all eight cores of the two quad-core processors on each node simultaneously, giving us in effect a computer that can run jobs efficiently with up to 656 parallel processes when using all 82 compute nodes. The cluster tara is an IBM Server x iDataPlex purchased in 2009 by the UMBC High Performance Computing Facility (www.umbc.edu/hpcf). It is an 86-node distributed-memory cluster comprised of 82 compute, 2 develop, 1 user, and 1 management nodes. Each node features two quad-core Intel Nehalem X5550 processors (2.66 GHz, 8 MB cache), 24 GB memory, and a 120 GB local hard drive. All nodes and the 160 TB central storage are connected by an InfiniBand (QDR) interconnect network.
Parallelization of Matrix Factorization for Recommender Systems
(2010) Baum, Julia; Cook, Cynthia; Curtis, Michael; Edgerton, Joshua; Rabidoux, Scott; Raim, Andrew M.; Neerchal, Nagaraj K.; Bell, Robert M.
Recommender systems are emerging as important tools for improving customer satisfaction by mathematically predicting user preferences. Several major corporations including Amazon.com and Pandora use these types of systems to suggest additional options based on current or recent purchases. Netflix uses a recommender system to provide its customers with suggestions for movies that they may like, which are based on their previous ratings. In 2006, Netflix released a large data set to the public and offered one million dollars for significant improvements on their system. In 2009, BellKor’s Pragmatic Chaos, a team of seven, won the prize by combining individual methods. Dr. Robert Bell, with whom we collaborated, was a member of the winning team and provided us with the data set used in this project, consisting of a sparse matrix with rows of users and columns of movies. The entries of the matrix are ratings given to movies by certain users. The objective is to obtain a model that predicts future ratings a user might give for a specific movie. This model is known as a collaborative filtering model, which encompasses the average movie rating (mu), the rating bias of the user (b), the overall popularity of a movie (a), and the interaction between user preferences (p) and movie characteristics (q). Two methods, Alternating Least Squares and Stochastic Gradient Descent, were used to estimate each parameter in this non-linear regression model. Each method fits characteristic vectors for movies and users from the existing data. The overall focus of this project is to explore the two methods, and to investigate the suitability of parallel computing utilizing the cluster Tara in the UMBC High Performance Computing Facility.
Parallelizing Computation of Expected Values in Recombinant Binomial Trees
(Taylor and Francis Online, 2017-11-24) Popuri, Sai K.; Raim, Andrew M.; Neerchal, Nagaraj K.; Gobbert, Matthias K.
Recombinant binomial trees are binary trees where each non-leaf node has two child nodes, but adjacent parents share a common child node. Such trees arise in option pricing in finance. For example, an option can be valued by evaluating the expected payoffs with respect to random paths in the tree. The cost to exactly compute expected values over random paths grows exponentially in the depth of the tree, rendering a serial computation of one branch at a time impractical. We propose a parallelization method that transforms the calculation of the expected value into an embarrassingly parallel problem by mapping the branches of the binomial tree to the processes in a multiprocessor computing environment. We also discuss a parallel Monte Carlo method and verify the convergence and the variance reduction behavior by simulation study. Performance results from R and Julia implementations are compared on a distributed computing cluster.
Performance Evaluation of Minimum Average Deviance Estimation in High Dimensional Poisson Regression
(2015) Biggs, Ely; Helble, Tessa; Jeffreys, George; Nayak, Amit; Al-Najjar, Elias; Raim, Andrew M.; Adragni, Kofi P.
The second most expensive part of the 2010 Decennial Census was Address Canvassing (AdCan), a eld operation to prepare the Master Address File (MAF) for census day. The MAF is a database of households in the United States maintained by the US Census Bureau and is used as a basis for the census and household surveys that it conducts. Motivated by the importance of the MAF and the cost of a large scale AdCan operation, there is an interest to use statistical methodologies to explain MAF errors discovered during canvassing. Ideally, statistical models could be used to predict future errors and assist with updating of the MAF. A major challenge in constructing a MAF error model is that important predictor variables associated with MAF errors are not known. Some recent works at Census Bureau have carried out variable selection using a collection of data sources, treating counts of errors per census block as the outcome. It may be possible to use dimension reduction methodologies to obtain count models with much lower dimensional predictors. Adragni et al. [4] proposed a methodology called Minimum Average Deviance Estimation (MADE), which is based on the concept of local regression and embeds sufficient dimension reduction of the predictors. MADE assumes a forward regression with the response variable following an exponential family distribution, such as Poisson for counts. The goal of this project is to evaluate the performance of MADE on large data sets using simulations. We parallelized several snippets of the MADE source code to improve its performance and compare the speed up of these parallelized snippets with their serial alternatives. Simulated data sets with increasing dimensions are used to evaluate the run time. A limited stress test is performed to determine the extent of problem size that MADE can handle on maya, a high performance computing cluster at UMBC. These tests allow us to evaluate the capabilities of MADE to scale to large data sets, such as the AdCan modeling problem.

Browsing by Author "Raim, Andrew M."

Results Per Page

Sort Options