UMBC Mathematics and Statistics Department

Permanent URI for this collectionhttp://hdl.handle.net/11603/56

Browse

Recent Submissions

Now showing 1 - 20 of 660
  • Item
    Inference about a Common Mean Vector from Several Independent Multinormal Populations with Unequal and Unknown Dispersion Matrices
    (MDPI, 2024-08-31) Kifle, Yehenew Getachew; Moluh, Alain M.; Sinha, Bimal K.
    This paper addresses the problem of making inferences about a common mean vector from several independent multivariate normal populations with unknown and unequal dispersion matrices. We propose an unbiased estimator of the common mean vector, along with its asymptotic estimated variance, which can be used to test hypotheses and construct confidence ellipsoids, both of which are valid for large samples. Additionally, we discuss an approximate method based on generalized p-values. The paper also presents exact test procedures and methods for constructing exact confidence sets for the common mean vector, with a comparison of the local power of these exact tests. The performance of the proposed methods is demonstrated through a simulation study and an application to data from the Current Population Survey (CPS) Annual Social and Economic (ASEC) Supplement 2021 conducted by the U.S. Census Bureau for the Bureau of Labor Statistics.
  • Item
    Determination of the residual efficacy of broflanilide (VECTRON™ T500) insecticide for indoor residual spraying in a semi-field setting in Ethiopia
    (BMC, 2025-02-13) Simma, Eba Alemayehu; Zegeye, Habtamu; Akessa, Geremew Muleta; Kifle, Yehenew Getachew; Zemene, Endalew; Degefa, Teshome; Yewhalaw, Delenasaw
    The rotational use of insecticides with diverse modes of action in indoor residual spraying (IRS) is pivotal for enhancing malaria vector control and addressing insecticide resistance. A key factor in national malaria vector control/elimination programmes is the rate at which these insecticides decay. VECTRON™ T500, with broflanilide as its active ingredient, is a recently developed candidate insecticide formulation which has shown promising results in certain phase II experimental hut trials. However, its residual efficacy across different settings has not been thoroughly investigated. This study evaluated the efficacy of VECTRON™ T500 on various wall surfaces (mud, dung, paint, and cement) and assessed its decay rates over time in Ethiopia.
  • Item
    Semiparametric modeling of time-varying activation and connectivity in task-based fMRI data
    (Elsevier, 2020-10-01) Park, Jun Young; Polzehl, Joerg; Chatterjee, Snigdhansu; Brechmann, André; Fiecas, Mark
    In functional magnetic resonance imaging (fMRI), there is a rise in evidence that time-varying functional connectivity, or dynamic functional connectivity (dFC), which measures changes in the synchronization of brain activity, provides additional information on brain networks not captured by time-invariant (i.e., static) functional connectivity. While there have been many developments for statistical models of dFC in resting-state fMRI, there remains a gap in the literature on how to simultaneously model both dFC and time-varying activation when the study participants are undergoing experimental tasks designed to probe at a cognitive process of interest. A method is proposed to estimate dFC between two regions of interest (ROIs) in task-based fMRI where the activation effects are also allowed to vary over time. The proposed method, called TVAAC (time-varying activation and connectivity), uses penalized splines to model both time-varying activation effects and time-varying functional connectivity and uses the bootstrap for statistical inference. Simulation studies show that TVAAC can estimate both static and time-varying activation and functional connectivity, while ignoring time-varying activation effects would lead to poor estimation of dFC. An empirical illustration is provided by applying TVAAC to analyze two subjects from an event-related fMRI learning experiment.
  • Item
    The influence of decision-making in tree ring-based climate reconstructions
    (Springer Nature, 2021-06-07) Büntgen, Ulf; Allen, Kathy; Anchukaitis, Kevin J.; Arseneault, Dominique; Boucher, Étienne; Bräuning, Achim; Chatterjee, Snigdhansu; Cherubini, Paolo; Churakova (Sidorova), Olga V.; Corona, Christophe; Gennaretti, Fabio; Grießinger, Jussi; Guillet, Sebastian; Guiot, Joel; Gunnarson, Björn; Helama, Samuli; Hochreuther, Philipp; Hughes, Malcolm K.; Huybers, Peter; Kirdyanov, Alexander V.; Krusic, Paul J.; Ludescher, Josef; Meier, Wolfgang J.-H.; Myglan, Vladimir S.; Nicolussi, Kurt; Oppenheimer, Clive; Reinig, Frederick; Salzer, Matthew W.; Seftigen, Kristina; Stine, Alexander R.; Stoffel, Markus; St. George, Scott; Tejedor, Ernesto; Trevino, Aleyda; Trouet, Valerie; Wang, Jianglin; Wilson, Rob; Yang, Bao; Xu, Guobao; Esper, Jan
    Tree-ring chronologies underpin the majority of annually-resolved reconstructions of Common Era climate. However, they are derived using different datasets and techniques, the ramifications of which have hitherto been little explored. Here, we report the results of a double-blind experiment that yielded 15 Northern Hemisphere summer temperature reconstructions from a common network of regional tree-ring width datasets. Taken together as an ensemble, the Common Era reconstruction mean correlates with instrumental temperatures from 1794–2016 CE at 0.79 (p < 0.001), reveals summer cooling in the years following large volcanic eruptions, and exhibits strong warming since the 1980s. Differing in their mean, variance, amplitude, sensitivity, and persistence, the ensemble members demonstrate the influence of subjectivity in the reconstruction process. We therefore recommend the routine use of ensemble reconstruction approaches to provide a more consensual picture of past climate variability.
  • Item
    Physics-guided probabilistic modeling of extreme precipitation under climate change
    (Springer Nature, 2020-06-24) Kodra, Evan; Bhatia, Udit; Chatterjee, Snigdhansu; Chen, Stone; Ganguly, Auroop Ratan
    Earth System Models (ESMs) are the state of the art for projecting the effects of climate change. However, longstanding uncertainties in their ability to simulate regional and local precipitation extremes and related processes inhibit decision making. Existing state-of-the art approaches for uncertainty quantification use Bayesian methods to weight ESMs based on a balance of historical skills and future consensus. Here we propose an empirical Bayesian model that extends an existing skill and consensus based weighting framework and examine the hypothesis that nontrivial, physics-guided measures of ESM skill can help produce reliable probabilistic characterization of climate extremes. Specifically, the model leverages knowledge of physical relationships between temperature, atmospheric moisture capacity, and extreme precipitation intensity to iteratively weight and combine ESMs and estimate probability distributions of return levels. Out-of-sample validation suggests that the proposed Bayesian method, which incorporates physics-guidance, has the potential to derive reliable precipitation projections, although caveats remain and the gain is not uniform across all cases.
  • Item
    High dimensional, robust, unsupervised record linkage
    (Statistics Poland, 2020) Bera, Sabyasachi; Chatterjee, Snigdhansu
    We develop a technique for record linkage on high dimensional data, where the two datasets may not have any common variable, and there may be no training set available. Our methodology is based on sparse, high dimensional principal components. Since large and high dimensional datasets are often prone to outliers and aberrant observations, we propose a technique for estimating robust, high dimensional principal components. We present theoretical results validating the robust, high dimensional principal component estimation steps, and justifying their use for record linkage. Some numeric results and remarks are also presented.
  • Item
    On weighted multivariate sign functions
    (Elsevier, 2022-05-21) Majumdar, Subhabrata; Chatterjee, Snigdhansu
    Multivariate sign functions are often used for robust estimation and inference. We propose using data dependent weights in association with such functions. The proposed weighted sign functions retain desirable robustness properties, while significantly improving efficiency in estimation and inference compared to unweighted multivariate sign-based methods. Using weighted signs, we demonstrate methods of robust location estimation and robust principal component analysis. We extend the scope of using robust multivariate methods to include robust sufficient dimension reduction and functional outlier detection. Several numerical studies and real data applications demonstrate the efficacy of the proposed methodology.
  • Item
    Probing an auxiliary laser to tune the repetition rate of a soliton microcomb
    (Optica, 2025-02-15) Mahmood, Tanvir; Cahill, James P.; Sykes, Patrick; Courtright, Logan; Wu, Lue; Vahala, Kerry J.; Menyuk, Curtis; Zhou, Weimin
    We demonstrate that it is possible to linearly tune the repetition rate of a bright soliton comb that is generated using an Si3N4 microring resonator by linearly varying the frequency of an auxiliary heater laser. Hence, the auxiliary laser can be utilized as a linear active feedback element for stabilizing the repetition rate. We investigated the potential of the auxiliary laser as an actuator of the soliton repetition rate by varying the auxiliary laser frequency at different modulation rates. Within the modulation bandwidth of the laser, we find that the variation ratio, defined as the ratio of the change in the repetition rate to the change in the laser frequency, remains unchanged. This variation ratio also quantifies the correlation between the frequency drift of the auxiliary laser and the repetition rate phase noise and makes it possible to examine the impact of frequency drift on the attainable phase noise performance of the soliton microcomb. For our setup, we find that the repetition rate phase noise of the microcomb below a 1-kHz offset from the carrier is dominated by the frequency drift of the auxiliary laser, which emphasizes the importance of deploying an inherently low-phase-noise laser when auxiliary laser heating technique is utilized.
  • Item
    Chemotaxis of Drosophila Border Cells is Modulated by Tissue Geometry Through Dispersion of Chemoattractants
    (Elsevier, 2025-02-05) George, Alexander; Akhavan, Naghmeh; Peercy, Bradford; Starz-Gaiano, Michelle
    Migratory cells respond to graded concentrations of diffusible chemoattractants in vitro, but how complex tissue geometries in vivo impact chemotaxis is poorly understood. To address this, we studied the Drosophila border cells. Live-imaged border cells varied in their chemotactic migration speeds, which correlated positionally with distinct architectures. We then developed a reduced mathematical model to determine how chemoattractant distribution is affected by tissue architecture. Larger extracellular volumes locally dampened the chemoattractant gradient and, when coupled with an agent-based motion of the cluster, reduced cell speeds. This suggests that chemoattractant levels vary by tissue architectures, informing cell migration behaviors locally, which we tested in vivo. Genetically elevating chemoattractant levels slowed migration in specific architectural regions, while mutants with spacious tissue structure rescued defects from high chemoattractant levels, promoting punctual migration. Our results highlight the interplay between tissue geometry and the local distribution of signaling molecules to orchestrate cell migration.
  • Item
    Data-Driven Approaches to Classifier and Variable Selection in High-Dimensional Classification
    (2024-01-01) Andalib, Vahid; Baek, Seungchul; Mathematics and Statistics; Statistics
    Classification in high dimensions has gained significant attention over the past two decades since Fisher's linear discriminant analysis (LDA) is not optimal in a smaller sample size n comparing the number of variables p, i.e., p>n, which is mostly due to the singularity of the sample covariance matrix. This dissertation proposes two novel data-driven approaches to address the challenges in high-dimensional classification, both building upon Fisher's LDA. The first approach involves the development of binary classifiers using random partitioning. Rather than modifying how to estimate the sample covariance and sample mean vector in constructing a classifier, we build two types of high-dimensional classifiers using data splitting, i.e., single data splitting (SDS) and multiple data splitting (MDS). We also present a weighted version of the MDS classifier that further improves classification performance. Each of the split data sets has a smaller size of variables compared to the sample size so that LDA is applicable, and classification results can be combined with respect to minimizing the misclassification rate. We provide theoretical justification backing up our methods by comparing misclassification rates with LDA in high dimensions. The second approach proposes a high-dimensional classifier, which is a two-stage procedure serving variable selection and classification tasks. The variable selection scheme is to select covariates that belong to the discriminative set, and this approach is aimed at obtaining a better classifier, rather than choosing significant variables themselves. In the first stage, we identify discriminative variables by adopting a notion of mirror statistic, proposed recently in the literature, and LDA direction vector obtained from a regularized form of the sample covariance matrix and a James-Stein type estimator for the mean vectors. In the second stage, a new classifier is developed using the selected variables, refined with a modified ?-greedy algorithm to enhance the LDA direction vector. Both approaches are extensively validated through simulation studies and real data analysis, including DNA microarray data sets. Our methods demonstrate superior or comparable performance to existing high-dimensional classifiers, offering improved classification accuracy, effective variable selection, and robustness in various scenarios. This dissertation contributes to the field of high-dimensional statistics by providing novel, theoretically grounded, and effective methods for classification in high-dimensional spaces, with potential applications in genomics, machine learning, and other domains facing the challenges of high-dimensional data analysis.
  • Item
    Using Neural Networks to Sanitize Compton Camera Simulated Data through the BRIDE Pipeline for Improving Gamma Imaging in Proton Therapy on the ada Cluster
    (2024) Chen, Michael O.; Hodge, Julian; Jin, Peter L.; Protz, Ella; Wong, Elizabeth; Obe, Ruth; Shakeri, Ehsan; Cham, Mostafa; Gobbert, Matthias; Barajas, Carlos A.; Jiang, Zhuoran; Sharma, Vijay R.; Ren, Lei; Mossahebi, Sina; Peterson, Stephen W.; Polf, Jerimy C.
    Precision medicine in cancer treatment increasingly relies on advanced radiotherapies, such as proton beam radiotherapy, to enhance e?cacy of the treatment. When the proton beam in this treatment interacts with patient matter, the excited nuclei may emit prompt gamma ray interactions that can be captured by a Compton camera. The image reconstruction from this captured data faces the issue of mischaracterizing the sequences of incoming scattering events, leading to excessive background noise. To address this problem, several machine learning models such as Feedfoward Neural Networks (FNN) and Recurrent Neural Networks (RNN) were developed in PyTorch to properly characterize the scattering sequences on simulated datasets, including newly-created patient medium data, which were generated by using a pipeline comprised of the GEANT4 and Monte-Carlo Detector E?ects (MCDE) softwares. These models were implemented using the novel 態ig-data REU Integrated Development and Experimentation� (BRIDE) platform, a modular pipeline that streamlines preprocessing, feature engineering, and model development and evaluation on parallelized GPU processors. Hyperparameter studies were done on the novel patient data as well as on water phantom datasets used during previous research. Patient data was more di?cult than water phantom data to classify for both FNN and RNN models. FNN models had higher accuracy on patient medium data but lower accuracy on water phantom data when compared to RNN models. Previous results on several di?erent datasets were reproduced on BRIDE and multiple new models achieved greater performance than in previous research.
  • Item
    Profile least squares estimation in networks with covariates
    (2024-12-20) Chandna, Swati; Bagozzi, Benjamin; Chatterjee, Snigdhansu
    Many real world networks exhibit edge heterogeneity with different pairs of nodes interacting with different intensities. Further, nodes with similar attributes tend to interact more with each other. Thus, in the presence of observed node attributes (covariates), it is of interest to understand the extent to which these covariates explain interactions between pairs of nodes and to suitably estimate the remaining structure due to unobserved factors. For example, in the study of international relations, the extent to which country-pair specific attributes such as the number of material/verbal conflicts and volume of trade explain military alliances between different countries can lead to valuable insights. We study the model where pairwise edge probabilities are given by the sum of a linear edge covariate term and a residual term to model the remaining heterogeneity from unobserved factors. We approach estimation of the model via profile least squares and show how it leads to a simple algorithm to estimate the linear covariate term and the residual structure that is truly latent in the presence of observed covariates. Our framework lends itself naturally to a bootstrap procedure which is used to draw inference on model parameters, such as to determine significance of the homophily parameter or covariates in explaining the underlying network structure. Application to four real network datasets and comparisons using simulated data illustrate the usefulness of our approach.
  • Item
    Biological and residual activity of candidate larvicide formulation, SumiLarv 2MR, against an exotic invasive mosquito Anopheles stephensi Liston, 1901 (Diptera: Culicidae) in Ethiopia
    (Springer Nature, 2025-01-02) Yewhalaw, Delenasaw; Erena, Ebisa; Degefa, Teshome; Kifle, Yehenew Getachew; Zemene, Endalew; Simma, Eba Alemayehu
    The study evaluated the efficacy and residual activity of SumiLarv 2MR, SumiLarv 0.5G, and Abate 1SG (used as a positive control) against Anopheles stephensi larvae in Awash Subath Kilo, Afar Regional State, Ethiopia, using a semi-field experimental setup. Plastic containers with capacities of 100L and 250L were used to assess the residual efficacy of SumiLarv 2MR. Specifically, four 100L containers were each treated with one disc of SumiLarv 2MR, compared to two untreated controls. Similarly, four 250L containers received one disc each, with two untreated controls. Additionally, eight 250L containers were treated with a half-dose to match one disc per 500L, alongside four untreated controls. For SumiLarv 0.5G and Abate 1SG, four 100L containers were treated with each larvicide, with two untreated controls for each. Each container received 20� third and fourth instar An. stephensi larvae. Observations of adult emergence were conducted until all pupae either emerged or died. Results showed that SumiLarv 2MR demonstrated a nine-month residual efficacy, SumiLarv 0.5G provided seven weeks of efficacy, and Abate 1SG showed a five-week efficacy. Additionally, SumiLarv 2MR discs retained nearly 50% of their initial pyriproxyfen content after nine months, suggesting potential for extended residual activity. This study highlights the long-term effectiveness of SumiLarv 2MR抯 as a larvicide against An. stephensi in Ethiopia.
  • Item
    Well-posedness and Sensitivity Analysis of a Fluid Model for Multiclass Many-Server Queues with Abandonment Under Global FCFS Discipline
    Kang, Weining
    In this paper, under mild conditions on the arrival, service and patience time distributions, we establish the well-posedness of the fluid model of a multiclass many-server queueing model with differentiated service and patience times operated under the global FCFS service discipline. In particular, the well-posedness of the fluid model is established through the study of the existence and uniqueness of fixed points of certain functional map of Volterra type. In addition, by showing a local Lipschitz property of this functional map as a functional of the initial data to the fluid model, we also perform a sensitivity analysis on the fluid model.
  • Item
    Bayes Estimation of a Common Mean of Several Normal Populations with Unknown Variances
    (University of Rajshahi, 2024-12-23) Mphekgwana, Peter M.; Kifle, Yehenew Getachew; Marange, Chioneso S.
    Combining information from several independent normal populations to estimate a common mean parameter has applications in meta-analysis and is an important statistical problem. For this application, Gregurich and Broemeling (1997) and Tu (2012) concentrated on point estimation employing Bayesian techniques to infer about the common mean of two normal populations with unknown variances. In our study, we expand upon their investigation to encompass k normal populations with a common mean, incorporating a range of objective priors. Through the use of two examples, it is discovered that as the hyperparameter α under a Bayesian framework increases, the performance of the Bayesian technique also improves.IJSS, Vol. 24(2) Special, December, 2024, pp 81-94
  • Item
    HITTING k PRIMES BY DICE ROLLS
    ALON, NOGA; Malinovsky, Yaakov; MARTINEZ, LUCY; ZEILBERGER, DORON
  • Item
    Time-Periodic Solutions for Hyperbolic-Parabolic Systems
    (2024-12-25) Mosny, Stanislav; Muha, Boris; Schwarzacher, Sebastian; Webster, Justin
    Time-periodic weak solutions for a coupled hyperbolic-parabolic system are obtained. A linear heat and wave equation are considered on two respective d-dimensional spatial domains that share a common (d − 1)-dimensional interface Γ. The system is only partially damped, leading to an indeterminate case for existing theory (Galdi et al., 2014). We construct periodic solutions by obtaining novel a priori estimates for the coupled system, reconstructing the total energy via the interface Γ. As a byproduct, geometric constraints manifest on the wave domain which are reminiscent of classical boundary control conditions for wave stabilizability. We note a “loss” of regularity between the forcing and solution which is greater than that associated with the heat-wave Cauchy problem. However, we consider a broader class of spatial domains and mitigate this regularity loss by trading time and space differentiations, a feature unique to the periodic setting. This seems to be the first constructive result addressing existence and uniqueness of periodic solutions in the heat-wave context, where no dissipation is present in the wave interior. Our results speak to the open problem of the (non-)emergence of resonance in complex systems, and are readily generalizable to related systems and certain nonlinear cases.
  • Item
    Fluid Model of A Many-Server Queueing Network with Abandonment and Markovian Routing
    (2024) Kang, Weining; Pang, Guodong
    This paper studies a fluid model for a non-Markovian many-server queueing network with abandonment, where externally arrived and internally routed customers are served under the non-idling global First-Come-First-Serve (FCFS) discipline at each station of many parallel servers. The routing follows a Markovian mechanism. Externally arrived and internally routed customers in each queue may have different service time distributions, as well as different patience time distributions, and all these distributions may depend on the station. The fluid model dynamics is described by the fluid contents of externally arrived customers and internally routed customers in each queue (both waiting and receiving service) and a set of four measure-valued processes, tracking the amount of service time each externally arrived customer in service has received, the amount of service time each internally routed customer in service has received, the waiting times of externally arrived customers and the waiting times of internally routed customers in queue. Under mild conditions on the service and patience time distributions, we prove the existence and uniqueness of a solution to the fluid model equations. We then characterize the invariant states of this fluid model when the arrival rates are constant. We also establish the convergence of the properly scaled stochastic evolution dynamics to the fluid model.
  • Item
    Confidence Ellipsoids of a Multivariate Normal Mean Vector Based on Noise Perturbed and Synthetic Data with Applications
    (SSCA, 2024) Basak, Biswajit; Kifle, Yehenew Getachew; Sinha, Bimal K.
    In this paper we address the problem of constructing a confidence ellipsoid of a multivariate normal mean vector based on a random sample from it. The central issue at hand is the sensitivity of the original data and hence the data cannot be directly used/analyzed. We consider a few perturbations of the original data, namely, noise addition and creation of synthetic data based on the plug-in sampling (PIS) method and the posterior predictive sampling (PPS) method. We review some theoretical results under PIS and PPS which are already available based on both frequentist and Bayesian analysis (Klein and Sinha, 2015, 2016; Guin et al., 2023) and derive the necessary results under noise addition. A theoretical comparison of all the methods based on expected volumes of the confidence ellipsoids is provided. A measure of privacy protection (PP) is discussed and its formulas under PIS, PPS and noise addition are derived and the different methods are compared based on PP. Applications include analysis of two multivariate datasets. The first dataset, with p = 2, is obtained from the latest Annual Social and Economic Supplement (ASEC) conducted by the US Census Bureau in 2023. The second dataset, with p = 3, pertains to renal variables obtained from the book by Harris and Boyd (1995). Using a synthetic version of the original data generated through PIS and PPS methods and also the noise added data, we produce and display the confidence ellipsoids for the unknown mean vector under various scenarios. Finally, the privacy protection measure is evaluated for various methods and different features.
  • Item
    Improving Gamma Imaging in Proton Therapy by Sanitizing Compton Camera Simulated Patient Data using Neural Networks through the BRIDE Pipeline
    (IEEE, 2025-01-16) Chen, Michael O.; Hodge, Julian; Jin, Peter L.; Protz, Ella; Wong, Elizabeth; Cham, Mostafa; Gobbert, Matthias; Barajas, Carlos A.
    Precision medicine in cancer treatment increasingly relies on advanced radiotherapies, such as proton beam radiotherapy, to enhance efficacy of the treatment. When the proton beam in this treatment interacts with patient matter, the excited nuclei may emit prompt gamma ray interactions that can be captured by a Compton camera. The image reconstruction from this captured data faces the issue of mischaracterizing the sequences of incoming scattering events, leading to excessive background noise. To address this problem, several machine learning models such as Feedfoward Neural Networks (FNN) and Recurrent Neural Networks (RNN) were developed in PyTorch to properly characterize the scattering sequences on simulated datasets, including newly-created patient medium data, which were generated by using a pipeline comprised of the GEANT4 and Monte-Carlo Detector Effects (MCDE) softwares. These models were implemented using the novel ‘Big-data REU Integrated Development and Experimentation’ (BRIDE) platform, a modular pipeline that streamlines preprocessing, feature engineering, and model development and evaluation on parallelized GPU processors. Hyperparameter studies were done on the novel patient data as well as on water phantom datasets used during previous research. Patient data was more difficult than water phantom data to classify for both FNN and RNN models. FNN models had higher accuracy on patient medium data but lower accuracy on water phantom data when compared to RNN models. Previous results on several different datasets were reproduced on BRIDE and multiple new models achieved greater performance than in previous research.