Large Language Model Driven Analysis of General Coordinates Network (GCN) Circulars

dc.contributor.authorSharma, Vidushi
dc.contributor.authorAgarwala, Ronit
dc.contributor.authorRacusin, Judith L.
dc.contributor.authorSinger, Leo P.
dc.contributor.authorBarna, Tyler
dc.contributor.authorBurns, Eric
dc.contributor.authorCoughlin, Michael W.
dc.contributor.authorDutko, Dakota
dc.contributor.authorElliott, Courey
dc.contributor.authorGupta, Rahul
dc.contributor.authorMahabal, Ashish
dc.contributor.authorMukund, Nikhil
dc.date.accessioned2026-01-22T16:19:13Z
dc.date.issued2025-11-18
dc.description.abstractThe General Coordinates Network (GCN) is NASA's time-domain and multi-messenger alert system. GCN distributes two data products - automated ``Notices,'' and human-generated ``Circulars,'' that report the observations of high-energy and multi-messenger astronomical transients. The flexible and non-structured format of GCN Circulars, comprising of more than 40500 Circulars accumulated over three decades, makes it challenging to manually extract observational information, such as redshift or observed wavebands. In this work, we employ large language models (LLMs) to facilitate the automated parsing of transient reports. We develop a neural topic modeling pipeline with open-source tools for the automatic clustering and summarization of astrophysical topics in the Circulars database. Using neural topic modeling and contrastive fine-tuning, we classify Circulars based on their observation wavebands and messengers. Additionally, we separate gravitational wave (GW) event clusters and their electromagnetic (EM) counterparts from the Circulars database. Finally, using the open-source Mistral model, we implement a system to automatically extract gamma-ray burst (GRB) redshift information from the Circulars archive, without the need for any training. Evaluation against the manually curated Neil Gehrels Swift Observatory GRB table shows that our simple system, with the help of prompt-tuning, output parsing, and retrieval augmented generation (RAG), can achieve an accuracy of 97.2 % for redshift-containing Circulars. Our neural search enhanced RAG pipeline accurately retrieved 96.8 % of redshift circulars from the manually curated database. Our study demonstrates the potential of LLMs, to automate and enhance astronomical text mining, and provides a foundation work for future advances in transient alert analysis.
dc.description.sponsorshipWe thank the anonymous referee for useful comments and suggestions on the manuscript. VS was sponsored by support from the National Aeronautics and Space Administration (NASA) through a cooperative agreement with Center for Research and Exploration in Space Science and Technology II (CRESST II). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Aeronautics and Space Administration (NASA) or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. The GCN team acknowledges support from the NASA’s Internal Scientist Funding Model (ISFM) program. This research has made use of data obtained through the General Coordinate Network (GCN) Service, provided by the NASA Goddard Space Flight Center (GSFC), in support of NASA’s High Energy Astrophysics Programs. The authors would also like to thank Daniela Huppenkothen for the insightful discussions. RG was sponsored by the National Aeronautics and Space Administration (NASA) through a contract with ORAU. M.W.C acknowledges support from the National Science Foundation with grant numbers PHY-2409481, PHY-2308862 and PHY-2117997. NM acknowledges support from the National Science Foundation (NSF) under awards PHY-1764464 and PHY-2309200 to the LIGO Laboratory, under Cooperative Agreement PHY-2019786 (The NSF AI Institute for Artificial Intelligence and Fundamental Interactions, http://iaifi.org/), and from MathWorks, Inc.
dc.description.urihttp://arxiv.org/abs/2511.14858
dc.format.extent61 pages
dc.genrejournal articles
dc.genrepostprints
dc.identifierdoi:10.13016/m215d4-lyg9
dc.identifier.urihttps://doi.org/10.48550/arXiv.2511.14858
dc.identifier.urihttp://hdl.handle.net/11603/41558
dc.language.isoen
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Center for Space Sciences and Technology (CSST) / Center for Research and Exploration in Space Sciences & Technology II (CRSST II)
dc.relation.ispartofUMBC Faculty Collection
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectAstrophysics - Instrumentation and Methods for Astrophysics
dc.subjectAstrophysics - High Energy Astrophysical Phenomena
dc.titleLarge Language Model Driven Analysis of General Coordinates Network (GCN) Circulars
dc.typeText
dcterms.creatorhttps://orcid.org/0000-0002-4394-4138

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2511.14858v1.pdf
Size:
2.68 MB
Format:
Adobe Portable Document Format