Mining patents using molecular similarity search

dc.contributor.authorRhodes, James
dc.contributor.authorBoyer, Stephen
dc.contributor.authorKreulen, Jeffrey
dc.contributor.authorChen, Ying
dc.contributor.authorOrdoñez, Patricia
dc.date.accessioned2025-06-05T14:02:42Z
dc.date.available2025-06-05T14:02:42Z
dc.date.issued2006-12
dc.descriptionPacific Symposium on Biocomputing 2007
dc.description.abstractText analytics is becoming an increasingly important tool used in biomedical research. While advances continue to be made in the core algorithms for entity identification and relation extraction, a need for practical applications of these technologies arises. We developed a system that allows users to explore the US Patent corpus using molecular information. The core of our system contains three main technologies: A high performing chemical annotator which identifies chemical terms and converts them to structures, a similarity search engine based on the emerging IUPAC International Chemical Identifier (InChI) standard, and a set of on demand data mining tools. By leveraging this technology we were able to rapidly identify and index 3,623,248 unique chemical structures from 4,375,036 US Patents and Patent Applications. Using this system a user may go to a web page, draw a molecule, search for related Intellectual Property (IP) and analyze the results. Our results prove that this is a far more effective way for identifying IP than traditional keyword based approaches.
dc.description.urihttps://pubmed.ncbi.nlm.nih.gov/17990501/
dc.format.extent12 pages
dc.genreconference papers and proceedings
dc.identifierdoi:10.13016/m26uyl-tdid
dc.identifier.citationRhodes, James, Stephen Boyer, Jeffrey Kreulen, Ying Chen, and Patricia Ordonez. “Mining Patents Using Molecular Similarity Search.” In Biocomputing 2007, 304–15. WORLD SCIENTIFIC, 2006. https://doi.org/10.1142/9789812772435_0029.
dc.identifier.urihttp://hdl.handle.net/11603/38579
dc.language.isoen_US
dc.publisherPacific Symposium on Biocomputing
dc.relation.isAvailableAtThe University of Maryland, Baltimore County (UMBC)
dc.relation.ispartofUMBC Information Systems Department
dc.rightsThis item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.
dc.subjectChemical Similarity
dc.subjectPatents
dc.subjectInChI
dc.subjectSearch Engine
dc.subjectData Mining
dc.titleMining patents using molecular similarity search
dc.typeText

Files