TDLR: Top (Semantic)-Down (Syntactic) Language Representation

Author/Creator ORCID

Date

2022

Department

Program

Citation of Original Publication

Rights

This item is likely protected under Title 17 of the U.S. Copyright Law. Unless on a Creative Commons license, for uses protected by Copyright Law, contact the copyright holder or the author.

Abstract

Language understanding involves processing text with both the grammatical and 2 common-sense contexts of the text fragments. The text “I went to the grocery store 3 and brought home a car” requires both the grammatical context (syntactic) and 4 common-sense context (semantic) to capture the oddity in the sentence. Contex5 tualized text representations learned by Language Models (LMs) are expected to 6 capture a variety of syntactic and semantic contexts from large amounts of training 7 data corpora. Recent work such as ERNIE has shown that infusing the knowl8 edge contexts, where they are available in LMs, results in significant performance 9 gains on General Language Understanding (GLUE) benchmark tasks. However, 10 to our knowledge, no knowledge-aware model has attempted to infuse knowledge 11 through top-down semantics-driven syntactic processing (Eg: Common-sense to 12 Grammatical) and directly operated on the attention mechanism that LMs leverage 13 to learn the data context. We propose a learning framework Top-Down Language 14 Representation (TDLR) to infuse common-sense semantics into LMs. In our 15 implementation, we build on BERT for its rich syntactic knowledge and use the 16 knowledge graphs ConceptNet and WordNet to infuse semantic knowledge.