Enabling easier translation of building code specifications using sentence similarity methods

Author/Creator

Author/Creator ORCID

Date

2021-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Subjects

Abstract

Construction projects must comply with various regulations specified in building codes. Architecture today has become very complex, but the process of checking compliance is still manual, costly, slow, and error prone. The fragmentation and diverse nature of the industry, complex networks of stakeholders, competitiveness, declining productivity, and a lack of motivation to adopt new technology have certainly contributed to this [4]. With the advancement in technology, there have been many research efforts in automating the compliance checking process through machine learning. These machine learning systems need a dataset that contains the building code and its machine-interpretable representations to train. The building codes documents are extensive, and creating the dataset requires a lot of manual work as each building rule needs to be converted to its machine-interpretable representation. This process can be simplified by assigning a similarity score to all the rules in the building code document and applying a clustering algorithm to find coherent rule clusters. These coherent rule clusters can then be converted to machine-interpretable representations using templates with less effort. In this theses, we explore five sentence similarity methods to find analogous building code specifications and use K-means algorithm to cluster them into coherent sets. We offer a description and analysis of each of the methods, discuss the method used for clustering, and closely examine the performance of each method on the building code document. We use the International Building Code (IBC) document to calculate the similarity score and to apply each of the methods.