Building Practical, Efficient, and Reliable Fault-tolerant Distributed Systems

Author/Creator

Author/Creator ORCID

Date

2021-01-01

Department

Computer Science and Electrical Engineering

Program

Computer Science

Citation of Original Publication

Rights

This item may be protected under Title 17 of the U.S. Copyright Law. It is made available by UMBC for non-commercial research and education. For permission to publish or reproduce, please see http://aok.lib.umbc.edu/specoll/repro.php or contact Special Collections at speccoll(at)umbc.edu
Distribution Rights granted to UMBC by the author.
Access limited to the UMBC community. Item may possibly be obtained via Interlibrary Loan thorugh a local library, pending author/copyright holder's permission.

Abstract

Building a real-world distributed system should consider a range of fundamental design objectives, including fault tolerance, reliability, performance, and scalability. Modern distributed systems require tolerance to any kind of service disruption, whethercaused by a simple hardware fault or by a large-scale disaster. Famous systems like ZooKeeper, Google file system (GFS) and Bigtable are designed to tolerate benign faults. Byzantine fault-tolerant (BFT) state machine replication (SMR) is regarded as an ideal candidate that can tolerate arbitrary faulty behaviors, and can be applied to multiple real-world systems. BFT SMR as one of consensus protocols is the core of blockchain to provide agreement services, and its efficiency highly affects the performance of a blockchain system. This dissertations presents three fault-tolerant distributed systems by leveraging novel BFT protocols, practical cryptographic schemes and libraries, efficient and scalable system designs, modern programming language, and complete and detailed evaluations and deployments. Besides fault tolerance, this dissertations also presents different approaches to build distributed systems toward high performance, scalability, and usability. In this work, we first focus on constructing distributed systems with asynchronous BFT protocol. Asynchronous BFT protocols such as HoneyBadgerBFT and BEAT can achieve only static security. Unfortunately, the weaker static model of security does not capture the power of several types of attackers. We develop two protocols EPIC and HALE to defend against adaptive corruption, where the adversary can corrupt the replicas at any moment during the execution of the protocol. Meanwhile, when there is no contention or contention is rare among correct replicas, it is necessary for correct replicas to terminate fast so that performance can be improved. We develop MiB, a novel andefficient asynchronous BFT framework using new distributed system constructions as building blocks. We also systematically carried out experiments for asynchronous BFT protocols with failures and evaluated their performance in various failure scenarios. Finally, we present Chios, an intrusion-tolerant publish/subscribe system which protects against Byzantine failures. Our publish/subscribe system achieves decentralized confidentiality with fine-grained access control and strong publication order guarantees.