Fault Tolerant Approaches for Distributed Real-Time and Embedded Systems

Citation: Paul Rubel, Matthew Gillen, Joseph Loyall, Aniruddha Gokhale, Jaiganesh Balasubramanian, Priya Narasimhan, and Aaron Paulos . Fault Tolerant Approaches for Distributed Real-Time and Embedded Systems . Military Communications Conference (MILCOM), Orlando, Florida, October 29-31, 2007.

Formats: PDF

Abstract

Fault tolerance (FT) is a crucial design consideration for mission-critical distributed real-time and embedded (DRE) systems, which combine the real-time characteristics of embedded platforms with the dynamic characteristics of distributed platforms. Traditional FT approaches do not address features that are common in DRE systems, such as scale, heterogeneity, real-time requirements, and other characteristics. Most previous R&D efforts in FT have focused on client-server object systems, whereas DRE systems are increasingly based on component-oriented architectures, which support more complex interaction patterns, such as peer-to-peer. This paper describes our current applied R&D efforts to develop FT technology for DRE systems. First, we describe three enhanced FT techniques that support the needs of DRE systems: a transparent approach to mixed-mode communication, auto-configuration of dynamic systems, and duplicate management for peer-to-peer interactions. Second, we describe an integrated FT capability for a real-world component-based DRE system that uses off-the-shelf FT middleware integrated with our enhanced FT techniques. We present experimental results that show that our integrated FT capability meets the DRE system's real-time performance requirements for both the responsiveness of failure recovery and the minimal amount of overhead introduced into the fault-free case.

BBN Home Projects Technologies People Papers Comments
© 2010 BBN Technologies