|
|
|
Adding Fault-Tolerance to a Hierarchical DRE System
Citation: Paul Rubel, Joseph Loyall, Richard Schantz, Matthew Gillen. LNCS 4025/2006 pp 303 - 308. ( Proceedings of Distributed Applications and Interoperable Systems: 6th IFIP WG 6.1 International Conference, DAIS 2006, Bologna, Italy, June 14-16, 2006.)
Formats: PDF
Abstract Dynamic resource management is a crucial part of the infrastructure for emerging mission-critical distributed real-time embedded system. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes an ongoing effort to develop a fault-tolerant multi-layer dynamic re-source management capability and the challenges we have encountered, including multi-tiered structure, rapid recovery, the characteristics of component middleware, and the co-existence of replicated and non-replicated elements. While some of these have been investigated before, this work exhibits all of these characteristics simulta-neously, presenting a significant fault-tolerance research challenge.