Open Access Open Access  Restricted Access Subscription Access

A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed System


Affiliations
1 Department of Computer Science, Amity University, Haryana, India
 

A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. Checkpoint is defined as a fault tolerant technique. It is a save state of a process during the failure-free execution, enabling it to restart from this checkpointed state upon a failure to reduce the amount of lost work instead of repeating the computation from beginning. The process of restoring form previous checkpointed state is known as rollback recovery. A checkpoint can be saved on either the stable storage or the volatile storage depending on the failure scenarios to be tolerated. Checkpointing is major challenge in mobile ad hoc network. The mobile ad hoc network architecture is one consisting of a set of self configure mobile hosts(MH) capable of communicating with each other without the assistance of base stations, some of processes running on mobile host. The main issues of this environment are insufficient power and limited storage capacity. This paper surveys the algorithms which have been reported in the literature for checkpointing in distributed systems as well as Mobile Distributed systems.

Keywords

Checkpointing, Distributed Systems, Fault Tolerance, Mobile Computing System, Rollback Recovery.
User
Notifications
Font Size

Abstract Views: 158

PDF Views: 3




  • A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed System

Abstract Views: 158  |  PDF Views: 3

Authors

Sudha
Department of Computer Science, Amity University, Haryana, India
Nisha
Department of Computer Science, Amity University, Haryana, India

Abstract


A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. Checkpoint is defined as a fault tolerant technique. It is a save state of a process during the failure-free execution, enabling it to restart from this checkpointed state upon a failure to reduce the amount of lost work instead of repeating the computation from beginning. The process of restoring form previous checkpointed state is known as rollback recovery. A checkpoint can be saved on either the stable storage or the volatile storage depending on the failure scenarios to be tolerated. Checkpointing is major challenge in mobile ad hoc network. The mobile ad hoc network architecture is one consisting of a set of self configure mobile hosts(MH) capable of communicating with each other without the assistance of base stations, some of processes running on mobile host. The main issues of this environment are insufficient power and limited storage capacity. This paper surveys the algorithms which have been reported in the literature for checkpointing in distributed systems as well as Mobile Distributed systems.

Keywords


Checkpointing, Distributed Systems, Fault Tolerance, Mobile Computing System, Rollback Recovery.