A Survey of Database Crash Recovery Methods Circa ARIES



A Survey of Database Crash Recovery Methods circa ARIES

by Jed Lau

March 4, 1999

COMS 632, Advanced Database Systems

Survey Paper

Contents

Introduction 2

Background 2

FRAMEWORK 4

The Undo/Redo Algorithms (Bernstein et al.) and the Write-Ahead Logging Protocol Introduced 5

System R Crash Recovery (Gray et al.) 6

Physiological Logging (gray and reuter) and compensation log records (clr's) introduced 9

ARIES (Mohan et al.) 10

Client-Server ARIES in exodus (franklin et al.) 14

Conclusion 15

Sources 17

References 17

A Survey of Database Crash Recovery Methods circa ARIES

Introduction

A survey of database crash recovery development from the late 1970’s to the present reveals a gradual but progressive evolution in design that has culminated in today’s acceptance of the ARIES algorithm as the method of choice. ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) was developed at IBM in the early 1990’s out of a need for greater support for fine granularity locking, partial rollbacks, high concurrency, and repeated history. This review of the crash recovery methods that preceded and that have followed ARIES demonstrates the manner by which changing industry requirements and the problems of yesterday’s recovery methods motivate the solutions offered by tomorrow’s recovery methods. In particular, we consider in detail Undo/Redo algorithms presented by Bernstein, the shadowing technique of the System R recovery manager, the physiological logging technique of Gray and Reuter, the original ARIES method as described by Mohan, and the client-server implementation of ARIES for EXODUS. An examination of the history and implementations of ARIES suggests the direction for future work in the area of database crash recovery design.

Background

The need for crash recovery has always been a part of development work on transaction processing systems and database management systems. Systems crash; consequently, they must be capable of recovering themselves to a state that enables them to continue functioning on some level. But for what kinds of failures must a recovery method be prepared, and what notion of consistency must be achieved by crash recovery?

Types of failures: Canonically, recovery methods have addressed three types of failures—transaction failures, system failures, and media failures. A transaction failure describes the event in which a transaction experiences a problem that requires the transaction to abort. Problems such as bad input or consistency violations may be detected by the transaction itself. Other problems, such as timeout or deadlock, may be detected by the system in which the transaction is running. A system failure is a failure of the larger operating system, database system, or hardware. A media failure is a loss of secondary storage that holds the database. In his paper on the System R recovery manager (Gray-SysR 226), Gray recognizes that there are tradeoffs associated with handling these different types of failures. Transaction failures occur most frequently (several per minute), followed by system failures (several per month), then media failures (several per year). However, transaction failures require the least amount of recovery time (milliseconds), followed by system failures (seconds), then media failures (minutes). Restart from these failures must be idempotent, meaning that the effect of many restarts is the same as the effect of just one restart. This property ensures that failures during a restart can be handled just like any other failure.

Transactions and the ACID properties: The responsibility of the recovery method is typically understood in the context of transactions. A transaction is defined as a sequence of interactions with a database system that represents one meaningful activity in the user’s environment (Haerder 236). This notion—that database system activities can be isolated from each other both semantically as well as functionally—gives rise to four properties that characterize a transaction, and ultimately, that define the responsibilities of a crash recovery method. The four properties are atomicity, consistency, isolation, and durability (ACID).

• Atomicity is the property that the transaction is either all or nothing; either it happens, or it does not.

• Consistency is the property that when a transaction ends successfully, the changes that it makes to the database system are only legal changes that preserve the integrity of the data in the database. Concurrency control maintains consistency while concurrent transactions run in the system.

• Isolation is the idea already described that transactions are transparent to other transactions that might be running concurrently in the same database system. And finally,

• Durability ensures that the results of a committed transaction (a transaction that has completed successfully and that has saved its results to the database) persist in the face of crashes and actions by other transactions.

These four properties of a transaction must be preserved when a system recovers from a crash, hence the ultimate responsibility of the recovery method. Typically, following a system crash, the effects of transactions that committed (completed successfully) prior to the crash are redone, while the effects of transactions that did not commit prior to the crash (called loser transactions) are undone. Loser transactions must be undone since state information (i.e. dirty pages in memory) may have been lost for these transactions during the crash.

Logging: Crash recovery methods often keep a log to record the history of actions performed by a database system. The log is a file of log records stored on stable storage, although recently generated log records may reside in memory. As will be shown, certain heuristics (such as the Write-Ahead Logging Protocol, or WAL) must be employed to ensure that in-memory log records are written to stable storage at appropriate and necessary times. The log records in the log file are each assigned a unique Log Sequence Number (LSN), which monotonically increases as more log records are added. The log records themselves may be of various types:

• An Update log record indicates a data update. Data updates can be represented two ways, physical logging or logical logging. Physical logging actually copies the old and new values that are affected in an update into the log record. Logical logging produces a log record that merely describes the change being made (e.g. the record at a certain page was incremented by a certain amount). Redo information or undo information (or both) may be contained in an Update log record.

• An Abort log record indicates that a transaction has aborted. Similarly,

• A Commit log record indicates that a transaction has committed. Finally,

• A Checkpoint log record stores information that can be used by a recovery method to begin system restart. A checkpoint enables recovery to begin from the most recent Checkpoint log record rather than from the beginning of the log, thus reducing the amount of work that needs to be exerted for recovery.

Note that while not all logging techniques use or support these same log types, the semantics (i.e. what the log records represent) are often similar.

Locking: Transactions often acquire locks on data objects in order to obtain exclusive access to those data objects as part of concurrency control. Locking is thus used to ensure mutual exclusion among transactions. There are various locking protocols that can be implemented (the descriptions of which are largely outside the scope of this survey), but for the purposes of crash recovery, the main concern is the granularity of locks that are permitted. That is, does the crash recovery method limit locks to page-level granularity, or can finer granularity locks—such as record-level locks that operate within a page—be acquired? Indeed, the answer involves a number of components—in particular, the nature of the events that require locks, the manner in which data is used by the system as a whole, and the way in which data updates are logged.

Framework

Support of transactional ACID properties and the ability to respond to the three failure types represent the baseline of database system crash recovery. However, many other components—often extensions of this baseline—are part of a “good” crash recovery method. First, flexibility with respect to concurrency control, buffer management, and transaction maintenance (e.g. transaction rollback) is desired. A “flexible” recovery method is one that imposes few restrictions on each of these areas. The space and processing costs of the recovery method may be a second issue. Specifically, how much work does the algorithm have to do to recover from a crash? Third, in the spirit that simplicity leads to greater effectiveness, the complexity of the recovery method may be considered.

Some of the salient features offered by ARIES are used to comprise a framework by which recovery methods can be compared. Inevitably, a consideration of the early recovery methods shows how these features have evolved through time. The following features are examined:

• Flexible buffer management: A recovery method may impose restrictions on the buffer manager. An example would be a recovery method dictating the page replacement policy that is used by the buffer manager. Is there reasonable freedom to choose and implement buffer management separate from crash recovery?

• Minimal space overhead: Large amounts of space can be consumed by a recovery method due to the nature of the algorithm. As a measure of efficiency, minimal space overhead is desired.

• Checkpointing: Is checkpointing an option? How much work needs to go into checkpointing? What are the implications of checkpointing on the performance of the system?

• Support for partial rollback: Partial rollback describes the ability of the system to partially undo the effects of a transaction without aborting it entirely—that is, to roll the transaction back to an earlier state. Some of the later recovery methods support this feature.

• Bounded transaction rollback: The introduction of the Compensation Log Record (CLR) allows a recovery method to bound the amount of time that is required to roll back a transaction—an important feature for system restart in the face of repeated failures.

• Fine granularity locking: Is less than page-level locking supported by the recovery method? How often are locks generated for various changes to the data?

We begin with the Undo/Redo algorithms of Bernstein in the 1980’s. Methods that were developed near the advent of ARIES in the early 1990’s are presented. Then, ARIES is presented itself, concluding with a review of recent extensions to the algorithm.

The Undo/Redo Algorithms (Bernstein et al.) and the Write-Ahead Logging Protocol Introduced

The scene before ARIES seems to be dominated by the shadowing technique, first described by Lorie in 1977 (Lorie), and four Undo/Redo algorithms that are outlined by Bernstein (Bernstein). The shadowing technique, which performs recovery by falling back to an old, “shadow” version of data in addition to the current version, will be described in the later section on System R crash recovery.

The Undo/Redo algorithms described by Bernstein are actually as follows: Undo/Redo, Undo/No-Redo, No-Undo/Redo, and No-Undo/No-Redo. The Undo/Redo algorithm (otherwise known as the Do-Undo-Redo protocol) proceeds first with a backward scan of the log file, undoing the actions of loser transactions. The backward scan is followed by a forward scan of the log file during which the actions of committed transactions are redone. In short, the database system’s Recovery Manager (RM), working closely with the Cache Manager (CM, which resembles a buffer management component) keeps track of an Active Transactions List, a Committed Transactions List, and an Aborted Transactions List via Commit and Abort log records. For each update that a transaction makes to a page in memory, a log record containing undo and redo information is written directly to stable storage. When a crash occurs, the system restarts, and the memory is cleared. The RM scans backwards from the last entry of the log file, rebuilding the Active Transactions List, Committed Transactions List, and Aborted Transactions List according to the Commit and Abort log records that the RM sees. If an Update log record is encountered, the update is undone if the transaction was active at the time of the crash. How are such transactions identified? If the transaction is currently in the Active Transactions List or if the RM has never seen the transaction before (i.e. the transaction is not in any of the lists, and thus, needs to be added to the Active Transactions List), then the transaction was active at the time of the crash. A transaction is removed from the Active Transactions List when the RM encounters (and undoes) the first Update log record for the transaction. (Because each Update log record has a pointer to the previous Update log record for the same transaction, the first Update log record has a previous pointer set to null.) The backward scan stops when the Active Transactions List is empty. Then, a forward scan redoes only the Update log records for transactions in the Committed Transactions List.

The other three algorithms eliminate either the Undo phase or the Redo phase, or both. The Undo/No-Redo algorithm eliminates Redo by requiring all pages dirtied by a committing transaction to be flushed before the transaction commits. Redo thus becomes unnecessary because the effects of the transaction are already reflected on stable storage. The No-Undo/Redo algorithm eliminates Undo by delaying application of page updates until transaction commitment. While a transaction is active, the page updates that the transaction wants to apply to the database are stored in an “intentions list” of log records, one list per transaction. Finally, the No-Undo/No-Redo algorithm eliminates both Undo and Redo by using a modified implementation of the shadowing technique. An active transaction’s updates are made to a directory of pages on stable storage. When the transaction commits, the larger directory that defines the database’s committed state is simply made to point to the transaction’s directory, an atomic action. During system restart, no work needs to be done because the database’s directory already reflects the actions of only the committed transactions.

Write-Ahead Logging Protocol (WAL): Where logging is applicable in these algorithms, the Write-Ahead Logging Protocol (WAL) is followed. WAL is a logging technique that consists of two parts. First, the protocol ensures that a log record describing an update to a page is written to stable storage before the page can be saved to disk. Second, the protocol ensures that when a transaction commits, all of the transaction’s log records are flushed to stable storage before the Commit log record is appended to the log and flushed to stable storage as well. (Bernstein calls these two parts of WAL the Undo Rule and the Redo Rule, respectively. Bernstein 177-178.) The effect of this protocol is that all changes made to the database are recorded before the changes are applied. Thus, in case of a crash, it is never the case that an update is applied to the database—and specifically, to stable storage—without first being recorded safely as a log record (on stable storage). Bernstein mentions WAL as an aside (actually, as a footnote) when he describes these four algorithms, yet later recovery methods will give greater recognition to WAL as an efficient and effective logging technique.

Analysis: Three of the four algorithms represent attempts to eliminate one or both of the Undo/Redo phases as a means of more “efficient” crash recovery (Bernstein, 180). However, flexibility, algorithmic simplicity, and space costs are compromised as a result. In particular, these three algorithms have significant implications on the architecture and behavior of the CM (buffer management). It is apparent that Bernstein’s database model allows for a closely integrated relationship between buffer management and crash recovery. Modern database models would call for greater functional independence of the two components. And, while the Undo/Redo algorithm offers the most flexibility with respect to buffer management, the logic of the algorithm (i.e. beginning with a backward scan of the log records, rather than a forward scan) contrasts sharply with the algorithms that will follow it. With respect to flexibility, the most significant problem of these four algorithms is the restrictions and requirements that the algorithms impose on buffer management. The algorithms are not very flexible with respect to transaction maintenance either. A recovery method that has fewer implications on buffer management is desired. There may also be room for a simpler algorithm in which recovery efficiency does not come with such high space and processing costs.

|Features |Undo/Redo Algorithms |

|Flexible buffer management | |

|Minimal space overhead | |

|Checkpointing |Undo/Redo only |

|Support for partial rollback | |

|Bounded transaction rollback | |

|Fine granularity locking | |

System R Crash Recovery (Gray et al.)

The RM of the System R Database Manager uses the shadowing technique to perform crash recovery. Recovery applies only to files that are characterized as shadowed. (Nonshadowed files are excluded from crash recovery. That is, the user is wholly responsible for making and storing redundant copies of the files as backup.) Two online versions of a shadowed file—a current version and a shadow version—are maintained by the database system. The current version keeps track of changes to a file before the file has been “saved”. The shadow version represents the state of the file when it was last saved. Initially, page reads are drawn from the shadow version. However, once an update is made to a page, a new page frame in the current version is allocated on disk, and the update is reflected in the current version. In this manner, the shadow version is never updated. A shadow version of a file is updated via the SAVE operation. When SAVE is called, the current version of a file becomes the shadow version. Because the buffer pool can contain dirty pages of the current version that have yet to be flushed to disk, a system failure leaves the current version of a file in an inconsistent state. For this reason, the file is restored by returning to the shadow version of the file, via the RESTORE operation.

More interesting is the manner in which in the effects of committed transactions are redone following a crash recovery, given that the shadowing technique operates on files rather than transactions. The following quotation from Gray’s paper on System R’s RM introduces the role of the log:

Unhappily, this [shadowing] technique does not seem to generalize to concurrent transactions on a shared file. If several transactions concurrently alter a file, file save or restore is inappropriate because it commits or aborts the updates of all transactions to the file… We were unable to architect a transaction mechanism based solely on shadows which supported multiple users and save points. Instead, the shadow mechanism is combined with an incremental log of all the actions a transaction performs. (Gray-SysR 231)

The passage reveals an important difference between Bernstein’s No-Undo/No-Redo implementation and the System R implementation of the shadowing technique: Whereas the No-Undo/No-Redo implementation shadows (where “shadows” is used as a verb) transactions, the System R implementation shadows files. However, as Gray indicates, the shadowing of files does not enable the system to distinguish the actions of different transactions. For this reason, a log is used. The log contains undo and redo information of a transaction’s actions. The system ensures that committed transactions can be redone and loser transactions can be undone by WAL.

System R supports action-consistent checkpoints. Action consistency means that the checkpoints are taken at times during which the system is in an acquiesced state, no transactions running. In this way, transaction atomicity is ensured; there is no transaction in the middle of execution. To reach this acquiesced state, the System R RM periodically action-acquiesces the system, which means that all existing transactions are allowed to while new transactions are not allowed to begin. Once the checkpoint is taken, new transactions may start. In addition to checkpoints, System R supports another “log marker” known as a savepoint. A savepoint allows for partial rollback of a transaction, typically by specifying the LSN of a log record to which a transaction should be rolled back. If the transaction gets into trouble (e.g. deadlock or bad input), the effects of the transaction can be undone to its last savepoint. The advantage, of course, is that the transaction does not need to be completely aborted.

Upon system restart, shadowed files are reset to their shadow versions, and the log—beginning from the most recent checkpoint—is used to redo the effects of transactions that were committed at crash time while undoing the effects of transactions that were not committed at crash time. How are these “loser transactions” identified? By categorizing transactions based on the Commit and Abort log records that it sees in the log, the RM is able to identify the loser transactions and subsequently undo their effects. Once restart is done, another checkpoint is written to the log. Note that System R’s recovery consists first of an Undo phase followed by a Redo phase.

Analysis: The System R RM allows for much greater flexibility than the Undo/Redo algorithms with respect to buffer management. However, the shadowing technique for large files consumes space and processing time prodigally, and checkpoints are complicated to implement because they must reference both current and shadow versions of files. In addition, checkpointing is a bottleneck because the system must be action-quiesced periodically. In terms of complexity, shadowing makes directory maintenance difficult, checkpointing difficult, and deadlock detection, among other grievances. While fine granularity locking is provided by System R, an enigmatic deadlock detection and resolution scheme is required. (In this scheme, transactions that are rolling back are marked as “golden” and must acquire a special lock in exclusive mode for every undo action.) The need for this special lock makes fine granularity locking in System R relatively inefficient.

Finally, the use of WAL to support concurrent transactions makes the shadowing technique redundant, if not burdensome. Gray admits that the adoption of the shadowing technique is largely historical (Gray-SysR 240). WAL enables the large directories that keep track of the current and shadow versions of files to be replaced by a much smaller set of file descriptors, eliminates the need for a directory buffer pool (which was needed to handle the large directories), and simplifies checkpointing. On the bright side, writes Gray, “the performance of shadows is not unacceptable: In fact for small databases (fewer than 10 megabytes) shadows have excellent performance” (Gray-SysR 240). Gray concludes that a good database system should support both shadows for private files (used by one user at a time) and log-based recovery for shared files. As noted, using WAL exclusively can simplify crash recovery considerably. A recovery method that is less costly than the shadowing technique is desired. Gray suggests using WAL, but the implementation of such a method is questionable since it has not been done before.

|Features |System R |

|Flexible buffer management |√ |

|Minimal space overhead | |

|Checkpointing |inefficient |

|Support for partial rollback |√ |

|Bounded transaction rollback | |

|Fine granularity locking |inefficient |

Physiological Logging (Gray and Reuter) and Compensation Log Records (CLR’s) Introduced

Some ten years after his paper on the System R RM, Gray proposes (in his book with Reuter) a logging technique called physiological logging. Indeed, physiological logging seems to represent Gray’s response to the problems that he identifies in the System R implementation of the shadowing technique—costly shadow pages and an action-acquiesce bottleneck for checkpointing, in particular. The method he proposes is WAL at its core, but it allows for greater logging flexibility.

The motivation for physiological logging is that small updates to a page, updates which change just a few records within the page, generate inefficient log records if the entire page must be saved for undo and redo information. Recall that either physical logging or logical logging can be used in a log. Physical logging is idempotent, whereas logical logging is not. However, logical log records have the ability to be much smaller than physical log records. Gray seeks to reap the benefits of both types of logging with physiological logging. He proposes a log record that applies to a physical page, yet whose record body can designate a state transformation of the page either physically or logically. That is, a physiological record is physical logging to a page, and either logical or physical logging within the page.

Perhaps more important than its record structure, the physiological logging protocol represents an implementation of WAL that closely resembles that which is later used in ARIES. During system restart, the RM scans the log forward from the most recent checkpoint to the end, redoing the effects of the log records in a manner similar to restart in System R (just without the shadow pages). When the RM reaches the end of the log, the effects of loser transactions are undone.

Compensation Log Records (CLR’s): Here, the use of compensation log records (CLR’s) is also seen for the first time. CLR’s are log records that are written to the end of the log—just like any other log record—by the RM when Update log records are undone during restart. When an Update log record is undone, the effect of the undo is recorded in a CLR. Because an undo consists of copying the before-image of the Update log record to memory, the CLR simply records this before-image. In addition, the CLR stores a reference to the next update log record that must be undone for this transaction (called the UndoNextLSN). When transactions are redone during restart, the undo actions recorded by CLR’s are redone as well. In this manner, the CLR bounds the amount of work needed to undo the effects of a transaction that did not commit before a crash. The number of CLR’s that are written during the rollback of a transaction is no more than the number of update log records for the transaction.

How could this work be unbounded? Consider a RM that is attempting to undo such a transaction during restart. Assume the system crashes during restart. In the absence of CLR’s, the RM will have to redo all of the updates made by this transaction, then undo all of the updates. If the system crashes again, the RM again will have to start all over, redoing the updates and then undoing them. Thus, if the system crashes continuously, the amount of work that the RM has to perform to rollback this transaction is unbounded. We can only hope that, at some point, the RM is given enough time between crashes to rollback the transaction, but rollback is ultimately not guaranteed.

The use of CLR’s provides a guarantee in this regard. Undo actions can be moved into the Redo phase via CLR’s. During the Redo phase of restart, the RM redoes the undo actions recorded in CLR’s that may have been written previously for a loser transaction during rollback. During the Undo phase, however, the RM looks at the CLR’s UndoNextLSN and begins undoing the transaction from that point. Thus, as long as a checkpoint is taken at the end of a Redo phase, undo actions for a loser transaction do not have to be repeated in case the system crashes during restart. In this manner, multiple crashes cannot prevent the transaction from being rolled back in a bounded amount of time.

Analysis: Physiological logging demonstrates an attempt to improve overall crash recovery by focusing on improvements that can be made to the logging technique. While this technique does not entirely solve the need of database systems for multiple granularities of logging and locking (page-wise as well as record-wise; such locking issues are not considered in the paper), the algorithm uses some logging elements, such as CLR’s, that will play important roles in ARIES.

|Features |Physiological Logging |

|Flexible buffer management |√ |

|Minimal space overhead |√ |

|Checkpointing |√ |

|Support for partial rollback |√ |

|Bounded transaction rollback |√ |

|Fine granularity locking | |

ARIES (Mohan et al.)

The genesis of a highly effective crash recovery method is predicted by the experiences of the recovery methods that preceded ARIES, including the Undo/Redo algorithms, the System R RM, and the physiological logging technique described above. The Undo/Redo algorithms validate the utility of a good logging technique, but at the same time, demonstrate the problems encountered when crash recovery and buffer management are two dependent upon each other. Moreover, the algorithms suggest that it is easier to make recovery method more “efficient” by focusing on the logging technique rather than on elimination of a crash recovery phase (such as Undo or Redo). The System R RM demonstrates that the shadowing technique is a poor path to follow when concurrent transactions need to operate on large databases, yet reaffirms the utility of WAL. Finally, physiological logging represents an attempt to offer greater flexibility in concurrency control through a combination of physical and logical logging. Compensation log records are introduced as a means to bound the rollback time of a transaction, suggesting that greater flexibility might be offered in the future for transaction rollbacks. Around the time that Gray publishes his book with Reuter, in which physiological logging is described, Mohan et al. release their paper on ARIES. “We present a simple and efficient method,” writes Mohan, “which supports partial rollbacks of transactions, fine-granularity (e.g., record) locking and recovery using write-ahead logging (WAL)” (Mohan 251).

ARIES operates according to three fundamental principles:

• WAL: ARIES uses WAL to ensure that updates are reflected in the log before they are applied to stable storage.

• Repeating history: During Redo, ARIES repeats all of the actions performed by all of the transactions prior to the crash. The actions of loser transactions are then undone.

• Logging changes during undo: Undoing an Update log record generates a CLR, which is used to bound rollback time of a transaction.

During normal operation, ARIES generates Update, Commit, Abort, and Checkpoint log records. Following a crash, a three-pass algorithm is used for restart recovery. First, the Analysis phase processes the log from the most recent checkpoint to identify the pages that were dirty and the transactions that were active at the time of the crash (i.e. builds the Active Transactions List and a list of dirty pages, called the Dirty Page Table). Second, the Redo phase repeats history from the earliest log record that could require redo. (This log record is the earliest log record to add a page to the Dirty Page Table.) Finally, in the Undo phase, ARIES proceeds backwards from the end of the log, undoing the effects of loser transactions while generating CLR’s for these undo actions. Normal operation and these three recovery phases are now considered in greater detail, with a presentation of the benefits and features of each operation,

Normal operation: During normal operation, ARIES maintains two structures, the Dirty Pages Table and the Transaction Table (which we saw in previous algorithms as the Active Transactions List). The Dirty Pages Table contains an entry for each dirty page, plus the LSN of the earliest log record that caused the page to become dirty. The Transaction Table contains a list of all active transactions, plus the LSN of the last log record written for a given active transaction. For each transaction, the Transaction Table optionally may contain the LSN of a log record that represents the savepoint of a transaction. If a partial rollback is initiated on a transaction, the transaction is undone by the Undo phase up to the savepoint.

Four types of log records are written to the log during normal operation. Update log records are written to stable storage before updated pages are written to stable storage. A Commit log record is written to stable storage, but only after all log records before it have been written to stable storage first. An Abort log record is written to stable storage before the Undo phase is initiated on the aborting transaction. Finally, a Checkpoint log record saves the Dirty Page Table and Transaction Table to the log. Checkpointing does not require quiescing the system or flushing any pages. (Note that transaction rollback does not require any log records. The Undo phase is simply called on a transaction that is rolled back.) Actually, ARIES allows for considerable logging flexibility. For instance, checkpoint information or undo/redo information that cannot be contained in a single Checkpoint or Update log record can be recorded in several log records. These log records may be held together in several ways, such as using log record markers (to mark the beginning and the end of a series of linked log records) or adding an LSN field that allows one log record to refer to the next log record in a series.

Analysis Phase: As the first phase in system restart to execute after a crash, the Analysis phase determines (1) the point in the log at which the Redo phase should begin, (2) the pages that were dirty at crash time, and (3) the transactions that were active at crash time. Analysis begins at the most recent checkpoint in the log, loading the Dirty Page Table and Transaction Table recorded at the checkpoint. As the phase moves forward in the log, the pages affected by Update log records are looked up in the Dirty Page Table. If an updated page is not in the Dirty Page Table, it is added to the table. Similarly, the Analysis phase monitors the list of active transactions. If an Update log record was generated by a transaction that is not in the Transaction Table, the transaction is added to the table. An Abort or Commit log record removes a transaction from the Transaction Table. By the time the Analysis phase reaches the end of the log, the Dirty Page Table and the Transaction Table prior to the crash have been regenerated. The point at which the Redo phase is now determined by the earliest log record to dirty a page in the Dirty Page Table. Mohan notes that ARIES permits checkpointing at the end of an Analysis phase in order to save work if a failure were to occur during system restart (Mohan 269). By checkpointing, the Dirty Page Table and Transaction Table would not need to be regenerated.

Redo Phase: During the Redo phase, history is repeated by redoing all of the actions of all transactions that have log records, beginning from the earliest log record that dirties a page in the Dirty Page Table. For each Update log record or CLR encountered, Redo determines if the action must be undone. An action is redone unless one of the following three conditions holds:

• The affected page is not in the Dirty Page Table, meaning that the update was already saved to stable storage before the crash.

• The affected page is in the Dirty Page Table, but the LSN in the Dirty Page Table is greater than the LSN of the log record. This condition means the page was added to the Dirty Page Table by a later log record, making a redo of this action irrelevant.

• The LSN stored on the page is greater than the LSN of the log record. This condition means that a later log record resulted in a page update that was saved to stable storage, so a redo of this action is irrelevant. (This third condition is checked last since a page fetch from disk may be required.)

If an action is redone, the LSN stored on the page is set to the LSN of the redone log record. Mohan suggests exploiting parallelism in the Redo phase with such optimizations as in-memory queues of log records or asynchronous I/Os to read pages (Mohan 267).

Undo Phase: In the final phase of recovery, the effects of loser transactions (i.e. the transactions that remain in the Transactions Table) are undone. The LSN’s in the Transactions Table are used to determine the order by which actions are undone. To undo multiple transactions, the Undo phase requires simply a backward scan of the log—repeatedly selecting the greatest LSN from the Transactions Table, undoing the action specified at that LSN, and changing the transaction’s LSN in the Transaction Table to the next log record that must be undone for that transaction. As undo’s are performed, CLR’s are generated at the end of the log. The CLR describes the undo action being performed and contains the undoNextLSN field, which points to the next log record that needs to be undone for that transaction. If Undo encounters a CLR (vs. generating a CLR), no undo action is taken. Instead, the Transaction Table’s LSN of the transaction that generated the CLR (perhaps during a previous Undo) is updated to refer to the next undo action that needs to take place for this transaction, indicated by the undoNextLSN of the CLR. Undo stops when there are no more actions to be undone for any of the transactions; at this point, the transactions have been completely rolled back.

Note that the Undo phase is also called as part of transaction abort or rollback. In either of these two cases, only one transaction is undone. Moreover, in the case of a partial rollback, Undo stops at the savepoint designated for the transaction. Mohan states that optimizations through parallelism can also be realized for the Undo phase by using multiple processes, one process for each transaction (Mohan, 267). Additionally, he notes that the Undo phase of ARIES is flexible enough to allow loser transactions to continue running (Mohan 268). Rather than being rolled back entirely, a loser transaction could be rolled back to its save point, from which point it could continue running when recovery is finished.

Analysis: ARIES offers a number of advantages over the recovery methods that came before it. In particular, Mohan shows in his paper that the System R paradigm of an Undo phase preceding a Redo phase is incorrect when WAL and fine granularity locking are part of the system (Mohan 272). Among the properties that have been examined in this survey, ARIES provides the following:

• Flexible buffer management: As long as WAL is followed, the buffer manager can use any page replacement policy.

• Minimal space overhead: Logs, checkpoints, and data structures (such as the Transactions Table and Dirty Page Table) have low memory costs.

• Efficient checkpoints: Checkpoints do not require the system to acquiesce. Moreover, a small but useful amount of information is saved at a checkpoint.

• Support for partial transaction rollback and continuation of loser transactions: Savepoints can be used to roll back a transaction to a previous state, which at times may be more desirable than aborting the (loser) transaction entirely.

• Bounded transaction rollback in spite of repeated failures: CLR’s bound the amount of time to roll back a transaction by saving undo actions that can be redone in the Redo phase.

• Fine granularity locking: Whereas System R provides fine granularity locking only through a complex relationship between the shadowing technique and the locking protocol (recall the “golden” transactions), ARIES provides fine granularity locking with ease. In fact, ARIES is capable of supporting multiple granularities of locking while eliminating the need for a special case to handle deadlock during transaction rollback.

• Opportunities to optimize restart: Mohan describes ways in which parallelism can be exploited to speed restart.

• Simplicity: Compared to the shadowing technique of System R, ARIES is a much simpler industrial-strength recovery algorithm.

Many other features have been omitted from this survey for lack of space. As the next section will show, however, ARIES does not itself represent a panacea to crash recovery. The implementation of ARIES on a client-server architecture illustrates how the algorithm must be revised to solve a new problem with a new set of assumptions.

|Features |ARIES |

|Flexible buffer management |√ |

|Minimal space overhead |√ |

|Checkpointing |√ |

|Support for partial rollback |√ |

|Bounded transaction rollback |√ |

|Fine granularity locking |√ |

Client-Server ARIES in EXODUS (Franklin et al.)

In 1992, Franklin et al. implement ARIES in the EXODUS client-server database system. As Franklin relates, using ARIES in a client-server architecture necessitates significant changes to the algorithm described by Mohan: “… The algorithm as specified in [Moha90] cannot be directly implemented in a page-server architecture because the architecture violates some of the explicit and implicit assumptions upon which the original algorithm is based” (Franklin 2). Inevitably, it is the separation of data residing on the clients and data residing on the server that introduces problems.

In client-server EXODUS, the server functions as the main repository for the database and the log. It provides for lock management, memory allocation/deallocation, and recovery/rollback. The client processes one transaction at a time using a “working copy” of data that it holds in its own locally managed buffer pool. The relationship between the client and the server is best described as a “data-shipping system,” in which a client sends requests to server, which then responds to those specific requests. The server cannot initiate contact with a client; the server may only respond to client requests. Communication between a client and a server is relatively expensive. Finally, because the server must be capable of responding quickly to numerous clients concurrently, the server tends to have higher hardware capabilities, reliability characteristics, and performance levels than clients.

During a transaction, a client caches data in its buffer pool, making changes locally. Before any updated pages are sent to the server, however, the client first sends log records to the server—hence, the client-server implementation of WAL. Log records are actually grouped into pages (an individual log record is never greater than a page in size) and sent to the server, one page at a time. The client is responsible for buffering these log records as they are generated, yet is capable of sending them to the server at any time during a transaction. Log records are ultimately kept on the server, however, for two reasons. First, it is undesirable to lose access to data as a result of client failure. That is, the server should not have to rely on the functionality of clients for server-side responsibilities, namely recovery. Second, it is not economical to require clients to have log disks.

Given this framework, Franklin identifies two problems associated with implementing ARIES on EXODUS. First, the expense of communication between clients and the server makes the assignment of unique and monotonically increasing LSN’s difficult. Because log records are being sent to the server from a number of different clients, there needs to be an inexpensive means to resolve LSN assignment without considerable message passing. (The naïve solution of allowing clients to generate LSN’s locally, and then let the server determine the correct mapping to global LSN’s, is unacceptable due to the amount of server processing that would be required.) Second, the separation of data between client and server makes rollback and restart problematic. During rollback, ARIES can assume that the effects of log records are either manifested in stable storage or in the buffer pool. In EXODUS, however, the server can have Update log records for which it does not have the affected database pages, since clients send log records to the server before sending the updated pages. Restart poses a similar problem with the use of the Dirty Page Table. The server may not be aware of all dirty pages, since some may be sitting at clients.

To solve the problem of LSN assignment, a new log record enumerator, called the Log Record Counter (LRC), is introduced. LRC’s, like LSN’s, are monotonically increasing numbers. The difference is that the LRC is associated—not with a log record, like the LSN—but with a page. Together, the page and the LRC uniquely identify an update that was applied to the page. The LRC is included with an updated page when it is sent to the server. The server compares the LRC of the updated page with the LRC of the page that it has in the stable database. If the LRC of the updated page is greater than the LRC of the page in stable database, the server knows that an update was made, and consequently, is able to generate an LSN for the Update log record. The LRC is consequently stored within this log record. The one drawback to this scheme, as Franklin notes, is that LRC’s do not reference actual log records, like LSN’s. Additional care must be taken to ensure that a page and its LRC refer correctly to an LSN.

To solve the problems associated with system restart, the Analysis, Undo, and Redo phases of EXODUS’s ARIES implementation are modified to correctly identify the pages that require redo and undo work. During Analysis, the LSN’s of log records that add pages to the Dirty Page Table are tracked to isolate the uncommitted transactions that have caused pages to become dirty. During Redo, LRC’s rather than LSN’s are used to compare log records and pages, since the LRC of a page in the stable database now determines when an update to the page was last made. And finally, during Undo, a page undo is performed conditionally, only if the log records that reside in the server’s log have already been applied to the server’s stable database. This condition is determined by comparing the LRC stored in the Update log record against the LRC of the page in the stable database.

Analysis: The implementation of ARIES on EXODUS demonstrates that some of the basic assumptions made by the ARIES algorithm need to be revised in order to function in a client-server database environment. The core advantages of ARIES are still preserved, however. Franklin reports that the performance of the logging technique used in EXODUS was acceptable, from both a space and a processing point of view. He suggests that further improvement could be realized by reducing the amount of logged information and conducting the writing of log pages in parallel with other activities.

|Features |Client-Server ARIES |

|Flexible buffer management |√ |

|Minimal space overhead |√ |

|Checkpointing |√ |

|Support for partial rollback |√ |

|Bounded transaction rollback |√ |

|Fine granularity locking |√ |

Conclusion

By following the development of salient crash recovery features, it is possible to recognize an evolution of recovery methods that has resulted in the contemporary adoption of ARIES as the industry standard. Moreover, it is evident that each recovery method offers solutions not only to issues that were unresolved in an earlier recovery method, but also to new requirements that develop as a result of a changing computing environment. The table below summarizes the sample of features that have been tracked for the five recovery methods reviewed in this survey—Bernstein’s Undo/Redo algorithms, the System R recovery method, the physiological logging technique, ARIES, and client-server ARIES. The increase in the number of features offered by a recovery method through time (left to right across the table) reveals a progression towards a simpler, more flexible, and more powerful recovery algorithm.

While the future of crash recovery development for database systems is unclear, it is likely that the field will continue to evolve to meet the changing needs of the computing environment, which follows and evolutionary path of its own. For instance, recent industrial database systems have touted in advertisements their abilities to detect crashes before they occur (Herring). Whether these systems actually prevent crashes from occurring or merely prepare for the crashes by saving vital data is unclear. Yet the notion that the event of a crash can be handled through either crash avoidance or crash recovery seems to be relatively new. (This idea is analogous to the way a system can handle deadlock, either through deadlock avoidance or through deadlock detection). It is safe to predict, however, that ARIES will continue to serve as a foundation for future database system recovery methods.

| |Undo/Redo |System R |Physiological |ARIES |Client-Server |

| |Algorithms | |Logging | |ARIES |

| | | | | | |

|Features | | | | | |

|Flexible buffer management | |√ |√ |√ |√ |

|Minimal space overhead | | |√ |√ |√ |

|Checkpointing |Undo/Redo |inefficient |√ |√ |√ |

|Support for partial rollback | |√ |√ |√ |√ |

|Bounded transaction rollback | | |√ |√ |√ |

|Fine granularity locking | |inefficient | |√ |√ |

Sources

Bernstein, Philip A., Hadzilacos, Vassos, Goodman, Nathan. Concurrency Control and Recovery in Database Systems. Reading, MA: Addison-Wesley Publishing Company, 1987.

Franklin, Michael J., Zwillig, Michael J., Tan, C.K., Carey, Michael J., Dewitt, David J. “Crash Recovery in Client-Server EXODUS”. Computer Sciences Technical Report #1081, March 1992, Computer Sciences Department, University of Wisconsin-Madison.

Gray-Psych: Gray, Jim, Reuter, Andreas. Transaction Processing: Concepts and Techniques. San Francisco, CA: Morgan Kaufmann Publishers, 1993.

Gray-SysR: Gray, Jim, McJones, Paul, Blasgen, Mike, Lindsay, Bruce, Lorie, Raymond, Price, Tom, Putzolu, Franco, Traiger, Irving. “The Recovery Manager of the System R Database Manager”. ACM, 1981.

Ramakrishnan, Raghu. Database Management Systems. Boston, MA: WCB McGraw-Hill, 1998.

The following references are from the “Red Book”, Stonebraker, Michael and Hellerstein, Joseph M. Readings in Database Systems, Third Edition. San Francisco, CA: Morgan Kaufmann Publishers, 1998. Page numbers refer to page numbers in the Red Book.

Haerder, Theo, and Reuter, Andreas. “Principles of Transaction-Oriented Database Recovery”. ACM, 1983.

Mohan, C., Hadlerle, Don, Lindsay, Bruce, Pirahesh, Hamid, Schwarz, Peter. “ARIES: A Transaction Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging”. IBM Research Report RJ6649, IBM Almaden Research Center, November 1990.

References

Herring: Red Herring, March 1999 issue, p. 30. “Neural Networks: Computer Associates’ new technology speaks for itself.”

Lorie, R.A. Physical integrity in a large segmented database. ACM Transactions on Database Systems 2, 1 (Mar.), 91-104.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download