A Simple and Efficient Implementation for Small Databases

A Simple and Efficient Implementation for Small Databases

Andrew D. Birrell Michael B. Jones Edward P. Wobber

11 August 1987

d i g i t a l Systems Research Center 130 Lytton Avenue Palo Alto, CA 94301

A Simple and Efficient Implementation for Small Databases

Andrew D. Birrell Michael B. Jones Edward P. Wobber*

ABSTRACT: This paper describes a technique for implementing the sort of small databases that frequently occur in the design of operating systems and distributed systems. We take advantage of the existence of very large virtual memories, and quite large real memories, to make the technique feasible. We maintain the database as a strongly typed data structure in virtual memory, record updates incrementally on disk in a log and occasionally make a checkpoint of the entire database. We recover from crashes by restoring the database from an old checkpoint then replaying the log. We use existing packages to convert between strongly typed data objects and their disk representations, and to communicate strongly typed data across the network (using remote procedure calls). Our memory is managed entirely by a general purpose allocator and garbage collector.This scheme has been used to implement a name server for a distributed system. The resulting implementation has the desirable property of being simultaneously simple, efficient and reliable.

* Andrew Birrell and Edward Wobber are at the DEC Systems Research Center, 130 Lytton Avenue, Palo Alto, CA 94301. Michael Jones is at Carnegie-Mellon University, Pittsburgh, PA.

1

A SIMPLE AND EFFICIENT IMPLEMENTATION FOR SMALL DATABASES

1

1. THE PROBLEM

There are many situations in the design of an operating system or of a distributed system where we desire to manage data that is structured and that must persist across system restarts. Although we often call these "databases", they are quite different from commercial databases such as those used by a bank or an insurance company. The databases we are considering have the following distinguishing characteristics. They are relatively small -- up to about 10 megabytes. They have a moderate rate of updates -- a burst rate of up to 10 transactions per second, and a long term rate of up to 10000 transactions per day. The update operations provided by these databases are single-shot transactions. In other words, there are no update transactions composed of multiple client actions, and the database implementation does not make any intermediate state visible to its clients. Examples of these operating system databases include records of user accounts, network name servers, network configuration information and file directories.

The purpose of this paper is to offer an implementation technique suitable for this class of databases. Our design is based on technology well known to the database community -- a main memory database with checkpoints and a re-do log [10,11]. To this we add technology well known to the operating systems community -- high level language data structures, remote procedure calls and strongly typed access to backing store. The outcome appears to be novel -- an implementation of operating system databases that is simultaneously very simple, very efficient and very reliable. The other implementations we have seen for such databases suffer by lacking one (or more) of these properties.

Throughout the paper we illustrate the design with the example of a name server that we have built for our distributed computing environment. That name server design has other interesting aspects [6], but the present paper concentrates only on the data storage aspects of the name server.

Typically, we implement such databases by compiling into the implementation an understanding of the particular data types and record organizations required by the database. This contrasts with the flexibility of the data models offered by general purpose database systems; we will not consider this particular aspect any further in the present paper.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download