Binary Data and Buffer Pool

[Pages:6]CS 2604 Project 2

Binary Data and Buffer Pool

Spring 2002

For this project you will implement a simple database program that will support search and modify operations on a file containing simple records of the following form:

? an unsigned short integer, followed by ? six arbitrary characters (not tabs), followed by ? two integers. There is no particular context for these records; i.e., the specification won't attach any particular meaning to the last three of the data fields that belong to one of these records.

The project assumes that the file of records may be too large to store all the records in primary memory at once. Therefore, you will also implement a general-purpose buffer pool that will mediate the disk operations and ideally reduce the number of individual disk accesses that must be performed.

Program Invocation:

Your program must take the names of the input and output files from the command line -- failure to do this will irritate the person for whom you will demo your project. The program will be invoked as:

BinaryBP

If either of the specified input files does not exist, the program should print an appropriate error message and either exit or prompt the user for a correction.

The Buffer Pool:

The buffer pool mediates all reading and writing of the binary database file. No other portion of the system is allowed to access that file, except possibly to open the file on program startup and close the file on program termination.

The buffer pool will manage a collection of individual buffers, each capable of storing a fixed size block of data. Common terminology is that each buffer is stored in a slot or position in some organizing container. The type of container used to store the buffers is usually determined by the replacement strategy; in this project a FIFO strategy will determine which stored buffer to replace and that indicates that the buffers should be organized in a queue.

Each buffer will store N bytes of data, where N will be specified in the header of the command file described below. In any case, N will always be a multiple of the record size, so logically each buffer will store an integer number of records. And so we may also view the database file as a sequence of logical blocks. Common terminology is that a buffer stores one block of the file. In essence, a buffer is just a char array of dimension N, although it may be useful to also store the file offset of the block held in the buffer. The maximum number of buffers will also be specified in the command file.

The buffer pool will service get and put requests triggered when a record is needed from the database file or when a record is to be updated. These requests will consist of a sequence number for the requested record and a char* to which the requested data is to be served or from which it is to be acquired.

The buffer pool must determine if the requested record is contained in a block that is currently in the pool. If the record is not present (a miss) then the buffer pool must read the block containing that record from the database file and store it in a buffer. If all the slots are occupied, the buffer pool must select one to be replaced; there are a number of good strategies for this but in this case you will use FIFO replacement. That is, replace the buffered block that has been in the pool longest. To complicate matters, if the block to be replaced has been modified since it was read from disk it must be written back to disk so those changes won't be lost. This is sometimes referred to as a lazy writeback. Note that servicing a put request only writes the relevant data to the appropriate buffer, not to the database file on disk.

The buffer pool will not know anything about the contents of the data records. From its perspective, each transaction will simply involve copying some number of bytes from a buffer or writing some number of bytes to a buffer.

To emphasize, the buffer pool stores raw, uninterpreted bytes of data. Whatever you call your data type for the records described above, that type name should never occur within your buffer pool implementation.

Page 1 of 6

CS 2604 Project 2

Spring 2002

When servicing a get or put request, the buffer pool should log which buffer slot held the record, and whether a hit or miss occurred. If a buffer had to be replaced, the buffer pool should log which file block was replaced, and whether a writeback was necessary.

The buffer pool should also maintain a count of the number of blocks that have been read or written, the number of hits and misses, and the number of writebacks.

The buffer pool must also provide the means to log its contents, usefully formatted, on demand.

Other System Elements:

There must be a controller that receives data and commands from the command file and triggers the appropriate responses in the other system elements. The controller may be purely procedural. The controller may take responsibility for verifying the existence of the input files, and opening and closing the various file streams. The controller should log the initial system configuration (described below). The controller should create the data manager and buffer pool objects.

There must be a data manager class, separate from the controller, which is responsible for managing the execution of the show and update commands described below. The data manager will deal with record objects, not with raw data. The data manager should log results from processing show and update commands.

The data records should be encapsulated as objects when being processed by the data manager. There should also be a file manager object that handles reading the command file, and stripping out comments.

It may be useful to have a translator class that handles the data conversions that must take place when the data manager and buffer pool communicate, but this is not required. It may also be useful for each buffer to be an object.

The Binary Database File:

The database (dB) file used in this project will be in binary format. The dB file will consist of a sequence of records, as described above. Each record will consist of the four sections described above, in that order. There will not be any header data at the beginning of the file, or any extra data at the end. A hex dump of a sample binary dB file is included later in this document. Here is a hex display of the first record in that file:

Pos 00 01 02 03 04 05 06 07

08 09 0A 0B 0C 0D 0E 0F

------------------------------------------------------------------

Value 00 00 68 61 70 6A 69 61

AF 00 00 00 90 01 00 00

Sequence Number

6 characters

Integer

Integer

A complete binary data file is available on the course website.

Logically the dB file can be viewed as a sequence of records, and each record can be specified by giving its sequence number within the file. We will do so, numbering the records starting at 0. Note that since the records are all of the same size, knowing the sequence number of a record R allows you to calculate the file offset (byte count) at which that record begins in the file. For convenience in testing, the unsigned short integer in each record will be set to the sequence number of that record within the dB file.

Note that you are absolutely forbidden to simply read and store the entire file in memory. Your implementation must make use of a buffer pool, as described above.

Page 2 of 6

CS 2604 Project 2

Spring 2002

Command File:

The execution of the program will be driven by a script file, as in the first project. As before, lines beginning with a semicolon character (';') are comments and should be ignored.

The command file will start with a header that may include comments and will definitely include a line specifying the buffer pool "geometry":

buffer

After that header, each non-comment line of the command file will specify one of the commands described below. Each line consists of a sequence of "tokens" which will be separated by single tab characters. A newline character will immediately follow the final "token" on each line. The command file is guaranteed to conform to this specification, so you don't need to worry about error-checking when reading it.

The following commands must be supported:

show Log the values of all of the data fields of the indicated record. These should be interpreted, not written in raw binary form. If the sequence number corresponds to a non-existent record, log an error message.

This command will result in a transaction with the buffer pool, which must determine whether the record is in memory or not, load the appropriate file block if necessary, and then return the record data for display.

update Replace the data for the indicated record with the given data. Log a message confirming the update. If the sequence number corresponds to a non-existent record, log an error message.

This command will also result in a transaction with the buffer pool. If the targeted record is not in memory, the buffer pool must load the appropriate file block. Once the targeted record is in memory, the buffer pool must overwrite its data (in memory) with the supplied data (in binary format).

debug Log the current contents of the buffer pool. The display should be neatly formatted. For each file block stored in the buffer pool, log the file offset at which the block begins and the bytes stored for that block, formatted as pairs of hex digits. (Code that can be adapted for this purpose will be posted along with this specification.)

exit Terminate program execution. The buffer pool should perform any necessary writebacks before the dB file is closed. Summary statistics, described below, should be logged. All dynamic memory should be properly deallocated.

A sample command script is included later in this document.

Log File Description:

Since this assignment will be graded by TAs, rather than the Curator, the format of the output is left up to you. Of course, your output should be clear, concise, well labeled, and correct. The first two lines should contain your name, section specification (e.g., CS 2604 11:15 MWF), and project title. The next section of the log file should contain some initialization information:

? the names of the dB, command, and log files ? the buffer pool configuration including the number of slots and the size of each buffer

Page 3 of 6

CS 2604 Project 2

Spring 2002

The remainder of the log file output should come directly from your processing of the command file. You are required to echo each command that you process to the log file so that it's easy to determine which command each section of your output corresponds to. Each command should be numbered, starting with 1, and the output from each command should be well formatted, and delimited from the output resulting from processing other commands.

A complete sample log is available on the course website.

Submitting Your Program:

You will submit a gzipped tar file containing your project to the Curator System (read the Student Guide), and it will be archived until you demo it for one of the GTAs. Instructions for submitting are contained in the Student Guide. You will

find a list of the required contents for the zipped file on the course website. Follow the instructions there carefully; it is very

common for students to suffer a loss of points (often major) because they failed to include the specified items.

Be very careful to include all the necessary source code files. It is amazingly common for students to omit required header or cpp files, or to submit the wrong version of their program. In such a case, it is obviously impossible to perform a test of the submitted program unless the student is allowed to supply the missing files. When that happens, to be fair to other students, we must assess the late penalty that would apply at the time of the demo.

To avoid such problems, once you've prepared your gzipped tar file for upload, copy it to a new location, unarchive it, build an executable and test that executable. If you do that you can at least be sure you're not submitting an old, incomplete version.

You will be allowed up to five submissions for this assignment, in case you need to correct mistakes. Test your program thoroughly before submitting it. If you discover an error you may fix it and make another submission. Your last submission will be graded, so fixing an error after the due date will result in a late penalty.

The submission client can be found at:



Programming Standards:

The GTAs will be carefully evaluating your source code on this assignment for programming style, so you should observe good practice. See the Programming Standards page on the course website for specific requirements that should be observed in this course. As always, you should practice good object-centered design and implementation.

Evaluation:

You will schedule a demo with your assigned GTA. At the demo, the TA will supply your submitted project, and you will perform a build and run your program on the supplied test data. The GTA will evaluate the correctness of your results. In addition, the GTA will evaluate your project for good internal documentation and software engineering practice.

Pledge:

Each of your program submissions must be pledged to conform to the Honor Code requirements for this course. Specifically, you must include the pledge statement provided with the earlier project specifications in the header comment for your main source code file.

Page 4 of 6

CS 2604 Project 2

Sample command script:

; Simple buffering with no updates or writebacks.

;

; Buffer pool configuration:

buffer 16 10

;

; Display a few data records and check the buffer pool:

show

17

debug

show

23

debug

show

18

debug

;

; Display one that should already be loaded:

show

23

;

; Load a bunch of blocks to fill the buffer pool:

show

22

show

31

show

3

show

4

show

5

show

13

show

9

debug

;

; None of these should force replacements:

show

4

show

31

show

3

show

5

show

13

show

9

show

22

;

; This should force a replacement but no writeback:

show

41

;

; Each of these should also force replacements w/o writebacks:

show

72

show

49

;

; This should re-load a block:

show

17

debug

;

; Push the boundaries of the file:

show

0

show

99

;

debug

;

; Stop:

exit

Spring 2002

Page 5 of 6

CS 2604 Project 2

Hex dump of sample binary data file:

00 01 02 03 04 05 06 07

08 09 0A 0B 0C 0D 0E 0F

------------------------------------------------------------------

00 00 68 61 70 6A 69 61

AF 00 00 00 90 01 00 00

01 00 6E 79 64 62 77 68

C1 3D 00 00 E9 06 00 00

02 00 7A 75 70 63 75 78

C4 52 00 00 90 35 00 00

03 00 73 68 6E 78 77 6A

E6 23 00 00 80 1B 00 00

04 00 63 7A 62 6D 64 64

D8 70 00 00 25 44 00 00

05 00 72 6F 6D 74 76 74

4D 6B 00 00 E3 74 00 00

06 00 67 70 64 6B 62 6B

7F 5D 00 00 90 39 00 00

07 00 64 74 67 66 75 6A

A9 60 00 00 F6 40 00 00

08 00 74 70 66 66 6D 73

8E 23 00 00 52 09 00 00

09 00 72 64 66 74 6F 64

F1 40 00 00 D7 5C 00 00

0A 00 6C 65 65 61 6C 63

1A 26 00 00 2B 1E 00 00

. . .

61 00 66 69 74 73 74 7A

8A 34 00 00 69 6E 00 00

62 00 72 61 79 73 78 6E

A9 6B 00 00 3E 1C 00 00

63 00 61 73 69 75 71 62

62 78 00 00 E8 24 00 00

------------------------------------------------------------------

00 01 02 03 04 05 06 07

08 09 0A 0B 0C 0D 0E 0F

Spring 2002

Page 6 of 6

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download