CS-4513 Project #3 (Simple Web Server)



CS-4513 Distributed Systems WPI, D-Term 2007

Hugh C. Lauer Project 3 (100 points)

Assigned: Friday, April, 13, 2007 Due: Friday, April 27, 2007

Introduction

In this assignment, you will build a simple but real, multi-threaded web server capable of serving real Web pages. It will respond to actual HTTP requests and reply with appropriate responses. You can test your server using a standard web browser. You will also build a simple web client for testing your server and seeing what it is doing.

Your web server should be able to compile and run on CCC machines at WPI. Some sample socket code is provided below as a starting point.

For extra credit, you may extend your web server to support stateful connections.

General Requirements

Your server will be started from a command line as follows:–

% webserver port# [verbose|basic]

For obvious reasons, your server will not be able to use the standard HTTP port. Therefore, the first argument is the port number that your server will listen to. The second argument is the name of the directory where the server will look for web pages. The last argument is optional; it specifies whether you should do verbose logging, basic logging, or no logging at all.

The server should first allocate a socket, bind() it to the specified port, and start listening using listen(). It should then go into a simple loop as follows:–

1. Wait for and accept() a connection request from a client

2. Spawn a thread to handle this connection –

3. Go back to step 1 and wait for another connection request.

Meanwhile, the newly spawned thread should

1. Read the client’s request and respond to it;

2. Close the accepted connection; and

3. Terminate the thread.

You must be able to handle multiple connection requests in parallel using multiple threads.

The web browser or client will send an HTTP GET request specifying the web page that it wants. If you can find the web page, you will send it back to the client; otherwise, you must respond with an error. Since HTTP version 1.0 is stateless, you should then close the connection and terminate the thread.

Since your server will not be using the standard HTTP port, your client or browser must explicitly specify the port that your server is serving. For example, if server is running on CCC4 and your port is 4242, then the URL for accessing the WPI Admissions page would be



There are two HTTP rules that you must implement (and many others that you may ignore). First, each requested web page must be prefixed with the directoryName the argument of the server command line. For example, WPI’s web pages are stored in the directory

/www/docs

If this is specified on your command line, then you would look for the file

/www/docs/admissions.html

in order to serve the URL above. (Try this one; it seems to work.)

Second, if the requested web page either ends with a “/” character, or if it resolves to the name of a directory, you must add “index.html” to the path name and search for that file. For example, if the client or browser specifies either of the URLs

or

your server should respectively serve the pages

/www/docs/News/index.html

/www/docs/News/Features/index.html

For satisfactory completion of this project, you are only responsible for serving web pages that actually map to files. Some web pages invoke scripts – for example, WPI’s home page at

/www/docs/index.html

After you can successfully handle ordinary html files, you should have a look at this web page and see what it does.

You should return an error for all requests that do not map to regular files after following these two mapping rules.

Note that Unix and Linux have a rule about programs that bind sockets to ports, namely that port numbers may not be re-used in rapid succession. I.e., if your program binds to port #4242, then after it terminates, you cannot immediately rerun it and bind to the same port again. This rule is instituted to allow time for stale references to the port to flush themselves from the network.

One other thing you have to do is to figure out a way of exiting from your server cleanly — for example, by responding appropriately to a KILL command from the keyboard. Also, make sure that you are intelligent with respect to “/” characters. It does not hurt to have too many, but it does hurt to have too few.

Note: For this project, you will implement HTTP v1.0. Although there should be sufficient information in this assignment description to complete the project, students often find it useful to research the HTTP protocol on the Web. The following link contains an interesting and helpful tutorial:– .

Implementation – HTTP Requests

To implement the basic socket functions, you may use as a starting point the sample code on



The relevant socket functions are socket() to create the socket, bind() to bind the socket to a port, listen() to create a request queue and to start listening for requests, and accept() to accept a connection and create a new socket on which to reply to that connection.

Once you have accepted a connection, your server needs to read and interpret an HTTP request. Below is an example generated by a browser for the page index-t.html located on ccc1.wpi.edu at port #4242.

GET /index-t.html HTTP/1.0

Connection: Keep-Alive

User-Agent: Mozilla/4.7 [en] (X11; U; SunOS 5.7 sun4u)

Host: ccc1.wpi.edu:4242

Accept: image/gif, image/x-xbitmap, image/jpeg, image/png, */*

Accept-Encoding: gzip

Accept-Language: en

Accept-Charset: iso-8859-1,*,utf-8

The first line of the request contains the type of request. For this assignment, you will only need to recognize and handle the GET request. Following the GET request is the name of the object being requested and the HTTP version number that the requestor is using. You must extract the name of the object from the line. For the time being, you may ignore the HTTP version.

The remaining lines in the HTTP request header specify what the web client is capable of receiving. You may ignore them, but you still need to read them. (However, you will need to recognize the “Connection: Keep-Alive” line for the extra credit part below.) Your server should keep reading header lines until it encounters a blank line.

To aid you in reading the HTTP request a line at a time, the routine sockreadline() has been provided. You may find this at



This routine receives a character at a time from a given socket and stores these characters in a NULL-terminated character buffer. It returns when the newline (\n) character is reached. The HTTP specification expects that all lines are terminated with a CR (carriage return) character followed by a LF (line-feed) character. In C and C++, these are represented as “\r” and “\n,” respectively, and they are referred to in the text below as “CR/LF.”

Note: A null line is represented by a standalone CR/LF. This is sometimes confusing, because the previous line was also terminated by a CR/LF. Therefore, you may often see two CR/LF combinations in a row – one to terminate the previous line and one to indicate a null line.

Implementation – HTTP Responses

There are many HTTP response codes, but for this project only need to use only two:– 200 and 404. If you receive a GET request for an object that can be successfully mapped to a file, then you should open the file for reading using the system call open(). If you can successfully open the file, then your server should first send the HTTP response

HTTP/1.0 200 OK\r\n\r\n

This indicates success followed by a blank (i.e., null) line. Subsequently, you should use the read() function to read the contents of the file and send it on to the connection using send() or write(). (The reason for using read() rather than text-based I/O routines is that not all of the content is guaranteed to be text.) When you have completed reading, use close() to close the file. Then close the socket connection using the close() function again. Your server thread is now done handling the request, and it may exit.

If the request is not valid or the object cannot be mapped to a file and successfully opened, then your server should send back to the client the response

HTTP/1.0 404 Not Found\r\n\r\n

This indicates failure. You should then close the socket connection and terminate the thread.

Testing

To test your server, you can use a standard web browser. However, to make testing easier and to be able to see the response headers, you should create a simple Web client. This client should connect to a given port on a given host, send an HTTP request string, and print the response.

You may use command line arguments to control your client. For example:–

% webclient ccc1 4242 /News/index.html

can be used to request the object from port 4242 on the machine CCC1.

Your simple webclient will need to connect to the port and send the GET line, patterned after the request in the Implementation – HTTP Requests section above. Be sure to end it with CR/LF. It should then follow the request with a blank line — i.e., another CR/LF. You may use “HTTP/1.0” as the version. Your webclient should then receive back the response headers and content from the server and print them to the standard output stream.

The webclient lets you see what the response from the server looks like. To see what the request from any client looks like, you should use the optional verbose or basic arguments in the webserver command line. Use the basic argument to print a summary of the request and the response. Use the verbose argument to print out the entire client HTTP request so that you can see what arrived over the socket, followed by your entire HTTP response lines, prior to the file itself. When using either the basic or verbose arguments, be sure to include the thread number or other identification on each line so that you can tell which thread is printing.

Example: Try serving the WPI home page or the CS Department Home page and see what happens. For example, how many separate HTTP requests are made just for that one page?

Beware of requesting images with your simple client, because the content will likely not print very well. In this case, you may want to print only the first few lines of the content. You may test your web client with any standard web server by sending a request to the well-known web server port 80.

For your reference, a sample web client can be found on



This does not do exactly what is requested, but it should serve as guidance for how to build your webclient.

To test your multi-threaded implementation, modify or adapt your webclient to have more than one separate HTTP request open at a time. For example, you might extend the argument list of webclient to specific an arbitrary number of files (up to some reasonable limit). It might then spawn threads for the arguments, synchronizing them so that they all send their web requests as close to the same time as possible. Watch the server debugging output to see what happens.

Extra Credit: Persistent Connections

For extra credit, extend your server implementation to support connections that stay alive beyond a single HTTP request. This may be done by recognizing the line

Connection: Keep-Alive

in an HTTP request header, by implementing a subset of HTTP v1.1, or both. In this case, the thread does not close the connection after it satisfies the request, but it simply keeps reading from the socket input for more requests. The same thread continues to serve the same client so long as the client continues to ask. The thread terminates and closes the connection if either

1. the connection is idle for a reasonable period of time (e.g., a few minutes), or

2. an HTTP request header contains a line of the form

Connection: close

Modify your use of the verbose and basic arguments to show when the webserver opens and closes a connection. Demonstrate that it is capable of supporting several independent connections at a time.

Submission of Project

This project is NOT a team project; it is to be done individually. However, you may discuss among yourselves the subtleties of the HTTP protocol, and you make take general advice from others, provided that you share it with everyone in class by e-mail.

As always, you should not assume that the user or the user’s browser submits correct input. All code must be clearly commented. All output and printouts must be easy to understand and cleanly formatted.

When you do later parts of the project, be sure that you do not corrupt earlier, previously working parts. You may do this by making a copy of the code before developing the later part or by retesting the original code on the earlier part.

Submit your project via myWPI, as with previous projects. Zip all of your files together into a single zip file named “e-mailID-Project3.zip”, where “e-mailID” is replaced by your WPI e-mail ID. Do not make subfolders in your zip file; everything should be at the top level. If you make a mistake, you may resubmit with the same file name.

It is in your interest to make the TA’s life easy. He/she will download your submission, compile it, run it with your test cases, and run it with our own test cases.

• Do not make us figure out how to do this! We would like to simply unzip the file and then execute a simple make command to build your entire application.

• Programs should compile without warnings! You may not use the “-Wno_deprecated” switch without prior permission from the instructor or TA.

Your submission should include

1. Source code for all programs and header files of this assignment

2. The makefiles for building the executable programs

3. Output files showing how your webserver and webclient programs run. In particular, be sure to show that your server supports concurrent requests.

4. A write-up describing what you did and how you did it. Brevity is a virtual; details should be documented as comments in your code.

Grading

Here are the grading guidelines for this project:–

• Successful submission of project via myWPI — 5 %

• Successful build (using make) with no errors or warnings on a CCC machine — 5%

• Clear, cogent write-up describing your work — 15%

• Successful operation of your webserver to fetch assorted web pages from WPI web site using a standard browser — 25%

• Successful operation of your webclient to fetch pages from the WPI web server — 25%

• Successful demonstration by your webclient and by logs generated by the basic or verbose argument that your webserver supports more than one concurrent HTTP request at a time — 25%

Extra credit:–

• Modify your webserver to support multiple, concurrent, persistent connections and successfully demonstrate that it does — 50%.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download