Project 5: Base64

Project 5: Base64

CS 200 ? 20 Points Total Due Wednesday, March 24, 2017

Objectives

Write a program or programs that allow a message to be converted back and forth between base64 encoding.

Practice programming with bitwise operations.

Overview

The SMTP (Simple Mail Transfer Protocol) communication protocol is the technique by which computers pass email messages to one another. This system has been in place for decades, and as such has some significant (although understandable) limitations.

The most significant limitation is that only printable ASCII characters between ASCII 32 (space) and ASCII 126 (tilde) can be communicated via email. This means that every email must internally consist of only the 95 different bytes in that range.

This makes it impossible to directly transmit or attach "binary" data such as pictures and music files, since those files consist of bytes with values between 0 and 255.

To work around this problem, binary files are converted from a base 256 numbering system (256 values per byte) into a base 64 numbering system (64 values per byte). Because 256 and 64 are both powers of 2, it is easy to convert between the two using bitwise operators in the same way that it's easy to convert between hexadecimal and octal.

Base 64 notation uses 64 of the 95 printable characters to represent its digits. It makes sense to use the characters 0-9, A-Z, and a-z - but that only adds up to 62 characters total. Two more characters must be arbitrarily chosen to give 64 symbols total.

MIME Base64

To be able to distinguish between emails with plain text data and those with encoded binary data, another layer of functionality was added on top of SMTP called MIME (Multipurpose Internet Mail Extensions). The MIME standard guarantees that all binary data is converted to printable ASCII characters for actual transmission, marking the MIME-Type of each data block so it may be correctly converted back on the other end.

As mentioned, deciding which symbols represent a base 64 numbering system is somewhat arbitrary - in fact, there are several variations of base 64 encoding schemes that use different symbol sets. The MIME base 64 standard is called "base64" - one word. Other base 64 systems, such as "uuencode" and "binhex", are never referred to as "base64".

The base64 symbols representing digits values 0 to 63 are as follows:

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Oddly enough, the value 0 is represented by 'A' rather than '0' - a '0' actually represents the value 52. 63 is represented by '/'.

There is one more character used with base64 - the equals sign '='. Since every 3 data bytes is converted to exactly 4 base64 bytes (8 bits * 3 == 6 bits * 4), '=' signs are used to pad the end of a base64 sequence so that the entire set is a multiple of 4 bytes in size. There will be either zero, one, or two equals signs at the end, which mean that there are three bytes, two bytes, or one byte (respectively) encoded in those four characters.

Here is a sample of base64 encoding. Your code does not have to produce this output, I only present this to show you how the conversion works. Your code only needs to accept a string and display the converted output.

Original Data:

Original data: Zork

ASCII codes: In binary:

Z = $5A o = $6F r = $72 k = $6B 01011010 01101111 01110010 01101011

Regroup as sets of 6 bits and convert to appropriate base64 digit:

Groups of 8: 01011010 01101111 01110010 01101011

Groups of 6: 01011010 01101111 01110010 01101011

010110 100110 111101 110010 011010 11xxxx xxxxxx xxxxxx

In base 10:

22 38 61 50 26 48 N/A N/A

Base64 Output: Wm9yaw==

Requirements

Write one or two C/C++ programs that allow a message to be typed in and converted to base64 and back. There are two major steps for converting to base64: first you must extract out the groups of 6-bit numbers from 3 byte sets, then figure out what letters they correspond to. To convert from base64, determine the value of each base64 symbol and recombine the four 6-bit values into three bytes.

Tips:

Determine the bitwise operations necessary to extract or recombine a single set of values, then reapply that operation to every set. For example, you might write a function that takes a string of 3 unencoded characters and places them into an array of 4 encoded characters - and a function that takes 4 encoded characters places them into an array of 3 unencoded characters.

To convert a number between 0-63 into its proper symbol, use the number as an index into a string containing all the symbols in order.

To convert a base64 symbol back into a number 0-63, find the index it's located at in a string containing all the symbols in order.

When converting to base64, encode every character of the input. When converting from base 64, you can ignore anything except one of the 64 valid symbols and the equals sign.

See the entry for "base64" on Wikipedia for more examples. You're welcome to use the code at the bottom of this assignment as a starting point for

your program. It will compile and run; all it does it take input and split it into groups of three characters and adds a `-`. You only need to modify the `encode' function to take groups of three characters and encode them into groups of four base64 characters. Don't forget to handle the special case where you only get one or two characters instead of three, so you need to pad with `='s appropriately. Once you have encoding working, you can take the same starter code and modify it to do decoding. You'll have to modify the code in `main' to divide input in groups of four and receive output in groups of three, and then write a version of `encode' to decode instead. It's Ok with me if you combine both programs into a single one, but you don't have to. Be sure to tell me in your report how to run your programs, though, so I can test them. I won't guarantee to take the time to figure it out from your source code, so you might lose points if I can't test your program.

Project Report

The final step of this assignment is to create a report consisting of a cover page, an overview of the project, sample output, and the source code. See Assignment Policies on either the class website or Bb Learn.

Starter Framework (in C++)

1 //=============================================================================

2 // Base64 Encoding Starter Framework

3 // 2011.02.16 by Abe Pralle

4 //

5 // Reads a string of text and prints a resulting string of text where every

6 // 3 original characters have been transformed into 4 result characters

7 // consisting of the first three characters reversed followed by a hypen.

8 //

9 // Example output:

10 // Enter text: ABCDEFGHIJKLMNOPQRSTUVWXYZ

11 // You typed in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" (26 characters)

12 // Encoded value: CBA-FED-IHG-LKJ-ONM-RQP-UTS-XWV-

13 //=============================================================================

14 #include

15 using namespace std;

16

17 void encode( unsigned char* src, unsigned char* dest );

18

19 int main()

20 {

21 // Declare arrays to store original and encoded strings.

22 unsigned char st[80];

23 unsigned char encoded[120];

24

25 // Read in original string.

26 cout ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download