CSCE 2014 – Programming Project 3 Midpoint Due Date – 02 ...

CSCE 2014 ¨C Programming Project 3

Midpoint Due Date ¨C 02/24/2021 at 11:59pm

Final Due Date ¨C 03/03/2021 at 11:59pm

1. Problem Statement:

The goal of this programming assignment is to gain experience with recursive

binary search, and also reading and processing text files. Your task is to write a

program that can take in an English book and create an abridged version of the book

written using only the 1000 most common English words, proper names, and basic

punctuation marks. All other words in the original book should be removed.

You will be given a data file ¡°top1000.txt¡± with the top 1000 most commonly used

English words based on frequency analysis of a collection of novels. Each line in the

data file consists of an integer rank (1 = most common, 1000 = least common)

followed by the word. To simplify your analysis, all of the upper case letters have

been converted to lower case, and the file is sorted in alphabetical order.

To create the abridged version of an input document, you must read the words one

at a time, convert the word to lower case, and use binary search to look the word up

in your dictionary data structure to see if it falls in the top 1000. If it does, the

original word should be printed in the abridged book.

In order to determine if a word is a proper name, you should look at the first letter

in the word, and if it is a capital letter and the word is not the first word in a

sentence, then you can assume the word is a proper name, and you should print the

proper name in your abridged book. This approach may not be perfect, but it should

work most of the time.

Once you have the top 1000 words and the proper names printed properly, you can

extend your program to print the most five common punctuation marks (period,

comma, semicolon, question mark, and exclamation point). You can ignore all other

ASCII characters, and print a space in its place in the abridged book.

To test your program, you will be given five samples of text taken from the

beginnings of five well-known public domain books. You can use these short

documents to create your abridged books. Hopefully we will be able to recognize

and understand the abridged book created by your program.

book1.txt from Anne of Green Gables.

book2.txt from David Copperfield.

book3.txt from Adventures of Huckleberry Finn.

book4.txt from The Time Machine.

book5.txt from The Jungle Book.

2. Design:

There are two essential problems you need to solve in order to complete your

program. First, you need some way to read and store all 1000 words and their

corresponding ranks, and some way to search this data structure to look up a word

and find its rank. The most natural way to do this would be to create a "Dictionary"

class that has a "read_file" method to load words and their ranks into private arrays,

and a "binary_search" method to recursively search these arrays to look up the rank

of a word. To test and debug this class, you can write a tiny main program that

prompts the user for words, and prints out their corresponding ranks. See the

programs "dictionary.cpp" and "numbers2.cpp" in the source directory for some

sample code.

The second problem you must address is the reading and processing of the input

document. At one level, this is relatively easy because you just need to read an input

file one word at a time until the end of file is reached, and look up each word using

your Dictionary class above to see if the word is in the top 1000 or not. The tricky

part of this process is dealing with upper case letters, numbers, and other characters

in the input file. You need to convert upper case letters [A..Z] to lower case letters

[a..z], and remove all other characters from the word before you look the word up in

the dictionary. If the character that was removed was one of the five punctuation

marks listed above, you should print character this after the word is processed.

3. Implementation:

To implement your project, you should break your code down into multiple files

using techniques discussed in lab and in class. You are welcome to look at programs

on the class website for sample code to assist in the implementation of this project.

As always, it would be a good idea to start with "skeleton methods" to get something

to compile, and then add the desired code to each method incrementally writing

comments, adding code, compiling, debugging, a little bit at a time. Once you have

the methods implemented, you can create a main program with s simple menu

interface that calls these methods to complete your project.

Remember to use good programming style when creating your program (good

names for variables and constants, proper indenting for loops and conditionals,

clear comments). Be sure to save backup copies of your program somewhere safe.

Otherwise, you may end up retyping your whole program if something goes wrong.

4. Testing:

Once your program is fully debugged, copy your source code and test files to turing

using FileZilla or a similar tool. Then login to turing and do the following to

complete your program testing:

?

?

?

?

?

?

?

?

?

Type ¡°script¡± to start recording your testing session.

Type ¡°g++ -Wall *.cpp ¨Co project3¡± to compile your project.

Type ¡°./project3¡± to run your program.

Type in the name of the book file you want to process.

Your program should print the abridged book to the screen.

Run your program again to process a second book.

Type ¡°exit¡± to finish recording your testing session.

Copy the file ¡°typescript¡± from turing onto your local computer.

Include ¡°typescript¡± with your code and project report when you upload the

project into blackboard.

5. Documentation:

When you have completed your C++ program, write a short report using the project

report template describing what the objectives were, what you did, and the status of

the program. Does it work properly for all test cases? Are there any known

problems? Save this report to be submitted electronically.

6. Project Submission:

In this class, we will be using electronic project submission to make sure that all

students hand their programming projects and labs on time, and to perform

automatic plagiarism analysis of all programs that are submitted.

When you have completed the tasks above, copy all of your source code, your

¡°typescript¡± file, and your project documentation into a folder called ¡°project3¡±.

Compress this directory into a single ZIP file called ¡°project3.zip¡±, and upload this

ZIP file into Blackboard. The GTAs will download and unzip your ZIP file and

compile your code using ¡°g++ -Wall *.cpp¡± and test it to verify correctness.

The dates on your electronic submission will be used to verify that you met the due

date above. All late projects will receive reduced credit:

?

?

?

?

10% off if less than 1 day late,

20% off if less than 2 days late,

30% off if less than 3 days late,

no credit if more than 3 days late.

You will receive partial credit for all programs that compile even if they do not meet

all program requirements, so handing projects in on time is highly recommended.

7. Academic Honesty Statement:

Students are expected to submit their own work on all programming projects,

unless group projects have been explicitly assigned. Students are NOT allowed to

distribute code to each other, or copy code from another individual or website.

Students ARE allowed to use any materials on the class website, or in the textbook,

or ask the instructor and/or GTAs for assistance.

This course will be using highly effective program comparison software to calculate

the similarity of all programs to each other, and to homework assignments from

previous semesters. Please do not be tempted to plagiarize from another student.

Violations of the policies above will be reported to the Provost's office and may

result in a ZERO on the programming project, an F in the class, or suspension from

the university, depending on the severity of the violation and any history of prior

violations.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download