Homework Assignment #4 - Strings and Files



Homework 4 Assigned: September 18, 2014Due: September 25, 2014, 6:00 PM00CS-1004, Introduction to Programming for Non-majors, D-term 201400CS-1004, Introduction to Programming for Non-majors, D-term 2014 Homework 4 — Strings and FilesObjectivesBecome familiar with using strings, lists, and filesWrite a program in more than one moduleAssignment — Create a List of Distinct Words in a set of Text DocumentsWrite a program that opens and reads more or more English text files, scans each one to accumulate a list of individual words appearing in that file, and then, when each input file has been scanned, writes to another file the list of unique words appearing in the entire set of inputs files — in alphabetical order — along with counts denoting the number of occurrences of each word.Your program should ignore the case of words. For example, ‘This’ and ‘this’ are the same word. On the other hand, singular and plural words are usually different — for example ‘file’ and ‘files’ — as are different verb forms such as ‘eat’ and ‘eats.’A sample output file should look like the following:–166a25and11as 15each2file4files109in4input98it99of3open6program18read152the41this3under30would-------------17Total number of different wordsTo allow for very long input files, the field width of the number of occurrences of each word should be at least four decimal digits. The counts need to be right-justified in the output. You should also total and print the number of distinct words. All of the words should be in lower case.For the basic problem, you may treat hyphenated words are distinct words:– for example, ‘deep-seated’ would be treated as ‘deep’ and ‘seated.’ You may also ignore possessives and contractions such as “don’t” and “Bob’s” by throwing away the apostrophes.For extra credit — after you get the rest of the program working — you may go back and handle these so that they show up as a user might expect.For debugging, you may use the following text files:– fun, you may use any of Shakespeare’s plays, which can be downloaded from the Internet.Structure and organizationStructure this homework in three or more separate .py modules. No individual function in any file may be more than one screen in length.One module should handle the opening and scanning of files.A separate module should handle the accumulation of words, duplicate elimination, and counting.Yet a third module should organize and write out the output information.In addition, it is suggested that you have a separate “wrapper” module for invoking your program with test cases. During development, you will want the wrapper to prompt you for simple test cases. However, when the program is finished, it should read the names of files from the command line (in Macintosh and Linux) or from a command prompt (in Windows). The format of the command line should bepython HW4.py outputFile inputFile1 inputFile2 ...That is, HW4.py is the name of Python program to be executed. The next command line argument is the name of the output file to be created or overwritten. Following that are one or more input file names, each one of which is to be read and scanned for text.SuggestionsA useful approach is to break apart the input text into individual words, strip out the punctuation, convert the words to lower case, and then store them in a list. Once the list has been accumulated, sort it using the list method sort(). This will, of course, result in a lot of duplicate words, but they will now be in order.In a separate function, go through the list. For each word, count the duplicates, and then add the word and its count to a new list. At the end of this process, the second list will contain exactly the information you need to write to the output file.The output file needs to be formatted correctly, with the word counts right-justified on each line. It is suggested that you use the string method format(), described in §5.8.2 of the mand line argumentsMost programs in real life take commands from a command line or a command prompt. The general format of such a command iscommandName argument1 argument2 argument3 ...The command name is, by convention, argument0 and is, in general, the name of the file containing the program that implements the command. Each argument is a string. Also by convention, the operating system loads the file containing the command, prepares a list of the arguments along with a count, and then calls the function named main(), passing the list and count to it. The main() function then picks apart the list of arguments, readies the program, and runs it, passing the argument strings to its various internal mand line arguments are supported by Windows, Macintosh, Linux, and most other systems. Graphical user interfaces (GUI’s) typically associate specific file types with specific programs. For example, .docx is associated with Microsoft Word, .pdf is associated with Adobe Acrobat, and .py is associated with Python. In most cases, “opening” a file in the GUI causes the operating system to create a command line containing the associated program name and the file name and then to “run” that command as if it had been typed into a command shell or command prompt.In some installations, Python programs can be run this way — by simply double-clicking or opening them in the GUI. But in many cases, Python is not sufficiently integrated with the GUI. Therefore, it has to be run separately from a command line or command prompt. In this case, you would run the program as follows:python3 HW4.py argument1 argument2 argument3 ...The number of arguments is, of course, variable and depends upon the program.To get access to these arguments in Python, import the sys module. The arguments are then available in a list named sys.argv. (“argv” is a traditional name used in Unix and Linux meaning “argument vector.”). Note that sys.argv[0] is the file name of your “main” Python program — i.e., the file that is “run” by the Python interpreter. The length of the list is len(sys.argv). GradingThis project is to be carried out in two-person teams. Both team members will receive the same project grade.This assignment is worth 100 points, allocated as follows:–Logical organization into at least three separate .py files – 10 pointsWrapper for testing, prompting user – 10 pointsInput – 30 pointsOpening file for reading(7)Scanning for words, insertion into list (20)Closing file and handling multiple input files(8)Output – 50 pointsCombining lists from individual inputs (5)Sorting (10)Removing duplicates and counting(15)Formatting individual words using list method format() (10)Writing output, closing the file, cleaning up(10)Extra CreditHandling contractions and possessives, including enough different test cases to show that this works – 10 pointsTreating hyphenated words as one, including test cases to show that this works – 5 pointsImplementing command line arguments – 10 pointsPenaltiesPenalty for any function longer than 50 lines – 10 points per function ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download