Lab #4 .edu



Unix Lab Assignment 8

CSC332

UNIX Operating System

Name _________________________________

|This lab will discuss the  sort, diff, uniq, and grep command.  |

Log on to your UNIX account. Type in:

            pwd

What was the response to this command? ______________________________________________

You should have the "absolute path name" from the root to your home directory. In the directory structure in Lab 3, the person's home directory was listed as /home/srp. In this example, the "/home" is the partition or group name and the "/srp" is the login name.

You will need the absolute path name that you received when you typed in the pwd above to copy the file employee to another directory. First, make a new directory. Type in:

            mkdir lab8

Change your working directory to lab8 by typing in:

            cd lab8

Verify that you are in subdirectory lab8. What command did you use to do this? _____________________________________________

Use vi or other editors to create a new file named by employee with the contents as follows:

mgt Cooper John 06151995 66000

mgt Davidson Darla 04151992 69500

mgt MacDonald George 06151985 70000

act Smith Thomas 04102002 56000

act Smith Alecia 04121991 65000

mis MacLeod Janice 01021977 90000

mis Mack Joe 02252003 85000

mis Winslow Sarah 02151995 58000

adm Smith Dexter 01021975 100000

mis Bennett Joan 08152001 79000

mgt Neason Elizabeth 10251998 65500

act NeSmith Donald 11301966 99500

Then close the file and type in:

            ls

The file employee consists of information on employees. The column titles and tables are listed only for your information and do not appear in the file. The file follows:

|Dept |Last Name First Name |Date  Hired |Salary |

| | | | |

|mgt |Cooper John |06151995 |66000 |

|mgt |Davidson Darla |04151992 |69500 |

|mgt |MacDonald George |06151985 |70000 |

|act |Smith Thomas |04102002 |56000 |

|act |Smith Alecia |04121991 |65000 |

|mis |MacLeod Janice |01021977 |90000 |

|mis |Mack Joe |02252003 |85000 |

|mis |Winslow Sarah |02151995 |58000 |

|adm |Smith Dexter |01021975 |100000 |

|mis |Bennett Joan |08152001 |79000 |

|mgt |Neason Elizabeth |10251998 |65500 |

|act |NeSmith Donald |11301966 |99500 |

In order to see what the file looks like, type in:

        cat employee

Sort Command

Sort sorts the file on a line-by-line basis. If the first characters on two lines are the same, sort looks at the second characters to determine the proper order. This process continues until sort finds a character that differs between the lines. If lines are identical, it does not matter which one sort puts first. It also uses machine collating sequence--which in this case is ASCII code.  Some important features of  ASCII code is that capital letters have higher priority than lower case letters.  Numbers have  a higher priority than letters. A copy of the ASCII code chart is attached to the end of this lab for your reference.  An important point to remember is that sort is a filter and does not change the contents of the input file.  It takes the contents of the specified input and outputs it in a sorted fashion. In other words, it filters its input.  If the output is not redirected to a file, the output goes to stdout--which means the terminal screen.  The sorted output will not be saved unless the output is redirected to a file.

First, type the following command:

            sort employee

What is the order that employee is sorted in? ___________________________________________

Give a brief description of how the file is sorted. _____________________________________________________________________________________________________________________________________________________________________________________________________

The first sort will sort the Dept field in alphabetic order.  The sort command will sort the first field on the machine's collating sequence if no options are specified (in ASCII sequence on the SUN system).  Listed below are some options that you will use to control the way in which the the sort command   works.  There are other options available that are explained in the man page for sort.

-b     ignore leading blanks - Blanks (TAB and SPACE characters) are normally field delimiters in the input file. Unless you use this option, sort also considers leading blanks to be part of the field they precede.

-f     fold lowercase into uppercase - This considers all lowercase letters to be uppercase letters. Use this option when you are sorting a file that contains both uppercase and lowercase text.

-n     numeric sort - When you use this option, minus signs and decimal points take on their arithmetic meaning and the -b option is implied. The sort utility does not order lines or order sort fields in the machine collating sequence but in arithmetic order.

-r reverse - Reverses the order of the sort.  If it is a numerical sort, the output will be in descending order.

You can sort on the different line fields. There are five line fields (dept, last, first, date hired, and salary). These sequences are bounded by blanks or by a blank and the beginning or end of a line. You can use these line fields to define a sort field. You can instruct sort to skip several fields if you wish. Even though there are five fields, the first field is considered to be +0, the second field is +1, the third is +2, etc. If you wish to sort on the first name field, you would use sort +2 employee. (think of this number as the number of fields to skip before beginning the sort)

Now, sort on the field for last name.

        sort +1 employee

Look at the sorted file. Are all the names sorted in alphabetical order? ______________________

Give a brief description of the output.______________________________________________________________________________________________________________________________________________________________________________________________________________

There is a problem because MacLeod comes before Mack. Sort put "L" before "k" because it arranges lines in the order of ASCII character codes. See the last page of this lab to see the ASCII values associated with the alpha characters.  Also NeSmith comes before Neason.  

In this ordering, uppercase letters come before lowercase letters. You can use the -f option to have the sort command ignore this and sort alphabetically. This option will fold the lowercase letters into uppercase letters and correct the problem from the sort above. The option is placed before the field number that is to be used in the sort. Sort the file again using the following command:

            sort -f +1 employee

What happens when you sorted it this time? ________________________________________________________________________________________________________________________________________________________________________________________________________

The next sort will be on the "date hired" field. Also save these next three sort routines into files by using the redirect symbol. Remember that the redirect symbol saves, to a file, output that would normally go to the screen (stdout). Type in:

            sort +3 employee > hired1

Use the cat command to list out the file hired1 to see the results. Are the hire dates sorted in order? _______________  

If not, what has happened? _____________________________________________________________________________________________________________________________________________

This sort did not put the numbers in order but put the shortest name first in the sorted list and longest name last. With the +3, sort skips the first three line fields and counts the spaces or blanks after the third line field as part of the sort field. The ASCII value of a space character is less than that of any other printable character, so sort puts the date hired that is preceded by the greatest number of spaces first.

One solution to this problem is to eliminate the leading blanks by using the -b option. However, since dates hired is a numeric field, you can use the -n option. 

Type in:

            sort -n +3 employee > hired2

What is the result of the sort? _______________________________________________________________________________________________________________________________________________

When you sorted on the date hired, it basically sorted on the month hired. The numeric sort treats this date field as a single number and sorts in a true numeric order.   Hence, the months 01 (Jan) will come before the months 02 (Feb) regardless of the day and year of hire.  You may want to sort on the  year hired instead. This can be done also. You can not only skip line fields but you can also skip characters in one line field as well. The +3.4 skips three line fields and then skips four characters before it sorts. Remember you must take care of the blank spaces also. Type in:

            sort -nb +3.4 employee > hired3

What was the result?   ________________________________________________________________________________________________________________________________________________

Briefly explain what happened. ____________________________________________________________________________________________________________________________________________

Unfortunately, the sort works differently on different machines. You will need to be careful using the sort until you understand how the machine you work on will use the sort.

Sorting on more than one field

UNIX also allows you to sort on more than one field. One example would be to sort on the department and on salary. We can see if people in one department are getting paid more than other departments. Of course there are other things that may be factors on salary such as years on the job. Type in:

            sort +0 +4n employee

What was the result? ________________________________________________________________________________________________________________________________________________

Were both columns sorted? __________________________

As you can see, there are complications! First of all, note that the n was used after the 4. That is because the first field is alphabetic and we do not want to sort the entire file with a numeric sort. So in this case, the n is used after the field number.

Next, the command line instructs sort to sort on the entire line (+0) and then make a second pass, sorting on each entire line. Look at the first field and second field in the first two lines. The sort routine sorts on the first field and then goes to the second field. It matches on the names and does not go further. In order to stop the sort routine from going past the +0 field, you need to define where the first sort ends--in this case, it will be -1 (one). This will stop the sort before going to the next field and will then go on to the salary field. Type in the next command.

            sort +0 -1 +4n employee

What were the results of this output. Was the file sorted on both the department and also the salary field?

Sorting data with more than one word in a field

In the data set example above, each new field was started after a blank space. Sometimes you want to have a field that contains two or more words. Examples would be names of books or their authors.  The file books contains fields with multiple words in it.

Copy the file books from the directory /tmp/csc3321/books to your lab8 directory by using the following command:

        cp /tmp/csc3321/books books

                     or

        cp /tmp/csc3321/books .

The file books contains the following information:

|Subject |Book Title |Author's |Author's |Pub. |Price |

| | |Last Name |First Name |Date | |

|UNIX: |Introduction to UNIX: |Wrightson: |Kate: |2003: |45.00: |

|UNIX: |Just Enough UNIX: |Anderson: |Paul: |2003: |39.00: |

|UNIX: |Bulletproof UNIX: |Gottleber: |Timothy |2002: |48.00: |

|UNIX: |Learning the Korn Shell: |Rosenblatt: |Bill: |1994: |35.95: |

|UNIX: |A Student's Guide to UNIX: |Hahn: |Harley: |1993: |24.50: |

|UNIX: |Unix Shells by Example: |Quigley: |Ellie: |1997: |49.95: |

|UNIX: |UNIX and Shell Programming: |Forouzan: |Behrouz: |2002: |80.00: |

|UNIX: |UNIX for Programmers and Users: |Glass: |Graham: |1993: |50.00: |

|SAS: |SAS Software Solutions: |Miron: |Thomas: |1993: |25.95: |

|SAS: |The Little SAS Book, A Primer: |Delwiche: |Lora: |1998: |35.00: |

|SAS: |Painless Windows for SAS Users: |Gilmore: |Jodie: |1999: |40.00: |

|SAS: |Getting Started with SAS Learning: |Smith: |Ashley: |2003: |99.00: |

|SAS: |The How to for SAS/GRAPH Software: |Miron: |Thomas: |1995: |45.00: |

|SAS: |The Output Delivery System: |Haworth: |Lauren: |2001: |48.00: |

|SAS: |Proc Tabulate by Example: |Haworth: |Lauren: |1999: |42.00: |

|SAS: |SAS Application Programming: |Dilorio: |Frank: |1991: |35.00: |

|SAS: |Applied Statistics & SAS Programming: |Cody: |Ronald: |1991: |29.50: |

Notice that you have book titles that contain more than one word. The names of the books have spaces in the titles. In this case, the entire title is one field. In order to sort a file on fields of this type, you need to add field delimiters. When you enter the data into a data set, you would use some character to tell where the fields end and the next one begins.  In this case the delimiter character is the : (colon). It could be some other character. However, the delimiter character must be unique and not a character that will be in the regular field. You must use the -t option when sorting this file. This option is:

-tx    set field delimiter - When you use this option, replace the x with the character that is the field delimiter in the input file. This character will be interpreted as an end of field during a sort.

In order to sort this file on the publish date,  issue the command:

        sort -n -t: +4 books

What is the result? ________________________________________________________________________________________________________________________________________________

Try another sort using the books file. Sort on the price field in reverse. Type in the following:

        sort -nr -t: +5 books

What was the result? _______________________________________________________________________________________________________________________________________________

Try one more sort, this time saving the sort to a file. This sort will be on two fields. Put it into a new file called newbooks. Type in:

        sort -t: +0 +1 books > newbooks

Look at the file, newbooks. What does the sorted file look like now?

__________________________________________________________________________________________________________________________________________________________________

Use the vi editor to view and edit  newbooks. Add your name to the top of the file. Save the file.

***************************************************************

                Print out the file  "newbooks" and attach it to this lab.

***************************************************************

Diff command

The diff command displays differences between two files on a line-by-line basis. It displays the differences as instructions that you can use to edit one of the files ( using the  vi editor) to make it the same as the other. When you use diff, it produces a series of lines containing  Append (a), Delete (d), and Change (c) instructions. Each of these lines is followed by the lines from the file that you need to append, delete, or change. A less than symbol () precedes lines from file2.

You will now need four files.  These are telnos, telnos2, telnos3, telnos4.  These files are all short files that contain names,  departments, and telephone numbers. This is what they look like.

|telnos |telnos2  |

|Hale Elizabeth Bot   744-6892 |Hale Elizabeth Bot   744-6892 |

|Harris Thomas  Stat  744-7623 |Harris Thomas  Stat  744-7623 |

|Davis Paulette Phys  744-9579 |Davis Paulette Phys  744-9579 |

|Cross Joseph   MS    744-0320 |Holland Tod    A&S   744-8368 |

|Holland Tod    A&S   744-8368 | |

| telnos3 |telnos 4  |

|Hale Elizabeth Bot   744-6892 |Hale Elizabeth Bot   744-6892 |

|Harris Thomas  Stat  744-7623 |Smith John     Comsc  744-4444 |

|Smith John     Comsc 744-4444 |Davis Paulette Phys  744-9579 |

|Davis Paulette Phys  744-9579 |Cross Joseph   MS    744-0320 |

|Cross Joseph   MS    744-0320 |Holland Tod    A&S   744-8368 |

|Holland Tod    A&S   744-8368 | |

To make it easier to copy you can use the * (wildcard) to copy these files. Type in the command:

                cp /tmp/csc3321/telnos* .  

Remember the . (period) means current directory and will copy all of the telnos files at one time and assign them the names that they have in the instructor's file

  In order to see how diff works, type in:

                diff telnos telnos2

What was the result?

________________________________________________________________________________________________________________________________________________________________________________________________________________________

The difference between these two files (telnos and telnos2) is that the 4th line in telnos is missing from telnos2. The first line that diff displays (4d3) indicates that you need to delete the 4th line from file telnos to make the two files match. The 4 is the line number and the (d) is delete. The line number to the left of each of the a,c, or d instructions always pertains to file1. Numbers to the right of the instructions apply to file2. The diff command assumes that you are going to change file1 to file2. The next line that diff displays starts with a less than () greater than sign which means that the extra line is in file2. The a means you must append a line to the file telnos after line 2 to make it match telnos3. Append means to add on to the end.  Next is an example of the change feature. Type in the following command:

                diff telnos telnos4

What was the result? __________________________________________________________

What lines do you need to change in order to make the two files alike? ________________________________________________________________________________________________________________________________________________

Notice that the three hyphens indicate the end of the text in the first file that needs to be changed and the start of the second file that needs to be changed.   Next, copy telnos to telnos5. What command did you use to do this? ________________________________________________________________________

Next, type in:

                diff telnos5 telnos2

What was the answer that you received? _________________________________________________________________________________________________________________________________________________________________________________________________________________

Use the vi editor to change telnos5 to match the file telnos2.   Then check to see if they are now alike.

What command did you use? _________________________________________________________

What was the result? __________________________________________________

What is the output of the diff command when the files match?

_______________________________________________________________________________

When the two files are alike, there is no response.  Unfortunately, Unix is not always user-friendly. 

Uniq Command

The uniq command displays a file, removing all but one copy of successive repeated lines. If the file has been sorted, uniq ensures that no two lines that it displays are the same. Sort telnos and telnos3 and then send them to a new file called tel2. Type in the following:

                sort telnos telnos3 > tel2

Next look at the file, tel2. What does it contain?___________________________________________________________________________________________________________________________________________________________________________________________________

Next issue the command:

                uniq tel2

What was the result?__________________________________________________________________________________________________________________________________________________________________________________________________________________

The uniq command has three options. These are:

-c   Causes uniq to precede each line with the number of occurrences of the line in the input file

-d   Displays only lines that are repeated

-u   Displays only lines that are not repeated

Issue the command:

                uniq -u tel2

What was the result?__________________________________________________________________________________________________________________________________________________________________________________________________________________

Next, issue the command:

                uniq -d tel2

What was the result?__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Grep Command

The grep command searches one or more files for a specified pattern. Normally each matching line is  copied to the standard  output.  Two options that can be used with grep are:

                -i             ignore case of alphabetic characters

                -n             precede each line printed by its relative line number in the input file

Use the line numbers in the output of grep to answer the questions in the following sections.  Issue the command:

               grep -n H telnos

What was printed? ________________________________________________________________________________________________________________________________________________________________________________________________________________________

Issue the command:

              grep -ni m telnos

What was printed this time?___________________________________________________________________________________________________________________________________________________________________________________________________________________

These are the files that should be in lab8. 

employee   hired1     hired2     hired3  

books  newbooks   telnos   telnos2  telnos3  telnos4 telnos5 tel2

Please indicate any areas of this lab that were difficult to understand.

________________________________________________________________________________________________________________________________________________________________________

[pic]

A copy of the ASCII chart is included here so you can refer to it to understand the priority order used by the computer to read certain characters.

 

|  |  |ASCII CHART |

|Right

Digit |0 |1 |2 |3 |4 |5 |6 |7 |8 |9 | |Left

Digit | | | | | | | | | | | | |3 | | | | |! |" |# |$ |% |& |' | |4 | |( |) |* |+ |, |- |. |/ |0 |1 | |5 | |2 |3 |4 |5 |6 |7 |8 |9 |: |; | |6 | |< |= |> |? |@ |A |B |C |D |E | |7 | |F |G |H |I |J |K |L |M |N |O | |8 | |P |Q |R |S |T |U |V |W |X |Y | |9 | |Z |[ |\ |] |^ |_ |` |a |b |c | |10 | |d |e |f |g |h |i |j |k |l |m | |11 | |n |o |p |q |r |s |t |u |v |w | |12 | |x |y |z |{ || |} |~ | | | | |Codes 00-31 and 127 are nonprintable control characters. Code 32 is the blank character. It has precedence over all other characters.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download