Vectors and Vector Operations



3.3 Hashing

Hashing is a method for storing and retrieving information. It is often very fast, much faster than linear search and even faster than binary search. Suppose we have n items. To search for one of these using linear search requires an amount of time that is O(n). To search using binary search requires time O(log n). However, if properly implemented, searching using hashing requires time O(1). Thus the amount of time is constant, provided n remains in bounds. In a sense this would be true for linear and binary search, but practically speaking hashing is usually better.

We illustrate hashing by storing and retrieving names in an array. Suppose the array is called Customer. For simplicity we shall make it not too large. Suppose it has 111 elements with indices starting out at 0 and going to 110.

We want to store some names in this array. For example,

Williams

Smith

Johnson

However, instead of putting them in the first 3 locations, what we do is compute from each name a number which is the location in the array where it will be stored. We compute the location where the name will be stored using a function called a hashing function. There are many possible hashing functions. Let’s look at one.

It is convenient to take advantage of how names are commonly stored on a computer. The letters of the name are usually stored sequentially in a computer’s memory with each letter being stored in 1 byte (8 bits) with the letter represented by its ASCII code. At the end of this section is an ASCII table with the bit representation of each letter. For example, a capital A is represented by 41 (hex) or 65 (decimal) with the other capital letters following in sequence. A lower case a is represented by 61 (hex) or 97 (decimal) with the other lower case letters following in sequence. Thus Williams is represented using 8 bytes as follows. We give the contents in hex.

The idea of the hashing function is to take some of the characters, treat them as a number and divide by the length of the array and take the remainder. The remainder is where the name will be stored in the array.

To simplify matters, let’s take the first 4 letters of the name and treat that as an integer. If there are fewer than 4 letters, we take however many there are. For example, with the name Williams we would just take Will.

At this point we need to look at one little quirk of the processors in personal computers. When the CPU takes 4 bytes of memory and treats it as an integer, it reverses the order of the bytes when treating it as an integer. Thus the first byte is the low order byte of the integer and the fourth byte is the high order byte of the integer. In the example of Will, the W is the low order byte and the right most l is the high order byte. See table at right where we also include the decimal representation of each letter.

If we want to see what number this represents in decimal we must compute

108(2563 + 108(2562 + 105(256 + 87

= (108)(16,777,216) + (108)((65,536) + 26,880 + 87

= 1,811,939,328 + 7,077,888 + 26,880 + 87

= 1,819,044,183

In the above example the array length is 111, so we need to mod 1,819,044,183 by 111.

1,819,044,183 mod 111 = 48

So Williams is stored in location 48 of the array.

We can use the algebra of mod to reduce the size of the numbers we are working with at each step of the calculations. In particular, we can then do the computations entirely by hand without a calculator. The main idea is that in sequence of additions, subtractions and multiplications followed by a mod, we can replace any number by another number that is the same when one mod's by 111.

[108(2563 + 108(2562 + 105(256 + 87] mod 111

= [ 108((256 mod 111)3 + 108((256 mod 111)2 + 105((256 mod 111) + 87] mod 111

= [ (- 3)((34)3 + (- 3)((34)2 + (- 6)((34) - 24] mod 111

= [ - 102((34)2 - 102(34 - 2(102 - 24] mod 111 = [ 9((34)2 + 9(34 + 2(9 - 24] mod 111

= [ 3(3(34(34 + 3(3(34 + 2(9 - 24] mod 111 = [ 102(102 + 3(102 + 18 - 24] mod 111

= [ (-9)((-9) + 3((-9) - 6] mod 111 = [ 81 - 27 - 6] mod 111 = 48 mod 111 = 48

Remark: Most calculators have a way to do mod. On the HP 48G to do n mod p, do the following. (1) MTH (2) REAL (3) n (4) p (5) MOD. Some calculators allow one to do calculations in hex. On the HP 48G to convert a number from hex to decimal do the following (1) MTH (2) BASE (3) |( # (4) 6 (C 6 (C 6957 ((| h (5) ((| h (6) B ( R.

Specifing a hashing function in mathematical notation differs somewhat from person to person. Here is an example to illustrate one method.

Example 1. Specify a function that takes a character string s, treats the first four characters as an integer with the first character as the low order digit and then mod's by 111.

Solution. Let

s = a character string

p = Length of s

sj = (j+1)st character of s. So s0 = first character, s1 = second character, … sp-1 = last character of s.

aj =

h(s) = [a0 + 256a1 + 2562 a2 + 2563 a3] mod 111

Here is a C program which illustrates the hashing method we have been discussing.

#include

#include

#include

using namespace std;

int main()

{ char Name[20], Name4[4];

int i, NameEnd, p;

unsigned long int *n;

// Read a name.

cout 3E 62 ^ 5E 94 ~ 7E 126

? 3F 63 _ 5F 95 Del 7F 127

-----------------------

Customer

|0 | |

|1 | |

|2 | |

|3 | |

| | |

| | |

| | |

| | |

| | |

|110 | |

|W |i |l |l |i |a |m |s |

|57 |69 |6C |6C |69 |61 |6D |73 |

|W |i |l |l |

|57 |69 |6C |6C |

|l |l |i |W |

|6C |6C |69 |57 |

|108 |108 |105 |87 |

Customer

|0 | |

|1 | |

|2 | |

|3 | |

| | |

| | |

|48 |Williams |

| | |

| | |

|110 | |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download