Introduction
The idea of a HashTable is to allow many of the different possible keys that might occur to be mapped to the same location in an array under the action of the index function. Then there will be a possibility that two records will want to be in the same place, but if the number of records that actually occur is small relative to the size of the array, then this possibility will cause little loss of time. Even when most entries in the array are occupied, hash methods can be an effective means of information retrieval.
Hash Function
The two principal criteria in selecting a hash function are as follows:
- It should be easy and quick to compute.
- It should achieve an even distribution fo the keys that actually occur across the range of indices.
Modular Arithmetic
Convert the key to an integer, divide by the size of the index range, and take the remainder as the result.
Collision Resolution
Hash funciton may map several different key to the same location in an array, which cause a collision. There are two main solutions to this problem: Open Addressing and Chaining.
Open Addressing
1. Linear Probing
The simplest method to cope with a collision is to start with the hash adress and do a sequential search throught the array for the desired key or an empty location. Noted that the table should be built circular so that when the last location is reached, the search proceeds to the first location of the array.
2. Quadratic Probing
Although linear probing is easy to take out, there is a major drawback, that is, as the table becomes about half full, there is a tendency toward clustering, which make it less efficient to find an empty position. So we may use an increment function to reharsh the index. Quadratic Probing is an effective version of increment function. It probes the table at locations h+1, h+4, h+9, … .
Quadratic probing substantially reduces clustering, but it is not obvious that it will probe all locations in the table, and in fact it does not. When the hsh_size is a prime number, quadratic probing reaches half the locations in the table.
3. Random Probing
Random probing is also a version of increment funciton. We use a pseudorandom number generator to obtain the increment. The random number generator used should be one that always generates the same sequence provided it starts with the same seed. The seed could be specified as some function of the key. This method is excellent in avoiding clustering, but is likely to be slower.
Implementation
First we give the implementation of Key and Record.
1 | class Key: public String { |
The method key_letter(int position) extract the character in a particular position of a key. And there is a vonversion operator that provides the Key of a Record. Here we may not give the details about how to implement these methods.
Then I give the definition of HashTable.(HashTable.hpp)
1 | class HashTable { |
HashTable()
precondition: None.
postcondition: The HashTable has been created and initialized to be empty.
1 | HashTable::HashTable() {} |
clear()
precondition: None.
postcondition: The HashTable has been cleared and is empty.
1 | void HashTable::clear() { |
insert(const Record &new_entry)
precondition: None.
postcondition: If the HashTable is full, a code of OverFlow is returned. If the table already contains an item with the key of new_entry a code of the duplicate_error is returned. Otherwise: The Record new_entry is inserted into the HashTable and Success is returned.
1 | Error_code HashTable::insert(const Record &new_entry) { |
retrieve(const Key &target, Record &found)
precondition: None.
postcondition: If an entry in the HashTable has key equal to target, then found takes on the value of such an entry, and Success is returned. Otherwise, not_present is returned.
1 | Error_code HashTable::retrieve(const Key &target, Record &found) const { |
Up to now, I have said nothing about deletion. Because the way we deal with collision is to probe the table until an empty position is found. So you can never tell that an entry is whether the target you are supposed to find or it is just an entry that has a diferent key. Consequently, I’d like to introduce another way to solve this problem–Chaining.
We can take the hash table itself as an array of linked lists, so that we can randomly access the keys in the table and insert or delete for any entry that has the same key.
Here I’ll just give the definition of the HashTable. I believe it’s easy for you to implement it by yourself.
1 | class HashTable { |
You can insert an entry by locating its position with its key and push_back() to the list.
You can delete an entry by locating its position with its key and remove() from the list.
Analysis of Retrieval Methods
In terns of the retrieval methods, I’d like to give a general comparison among three methods–Sequential Search, Binary Search, Hash-table Retrieval.
For a contiguous storage that contains n entries, there observations for retrieval is listed as the table below.
Methods | Observations |
---|---|
Sequential Search | O(n) |
Binary Search | O(log n) |
HashTable Retrieval | O(1) |
As we can see, Hash-table is significantly efficient in information retrieval. So make good use of it! Thank you!