Hashing

Keywords
            Collision, Bucket, hash function, key

Hash tables can be used  to implement  the insert  and find  operations in constant average time
Its an array with searching of time o(1)
Collision – two keys mapped to same slot
Complexity :
insertions, deletions, and  finds in constant average time

Applications :
      Implement set in constant time
implement dictionaries, online spelling checkers.  If  misspelling detection  (as opposed to correction) is important, an entire dictionary can be prehashed and words can be checked in constant  time. Hash  tables are well suited  for  this purpose because the words do not have  to be alphabetized.
Compilers use hash  tables  to keep track  of declared variables  in source code. The data structure is called a symbol  table.



Application
Action
        Key        
Value
phone book
look up phone number
person's name
phone number
dictionary
look up word
word
Definition
Internet DNS
Look up website by IP address
website
IP address
Reverse DNS
Look up IP address by web site
IP address
Website
genomics
amino acid dictionary
codon
amino acid
Java compiler
Find properties of variable
Variable name
Value and type
stock quote
Look up price of stock
stock symbol
Price
file share
find song to download
song name
Machine
file system
find file on hard drive
file name
location on hard drive











































Hash functions –  to map keys into index of buckets in table
Good hash function satisfies uniform hashing – each key equally hash(distributes) to slots
For example, a key that is a character string can be interpreted as an integer expressed in suitable radix notation. Thus, the identifier pt might be interpreted as the pair of decimal integers (112,116), since p = 112 and t = 116 in the ASCII character set; then, expressed as a radix-128 integer, pt becomes (112  128) + 116 = 14452. It is usually straightforward in any given application to devise some such simple method for interpreting each key as a (possibly large) natural number.
(1)    Division method :
      h(k) = k mod m .
Good values for m are primes not too close to exact powers of 2. For example, suppose we wish to allocate a hash table, with collisions resolved by chaining, to hold roughly n = 2000 character strings, where a character has 8 bits. We don't mind examining an average of 3 elements in an unsuccessful search, so we allocate a hash table of size m = 701. The number 701 is chosen because it is a prime near  = 2000/3 but not near any power of 2. Treating each key k as an integer, our hash function would be h(k) = k mod 701
(2)    Multiplication method
(3)    Universal hashing

Collision by chaining –
Separate chaining hashing is a space efficient alternative  to quadratic probing in which an array of linked lists is maintained




•    In chaining, we put all the elements that hash to the same slot in a linked list,
•    The worst-case behavior of hashing with chaining is terrible: all n keys hash to the same slot, creating a list of length n(like linked list). The worst-case time for searching is thus (n) plus the time to compute the hash function--no better  than if we used one linked list for all the elements.

Collisions by open addressing - 
•    In open addressing, all elements are stored in the hash table itself. That is, each table entry contains either an element of the dynamic set or NIL. When searching for an element, we systematically examine table slots until the desired element is found or it is clear that the element is not in the table. There are no lists and no elements stored outside the table, as there are in chaining
•    To perform insertion using open addressing, we successively examine, or probe, the hash table until we find an empty slot in which to put the key.
•    Three techniques are commonly used to compute the probe sequences required for open addressing: linear probing, quadratic probing, and double hashing.


Linear Probing

In linearprobing, collisions are resolved by sequentially  scanning table (with wraparound) until an empty cell is  found. For example, if  there  is only one free cell left in  the table, we may have to search the entire table to  find it

The  find  algorithm merely  follows  the same path  as the  insert algorithm. If  it reaches an empty slot, the item we are searching for is not  found; otherwise, it  finds the match eventually.

Standard deletion cannot be performed because, as with a binary  search tree,  an  item  in  the hash table not only represents  itself, but  it also connects other items by serving as a placeholder during collision resolution. Thus, if we removed  89 from the hash  table, virtually  all the remaining  find opera- tions would fail. Consequently, we implement  lazy deletion, or marking items as deleted rather  than physically  removing  them from the table. This information  is recorded  in an extra data member. Each  item  is either active or deleted.





Quadratic Probing :

Quadratic probing examines cells 1,4,9, and so on, away from the original probe point.
  if  the hash  function evaluates to H and a search in cell H is inconclusive, we try cells H + 1  *, H  + 22.  H  + 3*, .  . .  , H +  i2 (employing wraparound) in  sequence

Double hashing is one of the best methods available for open addressing
h(k, i) = (h1(k) + ih2(k)) mod m,

Perfect hashing –
•    Two level  hashing scheme with universal hashing at each level
•    First  level same as hashing with chaining , instead of making a list of keys hashing to slot J, use a secondary hash table with associated hash function.