So if I have to choose between a hash table or a prefix tree what are the discriminating factors that would lead me to choose one over the other. From my own naive point of view it seems as though using a trie has some extra overhead since it isn't stored as an array but that in terms of run time (assuming the longest key is the longest english word) it can be essentially O(1) (in relation to the upper bound). Maybe the longest english word is 50 characters? Hash tables are instant look up once you get the index. Hashing the key to get the index however seems like it could easily take near 50 steps. Can someone provide me a more experienced perspective on this? Thanks!  
add comment 
Advantages of tries: The basics:
New operations:
Advantages of linked structure:
Advantages of hashtables:
 

It all depends on what problem you're trying to solve. If all you need to do is insertions and lookups, go with a hash table. If you need to solve more complex problems such as prefixrelated queries, then a trie might be the better solution.
 
add comment 
Use a tree:
 
add comment 
Everyone knows hash table and its uses but it is not exactly constant look up time , it depends on how big the hash table is , the computational complexity of the hash function. Creating huge hash tables for efficient lookup is not an elegant solution in most of the industrial scenarios where even small latency/scalability matters (e.g.: high frequency trading). You have to care about the data structures to be optimized for space it takes up in memory too to reduce cache miss. A very good example where trie better suits the requirements is messaging middleware . You have a million subscribers and publishers of messages to various categories (in JMS terms  Topics or exchanges) , in such cases if you want to filter out messages based on topics (which are actually strings) , you definitely do not want create hash table for the million subscriptions with million topics . A better approach is store the topics in trie , so when filtering is done based on topic match , its complexity is independent of number of topics/subscriptions/publishers (only depends on the length of string). I like it because you can be creative with this data structure to optimize space requirements and hence have lower cache miss.
 
Some (usually embedded, realtime) applications require that the processing time be independent of the data. In that case, a hash table can guarantee a known execution time, while a trie varies based on the data.
 

评论