注册 登录  
 加关注
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

和申的个人主页

专注于java开发,1985wanggang

 
 
 

日志

 
 

How Do I Choose Between a Hash Table and a Trie (Prefix Tree)?  

2014-05-15 11:35:22|  分类: 算法 |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |


So if I have to choose between a hash table or a prefix tree what are the discriminating factors that would lead me to choose one over the other. From my own naive point of view it seems as though using a trie has some extra overhead since it isn't stored as an array but that in terms of run time (assuming the longest key is the longest english word) it can be essentially O(1) (in relation to the upper bound). Maybe the longest english word is 50 characters?

Hash tables are instant look up once you get the index. Hashing the key to get the index however seems like it could easily take near 50 steps.

Can someone provide me a more experienced perspective on this? Thanks!

add comment

Advantages of tries:

The basics:

  • Predictable O(k) lookup time where k is the size of the key
  • Lookup can take less than k time if it's not there
  • Supports ordered traversal
  • No need for a hash function
  • Deletion is straightforward

New operations:

  • You can quickly look up prefixes of keys, enumerate all entries with a given prefix, etc.

Advantages of linked structure:

  • If there are many common prefixes, the space they require is shared.
  • Immutable tries can share structure. Instead of updating a trie in place, you can build a new one that's different only along one branch, elsewhere pointing into the old trie. This can be useful for concurrency, multiple simultaneous versions of a table, etc.
  • An immutable trie is compressible. That is, it can share structure on the suffixes as well, by hash-consing.

Advantages of hashtables:

  • Everyone knows hashtables, right? Your system will already have a nice well-optimized implementation, faster than tries for most purposes.
  • Your keys need not have any special structure.
  • More space-efficient than the obvious linked trie structure
community wiki

2 
can not quite agree with "More space-efficient than the obvious linked trie structure" -- in a general hash table implementation, it occupies a much larger space to contain keys, while in tries, each node represents a word. In this sense, tries are more space-efficient. –  galactica Aug 14 '13 at 18:12
   
how about accesing data from one structure vs the other? I'm thinking cache and location –  Richard LenoirApr 14 at 22:38
add comment
How Do I Choose Between a Hash Table and a Trie (Prefix Tree)? - 和申 - 和申的个人主页

It all depends on what problem you're trying to solve. If all you need to do is insertions and lookups, go with a hash table. If you need to solve more complex problems such as prefix-related queries, then a trie might be the better solution.

community wiki

add comment

Use a tree:

  1. If you need auto complete feature
  2. Find all words beginning with 'a' or 'axe' so on.
  3. A suffix tree is a special form of a tree. Suffix trees have a whole list of advantages that hash cannot cover.
add comment

Everyone knows hash table and its uses but it is not exactly constant look up time , it depends on how big the hash table is , the computational complexity of the hash function.

Creating huge hash tables for efficient lookup is not an elegant solution in most of the industrial scenarios where even small latency/scalability matters (e.g.: high frequency trading). You have to care about the data structures to be optimized for space it takes up in memory too to reduce cache miss.

A very good example where trie better suits the requirements is messaging middleware . You have a million subscribers and publishers of messages to various categories (in JMS terms - Topics or exchanges) , in such cases if you want to filter out messages based on topics (which are actually strings) , you definitely do not want create hash table for the million subscriptions with million topics . A better approach is store the topics in trie , so when filtering is done based on topic match , its complexity is independent of number of topics/subscriptions/publishers (only depends on the length of string). I like it because you can be creative with this data structure to optimize space requirements and hence have lower cache miss.

community wiki

   
nice example :) –  hqt May 7 '13 at 19:06
add comment

Some (usually embedded, real-time) applications require that the processing time be independent of the data. In that case, a hash table can guarantee a known execution time, while a trie varies based on the data.

community wiki

2 
Most hash tables don't guarantee a known execution time - the worst case is O(n), if every element collides and gets chained –  Adam Rosenfield Oct 29 '08 at 5:38
2 
For any data set, you can compute a perfect hash function that will guarantee O(1) lookups for that data. Of course, computing the perfect hash ain't free. –  George V. Reilly Oct 29 '08 at 6:21
3 
Also, chaining is not the only way to handle collisions; there are all sorts of interesting, clever ways to handle this—cuckoo hashing (en.wikipedia.org/wiki/Cuckoo_hashing) for one—and the best choice depends on the needs of the client code. –  Hank Gay Oct 29 '08 at 12:11
   
didn't know about cuckoo hashing and its relation to the bloom filter, will make for an interesting read, thanks!–  Richard Lenoir Apr 14 at 22:42

http://stackoverflow.com/questions/245878/how-do-i-choose-between-a-hash-table-and-a-trie-prefix-tree
  评论这张
 
阅读(868)| 评论(0)
推荐 转载

历史上的今天

在LOFTER的更多文章

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2016