Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Last time I needed really fast hashing I used FNV. How does it compare to xxhash?


Fnv is fast because the code is tiny and hashing a few bytes is fast. xxhash uses multiple parallel streams and has very high throughput. They are fast hashes, each in their end of the spectrum, Fnv for 1-16 byte keys and xxhash for long data. It's better to call xxhash a checksum, since that's its purpose.


There are hash functions that are as fast or faster than FNV and stastically stronger (and faster) than xxhash. There are multiple hash function families that should be used before either of the above in modern applications unless you need backward compatibility (like the CRC case in the article). This is an active research area and both of the above, while adequate for many casual use cases, should not be used for checksums for the same reason CRC32 should not be used for checksums in large systems.

As an example from one of my hash function research projects, MetroHash64 will outperform xxhash both for speed and stastical robustness (which is quasi-cryptographic). If you only need 32 bits, truncate larger hashes; if they have very strong stastical properties, that works well.

Lots of people working on this stuff.


Comparing xxHash with FNV is a strawman, it's apples and oranges in terms of quality.

You are discounting xxHash considerably. xxHash is one of the best, if not the best, all factors considered.

SMHasher:

  xxHash, speed=5.4 GB/s, quality=10
  MurmurHash 3a, speed=2.7 GB/s, quality=10
  SBox, speed=1.4 GB/s, quality=9
  Lookup3, speed=1.2 GB/s, quality=9
  CityHash64, speed=1.05 GB/s, quality=10
  FNV, speed=0.55 GB/s, quality=5
  CRC32, speed=0.43 GB/s, quality=9	
  MD5-32, speed=0.33 GB/s, quality=10
  SHA1-32, speed=0.28 GB/s, quality=10


With all due respect, xxHash is a decent hash function but you are comparing it against some relatively weak or slow hash functions. As was pointed out last time this came up, the Metro64 hash example I offered is simultaneously faster and higher quality than xxHash. There are hash functions that are faster than xxHash and stronger than some of the cryptographic hashes, and xxHash definitely has much more bias than even MD5.

Getting a perfect score with SMHasher is table stakes. The default settings on SMHasher are much too loose for current research; I know it well, I have hacked it extensively. A state-of-the-art non-cryptographic hash function today generally has the following properties:

- statistical quality greater than or equal to MD5, a cryptographic hash. xxHash is not close to MD5 in quality.

- faster than xxHash on large keys, and for really good hashes, memory bandwidth bound. (Metro64 is, again, faster than xxHash, though not memory bandwidth bound.)

- faster than xxHash on small keys (Metro64, to use that example, is almost 2x faster)

There are a lot of good hash functions being developed by researchers. But empirically, xxHash is nowhere near the state-of-the-art any way you slice it. It is a decent hash function but there are many functions produced by many researchers that are both faster and higher quality. It isn't personal, people can measure it for themselves.


Yes, I remember reading your post on Metro64 at the time:

http://www.jandrewrogers.com/2015/05/27/metrohash/

Interesting that you based your results there on SMHasher. Are there better benchmarks available?


Never mind - it's covered in the other reply link here. Looks like xxhash is the goto hash algorithm now.


That's a misunderstanding. If you need a hash or checksum for very long data (megabytes, gigabytes), use xxhash. It's not the best for hash tables.


Good to know. I was using FNV to hash spam signatures which are typically small. Glad I chose well.


If your spam signatures are small (around 32 bytes) and fixed size, you might also want to look into a pure XOR tabulation hash. It's excellent for generating fixed-size keys for hash tables:

http://www2.imm.dtu.dk/projects/thrash-workshop/slides/thoru...

https://en.wikipedia.org/wiki/Tabulation_hashing

I use it all the time when I need a hash table index. It's simple to understand, safe and fast. Even though it can generate more assembly than other hashes, it's actually often faster because it has no multiply operation, only XOR. If the tables are small enough to fit into cache (i.e. small keys), it's brilliant. It has properties suitable for triangular probing, where other hashes sometimes don't have enough independence. And if you need a rolling hash, i.e. for Rabin Karp, it's also perfect.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: