C++Now 2018: You Can Do Better than std::unordered_map: New Improvements to Hash Table Performance

  Рет қаралды 57,482

CppNow

CppNow

Күн бұрын

Пікірлер: 50
@willw2596
@willw2596 8 ай бұрын
I'm 15 minutes in and find this to be one of the best technical talks I've ever seen.
@BoostCon
@BoostCon 8 ай бұрын
Thank you so much for your appreciative comment!
@OmarChida
@OmarChida 3 жыл бұрын
One of the best talks I have ever saw. I quite like his scientific approach.
@conradomartinezp
@conradomartinezp 2 жыл бұрын
As long as you have a way to solve ties in Robin Hood which only depends on the identity of the keys, e.g., the smaller key stays at its slot, the larger continues probing, the layout of the hash table after N inserts will be exactly the same, independent of the order of insertion. If ties are solved by the time of insertion then a cluster of synonims (equal hash value) will be ordered by time of insertion. Robin Hood does not reduce the average time for successful search but it makes a bit reduction of the variance (even in almost full tables) and it also reduces the expected longest probe sequence from O(log n) to O(log log n). It's nice to see that it also does well in practice 😉
@dennisrkb
@dennisrkb 5 жыл бұрын
you explained google's map much better than they did themselves
@skepticmoderate5790
@skepticmoderate5790 5 жыл бұрын
They explained it pretty well...
@TheOnlyFeanor
@TheOnlyFeanor 6 жыл бұрын
This was an excellent talk, thank you.
@lenkite
@lenkite 4 жыл бұрын
Wonderful and enlightening talk especially the zooms at various N levels. So std::map is great at < 400 elements!. I only wish the size of the element was also shown.
@leleo53000
@leleo53000 4 жыл бұрын
it's using `std::map`, the size of the element is probably 4
@excitedbox5705
@excitedbox5705 4 жыл бұрын
@@leleo53000 I think he said 8 later on but that might not apply to that.
@amitm1157
@amitm1157 5 жыл бұрын
Nice explanation about cache hits and misses while using hash tables.
@KilgoreTroutAsf
@KilgoreTroutAsf 5 жыл бұрын
300 nanoseconds amortized sounds atrocious for a cache miss on modern hardware. could it actually be 300 cpu cycles instead? This would fit the typical 50-80 ns latency for ddr4 and 3~4 Ghz cpu frequency either that or each miss results in O*(4) accesses to ram, with the cost of cpu instructions being too small ( ~0.25 ns per cycle) to matter
@Voy2378
@Voy2378 6 жыл бұрын
Benchmarking only on int keys only is deceptive. If you have std::string or any other type with expensive operator == ridiculously high load factor will kill you.
@KilgoreTroutAsf
@KilgoreTroutAsf 5 жыл бұрын
A good implementation for large-sized or variable-length keys should never store keys and hashes together, because it completely negates cache locality and memory alignment trickery. The better solution is to use two-level indirection and use the hash table to store (hash,pointer) or (hash,offset) pairs, where the offsets point to a flat array or other memory pool for keys, and are just linearly incremented after each addition. Also, the whole point of hash tables is to use a good universal hash function which digests the object into a fixed-length string of enough bits to guarantee very low collision probability. The final test for true equality should only happen more than once very rarely.
@excitedbox5705
@excitedbox5705 4 жыл бұрын
insert your strings into an array and use the keys as your INT. Then when you retrieve the key you can use stringArray[key] which only adds minor overhead.
@andrewgrant788
@andrewgrant788 2 жыл бұрын
Agreed. The comparison between std::map() and std::unordered_map() performance for small values of N are very misleading when benchmarked with int keys. Because a std::map is a tree, std::map::find() does a lot of < comparisons of keys which is very cheap for ints and much more expensive for strings or more complex keys. unorded_map first hashes the key to determine the bucket and then compares the keys in the same hash bucket. Keys can be pre-hashed to further optimized performance.
@ДаниилМатвеев-ю5т
@ДаниилМатвеев-ю5т 7 ай бұрын
Thank you! Best talk I have seen on this topic😊
@BoostCon
@BoostCon 7 ай бұрын
Thank you! So pleased to hear that you found the presentation informative.
@origamibulldoser1618
@origamibulldoser1618 11 ай бұрын
This is mad fascinating.
@77oxf
@77oxf 4 жыл бұрын
Why is std::map faster for fewer elements?
@TheJGAdams
@TheJGAdams 5 ай бұрын
They're O(log(n))
@ratchet1freak
@ratchet1freak 6 жыл бұрын
I'm pretty sure that you can significantly improve the robinhood adaptation of the google's map implementation is different, If you treat the 16 element bucket as a bucket that each are targets of the hash, instead of the individual elements of the buckets. Then the robin hood variant can use just 2 bits for the distance meta data (0-2 bucket distance + 3 for empty) and keep 6 bits for the hash pre-filter.
@TheLavaBlock
@TheLavaBlock 6 жыл бұрын
Great talk! Thanks
@wiadroman
@wiadroman 5 жыл бұрын
Great material, thanks a lot.
@Sychonut
@Sychonut 2 жыл бұрын
The content was great and much appreciated, but damn it felt at several points throughout that it's a nerds fest with people trying to show off to one another how smart they are. It's just not a good look.
@roman9509
@roman9509 6 жыл бұрын
cool
@ABaumstumpf
@ABaumstumpf 5 жыл бұрын
If only the real world was that simple - a hasmap for int-int, gosh how it would like that. instead we have things like.... normally we have maps of maps of vectors and such things.........
@blarghblargh
@blarghblargh 4 жыл бұрын
his benchmarks are open source. go prove how his map works worse for those, and show us you actually know what you're talking about
@muonline2067
@muonline2067 2 жыл бұрын
@@blarghblargh my question is: where can I download his unordered_map for testing man?
@joestevenson5568
@joestevenson5568 6 ай бұрын
Great talk, terrible audience
@zyxyuv1650
@zyxyuv1650 6 жыл бұрын
Obvious German
@seditt5146
@seditt5146 5 жыл бұрын
Really? Because I thought he sounded as though he had a slight Irish accent not German.
@pruibiebehastoet1914
@pruibiebehastoet1914 3 жыл бұрын
Because of the gründlichkeit ?
@mrmaniac9905
@mrmaniac9905 5 ай бұрын
Or you just use C.
Миллионер | 3 - серия
36:09
Million Show
Рет қаралды 2,2 МЛН
УДИВИЛ ВСЕХ СВОИМ УХОДОМ!😳 #shorts
00:49
Smart Sigma Kid #funny #sigma
00:33
CRAZY GREAPA
Рет қаралды 27 МЛН
Hash Table in C
2:11:31
Tsoding Daily
Рет қаралды 70 М.
Faster than Rust and C++: the PERFECT hash table
33:52
strager
Рет қаралды 608 М.
C++ cache locality and branch predictability
10:43
mCoding
Рет қаралды 85 М.
CppCon 2016: Jason Turner “Practical Performance Practices"
1:00:29