The cost of Hash Tables | The Backend Engineering Show

  Рет қаралды 34,315

Hussein Nasser

Hussein Nasser

Күн бұрын

Hash tables are effective for caching, database joins, sets to check if something is in a list and even load balancing, partitioning and sharding and many other applications. Most programming languages support hash tables. However they don’t come without their cost and limitations lets discuss this.
0:00 Intro
1:50 Arrays
3:50 CPU Cost (NUMA/M1 Ultra)
6:50 Hash Tables
10:00 Hash Join
16:00 Cost of Hash Tables
20:00 Remapping Cost Hash Tables
22:30 Consistent hashing
Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon)
database.husseinnasser.com
Introduction to NGINX (link redirects to udemy with coupon)
nginx.husseinnasser.com
Python on the Backend (link redirects to udemy with coupon)
nginx.husseinnasser.com
Become a Member on KZbin
/ @hnasr
🔥 Members Only Content
• Members-only videos
🏭 Backend Engineering Videos in Order
backend.husseinnasser.com
💾 Database Engineering Videos
• Database Engineering
🎙️Listen to the Backend Engineering Podcast
husseinnasser.com/podcast
Gears and tools used on the Channel (affiliates)
🖼️ Slides and Thumbnail Design
Canva
partner.canva.com/c/2766475/6...
Stay Awesome,
Hussein

Пікірлер: 58
@hnasr
@hnasr 2 жыл бұрын
Head to database.husseinnasser.com for a discount coupon to my Introduction to Database Engineering course . Link redirects to udemy with coupon applied.
@andydataguy
@andydataguy 2 жыл бұрын
I love these long-form audio breakdowns. Very natural. Please keep it up. Inch wide mile deep conversations! Just got a new job and going to subscribe to your channel as soon as I get my first check 🙏🏾🙌🏾
@sfsf285
@sfsf285 2 жыл бұрын
Your videos are always helpful, refreshing and brain opening, thanks a lot man i swear whenever im financially able ill buy all of your courses and memberships
@EngSamieh
@EngSamieh 2 жыл бұрын
Thanks for talking about hash tables. I think the answer to your question about extending hash table beyond system memory is memory paging (virtual memory), the OS does that on behalf of you (handle page fault exception), and swap memory pages between ram and disk. One another thing, DBMS uses hash tables in desk (indexing), and it is very primitive syscall to random access a position in file on desk using fseek syscall.
@arjundureja
@arjundureja 2 жыл бұрын
Very interesting. I'd love to hear more about limitations associated with specific data structures in the real-world
@johnrush607
@johnrush607 2 жыл бұрын
Awesome talk!! Would love to hear you talk more about consistent hashing and the origins of it in various databases :)
@botenjohn1752
@botenjohn1752 2 жыл бұрын
Yes!! this is really a topic I need refresh on . Thank you!
@techienomadiso8970
@techienomadiso8970 2 жыл бұрын
Great content. Would be so awesome to see some visuals...as you explain this interesting stuff.
@SohomBhattacharjee
@SohomBhattacharjee 2 жыл бұрын
this is my fav youtube channel !! Thank you
@CjqNslXUcM
@CjqNslXUcM 2 жыл бұрын
I love your mannerisms and presentation.
@gyroninjamodder
@gyroninjamodder 2 жыл бұрын
18:02 isn't that just the idea of swap. You advertise that a bunch of memory is available but if you access something not actually in memory it will have to load the page into memory.
@lucamantova3070
@lucamantova3070 2 жыл бұрын
Good point!
@longtranbao4177
@longtranbao4177 2 жыл бұрын
Loved it! Waiting for consistent hashing
@teunohooijer6788
@teunohooijer6788 2 жыл бұрын
there are now array databases which put thete data on disk or S3 (cheaper storage). there useful for (sparse) multidimensional data. An example of the database is TileDB.
@susmitvengurlekar
@susmitvengurlekar 2 жыл бұрын
Thought about disk based direct access which avoids index scan by getting / constructing the file path. But file locks when writing when it is being read by many is a tricky part.
@thesyd1982
@thesyd1982 2 жыл бұрын
Mr you surprise me every episiode :D, thanks fo all , allah ya7afedk
@CraneArmy
@CraneArmy 2 жыл бұрын
I remember reading something a while back, about one of intel's attempts at non-volitile RAM (I think this was optane, before they dumped the tech and used the branding for something else). The write up was about the thermodynamic limitations of non-volitile storage at ram speeds, probably given some voltage requirement (i dont remember, but it seems right). Basically stated that non-volitile memory at ram speeds (probably given the other specs of the product announcement) would need to be like an order of magnitude better than the theoretical performance limit required to store data. I didnt do the work to verify it, but it seemed convincing enough for someone who doesnt eat and breathe in electrical engineering. Since then, ive been pretty cynical of anyone claiming they are going to do it. Everyone seems to agree its the holy grail among a certain subset of smart people, they all want to do it, and the announcements seem to come and go never delivering a product. Might be an interesting topic for a video too, I'd watch to get an update.
@mfbalgo
@mfbalgo 2 жыл бұрын
We love you Hussein
@hjups
@hjups 2 жыл бұрын
When you said that disks are not byte addressable, were you referring to the restriction of the API / file system? Internally, flash is byte addressable (and technically so is a spinning disk). The caveat is that the access time is substantially higher for NVM verses volatile memory. Also, memory indirection via pointers can still be very expensive (I wouldn't put so much faith in "optimizations" by the hardware vendors). For example, if the pointer exists on a different CPU socket, or if you are skipping between L3 and DRAM, or even a swapped page (to disk) and DRAM. There's some tradeoff in performance between embedded fields (within the table) and table size, which also comes into play with some of these specialized NVM key-value pair accelerators (because of the access latency).
@anon_y_mousse
@anon_y_mousse 2 жыл бұрын
This is why the lessons on hash tables need to be seriously rethought. The only part of a hash table that actually needs to be an array is the index table itself, but each entry in the table can be a bucket, or as needed point to a new server if you're using it for that purpose. Most databases, at least as far as I am aware actually use trees and things are already sorted in by each relational piece of data. This type of tree is generally called a graph. The implementation of hash tables is actually pretty open, and I've implemented several versions where the data is stored in a tree and the index table just helps to jump to a subtree faster. For trees I'll generally use a deque to allocate decent sized blocks of nodes and memory management is pretty easy that way. I arrange the individual deques in two different trees themselves. On the subject of implementing them as arrays, since in that form they don't need to be sorted, if you store the unmasked hash key, as I do, you can swap the element to delete with the last element and patch one index then decrement the count. TLDR: hash tables don't have to be simple arrays anymore with merely one set of characteristics.
@timtreichel3161
@timtreichel3161 2 жыл бұрын
Okay, this was kinda interesting. I don't really know anything about data base programming so the insights there were cool. Personally the point of adding/deleting entries was explained pretty poorly. I don't know what strategies are normally used for handling collisions in data base programming, but when you have a hash table you probably have a strategy to handle collisions. So let's say you create a hash table of size 100 and insert 100 elements, there probably will already be collisions. At some point the collisions get so many, that the access time is becoming slow and you want to increase the size of the hash table. And when you do this, you don't want to increase the size from 100 to 101 but probably double it to 200 if not more. At least this it the common approach for dynamic arrays and you already said, that hash tables are array's. Of course it is true that you have to rehash everything with the new size of the hash table, which sadly should be slower as a normal dynamic array resize and copy.
@sohansingh2022
@sohansingh2022 8 ай бұрын
Thanks 😮
@TheHighclick
@TheHighclick 2 жыл бұрын
Hi Hussein. I started studying Front-End Development, after 5 months of school i managed to land a Jr job. I am now in a situation where I work fulltime, study frontend full time, but I would like to learn backend as I find it a lot more interesting. What backend project would you recommend if I want to get into backend engineering? Thanks for your content
@varunvishwakarma2901
@varunvishwakarma2901 2 жыл бұрын
Today's best half an hour spent.
@rsroyalrahul5
@rsroyalrahul5 2 жыл бұрын
Video on consistent hashing please
@prakanshuray7595
@prakanshuray7595 Жыл бұрын
If it's a talk , it could go as a podcast as well.
@night-bird1788
@night-bird1788 Жыл бұрын
For collision Point, there are many usefal approcahes one of these is chain each node or address with a linked list. appearntly the more collision the more time it takes to track the value, but if the entired data is large enough then the best way is to the store using the pair concept it consums more memory but O(1) to get or insert. so why are there collisions to avoid large arrays as the addition process takes time so reduce the primary table size and then add sublists to the main table then the addition process consumes less EX:) 100+100=200 takes 2 seconds but (50+50)+(50+50)=200 takes less time.
@nikilragav
@nikilragav 2 жыл бұрын
Wow that array is just a form of a hash table is super interesting
@nilanjanmukhopadhyay8369
@nilanjanmukhopadhyay8369 2 жыл бұрын
No array isn't a form of Hash Table. To build hash table you need an array and a hash function at least.
@realabodyify
@realabodyify 2 жыл бұрын
Sometime when my boss asks me to generate some type of report, I use hashtable for more efficiency. Without hashtable I need to loop 1 million time trying to filter the array to get the result I want, but with hashtable it is going to faster and way way less looping.
@SSJ1Igor
@SSJ1Igor 2 жыл бұрын
Where can I learn about all the different backend related concepts that you mention in the begining like caching, partitioning, database joins, sharding and so forth? is there somewhere a list of all the different concepts that a backend engineer should be familiar with? Sorry for the offtopic question but i need to know and thank u for ur videos i love your channel !
@SSJ1Igor
@SSJ1Igor 2 жыл бұрын
i just need like a list of these so i can start researching them one by one and trying out
@RyanKOnk
@RyanKOnk 2 жыл бұрын
Checkout this channel's playlists. Hussein has a lot of content categorised,so, you may find what you are looking for.
@SSJ1Igor
@SSJ1Igor 2 жыл бұрын
@@RyanKOnk thank you boo boo
@kaushikkumarbora
@kaushikkumarbora Жыл бұрын
@hnasr Arrays are Contiguous and not continuous.
@InsightByte
@InsightByte 2 жыл бұрын
would be gr8 if you would introduce some visuals in your "lectures" - you cover a lot and is hard to keep up
@tridao1273
@tridao1273 2 жыл бұрын
This is a podcast, not a "video"
@pemessh
@pemessh 2 жыл бұрын
i second this !
@dabbopabblo
@dabbopabblo 2 жыл бұрын
Why, you should be programming while listening, not wasting time watching
@arod3295
@arod3295 2 жыл бұрын
Visuals please!!!
@amans6504
@amans6504 Жыл бұрын
I had this doubt from long and now i got clear. So basically computers literally doesnt know anything, while using arrays or hash tables, we are just doing the operational cost to retrive the first element of the array and it just adds the index so it exactly knows the address of the element we are looking for, this is literally same as just accessing a primitive value.
@hnasr
@hnasr Жыл бұрын
Exactly right
@mohitrathore8808
@mohitrathore8808 Жыл бұрын
It would be more easier to understand if you include more animations visuals. Rest is awesome
@naveedhassan7399
@naveedhassan7399 2 жыл бұрын
HashTables are glorified arrays. Amazing insight.
@arod3295
@arod3295 2 жыл бұрын
Visuals please!!!
@arod3295
@arod3295 2 жыл бұрын
So many gems
@nikilragav
@nikilragav 2 жыл бұрын
Does anyone implement arraylists as a hash table wherein the address is dependent on the value so it doesn't need to be a consecutive block anymore?
@snake1625b
@snake1625b 2 жыл бұрын
I thought databases use trees For the index, not hash tables
@kwakubiney5175
@kwakubiney5175 2 жыл бұрын
Trees solved certain problems the hash tables were facing, for example, the hash tables had to be stored in main memory so that’s a constraint.
@nikilragav
@nikilragav 2 жыл бұрын
@@kwakubiney5175 doesn't the tree also need to be? It has n nodes for n items in db plus it has a bunch of extra pointers
@kwakubiney5175
@kwakubiney5175 2 жыл бұрын
@@nikilragav Writes and reads are done on pages and these pages don’t exist in main memory but on disk.
@nikilragav
@nikilragav 2 жыл бұрын
@@kwakubiney5175 ok yes, so what does that have to do with the tree? Are you saying that each node of the tree represents a page? And that you load one page at a time in memory?
@nikilragav
@nikilragav 2 жыл бұрын
I don't think that directly addresses what Hussain was saying. He was saying it is possible that the hashtable itself is too big to fit in memory. If that is the case, then a tree is not going to be any smaller...
@TehPwnerer
@TehPwnerer 2 жыл бұрын
Disk = slow memory
@richard-social8125
@richard-social8125 2 жыл бұрын
Long way of saying you know too much: "We can probably make a whole other video about this topic I don't wanna get into right now."
Caching is hard
20:31
Hussein Nasser
Рет қаралды 24 М.
Redis In-Memory Database Crash Course
50:01
Hussein Nasser
Рет қаралды 53 М.
How To Choose Ramen Date Night 🍜
00:58
Jojo Sim
Рет қаралды 60 МЛН
YouTube's Biggest Mistake..
00:34
Stokes Twins
Рет қаралды 76 МЛН
Inside a Google data center
5:28
Google Workspace
Рет қаралды 21 МЛН
Hash Table in C
2:11:31
Tsoding Daily
Рет қаралды 55 М.
Threads and Connections | The Backend Engineering Show
49:30
Hussein Nasser
Рет қаралды 62 М.
CS50 2019 - Lecture 5 - Hash Table
5:15
CS50
Рет қаралды 8 М.
How Discord Stores Trillions of Messages | Deep Dive
1:08:33
Hussein Nasser
Рет қаралды 172 М.
GraphQL Crash Course
57:24
Hussein Nasser
Рет қаралды 29 М.
Why EVERYONE Is Struggling to Become a Data Analyst
8:03
Learn with Lukas
Рет қаралды 244 М.
Hash Tables and Hash Functions
13:56
Computer Science
Рет қаралды 1,5 МЛН
How Neuralink Works 🧠
0:28
Zack D. Films
Рет қаралды 30 МЛН
How much charging is in your phone right now? 📱➡️ 🔋VS 🪫
0:11
Эффект Карбонаро и бумажный телефон
1:01
История одного вокалиста
Рет қаралды 2,4 МЛН
Готовый миниПК от Intel (но от китайцев)
36:25
Ремонтяш
Рет қаралды 438 М.
Which Phone Unlock Code Will You Choose? 🤔️
0:14
Game9bit
Рет қаралды 10 МЛН