A better hash table (in C)

  Рет қаралды 29,601

Jacob Sorber

Jacob Sorber

Күн бұрын

Patreon ➤ / jacobsorber
Courses ➤ jacobsorber.th...
Website ➤ www.jacobsorbe...
---
A better hash table (in C) // I've been meaning to revisit hash tables for a few years. Well, today's the day, we look at making our original hash table (circa 2020) into a general one that we can store just about anything in.
Related Videos:
Hash Table: • Understanding and impl...
Libraries: • How to Load Libraries ...
Makefiles: • Learn make in 60 seconds.
***
Welcome! I post videos that help you learn to program and become a more confident software developer. I cover beginner-to-advanced systems topics ranging from network programming, threads, processes, operating systems, embedded systems and others. My goal is to help you get under-the-hood and better understand how computers work and how you can use them to become stronger students and more capable professional developers.
About me: I'm a computer scientist, electrical engineer, researcher, and teacher. I specialize in embedded systems, mobile computing, sensor networks, and the Internet of Things. I teach systems and networking courses at Clemson University, where I also lead the PERSIST research lab.
More about me and what I do:
www.jacobsorbe...
people.cs.clem...
persist.cs.clem...
To Support the Channel:
+ like, subscribe, spread the word
+ contribute via Patreon --- [ / jacobsorber ]
Source code is also available to Patreon supporters. --- [jsorber-youtub...]

Пікірлер: 63
@SimGunther
@SimGunther Жыл бұрын
strager had a whole video on hash tables and it turns out that a better hash function based on the understanding of the keys going into the table equals a MUCH faster hash table! 🎉
@marcossidoruk8033
@marcossidoruk8033 Жыл бұрын
That video is completely miselading or at the very least it seems it has mislead you. Whats being implemented in this video is a general purpose hash table, what the video you mentioned shows is a perfect hash table that only works with a predefined, hardcoded set of words because he needed that for a JavaScript compiler. Those are two completely different problems and tbh his solution is quite dumb because for such a specific problem and such a limited set of keywords if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table.
@tommasobonvicini7114
@tommasobonvicini7114 Жыл бұрын
Folks, look at the number of thumbs up SimGunther received, then look at marcossidoruk ones: welcome to the software industry.
@godnyx117
@godnyx117 Жыл бұрын
@@marcossidoruk8033 Is it really faster tho? If that is the case, then a genera purpose language with good mata-programming features (like D) can easily create a library that does that!
@MaxCoplan
@MaxCoplan Жыл бұрын
it’s pretty obvious strager didn’t really know what he was talking about and just made the video for the clickbait. Did you see the thumbnail? On his stream today he said he didn’t even go to college! How anybody can take software engineering advice from him is beyond me.
@strager_
@strager_ Жыл бұрын
> if you really want the highest performance the best option is to do a giant switch statement over the first letter of each word and inside that more switch statements over the second letter and so on, wich is ugly as heck but much faster than a hash table. You should leave a comment on my video with your suggestion.
@randomscribblings
@randomscribblings Жыл бұрын
strdup() == malloc() + strcpy()
@zxuiji
@zxuiji Жыл бұрын
10:25, you already typedef'd it, you don't need another typedef, rather that's just asking for compile time errors
@aniritri8635
@aniritri8635 Жыл бұрын
Have you checked Zig yet ? Seems like a nice language to overview and compare to C.
@djazz0
@djazz0 Жыл бұрын
And Nim! :)
@sanderbos4243
@sanderbos4243 Жыл бұрын
What I really enjoyed programming and found incredibly useful during my 1.5 years of C assignments was to write my own vector implementation. A basic one is only about 50 lines of code. Because my uni also requires us to free() *all* allocated memory manually, I was then able to write void *my_malloc(size_t count, size_t size, char *description): a malloc() wrapper that stores the new address in one of those vectors. I could then call print_allocations() and free_allocations() at the end of my main()! Very nice during debugging.
@Urre5
@Urre5 Жыл бұрын
Did they say why you had to free stuff at the end of main
@sanderbos4243
@sanderbos4243 Жыл бұрын
@@Urre5 I presume it is because in most of our projects we don't have any loops that would force us to free memory. So they just want to make sure we are aware of how to use free() properly. On some systems the OS might not do it for us at the end of the program too.
@Urre5
@Urre5 Жыл бұрын
@@sanderbos4243 yeah I was hoping it's because of the latter part, but they should be explicit, and even in particular teach you not to free on systems which will clean things up, because otherwise you'll waste the users time when exiting the program. Then again if it's a nice arena or something where you have all your allocations it shouldn't take too long anyway
@sanderbos4243
@sanderbos4243 Жыл бұрын
@@Urre5 Totally agree, we pretty much learn to use malloc() and free() however we like, as long as we don't have leaks. We're not told basic performance stuff like it maybe being braindead to use malloc() and free() unnecessarily all over the place, and without telling us about stuff like big O. Every exercise is a PDF, and our school (look up Codam or 42 school) deliberately doesn't have any teachers nor books we have to read, so everyone helps each other, and we spend a ton of time reading up online. It's incredibly freeing since the school is open 24/7 and you aren't required to be there for that many hours per week, but it isn't for everyone, since it's your own responsibility to become an awesome programmer. Oh, and it's completely free. :)
@johanngambolputty5351
@johanngambolputty5351 Жыл бұрын
Just to be cheeky, the thing is, you don't have to type text from video anyway, you can use optical character recognition, I like to do `spectacle -r -b -o /tmp/screenshot.png && tesseract /tmp/screenshot.png stdout --psm 6 | xclip -sel clip` set to a keybind, then you can just paste into your favourite editor ;)
@RobertaPROTO
@RobertaPROTO 3 күн бұрын
Hi there, for a School project i do have to use a double linked list to record frequent items ( for each node i ll put the density and the time it was updated) however the problem is that i have to "organize my double linked list as a hash table using a function H" How is it possible? It also said that i do have to make the pointer to points to the next in term of time of updating
@svenvandevelde1
@svenvandevelde1 13 күн бұрын
Just know that malloc and calloc are implemented using a heap structure, which is much more complex than a hash table. Why not creating this case study without malloc and calloc. Through static memory allocation. The usage of malloc and calloc slow down the logic dramatically. Also, if your hash table size and the structure size can be binary calculated through bit shifts, the key calculation can be made using a rotating random binary calculation. Which will result in blazingly fast key calculation.
@JaccovanSchaik
@JaccovanSchaik Жыл бұрын
33:12 strdup()!
@JacobSorber
@JacobSorber Жыл бұрын
Very true. Thanks
@69k_gold
@69k_gold Жыл бұрын
Please make a video about terminal (termios.h) and different terminal modes etc
@sortof3337
@sortof3337 Жыл бұрын
I thought you stopped syaing without further ado. hahah. very good video. The reason I am good at c is because I have all your videos I can reference. :D
@austinraney
@austinraney Жыл бұрын
Is the calloc call at 13:16 not incorrect? I thought it was number of elements then size of t. Right, it still will allocate the same amount of memory, I just would have expected that you would need to cast to make the compiler happy. What am I missing?
@austinraney
@austinraney Жыл бұрын
Having thought about it for a second, I guess the calls would be functionally equivalent. Are there ever cases when they aren’t?
@Uerdue
@Uerdue Жыл бұрын
@@austinraney From the manpage, I cannot find any evidence that swapping the arguments could ever mess things up. I would however argue that it could potentially hurt performance, because the `calloc` implementation might use that extra bit of information you provide by specifying what's the amount of items and what's the size. For example, it might try to align the memory such that no single item in the array will cross a page boundary. For this, it would need to know what's the element size. Interestingly, I haven seen more people supplying the arguments in the wrong order than doing it correctly.
@Uerdue
@Uerdue Жыл бұрын
Update: I checked the libc implementation on my machine, and found that it doesn't care: It multiplies the values, makes sure no overflow happened, and then just goes on to allocate a block of memory as large as the result of the multiplication. Other libc implementations might differ.
@austinraney
@austinraney Жыл бұрын
@@Uerdue thanks for doing some digging and sharing! It’s much appreciated! I was curious in particular about potential page alignment problems like you mentioned.
@JacobSorber
@JacobSorber Жыл бұрын
Yeah, calloc is typically just a multiply, a malloc call, and a memset (or the equivalent). Sorry if I caused any confusion.
@ahmadhadwan
@ahmadhadwan Жыл бұрын
Very interesting video dr. Jacob, I'm glad you decided to expand on the last video.
@tiramihai1152
@tiramihai1152 Жыл бұрын
A 41 minute Jacob Sorber video? I'm in for a ride
@adambishop328
@adambishop328 Жыл бұрын
wow thank you for strcspn, i've been looping through my character arrays for a long time to try and format them into null-terminated strings without any return or newlines. Sweet function
@Ido-Levy
@Ido-Levy Жыл бұрын
Hey, thank you for putting out these videos! I'm learning a lot from you :) Why aren't you checking for memory allocation failures?
@mytriumph
@mytriumph 9 ай бұрын
in my experience, it generally isn't necissary on modern computers. The odds that so much of your computer's total memory is being taken up by other processes, so much so that the program fails to allocate a comparatively small amount of memory, is small enough on modern computers that you can reasonably rule it out. Now, is it good practice to check anyway? Yes, absolutely. But it ultimately doesn't end up making that big of a difference
@surters
@surters 6 ай бұрын
If you want to make a generic hash table, you need a lot of helper functions that knows the type that you would need to pass along, that gives a lot of extra parameter. Or you could just pass along a point to a struct with all those extra functions, each of them function pointers for that type. Some of the extra pointers could be print_obj, destroy_obj, initialize_obj, copy_obj, assign_obj etc.
@russelwestbrick3023
@russelwestbrick3023 Жыл бұрын
wonderful teaching!!
@greg4367
@greg4367 Жыл бұрын
Looking forward to part 2
@FelixNielsen
@FelixNielsen 11 ай бұрын
I have a question may be relevant or entirely unrelated. I'm not actually sure. In short, I have a problem, the solution to which could well be a hash table. My keys are known to be unique and equal length, that is to say in terms of bytes no more than a few, or not strings, but rather integers, if you so desire. The question the becomes, is there a special category of hash functions (or other method), which can convert these keys into values in a given range, naturally ranging from 0-(n-1) for n items? Mind you I can think of other solutions for doing what I need to do, which is basically a runtime defined and/or modified switch case like functionality, but none I can think of are entirely well suited. Thanks for your efforts.
@mr.erikchun5863
@mr.erikchun5863 Жыл бұрын
Thank you Jacob for making these videos.
@CodePagesNet
@CodePagesNet 2 ай бұрын
Thank you for the helpful video and C videos in general. I encourage and promote the understanding that C has advantages over OO, even though people may not understand that yet (OO is merely a code format, and an inflexible one at that).
@thomaswillson1107
@thomaswillson1107 Жыл бұрын
Hi, can you show us how did you custom your vscode (comparaison operators like `!=`, etc...), thx for the video !
@strager_
@strager_ Жыл бұрын
Those look like ligatures. You need a font with code-oriented ligatures, and you need an editor which supports ligatures. I don't know what font Sorber uses, but Fira Code is a popular font which has ligatures.
@jvp5000
@jvp5000 2 ай бұрын
@@strager_ thanks
@Nohope__
@Nohope__ 6 ай бұрын
I'm going to have to watch this 10 times. (TYSM for the amazing material < 3)
@erbenton07
@erbenton07 Жыл бұрын
points-- for unnecessary use of feof Jacob, long video's are fine
@TheSulross
@TheSulross 10 ай бұрын
Just had to implement an open addressing hash table using linear probing and and double hading to reduce clustering - and I validated that, yes, double hasing does reduce clustering and the second has function can be very cheap and practically no cost. In my case the has table is allicated up front to some size and does not have to be increased in size over operational life time. Only keys are stored in the has table so a lookup returns an index. So the data resolved to is kept in a separate array that is of the same max entries size as the hash table itself. So the very same hash table can be used to lookup different data structure values depending on context - something that is the case in my domain.
@greg4367
@greg4367 Жыл бұрын
Jacob, the Subscribe button on your WEB page is inop.
@zxuiji
@zxuiji Жыл бұрын
What I'd like to see is a vid on the new lattice based encryption algorithm, be one I'd definitely save for later and I'm sure a number of peops here would end up using it in future jobs or existing ones if they have them.
@zxuiji
@zxuiji Жыл бұрын
Considering how you use the allocations should've really just used calloc everywhere to avoid runtime issues
@Ido-Levy
@Ido-Levy Жыл бұрын
Also, why are you using uint32_t instead of just int?
@soniablanche5672
@soniablanche5672 9 ай бұрын
uint32_t is always gonna be unsigned 32 bit, int size will depend on your machine
@randomscribblings
@randomscribblings Жыл бұрын
strdup() again
@_veikkomies
@_veikkomies Жыл бұрын
When you write "tmp = tmp->next" (e.g. lookup function), don't you have to define what "tmp->next" means? Where was that done?
@IBelieveInCode
@IBelieveInCode Жыл бұрын
"next" is a field of the struct "entry". You probably missed that 🙂
@_veikkomies
@_veikkomies Жыл бұрын
@@IBelieveInCode Ahh yeah, probably. Thanks
@IBelieveInCode
@IBelieveInCode Жыл бұрын
Good Game 🙂
@IBelieveInCode
@IBelieveInCode Жыл бұрын
I've just written my own C "Hash Table" module. It's on my channel. Without sound. My english is badly written, but it's worse when I try to speak.
@randomscribblings
@randomscribblings Жыл бұрын
In delete you're leaking the key memory.
@JacobSorber
@JacobSorber Жыл бұрын
Did you watch the video? 😀 Yeah, I know. The example is unfinished. See you next week.
@randomscribblings
@randomscribblings Жыл бұрын
@@JacobSorber Yeah... the comment was made as I was watching.
@anon_y_mousse
@anon_y_mousse Жыл бұрын
It's not a bad start, but it would help if you made it slightly more generic. You could use _Generic and specialize on a few known types, or you could use a union and associate traits with whatever user defined data gets passed in. It would help to have the user pass in a hashing function and a comparison function and have flags to determine if data should be copied or merely pointed to.
Build a Realtime Chat App in React Native (tutorial for beginners) 🔴
3:49:50
Faster than Rust and C++: the PERFECT hash table
33:52
strager
Рет қаралды 575 М.
小天使和小丑太会演了!#小丑#天使#家庭#搞笑
00:25
家庭搞笑日记
Рет қаралды 31 МЛН
ДЕНЬ УЧИТЕЛЯ В ШКОЛЕ
01:00
SIDELNIKOVVV
Рет қаралды 2,4 МЛН
LIFEHACK😳 Rate our backpacks 1-10 😜🔥🎒
00:13
Diana Belitskay
Рет қаралды 3,9 МЛН
when you have plan B 😂
00:11
Andrey Grechka
Рет қаралды 67 МЛН
Fixing our "better" hash table's memory leaks (in c)
10:04
Jacob Sorber
Рет қаралды 6 М.
Pulling Back the Curtain on the Heap
21:38
Jacob Sorber
Рет қаралды 37 М.
Hash Tables - CS50 Shorts
18:47
CS50
Рет қаралды 146 М.
Hash Table in C
2:11:31
Tsoding Daily
Рет қаралды 65 М.
Hash Tables - Data Structures and Algorithms
20:16
Caleb Curry
Рет қаралды 38 М.
How to make memory read-only in your C programs.
12:57
Jacob Sorber
Рет қаралды 20 М.
31 nooby C++ habits you need to ditch
16:18
mCoding
Рет қаралды 789 М.
小天使和小丑太会演了!#小丑#天使#家庭#搞笑
00:25
家庭搞笑日记
Рет қаралды 31 МЛН