Great content Jordan! Really Enjoyed it. I kindly request you to also cover Online Bidding Platform (like e-Bay) in one of your videos. Thank you!
@sgulugulu196 сағат бұрын
Thanks for the videos and specially making the 2.0 playlists. Been a follower since Dec last year and was able to get Amazon and Google's offer. Would have liked to thank you in person if you were still at G. 😄. Keep up the great stuff man!
@jordanhasnolife51636 сағат бұрын
Congrats man!! You're an absolute legend! And have fun at Google!
@khushalsingh57612 сағат бұрын
great video and the information at 07:54 (Fortunatly there are engineers who has no life ... 😂😂) made the practicle touch
@user-ee7oi8qv7f16 сағат бұрын
So basically we can have both the implementation for the hash maps, one on the memory (i.e. space constraints) and second on the disk right (but either case it doesn't support range queries) and that's why we need B-trees. Now B-trees are disk only right, because the size of the data could be huge, it solves the range query issue but the problem with it is that the write could be expensive if we have to perform a B-tree split (recursive) right? This is what I understood so far, let me know if my understanding is correct. Also thanks a lot for this series, really helpful. Just one more question I have is that, we generally maintain WAL when we are doing a memory implementation of any kind of index right? or we do we maintain the WAL irrespective?
@jordanhasnolife51637 сағат бұрын
I think just about all DB indexes will use some sort of write ahead logs. I think what you said is basically correct, but the main reason b-tree indexes are good for range queries is that similar keys are next to each other on disk. Also, in reality, there's lots of in memory caching of B-tree pages that is done.
@VIPINKUMAR-dr7vuКүн бұрын
Can we implement a product clustering approach using Natural Language Processing (NLP) techniques to group similar types of products together on the same node, thereby optimizing search query performance?
@jordanhasnolife5163Күн бұрын
Yep!
@sankalpsharma1755Күн бұрын
Learned about binary search trees in 2016 And i am learning about their real use cases in 2024 :O And yes i'm old :/
@MultiCckkКүн бұрын
takes me 3 hours to understand and complete a 45 min video rip😂
@akibali7123Күн бұрын
Hi, You are delivering high quality content, It's very unique and hitting the core problems of each and every design. Can you please make a video on collaborative editing tool like excalidraw.
@jordanhasnolife5163Күн бұрын
Thanks! Any reason in particular that you think Excalidraw is challenging? Unlike google docs, I imagine that there aren't enough concurrent edits to make a single leader infeasible here
@chaitanyatanwar8151Күн бұрын
Thanks!
@daisuke.ryomen2 күн бұрын
Just started watching this series, till now it has been a lot of fun + a lot of learning!
@msebrahim-0072 күн бұрын
Question about adding elements after they have been removed (14:04): If a user adds "ham" 5 times to the set on the same node, what is preventing the set from containing different 5 instances of "ham" with unique IDs?
@jordanhasnolife51632 күн бұрын
Nothing. You have multiple instances of ham now. On the front end though, we just tell the user that we have one instance of ham.
@tunepa44182 күн бұрын
Thanks for the video. How do we get top k count from count min sketch, I assume count min sketch is just for counting not to get top k
@jordanhasnolife5163Күн бұрын
In theory you can maintain an in memory heap of size k with the counts of the top k elements as you compute the count of each while using the count min sketch
@medaliboulaamail64912 күн бұрын
hahahahaahah deserved for flying on delta (I never step foot inside an airplane)
@jordanhasnolife51632 күн бұрын
C'mon man it wasn't even spirit or frontier!
@medaliboulaamail64912 күн бұрын
@@jordanhasnolife5163 if it was spirit they would seat you on the jet engine none the less pick your poison
@aniketpandey19132 күн бұрын
Hey jordan, i'm not able to understand the use of CDN's here, are we going to store that 1 sec chunk in those CDN and also why do we need to store metadata, can you please clarify these doubts of mine
@jordanhasnolife51632 күн бұрын
Metadata basically just stores the link of the clip in S3 as well as it's sequence number and resolution. The CDN is used as a globally distributed cache of the video clips.
@truptijoshi25352 күн бұрын
Hi Jordan, can CDC have a single point of failure? If yes, how do we avoid? Also does CDC add extra latency?
@jordanhasnolife51632 күн бұрын
I mean in theory kafka, but I tend to imply that our Kafka cluster has replicas. CDC does make things slower, but I suppose in the cases where I use it I don't actually care (hence why I use it)
@seifeddinedridi48982 күн бұрын
Great video Jordan. It became a habit of mine 😄 to watch your content and study your system designs. Thanks mate for your work, I appreciate what you're doing.
@manishasharma-hy5mj2 күн бұрын
Hi Jordan, while performing writes in B-Tree, those are first written to WAL, and then it gets from WAL, to insert in its Btree like structure. So that's why writes are slower, as this must be synchronus operation. Am I right ?
@jordanhasnolife51632 күн бұрын
Not sure what you mean by synchronous but basically a write must be fully committed in the WAL for it to count otherwise we throw it out
@SonOfTheSoil_12 күн бұрын
webcam is fine, i really don't care about video quality. amazing content though, thanks.
@huguesbouvier38213 күн бұрын
Great video! Thank you! One comment: The encoding server will have to write to 4 different places. Either: - 2 PC: Bad - Write into the metadata cache last: Better And we could have a CRON job that cleans up failed job in the background?
@jordanhasnolife51633 күн бұрын
So I'd say it really only has to write to two places (since the caches should be content that is pulled in, not pushed in). To avoid two phase commit, we can just write the metadata row once the S3 clip is uploaded. If for some reason there are some orphaned S3 files that's no biggie
@jalaj62533 күн бұрын
You have used same db type - columnar db but different underlying db for chunk metadata and chatDb. I guess reason behind it is you want chunk metadata to have consistent db (Hbase) while for chatDb you want availability more (cassandra). Is this right understanding or there are other reasons also ?
@jordanhasnolife51633 күн бұрын
Hey I tried to explain this one more in the video. Cassandra - as fast write throughput as you're going to get (other than maybe like kafka or something and just using a log, which actually now that I think about it isn't the worst solution here). HBase - fairly fast for writes, but important for reads since we only want to access a couple columns of our data at once. You call both of these columnar dbs. To my knowledge, they both use column formats, but HBase uses column oriented storage, which is a significant difference.
@mystica72843 күн бұрын
toe reveal stream when?! great video btw!!
@jalaj62533 күн бұрын
Your videos are one of the best source to learn about system design. Appreciate your effort and consistency:)
@emenikeanigbogu93683 күн бұрын
start streaming on twitch!!!
@jordanhasnolife51633 күн бұрын
Feet incoming?!
@foxedex4473 күн бұрын
ill keep watchin ur videos till i die or u die
@jordanhasnolife51633 күн бұрын
I appreciate that man! I already have no life though so you've paid your debt. In all seriousness, once you get that job, no need to keep watching go have fun and socialize :)
@foxedex4473 күн бұрын
@@jordanhasnolife5163 i kinda watch em for fun at this point XDD
@InfiniteRabbitHole3 күн бұрын
Oh. So. Cool.
@jorgealonsogastelumgonzale28703 күн бұрын
Amazing video!
@scottmangiapane3 күн бұрын
Babe wake up, new system design just dropped
@jordanhasnolife51633 күн бұрын
She may still be sleeping
@scottmangiapane3 күн бұрын
@@jordanhasnolife5163 That's OK, the system is fault tolerant. I will eventually achieve consistency via an air horn
@jordanhasnolife51633 күн бұрын
@@scottmangiapane lol, I see you're the primary in your relationship then and you're asynchronously replicating your sleep status to her
@scottmangiapane3 күн бұрын
@@jordanhasnolife5163 Oh yeah. And if she's still not up, I'll implement sharting I mean sharding ;)
@scottmangiapane3 күн бұрын
@@jordanhasnolife5163 Oh yeah. And if she still won't wake up, I'll implement sharting I mean sharding ;)
@devops_junkie92033 күн бұрын
Ah, this is amazing, I have soome junior developers that I am training on MS Architeture we are now at the integration point and wanted to see which is better from our case. It seems we might be using both options for us. Thanks
@shobhitarya16373 күн бұрын
Does nosql databases use same mechanism i.e using WAL or Logical replication log to replicate data to other nodes OR it is just applicable to sql databases?
@jordanhasnolife51633 күн бұрын
I imagine this would be database specific but I don't see why they wouldn't
@sergiuchiuchiu66923 күн бұрын
@2:20 Your information is wrong there. I think you wanted to say that there are Partition keys (at least one) and Cluster keys (0 or more). Together they form the Primary key. Please review the video as it is misinforming thousands of people.
@jordanhasnolife51633 күн бұрын
Oops, typo on my part. If this were a bigger deal I'd revise the video, but I don't think anyone is losing their job due to using the wrong terminology on cassandra key names.
@ganesansanthanam-58963 күн бұрын
Please accept me as your mentee
@jordanhasnolife51633 күн бұрын
I'm sorry man I'm pretty pressed for time these days, perhaps you could find one amongst my other gigachad viewers or go asking on blind/linkedin
@tunepa44184 күн бұрын
Why does a rider need to be connected to a matching service close its location using the geohash loadbalancing ? I am quite confused. Can you please clarify
@jordanhasnolife51633 күн бұрын
The server itself doesn't need to be physically near it's location but it should be responsible for the area of the map that the potential rider and all nearby drivers could connect to
@muven27764 күн бұрын
Good Video - Too honest - Great content - Injecting the content like a slow poison
@meenalgoyal89334 күн бұрын
Hey Jordan! thanks for the video. I have a question around the part where flink consumers sends the message to other users in group chat. The consumer checks with load balancer to know about the hosts to which other users are connected and then consumer sends the message to those hosts. But how does the communication between flink consumer and chat service host occur for consumer to share those message to host to send them further to users?
@jordanhasnolife51634 күн бұрын
Chat server can just expose an http endpoint that the flink node can hit
@thenamestails71524 күн бұрын
How about Are you by any chance a C++ program? Cuz #include <iostreeam> #include "Rizz.h" using namespace std; int main { if (Rizz::wannaDate == true) { Rizz::isPregnant = true; } else { // there is no else } return -1; // yeah, I'm a bad guy :sunglasses: }
@thenamestails71524 күн бұрын
"1+1 equals 10, if you know what I mean 😏😏"
@harshchiki77964 күн бұрын
Did not follow the range query across geo hashes, in the slide where we were calculating the nearest points (slide having the diag of boxes)? Geo hashes don't have a mutual ordering - or do they? (if you don't mind, can you share a bit on that)
@jordanhasnolife51634 күн бұрын
You're right that there's not a direct ordering, but for a given geohash I do know all of the 8 hashes of the surrounding boxes so I can just check all of those with a Pythagorean equation to check things are in the correct distance from my focal point
@msebrahim-0074 күн бұрын
Question about the example at 9:55 where both leaders now have (Key: Jordan, Value: cute | scary). If a user reading from leader 1 is prompted to choose and picks "cute" whereas another user reading from leader 2 is prompted and chooses "scary", then we end up in a scenario where Leader 1 has "cute" and Leader 2 has "scary". Hence, there is a conflict. My question is, when a user is prompted and chooses a value, does the version vector increment for that leader? - If this is the case, I presume that when the leaders share their version vectors again we end up back at the same situation where we started and store the siblings. - If not, how do we go about resolving this conflict? Or perhaps this situation doesn't happen at all and i'm overthinking this?
@jordanhasnolife51633 күн бұрын
1) Yes, let's increment the version vector. 2) Yes, we'd get siblings again lol, hopefully this doesn't happen too frequently and the two leaders have time to synchronize.
@slover43844 күн бұрын
LSM tree is the name industry gives to the entire structure (the in memory BST plus the different SST tables) The in memory BST part is called the "memtable" That is, memtable + SSTables == LSM Tree
@FaisalAnees465 күн бұрын
Awesome video man ! Super qq - towards the end, in your secondary global index - where the shards are partitioned by height, why would Dwyane's 6'3" "height" get hashed to the second shard ? Isn't the hashing happening on the value of heights and because of that Dwyane would hash to shard 1 ?
@jordanhasnolife51633 күн бұрын
So the point is the height index is the secondary one. So his primary hash puts him on 2 but his secondary puts him on 1, now we need a two phase commit.
@slover43845 күн бұрын
If you are using hash index to offset the location of key-values on disk, why do you need a write ahead log too? That's 3 writes each time (in memory to hash index, on disk twice). I don't know any database that does this. You can always recover the hash index for the active segment of the database on restart by reading the actual segment of key-values from disk. For the older inactive (i.e., read-only) segments, the database stores a snapshot of the index for that segment onto disk when the segments goes into read-only mode. The write ahead log is useful if you are storing actual data values in memory only without the on disk key/values. In general, the write ahead log for any database, even relational databases, stores changes that were made in memory that >were not< made to disk yet. Similar for LSM commit logs. If a change was made to disk, there is little value in also tracking this in a WAL or commit log.
@jordanhasnolife51633 күн бұрын
What about atomicity of transactions...
@slover438420 сағат бұрын
I don't know a database like we are discussing that has atomic transactions - though there may be some. I mean a database that immediately stores the key-value on disk with an in-memory hash index pointing to offsets on disk. ( Note: There are some well known databases that store sorted key-values pairs on disk as SSTables that provide ACID transactions, but I don't mean those. Note: Cassandra is a database that uses SSTables but doesn't/didn't support typical transactions. But since it buffered entries in memory first before disk, it has a subset of a WAL that was just the "redo" entries. Some call that log file a commit log to make it clear that it isn't a full blown WAL, but I usually call it a redo log. If Cassandra started supporting ACID transactions, it could possibly repurpose the commit logs and make them complete WALs. So my initial comment to you really should have said I was referring to that redo part of a WAL (which is what we were discussing I think in the video). That isn't really needed if you are going to persist all changes to disk immediately) If such a database was created, where hashed key-values are immediately written to disk instead of being buffered in memory for a while... and you wanted to implement atomic transactions -> you could use a stripped down version of a WAL that just has the undo entries. I would call this an undo log. But that's still a subset of a typical WAL, but most people would not object to a undo log being called a WAL so long as context makes it clear. You made a great point above though, which is forcing me to write these thoughts down.. Now, _if_ we were to introduce a WAL (i.e., with redo entries in it) to a database with immediately persisted hashed key/value pairs, the logical step is to actually not write things to disk immediately. That is, you want to spend some time buffering the hashed key/value pairs in memory with redo entries going into WAL first in case the power gets cut. TMI: A WAL is really a combined redo log and an undo log. 1) Redo logs refer to the durability in ACID. They are not in general needed if writes are persisted to disk immediately, because we already get durability when we persist writes. Adding a redo log allows us to program performance boosting features into the database which amount to buffering recently modified items in memory. 2) Undo logs refer to the atomicity in ACID. These are used to aid in aborting or rolling back partially committed transaction.
@jordanhasnolife51637 сағат бұрын
@@slover4384 Thanks for your detailed response, I appreciate it!
@ganesansanthanam-58965 күн бұрын
I would love to be mentored by you. I am an international student who's struggling to find a job
@jordanhasnolife51633 күн бұрын
Hey man! Wishing you the best - unfortunately unlike my time at google, I'm now a bit short on time, so I don't think I have the availability to mentor at the moment :(
@ganesansanthanam-58965 күн бұрын
I would love to be mentored by you. Hey is there a discord or some community? My brain is melting and i am struggling..feel like an idiot
@ganesansanthanam-58965 күн бұрын
I am a F1 Student, got laid off and would love to connect with you and grt some guidance on how to improve. I would love to be mentored by you....
@ganesansanthanam-58965 күн бұрын
I would love to be mentored
@soumik765 күн бұрын
Hi Jordan, If DAG update isn't needed (as in if it's a simple cron job) then does executor directly updates schedules table, as there won't be CDC in this case?
@jordanhasnolife51633 күн бұрын
Seems reasonable to me
@collinmonahan34285 күн бұрын
I wanted to know more about your beef with Tech Lead. 😀
@jordanhasnolife51635 күн бұрын
I actually found tech lead pretty entertaining until I realized he wasn't being ironic