I hope you loved the video. if you do, do share a word on social media. It would me the world to me ❤
@uday_berad2 ай бұрын
Can you share the notes for GFS ?
@tsghosh90062 ай бұрын
Thank you.
@pranjalagnihotri607210 ай бұрын
It's 4 in the morning and just completed watching this amazing amazing video on GFS, really amazing decisions they took in 2003. Got few questions but first I will try to look-up on my own. Hatsoff to your teaching skills 🤩
@AsliEngineering10 ай бұрын
Thank you Pranjal :)
@PrashantChoudhary68682 ай бұрын
I am a beginner in system design and This paper is so brilliant that there were so many lines in it which would give me a whole new perspective of how things work and how i can make them more efficient I could imagine those lines or ideas and apply them in other situations of computer science and was amazed by how well it turns out Thanks for such easy explanation and content bhaiya Loved the video 👌
@lunadas121710 ай бұрын
I can't express my gratitude adequately for creating such excellent content
@BhaswantInguva5 ай бұрын
Great explanation. Thank you! Of the multiple times I felt "wow", two places are 1. How heartbeat is efficiently used and avoided persistence of chunk to location mapping. 2. How the write failures are handled with 2 step acks from primary replica.
@mritunjayyadav37889 ай бұрын
when you dropped this gem! I was also reading GFS white paper and came here to watch video after completing the white paper . Even though i have understood the paper , watching your video made me fall in love with GFS even more 😊
@AsliEngineering9 ай бұрын
Thanks man! means a ton ✨
@devdevjb10 ай бұрын
More paper reviews! I like how you explain
@nirajpandey38562 ай бұрын
The way you explained it is just awesome.
@JohnbelMahautiere3 ай бұрын
Thank you for supplying detailed information regarding the Google File System, as outlined and explained in the referenced paper.
@mydivinelife47743 ай бұрын
love it, thanks for sharing. I did not learn this much in last 10 years.
@SumitGouthaman9 ай бұрын
Great video! Really made me want to go back and read the source paper 😃 Around 1:13:00, I was slightly confused as to how the master detects corruption of chunks on a specific chunk server (doesn’t seem scalable for the master to keep checking the checksums regularly). After skimming the paper it seems the master is informed about potential corruption by the chunkserver with the corrupted chunk rather than the master detecting it proactively.
@AsliEngineering9 ай бұрын
Thank you 🙌
@JardaniJovonovich19210 ай бұрын
Great video. The Primary replica concept reminds me of the leader replica concept of a Kafka topic across brokers.
@gswapnil7773 ай бұрын
Thanks for such detailed video. More power to you !
@AsliEngineering3 ай бұрын
Thanks :)
@ntnmnk20099 ай бұрын
Nice explanation! This was covered in the fourth year and Hadoop is inspired by GFS and that's when yahoo open sourced and big data reveloution started
@tell5g9 ай бұрын
Wat so special about doesn't torrent work same way? It's just looks like google renaming bit client to Google file system.
@parthibdhal818713 күн бұрын
great explanation ! Thanks a lot !
@cyriacgc5 ай бұрын
I want someone to say to me the way Arpit says "such a beautiful software" 😊 Jokes apart, what a video! I'm really glad that i stumbled on this. Thank you
@AsliEngineering5 ай бұрын
Hahhaha 😅
@39_ganesh_ghodke987 ай бұрын
Your teaching skills are great👍
@AsliEngineering7 ай бұрын
Thanks Ganesh!
@suhanijain50267 ай бұрын
this is great would love to see paper on big table as well
@siteshp28 күн бұрын
Thank you so much Arpit for the whole video. I had a question about why would we need global mutation order, but then went through the paper and rewinded some parts of the video to understand that multiple client would be trying to write to the same chunk and hence a global mutation order is very important. Thanks for instilling curiosity :-). Means a ton to me too.
@sanjeevrajora73359 ай бұрын
Thanks for making such amazing content big fan of your work ❤, just a small doubt if i am not missing 1:01:19 you said that client have to write to each replica itself , don't you think it would be better if client only write to primary replica and from there its primary replica duty to write it to all secondary replicas, it will help in 2 ways 1) client don't have to send the confirmation that write to all replicas is completed , all these responsibility can be encapsulated to the primary replica 2) network call overhead to write from client to each replica can be reduce by primary server as they can use very high bandwidth connectivity of the Google also if replicas are near or in same physical zone than we can save network overhead to pass the data across hundreds of miles waiting for your response
@koteshwarraomaripudi10809 ай бұрын
Kafka does the same thing what u are suggesting. My best guess for gfs to take this call would be based on the network bandwidth availability in data centers in 2003.
@ahmedanwer17672 ай бұрын
Excellent explanation. I'm at 40:41 but its totally amazing. Plz make more explanation videos of research paper
@AsliEngineering2 ай бұрын
I have a playlist of 6 paper dissections on my channel.
@ItsMeIshir10 ай бұрын
Nailed it.. Thank you very much for the detailed video Arpit. Stay curious :)
@RahulPal-iz4ev10 ай бұрын
Amazing explanation!! Loved it!!
@maheshgvp773510 ай бұрын
This has been awesome so far, I am half way through the video. If this is not too much to ask for, it would be great if the notes that is presented can be shared 🙏
@AyushBakshi10 ай бұрын
Good video. Not really into file management systems or computer science (I'm a 3d artist) but still watched.
@uday_berad2 ай бұрын
Can you share the notes ?
@aniljuneja17526 күн бұрын
what happened to the older chunks ? i mean if i want to overwrite some chunk, GFS create a new chunk and change the sequence in op log like in this file these are the chunks in this sequence. But how does it identify & handle the older chunks ?
@arnabbir0018 ай бұрын
As a sequel, can you please add another video on the whitepaper MapReduce: Simplified Data Processing on Large Clusters? Also will it be possible to have a video covering Colossus? I know there are not enough resources available for that.
@pawansarma87309 ай бұрын
Very well explained and rly helped me understand… please do Big Table next😃
@akshayxml9 ай бұрын
can you share these notes?
@arjunmenon1574 ай бұрын
Hi Arpit,, Thanks for all the great content you are making. I had a doubt, in 17:22 you talk about how client translates (file,offset) -> (file,chunk,offset). But how would the client know on which chunk the data is present? Wouldn’t that data reside inside master?
@ritikjain481110 ай бұрын
Would love to have a similar walkthrough for aurora db
@gauravkhandelwal8377Ай бұрын
@arpit bhayani - Can you share slides or notes.
@pradipacharjee49152 ай бұрын
Hi @Arpit, in this video you talked about mostly how 1 GFS cluster looks like. But I have a doubt - if there are multiple GFS clusters ? how the routing happens ? I mean how system stores the info for a particular file stored in which cluster ?
@AsliEngineering2 ай бұрын
you can add one router in front of it and apply any routing strategy like hash/range/static etc.;
@Singh-rt1zq4 ай бұрын
Thanks for the great explanation! I have one doubt: How is the checkpoint in a compact B-tree format? Isn't it more like an append-only log?
@snlagrАй бұрын
Can someone help me with the calculation of 64mb chunk requires 64bytes of metadata, then 1GB of metadata could hold how much data? Arpit said 10^6 but shouldnt it be 10^4? Also how do chunk servers talk to each other, they also have mapping of other servers locally?
@visweshm9928Ай бұрын
Hi @AsliEngineering, during the global order across chunk servers, how would the primary chunkserver know which chunkserver are secondary? So that it can send the order needs to be persisited
@shubhamjagtap1082 ай бұрын
Hello Arpit, thank you for this video. I came across your channel today, amazing. Can you please create a similar video on Hadoop File System (I think it will be more or less similar to this one) or on How PySpark uses and processes data on Hadoop. Please?
@anweshchatterjee988210 ай бұрын
Do HDFS has same primary replication scenario and how to set the primary replication for a file ?
@dinesh.p864210 ай бұрын
Where can i find the notes thats being displayed in the video?
@RamBhakt__10 ай бұрын
Wow a great way to explain .. Thanks ..
@nitinagrawal6637Ай бұрын
A good one to watch, but I wonder where commodity hardware is being used for such distributed systems.
@nitinagrawal6637Ай бұрын
Making chunks of files, but how can these make the migration easy? I think, migration of the whole system will be done when required, in general. I also think, making chunks is also complicating the system with certain trade-offs.
@nitinagrawal6637Ай бұрын
On every heartbeat, if the chunk server is sending the list of chunks it is holding, then is it good? what if checksum is not written correctly on the disk? I don't think, write is that efficient. Google is doing such on its own customized infra, there it will be good but for general network it will be quite inefficient. Also, I think GFS client will be taking lot of network resources, as it is sending the data to all replicas also. I get the replicas having a particular chunk, then do I need to check with master while accessing again? I can keep hitting the same replica for the same chunk.
@tell5g9 ай бұрын
Doesn't torrent work the same way, I didn't find any difference other than renaming bit clients to Google file system
@akshitgarg094 ай бұрын
great video!
@ishnjn200110 ай бұрын
Will watch it again after completing my ongoing assignment 🫶🫡
@kailash._11.10 ай бұрын
Arpit, as you mentioned that the GFS Client will directly talk with the Chunk Servers to get particular chunk data and the ACL is maintained on the master server. Does it means that there is no access rights checking on the chunk servers? Can anyone with the details (meta data obtained from master server) of the chunk server access the data in that particular chunk server?
@akashshirale19277 ай бұрын
Loved this video and also implemented a hello world of GFS, can you also make a similar video of Kafka paper..
@saivivekvalluru136910 ай бұрын
Great video loved it
@growwithriu6 ай бұрын
Writes: 48:05
@davendrasingh5189 ай бұрын
How do we handle atomic or contentious operation here. ? two client writing same chunks are they going to acquire lock on LRU ?
@AsliEngineering9 ай бұрын
this is not a transactional system like a database. so contentions are not common. But in case there are then pessimistic locking is a simple solution.
@samyakjain24865 ай бұрын
Might be 64 KB is offset used to ensure Data Integrity
@suryanshsingh69065 ай бұрын
Just wow ❤
@sv_n10 ай бұрын
Hi Arpit, would it be feasible to implement this in Golang? What are your thoughts?
@AsliEngineering10 ай бұрын
Yes. totally. pretty easy to implement a quick prototype.
@linc0089 ай бұрын
What font are you using in the video?
@AsliEngineering9 ай бұрын
These are handwritten notes. But glad you thought it was a font.
@linc0089 ай бұрын
You have very good looking hand writing!@@AsliEngineering
@sagar-tt4ub10 ай бұрын
How do you make the diagrams?
@AsliEngineering10 ай бұрын
they are all hand drawn n my GoodNotes app.
@dinesh.p864210 ай бұрын
Dhanyavaad ji.
@apratimgupta15709 ай бұрын
Hey sir, in case of hot spots, won't writes be further affected?
@parth243910 ай бұрын
Gold Content , so nice presentation , handwriting ,Thanks for all the efforts 🫶🫡
@ishnjn200110 ай бұрын
This was a paper I read earlier then you hehe 😁
@asranand710 ай бұрын
Great discussion Arpit, please make more such long videos. I have one question from the discussion. Why is the client sending the same write to all the chunk servers ? And why not just to the primary replica and then the primary replica sending the write to other replicas ? (Similar to Kafka, where the leader broker send the writes to other brokers and waits for the ack from the replica brokers)
@rustyOsss10 ай бұрын
Exactly, Why client is waiting acknowledgement from the majority, isn't primary acknowledgement is enough, system will take care of the replication then.
@shankarbanerjee874610 ай бұрын
Maybe to protect the primary replica's bandwidth. Primary replica sending the writes to the other 2 will eat up its bandwidth, instead they chose to preserve it and use up the client's bandwidth. That's my hunch
@SanjayJayasankar10 ай бұрын
Same thought.. The bandwidth would often be faster between chunk servers.
@hiteshbitscs9 ай бұрын
That's great question. My take is client writes to primary & then primary writes to at least half of the replica. Let say if Replication factor = 3 then it waits for at least one more replica to give ACK. Hence total 2 writes are confimed. Then they update master and return to the client. so this ensures at least N / 2 replicas are consistent in the system. Also my take is rather than saving chunk server's bandwidth as writes to client to server is very unreliable and need to cross entire internet. Hence it is slow. mostly I believe chunk server would internally copy the data and ensures consistency. Client -> broadcast to all replica .. I AM NOT IN FAVOUR.
@GauravSinghvi-yu8jd10 ай бұрын
I am in final year of undergrad, have never read anything in system design, will i be able to understand this, please be honest sir
@AsliEngineering10 ай бұрын
Go through it and find out. If I were you and in college, I would have gone through such videos more often than not. Do not doubt your abilities before even trying. To be honest, being a student you will be able to learn quicker than the experienced folks. Do not let others tell you what you can learn and what you cannot.
@GauravSinghvi-yu8jd10 ай бұрын
@@AsliEngineering Thanks a lot sir for your answer, would surely go by your advice
@pushkarratnaparkhi220510 ай бұрын
💯💯👍🏻👍🏻
@nishantketu204010 ай бұрын
👍👍
@niwanshumaheshwari45349 ай бұрын
Thank you for this awesome video. Here master is working more like a coordinator node, isn't it? the zookeeper also does the same thing, right? Handle the information about which replica is residing on which node. I think we are using here multi-master replication, every replica is acting like a master for one chunk and others acting like a follower.