System Design Interview: Design Dropbox or Google Drive w/ a Ex-Meta Staff Engineer

Рет қаралды 93,561

Hello Interview - SWE Interview Preparation

Күн бұрын

Пікірлер: 292

@abhijeet8710 7 ай бұрын

"Have you done any System Design course ? How are you so good with this subject ?" - These were the word of my interviewer. I had a High Level + Low Level system design with a start-up recently. Surprisingly the question was to design a file sharing system such as Google Drive as described in this video with some additional features. I explained the HLD with the diagram as I had learned from the the concepts of this video. After the HLD was over, the interviewer told me that I have created a very robust & elegant system. He further said, he was so satisfied with the HLD, that he no longer wants to go into the LLD. Folks, these videos are the absolutely anything that you will ever require to ace a system design interview. Do remember to learn the fundamentals used in the system. A huge thanks to #Hello Interview for putting out the best content out there.

@JohnVandivier 7 ай бұрын

"he was so satisfied with the HLD, that he no longer wants to go into the LLD. " GOALS! kudos and congrats

@hello_interview 7 ай бұрын

This is epic!

@charan775 3 ай бұрын

which startup bro?

@abhijit-sarkar Ай бұрын

These videos are undoubtedly great, but your interviewing experience at some start up doesn't prove that. Interviewing is taught at FAANG companies, and some dude at a company that opened 6 months ago wouldn't even come within 9 miles of a FAANG interviewer.

@YeetYeetYe 5 ай бұрын

Simply amazing. I don't mean to throw shade to other channels, but this is by FAR the best system design interview prep. So many other channels are just people with a couple of months of experience at FAANG and it really shows the difference between junior FAANG engineers and Staff FAANG engineers. Extremely high quality work.

@hello_interview 5 ай бұрын

So glad you like them!

@KiritiSai93 3 ай бұрын

You guys remind me of the "Acquired" podcast hosts. No click-baits or cringe posts, just sheer passion about the subject and high-quality in-depth analysis of things. Kudos and hope you continue the great work!

@hello_interview 3 ай бұрын

That’s the idea. Pure value no BS 🫡

@draugno7 2 ай бұрын

I also loved the jokes and an occasional reassurance in the Uber video, looking forward to more! Ddinngdding (that driver's phone after Taylor Swift concert in a badly designed system). This channel is simply amazing because it ties together all of the concepts I learned and even elaborates on different DSs and DBs. Someone said 'no shade to other youtubers' but I say 'yes shade' because they usually confuse and frustrate people who watch with incomplete diagrams and explanations.

@EamonLinskey 8 ай бұрын

These are the best System Design videos I have found. Great framework for approaching problems, clear explanations, helpful diagrams. And I really appreciate the notes about how insight’s different seniority levels might approach specific parts

@andjelaarsic9217 7 ай бұрын

My mind is absolutely blown by how beautifully everything is explained. I love how you understand what would be possible questions/confusions from people watching and you address them by explaining pros and cons. Thank you so much for the content! Your walkthroughs are by far the most useful and interesting.

@hello_interview 7 ай бұрын

High praise! Appreciate you taking the time to share this 😊

@ghadialhajj 17 сағат бұрын

I like how you brought up the concept of reconciliation in addition to the more "real-time" path, which is very similar to what you've explained in the ad click aggregator, because it shows how learning the concepts and becoming able to transfer them across seemingly different problems is more valuable and proper for an SDI than memorizing some architecture and trying to reproduce it during the interview. Thanks for the content.

@Wololowizz 5 ай бұрын

I must say that this is the best system design video I've seen so far. You covered the problem and solution step-by-step while other videos just throws a bunch of ideas right away. Sometimes I feel overwhelmed watching other videos thinking that's impossible to know all of that, but watching this video we can know what's the expectation for each level and the most important thought: you don't need to know everything. And that's gold

@hello_interview 5 ай бұрын

Glad you liked it! Check out our others if you haven’t already. Same format :)

@GauravGupta-op8ol 9 ай бұрын

With my systems design interview coming up, I was looking forward to your video. It's great as always.

@madhurnsit 5 ай бұрын

This is the best content I have come across on System Design interviews. Wish I had landed here this sooner. Thank you so much!

@lorddel 9 ай бұрын

One more comment on this: comparing this to the written content on hellointerview, this one seems more round and well-thought (mainly regarding using S3 notif. on chunk upload completion, which wont work). Would be cool to see it reflected there on the platform! Good job

@hello_interview 9 ай бұрын

Good feedback! I'll try to get that updated, particularly by adding sync which I just last minute decided to throw into the video.

@md_dm490 9 ай бұрын

This channel has the best system design content on youtube. Keep up the good work.

@parashar1505 26 күн бұрын

There are many system design courses - both paid and free - and I have bought and seen many. I have rarely seen someone so organised, so methodical, so all-encompassing like the way are in creating a flow in the design. This just shows what a great thinker one needs to be to be able to create such a framework and flow. You would make everyone a bit of a better thinker than they are with your videos. Many thanks!

@alexandergordon9286 8 ай бұрын

It's pure gold! specially the parts where you are stopping the debates abouts what db to choose or if the calculations are needed. The deep dives are the best part.. no one goes that deep and thats actually what matters in an interview

@levimatheri7682 7 ай бұрын

Wow, by far the best system design videos anywhere. I love how simple you make it, and the invaluable tips!

@anuragtiwari3032 8 ай бұрын

i dont comment much, but for this kind of explanation i gotta give it u. Hands down the best explanation on youtube . pls continue making these kind of videos . This channel will blow up

@hello_interview 8 ай бұрын

♥️

@prasidmitra6859 7 ай бұрын

These are like gift from God. The best SD resources I've found in the last 3 years.

@jagrit07 3 ай бұрын

Watched 20 minutes of the video so far and This is the 3rd resource I am watching regarding Dropbox design, I have read Alex's book, read Grokking book and now watching this just for fun and I think Evan King is actually the King lol. Amazing video, Please keep on adding more content. Yesterday, I commented on Tinder's Design video and now here. I think I might have to comment on all the videos once I watch those because this is really good stuff and we viewers should appreciate it and hence I will keep adding comments lol :D

@sumanthperuri6579 13 күн бұрын

Best video i have come across the system design question for dropbox, and changed my perspective on how to answer the question when asked in interview.

@JShaker 6 ай бұрын

I'm so grateful for all of your videos. I've been practicing using the Hello Interview AI interviews, booked one mock with one of your interviewers, watched all the videos. The quality is so far beyond any other content out there, and I've successfully passed 5 system design interviews. Keep up the good content, your KZbin channel deserves to blow up and your website too #wouldinvest

@batusun717 4 ай бұрын

please upload more stuff like this. This is literally the BEST on KZbin. Very much appreciate all the great efforts!

@cidwiththreeeyes 2 ай бұрын

Thank you for another great video! Honestly, I don’t have any constructive criticism, it’s pretty much a perfect format for these videos-practical, concise, insightful. Other creators’ videos like this are good, but they feel like they’re just going through memorized recipes. Your videos are actually teaching system design theory. Really hope you have more of these as I make my way through your catalog.

@tushargoyal554 4 ай бұрын

This is the best channel for learning system design. I've gone through a lot of explanations but found them talking things in isolation making it very hard to connect to get a full picture. The popular system design interview book also doesn't help much due to very discrete and sometimes inconsistent sharing of knowledge.

@Gamble396 3 ай бұрын

One of the best System Design channels. Please keep uploading.

@noobu 9 ай бұрын

Great stuff again! Not only good for interview but also for daily work 1) Clear and concise structure 2) Weigh trade off rigorously and explain the final decision clearly. Every single component is well though out with real world considerations

@adeeshacharya7520 8 ай бұрын

This is really good, irrespective of whether we are taking interview or not, any person looking at this level of explanation and detail would try to picture software differnetly. Thanks for making such videos, would love to see some more

@yankomirov4290 3 ай бұрын

You added systematic (pardon the pun) approach to such an open-ended nature of an interview. This was a game change for me! I really appreciate it, I went ahead and bought the Guided Practice which is also amazing and is my main practicing tool. Thank you so much!

@mehdisaffar 7 ай бұрын

I love the content. It has been frustrating to watch some other system design videos where they just brush off over important details and act like everything is straightforward and easy, and just make 10s of services and never really explain the nitty-gritty details of how those things would work and IF they would actually work/be efficient etc. Thank you!

@mehdisaffar 7 ай бұрын

I wish you had mentioned the challenges of 2-way syncing in this context. Because this is akin to master-master replication, in case of network partition (for example user makes changes to remote, hops on another offline device, makes changes, then comes back online) there is a chance of inconsistencies (user makes different changes on device 1 vs device 2). There would probably need to be a way to offer merging changes together or have the user choose between version 1 or 2.

@mehdisaffar 7 ай бұрын

I think I talked too fast! You did mention reconciliation

@chongxiaocao5737 8 ай бұрын

one of the best system design preparation video I have seen online.

@EranM 22 күн бұрын

49:19 if a chunk have changed, it's fingerprint has changed.. isn't this enough to notice change? how about if it was compressed? Did you open the compression in s3 and local?

@aldogutierrezalcala3047 4 ай бұрын

Bro, again me, just had a system design interview using your framework, still don't have the result but definitely this framework is basically pure gold to lead a conversation that i would keep using even in a daily job.

@hello_interview 4 ай бұрын

Hell yes!! So glad it went well 💪

@Ptbcpr 2 ай бұрын

did you end up getting the job?

@anmolgangwal9236 3 ай бұрын

bro we are ready to pay just enable the join icon in your channel, this content is too good to be free

@crackITTechieTalks 7 ай бұрын

This is the best system design video, I have watched!! Specially the deep dives, You nailed it !! Looking forward to watch your videos.

@AncientArtist7 3 ай бұрын

Your content is great and really easy to follow through each step of the process. Please continue to make more system design videos. It is extremely helpful !

@pragatimodi950 7 ай бұрын

Hi Evan, this is my first time giving system design interviews. Really glad I found this channel to learn from. Most of my prior feedback from mocks and system design have been framework related for when I explain my design. This really helps with that and I think even at work, this is a really good approach to follow for. most things. Awesome content, thanks a lot!!!

@VahidOnTheMove 7 ай бұрын

Thanks for the videos. 47:45 I would like to know your opinion on push approach? By push approach I meant when the File service knows there is a change in a chunk, Sync service will let the client know. And, then the client will send a request to sync/download the chunk.

@dctc42 21 күн бұрын

Very cool... It would be nice to do a followup that covers file versioning. I've been racking my brain on the best ways to do this. Keep up the good work! Minor callout on using the chunk fingerprint as the ID. You could get hash collisions for chucks that end up having the same content.

@ashutoshrana9998 7 ай бұрын

Will be the best system design interview channel for sure. Neat content. Keep up with the quality Man!

@galashrenik3404 3 ай бұрын

One suggestion I have is that when designing APIs, your videos often highlight the importance of handling partial data, which is typically expected of senior or staff engineers. In my view, API versioning carries a similar level of significance.

@DMA-I 2 ай бұрын

I believe there is a slight flaw for the sync files from remote server feature (24:16). I believe we need to keep records in db which device which client has synced to date what updated time/what version or the get changes will loop endlessly (getchange will always get files needs to be updated, but they might just have been updated)

@Jadeish01 29 күн бұрын

Thank you for breaking it down so elegantly, this was super helpful

@indreshgahoi7103 9 ай бұрын

Hey Evan , thank you so much for providing the great content. I really live the way you organize and put content across the board. ❤

@allenputich4192 5 ай бұрын

You do an amazing job of explaining the thought process, technical details, and growth opportunities!

@guitarMartial Ай бұрын

49:09 - time is a weird commodity in distributed systems with clock drift et al wouldnt vector clocks be a better solution instead? this way we can detect write conflicts pretty well too

@hello_interview Ай бұрын

Yes :)

@guitarMartial Ай бұрын

@@hello_interview Come to think of it - maybe even a Merkle tree here might be powerful. You are storing all the hashes already just build a local merkle tree and use anti-entropy to figure out delta periodically. Really wild thought - merkle tree + version vectors. One helps quickly figure out anti entropy as we can compare hashes the other helps with write conflict detection. Couple this with Kafka as you showed and you have a pretty amazing scaling solution.

@guitarMartial Ай бұрын

55:31 - Merkle trees et al are giving me flashbacks to Torrenting days. Indeed the files were broken up in different chunks whose shas were used to perform comparisons for the sake of completion.

@EngineeringBootCamp Ай бұрын

Another great video. Some questions that came up in my mind after watching this video is - 1) How does local chunking work, do I literally break the files into parts and keep that in some other system or temp folder, and upload the files from there? 2) After I have uploaded the file, do I get rid of the chunks? 3) If we had a delta change in a remote file, you talked about comparing the fingerprints on all chunks and comparing locally, to only download ones that changed, implying we still keep these chunks locally somewhere? And even if I downloaded a modified chunk, how do I go ahead and stitch the chunks together to create the unified file in the main folder? [A little more clarity on those questions would be really beneficial.]

@TechieTech-gx2kd Ай бұрын

1. The chunking is not a physical concept rather a virtual one, the files are still stored as bits in the physical storage but in the database dropbox maintains a table on the client side known as chunks, which keeps the ranges on the physical file representing that chunk. Here is schema for chunks table Column Name Data Type Description chunk_hash TEXT (Primary Key) The unique hash of the chunk (e.g., SHA-256). ref_count INTEGER Number of files referencing this chunk. file_path TEXT File path where this chunk resides. start_byte INTEGER Start byte position of the chunk in the file. end_byte INTEGER End byte position of the chunk in the file. Similarly dropbox has file table Tracks metadata about files, including their chunk composition. Column Name Data Type Description file_id TEXT (Primary Key) A unique identifier for the file (e.g., UUID). file_name TEXT The name of the file. file_path TEXT Full path to the file on the local disk. chunk_hashes TEXT Comma-separated list of chunk hashes in order. Now when you add a new file, in the application layer you create chunks and calculate hash of each of them, then try to commit those chunks in Dropbox metaService, the metadata service will inform if the chunk is already available and won't ask you to upload at BlobService. 2. As there are no physical chunks So there is no need to get rid of chunks. on the local storage we always deal with files and not chunks. 3. Nopes you are not keeping any chunks but instead you'll deal with hashes(chunk hashes to be precise), as soon as you receive a notification that there is a remote change you'll ask about the chunks and their hashes, To dive little deeper, the MetaService maintains the Server_file_journal which keeps Append Only logs for each namespace and let you know for a paricular namespace what all changes are available in the server and you download only those chunks which you don't have in local based on their hashes. Now once you have the chunks available you directly replace bytes of that modified file in the disk without the need to re-create the file, so you are dealing with bits here via start and end offset. Do let me know if you need more detail

@VarunVermaUSC Ай бұрын

@@TechieTech-gx2kd Thank you so much, for taking the time out and sharing those details!

@pradeepbhat1363 26 күн бұрын

@@TechieTech-gx2kd Thanks for the details. So, if a new byte is added to the beginning of the file, the fingerprints will change for all the chunks and will it trigger a full file upload ?

@TechieTech-gx2kd 24 күн бұрын

@@pradeepbhat1363 Hey, your interpretation is right! Dropbox actually solved this issue by implementing content-defined chunking instead of fixed-length chunks. No, adding a byte at the beginning won't trigger a full file upload - that's the beauty of content-defined chunking! I've implemented this in Java to demonstrate: github.com/neerajjain92/DropboxRabinChunker When you add a byte at the start, only the first chunk changes because: The 48-byte sliding window quickly moves past the modified area Once the window contains only unmodified content, it generates the same fingerprints Same fingerprints create identical chunk boundaries So Dropbox would only need to upload the first few modified chunk, while all other chunks remain unchanged and can be reused from the server. This makes sync super efficient for small changes in large files. Check out the implementation - it shows how the chunks resynchronize after the modified region using Rabin fingerprinting.

@OneSanddman Ай бұрын

I really love your video series. Just a slight problem to point out here. 50gbs uploaded with 100 mbs should take less than 10 minutes, not an hour 12 minutes.

@KingstonFortune 20 күн бұрын

I would agree with you but then we would both be wrong 😉 Evan’s calculation in the video is actually correct because, first you have to convert Gigabytes to Gigabits (50 GB = 50 x 8 = 400 Gb) then divide it by the upload speed (400 Gb / 100 Mbps = 4,000 seconds) and then convert the seconds to hours (4,000 / 60 = 66.67 minutes) and finally (66.67 / 60 = 1.11 hrs) 😇

@adityaagarwal5348 3 ай бұрын

At 50:08, the delta sync approach might work in case of downloading updated chunk from s3 using range-bytes query and then updating file on the local system but it won't work other way around specifically because of s3. S3 objects are immutable so there will never be a case where a chunk will be updated. So if this questions come up in the interview, should we just mention that we won't sync files > some GBs or we should further divide the storage into blob and file-system (s3 and EFS) based on file size and handle the complexity on server?

@groovymidnight 8 ай бұрын

I really like the 5-step structure, it's the best I've seen and it effectively helps me think through the designs in a methodical way.

@hello_interview 8 ай бұрын

Right on! So glad it’s useful

@3rd_iimpact 9 ай бұрын

I just finished reading the article on this lol. I’ll check out the video as well.

@aslgomes 6 ай бұрын

Hey Stefan, awesome video, congrats! I've got a quick question though. Around the 49:46 mark, you mention adding an "updatedAt" to a chunk at a specific id/fingerprint. If a chunk changes, its fingerprint/hash/checksum would change too, right? So that id wouldn't really match the changed chunk anymore, would it? Doesn't that mean the old chunk gets "invalidated" and a new chunk id appears? Sorry if I'm missing something obvious here.

@hello_interview 6 ай бұрын

No this is spot on, good call out. I was loose here. If the fingerprint is the ID, then an updatedAt does not make sense. If the fingerprint is not the ID, then it of course does. Trade off here of whether you want to keep old chunks around for versioning.

@AlbaraaAlHiyari 8 ай бұрын

I truly appreciate all the effort you've put into making these amazing videos. Please keep them coming. One insignificant (not important) nitpick. 50 GB @ 100Mbps = ~ 1hr 7min. I think you just forgot to convert the decimal to minutes. You have it correct in the write up, as in 1.11 hours (0.11 * 60 = 6.6 minutes).

@hello_interview 8 ай бұрын

Mental math is hard 😛

@AlbaraaAlHiyari 8 ай бұрын

@@hello_interview tell me about it... Also not fun under the pressure of an interview 🤣

@krishnabirla16 3 ай бұрын

You did not talk about version inconsistency? If two clients keep changing their local folders, they will be in a loop of pushing their own sync and pulling the other client's sync. There has to be a timestamp/version based conflict resolution. Maybe a follow up please?

@phavelar 8 ай бұрын

one can argue that "supporting 50gb upload file size" is a functional requirement (you placed it under non-functional requirement) - just a call out. great video!

@vaibhavsharma1653 7 ай бұрын

Amazing. Some Notes: DeepDive: Chunking CDNs Adaptive Polling with only updated chunks Compression.

@faruni8299 15 күн бұрын

Wow the best design video out there! Just wow.

@satyajeetkumar2588 3 ай бұрын

Awesome , so simple and elegant . It would have been great if you would have mentioned about checksum implementation to maintain data integrity as you have mentioned in the non functional requirements just to mention not the actual implementation.

@smalladi78 7 ай бұрын

Thanks for posting these! Great interview as always! I am learning a lot from these interviews. I found it interesting that you jumped ahead in order for the non-functional requirements since you knew the large file upload requirement would impact the design enough that doing the other ones first was not beneficial since they would become irrelevant. Obviously, this comes with actual experience of working on the job. May I suggest doing a follow up that uses the final design from this interview and consider how it may change if you piled on a more advanced feature like syncing only a partial set of folders or sharing folders with other people.

@pradeepbhat1363 27 күн бұрын

Great video man ! very useful for preparing for system design interview.

@jimitshah7636 7 ай бұрын

Great video for system design preparation. Methodology, the way he approached the question was good. 5 steps. Pretty good

@JyotiKundani05 2 ай бұрын

This video was really helpful. Amazing work of putting this together and your explanation was on point. Much appreciated!

@hello_interview 2 ай бұрын

Glad you liked it! 🙂

@suri4Musiq 9 ай бұрын

Loved this resouce, thank you so much! But I just wanted to point out that in my interview I was asked about sharing files with other users and I feel like this design concentrated more on just syncing files across multiple devices. In the former, I think we can talk a little more about CDN/other approaches which were hand waved here.

@hello_interview 9 ай бұрын

Checkout the write up I linked! I go into sharing there.

@venkatamunnangi1287 9 ай бұрын

Thanks for the effort and videos. Easily one of the best in business for mocks and educational material.

@deathbombs 8 ай бұрын

45:45 I wonder how syncing would change if instead of folder status, it's for database writes with many writers

@evangeloskostopoulos8173 9 ай бұрын

This is really awesome, thank you. Please keep them coming!

@vijaykhurana8766 9 ай бұрын

Great content. Thank you for posting. One of the best system design video I have come across for this design.

@dashofdope Ай бұрын

For the chunking -how many parallel calls would we do? Maybe it doesn't matter?

@god_of_blunder 5 ай бұрын

these are the best Design videos i ever found, Thanks and Kudos.

@hello_interview 5 ай бұрын

❤️

@jherreria 7 ай бұрын

I really appreciate your help in this topic. I'm learning a lot! Keep the videos coming!

@adityaagarwal5348 3 ай бұрын

At 27:24 For determining which files are already available on the local system, can we store a client to files mapping on the server based on client id and then getChanges API uses that data + file metadata to calculate which files needs to be transferred to the client? I know there can be issues when there is a sync gap b/w local and remote like file is deleted on the local but anyway system is eventual consistent. Keeping lots of data on the client will grow the app size.

@TechieTech-gx2kd Ай бұрын

What dropbox implement is something amazing, it maintains a server_file_journal which is an append only log for any namespace_id, this keep on storing amy changes being made to a particular file, imagine a text file you do CRUD on the file, all these operations are stored into that server_file_journal.. Client simply asks saying that for this nsId give me what's the latest after a specific checkpoint which is a pointer named journalId(which each client maintains for their namespace), when it asks what all happend after this journal id sever returns the chunk details(probably a different hash) and client simply downloads them. "Keeping lots of data on the client will grow the app size." it's not the appSize it's the userData it's what you want to keep in your machine and get quick access to and also at the same time get access to it on the remote machines too.what you are referring to is something different which ICloud offers which is optimizing storage by keeping a bare minimum photos/video thumbnail on iPhone and when users request that file it fetches high definition

@ahmedkhan25 7 ай бұрын

Excellent sys design interviews - I like the informative tone and clear approach - thanks

@jeremyklein953 8 ай бұрын

Really good approach. I love how you build up to the full solution. It makes a lot of sense to me and helps me reason these complex systems as well

@mindrust203 9 ай бұрын

Hey Evan, this content is fantastic, thank you! I have a question regarding your solution to chunking around the 39 minute mark When we ask S3 to fetch us a pre-signed URL, do we do that for all our chunks as well? Does this happen on initial request to upload the file (metadata)? The way the File Metadata entity schema is described, it looks like we have a top-level S3Link, but also chunk-level S3 links embedded in the file metadata, so the upload flow is a little unclear to me

@hello_interview 9 ай бұрын

Good question, you're right to be a little confused here. So as I alluded to S3 offers and API called multi-part upload. For this, it requires just 1 presigned url, but, multi-part upload re-stitches the chunks back into a single file in s3, so this does not allow us to send over chunk deltas for syncing. As a result, we have to upload as chunks manually without relying on multi-part upload. So, long answer, but yes, you'd actually need to request a presigned url for each chunk, I should have made that clearer but tbh was not sure in the moment if multi-part upload could be configured to not re-stitch the file, so I omitted :)

@KITTU1623 9 ай бұрын

Thank you very much for the videos. One small nit pick. DynamoDB supports a maximum of 400KB per item and if we are storing all the chunk metadata in the item, for a 50GB file with 5 MB chunk size, assuming we need 100Bytes per chunk metadata, our item size would be around 1MB.

@hello_interview 9 ай бұрын

Good catch! True

@stashitt Күн бұрын

Thanks for this amazing video, I bought guided practices which have been incredible. I have one question, For a 50 gigabyte file we are storing an array of 10000 chunks in chunks, is it feasible ?

@Marcus-yc3ib 3 ай бұрын

Please keep upload these kind of videos. Thank you very much.

@Ynno2 9 ай бұрын

Do you suggest a different delivery framework for system design interviews which aren't necessarily "product"?

@hello_interview 9 ай бұрын

Topical! Was chatting about updating the site with that soon. I’d recommend very similar, but core entities and api are what may change as they could be less relevant. Instead I’d frame it as focusing on the inputs and outputs of the system more generally. And then still thinking about the data persisted

@hello_interview 9 ай бұрын

I’ll do a pure infra question next

@rushio8673 Ай бұрын

I think uploading speed up using chunks was clearly explained, but how do we speed up a download using the chunk wasn't clear, only brought up the point of whether to use the CDN or not, but if not using CDN then what ?

@hameeeed5992 16 күн бұрын

You first request the meta data from the backend and then download each chunk in the array from s3 sequentially.

@59sharmanalin 3 ай бұрын

We didnt outline file sharing feature, is it because of time constraints?

@hello_interview 3 ай бұрын

Went with syncing in the video instead since people asked for that in the comments

@VyasaVaniGranth 6 ай бұрын

First - please continue making and sharing these videos, this is incredible. Very few high quality sources available out there and this is probably the best one in my eyes. Second - how realistic is it that the download and upload happen directly b/w client and S3? Are there security concerns with this approach that should be considered? For reference, there's a Dropbox engineer's talk where uploads go through an intermediate service - this does mean additional copies of the data meaning more memory / compute but seems more realistic. In general, for any design that has media upload (eg. newsfeed), would you recommend direct upload to S3?

@hello_interview 6 ай бұрын

yah its a good point, most major systems don't do this for a number of reasons. While is largely academically correct and optimal, at youtube/dropbox/etc scale, they prefer more control so they're rolling their own systems here.

@krishnabirla16 3 ай бұрын

Do you not do web socket based design videos intentionally? Can you do some chat apps and video call apps?

@bqrkhn 4 ай бұрын

Very nice video. A question: You added a updatedAt at each chunk. But chunks are identified with their ID which is calculated from a finger print. When the file changes, the finger print changes, how do we update the updatedAt? Possible Answer: From client we send both old and new chunk IDs and then update both id and updatedAt. Is this the correct strategy?

@fragrancias972 4 ай бұрын

Same question here.

@bqrkhn 4 ай бұрын

@@fragrancias972 what do you think about my possible answer ?

@insofcury 2 ай бұрын

@@bqrkhn +1 I think this definitely solves the problem.

@MrSnackysmorez 5 ай бұрын

I love the videos and these are some of the best explanations. I love the flow and how everything builds on each other. It makes it much more manageable to do these problems. However you are driving and dictating this and this is so much harder to do when the interviewer wants to constantly interrupt and ask questions while you are doing these steps without first letting you explain what you are doing. I have this happen pretty often. How can you tell them to just chill and let you proceed? Appreciate these videos!

@haixiongwang4608 2 ай бұрын

Will the version management of files be out of scope for 35-45 mins interview discussion. Just want to get high level understanding the scope of current SD. Thanks

@amitb2921 7 ай бұрын

Thanks for a great content, especially the Deep Dive part, which generally people do not discuss about. I have one question around storing the chunks as list in the DB. For 50 GB file and 5 MB chunks there will be 10K chunks created. So the chunks list will have 10K entries. Now updating one chunk list column for every chuck status change could be quite challenging. Would it be better if we have a separate table for chunks instead. Also while you do the matching of chunks with the fingerprint, You need to check 10K entries from Local DB(with separate table and indexed) vs 10K entries in the chunk list (in single table column), where former is more efficient. Kindly let me know what are your thoughts on above points ?

@hello_interview 6 ай бұрын

Sounds reasonable to me! Good call out

@amitb2921 6 ай бұрын

@@hello_interview Thanks a ton for the response. I have modified my comment above to be bit more clear.

@Vancez-z2h Ай бұрын

As for syncing updated files on other clients, how does the client know the files are updated?

@ndubuezeprecious391 3 ай бұрын

Great stuff. This is the best I’ve seen so far. Can I know the app you are using for the white boarding, it looks really sleek

@hello_interview 3 ай бұрын

Excalidraw

@kkfun1 Ай бұрын

Does the File service create a signed URL for every file chunk to be uploaded?

@pujamishra1475 8 ай бұрын

I have a product architecture interview coming up. I was really looking for some good product architecture/design examples and then came across this. This is very helpful because you talk about the client, user experience, malicious users and relate it to the design decisions made. Thank you! One question, for a product architecture interview - should we go into more details about the APIs like explicitly write out requests, response, failure/success codes or the amount of discussion you did on APis is enough for senior level? Can you also tell me what topics/ points would you add over the discussion in this video if this was asked in a product architecture design round. Thanks again!

@surojitsantra7627 8 ай бұрын

One of the best and detailed explanation. Thank you so much for this content. Please upload more such videos.

@hello_interview 8 ай бұрын

New one later today!

@IshaZaka 9 ай бұрын

Hi Evan, Thankyou so much for providing this type of content. plz make a system design video on payment system

@BlunderMunchkin 6 ай бұрын

Huh. I would have prioritized consistency over availability. So much so, in fact, that I didn't even think it was a question. Some of the biggest headaches I've experienced as a developer have been caused by having an out-of-date file. I would much rather be temporarily unable to retrieve a file than to be fooled into thinking that the file I retrieved is the correct version.

@GabrielAnyaele Ай бұрын

I really love your videos. I have a question though, are there chunk ids constant (most likely so)?. You made mention that the chunk ids are a hash of the bytes of the chunks, what happens when the chunks are updated - Do we still maintain the initial ids? You put out amazing contents, I appreciate once again

@mahdidi96 Ай бұрын

Very important question, but I just noticed: there is no persistence of folder structure, i.e. how the files are organized locally. Say you setup dropbox on a new device. I can see this system syncing the files themselves, but you would lose your folder structure. The new device would contain all the files in the same directory essentially. How would you address this? (unless you did and I missed it)

@caesar5555 2 ай бұрын

Thank you! This is awesome! In Meta is your interviewer going to be at the hiring committee or will just send your and their notes?

@hello_interview 2 ай бұрын

Depends on level and situation. most likely they won't be there

@caesar5555 2 ай бұрын

Thank you for the answer. @@hello_interview Staff+ . So the interviewer is just a medium for intaking information....

@hello_interview 2 ай бұрын

@@caesar5555 In some sense. They provide a judgement and the loop (the collection of interviewers, together with the recruiter) make a call on whether to put the package forward to the hiring committee. That group (usually a couple directors) does not have enough time to review every detail of the notes, so they use some heuristics to see where the major risks are and decide whether to move forward.

@kamalsmusic 6 ай бұрын

For the client to know how to stitch together chunks, doesn't it need to know the starting offset & length for each one?

@viveksharma-tt5nj 2 ай бұрын

Simply amazing !! Thanks a lot for such clear and concise explanation !

@nobodyknows228 6 ай бұрын

1. How can we handle write conflicts when we have a folder which is supposed to be consistent across multiple devices? 2. Also when two devices are disconnected from the internet and if users updates some files how does the sync happens when they come back online and when both tries to write the changes at the same time at a same file path? I am not sure if these solutions work but I think 1. We can use a Redis lock for writes with TTL same as the timeout or a little more of the pre-signed url. If connection fails in between we can just resume the upload when connected back. But this might be a problem when a user is trying to upload big files with large timeout durations since other users might have to wait till the user uploading currently is done. 2. When the user comes back online we should probably first fetch all the changes that are executed on the device and raise conflicts with the user asking what action to perform(similar to git) and acquire lock to write if required.

@jmms49 8 ай бұрын

great videos, thanks for uploading these. Easily the best content about system design interviews I've found. I would probably suggest to use merkle trees for the sync functionality, seems like a natual way to diff and sync large file systems

@charan775 3 ай бұрын

how do you handle nested folders in your schema? also chunks could kept as separate table at user id level, so that we can reuse chunks of different files..

@jayshah234 8 ай бұрын

Hi Evan, Thanks for the detailed explanation! Very helpeful! At 40:50, you mentioned that S3 exposes multi-part upload API. Does that mean on client end we don't have to handle chunking and fingerprinting given that we use S3 multi-part? Thanks!

@hello_interview 8 ай бұрын

You’ll still do the chunking but it will handle fingerprint checks

@SunilKumar-jl6dl 5 ай бұрын

Hey there, I have some questions. Would be great to get your thoughts: 1. S3 supports multipart upload and all the chunks would get reconstructed into a single file at S3. Isn't this correct? If yes, then having file chunks in the database would be redundant right? Or would S3 have the chunks always and give access to the download at the client end? 2. At the client end should we know how the updated/deleted chunks of a previously uploaded file be stitched back together? 3. Would folder sharing with other users be a possible follow up question? Like what Google drive offers.