Design Google Drive or Dropbox (Cloud File Sharing Service)

Design Google Drive or Dropbox (Cloud File Sharing Service) | System Design Interview Prep

Рет қаралды 85,462

Interview Pen

Күн бұрын

Пікірлер: 85

@interviewpen Жыл бұрын

Thanks for watching! Visit interviewpen.com/? for more great Data Structures & Algorithms + System Design content 🧎

@shemleong7571 Жыл бұрын

Great overview. There's a few tweaks I would make: 1) Have ingest service return a presigned url so the client can directly upload to s3. This offloads the bandwidth problem to the client. 2) Tap on event triggers to handle the post-upload activities. 3) Instead of that first queue, rate limiting or API throttling might be a more appropriate way to manage the load.

@interviewpen Жыл бұрын

Agreed, using presigned S3 URLs is a great solution to manage load on the ingest API. That would also potentially eliminate the need for the queue in front of that API. Thanks for watching, stay tuned for more!

@meprateek24 10 ай бұрын

If the file gets uploaded directly to S3 then we might face the same issue that was explained in the beginning of the video about if the connection breaks then the whole upload has to start again. Will S3 upload also happen in chunks?

@drhdev 8 ай бұрын

@@meprateek24 Not our problem

@justinchan4810 3 ай бұрын

@@meprateek24 It's safe to assume the file upload can also be done in chunks, especially since the design stores the file in chunks in S3 (shown at 7:58)

@firezdog Жыл бұрын

security was not mentioned at all, nor anything about concurrent writes -- but much better than anything i could have done, that being said.

@interviewpen Жыл бұрын

There’s a limited amount of information we can convey in one video, but yes-security and concurrency are both super important things to consider here! Thanks for watching.

@yxawp Жыл бұрын

IOPS: (1M)(100) / 86,400 is ~ 1150/sec. Not clear why it was calculated to 115,000/sec in Handling Subscriptions section.

@interviewpen Жыл бұрын

Good catch, thanks!

@mecury007 5 ай бұрын

Yeah I lost 20 minutes trying to understand why 1M * 100 = 10B and not 100M

@Oz1111 11 ай бұрын

These system design vids are great. Given your expertise and how well you cover these topics, can you do a basics of system design explaining different services and common parts of system design? I know there are other channels that do this but I'd love to have you do one as your content is super clear and easy to follow. Thanks.

@interviewpen 11 ай бұрын

Thanks! If you're looking for a full course, check out interviewpen.com!

@hackaholic01 Жыл бұрын

For the Storage usage validate, you can remove the all overhead by below client, will have the file stats, client can request user metadata and check is there any storage available before uploading the file.

@interviewpen Жыл бұрын

Thanks for watching. I might be misunderstanding, but it sounds like you're suggesting having the client itself validate whether it has bought enough storage. You're right that doing this could reduce some overhead, but it would defeat the entire purpose of that step since the client could simply lie to the service about how much storage it has when uploading a file. It's important to make sure logic like this happens on the server side since clients are inherently untrusted.

@harshraj22_ Жыл бұрын

Assuming by Queue you meant the message queue, I would like to know your thoughts about using kafka instead of queue for notification service, with their pros and cons. Btw, great video :)

@interviewpen Жыл бұрын

We never specified specified what platform we would use for queues, but Kafka is a great choice for a system like this. The distributed nature of Kafka queues means they can be horizontally scaled to handle an extremely high load, and that would enable the system to handle the high traffic requirements. Glad you enjoyed the video!

@hfspace Жыл бұрын

one thing that has not been touched and comes to my mind immediately, is that the way chunking is handled here has room for improvement. because what if someone changes a file in the middle and adds loads of data to it (which would result in multiple new chunks in multiple different locations in the file). then you could reload the complete file or you implement some more complex indexing for the chunks, i guess and do a reindex operation.

@interviewpen Жыл бұрын

Yes this is correct - we just skimmed over it and said "do chunking", but the chunking itself is a mini-research paper in itself. We find this is the case with a lot of concepts we cover! So we just try out best to hit the major details! Thanks for watching - more coming!!!

@buntysingh7315 Жыл бұрын

thanks for taking the effort!

@interviewpen Жыл бұрын

sure!

@gordonli4946 11 ай бұрын

18:55, client can directly write into a queue? Not upload chunks to a service first then service will process with S3/db ? Wondering what queue is that in in front of backend service; and 23:00 why userid + fileid won’t have the scatter and gather issue as fileid alone?each user has lots of fileid/chunkid and we need at least a table for user/fid/cid anyway

@interviewpen 11 ай бұрын

For the first point, you're absolutely right. We'd need some sort of interface between the client and our queue to enable this. For the second, we need to shard on both user ID and file ID separately (user ID could be a global index), enabling us to query on the field we're looking for. Thanks!

@Wei-up2jn 5 ай бұрын

Great Video! Thanks for sharing! One question about the sharding: if we are sharding by UserId + FileId, doesn't it mean we still have to do scatter-gather if we want to get the full file list of a user?

@interviewpen 5 ай бұрын

Yes, that is correct. There's never a perfect way to shard a DB, so that's the tradeoff with this approach--we'd have to fetch files from every node to get all of a user's files.

@vinayak6564 3 ай бұрын

I feel it doesn't make sense to put chunks in queue, direct client having access to a messaging-queue-system is not practically good idea from security perspective. Also it doesn't reduce load anyhow as messaging queues also need to be scaled if not injestion servers, so it is just adding extra layer just for the sake of adding. Correct me if I am wrong.

@vinayak6564 3 ай бұрын

Only messaging queue for notification service makes sense.

@interviewpen 3 ай бұрын

The idea behind this was that if there are bursts of load, it wouldn't slow down users uploading their data. But I fully agree with you that it doesn't make sense for a client to have direct access, so it's not a very useful solution in this case. A better solution might be to use a tiered storage system behind our BLOB store which can provide very fast reads and writes for frequently accessed data while moving older data to cheaper storage mediums. Thanks for watching!

@vinayak6564 3 ай бұрын

@@interviewpen Thanks for the prompt response and answer! Great content btw finished watching blob storage system design after this.

@ravikant-hi8mz Жыл бұрын

What softwares do you use? Including the grey board thing to draw. Please suggest what you are using🙂

@interviewpen Жыл бұрын

GoodNotes. thanks for watching - more coming

@ebu7 Жыл бұрын

Please make a video about NAS(Network Attached Storage) system design.

@interviewpen Жыл бұрын

We'll add it to the list. Thanks for watching, more content is on the way!

@semenivanoff8615 Жыл бұрын

NAS is a storage accesible by IP (CIFS or NFS) what is so special about it? Or you mean any specific model of a storage system like NetApp?

@AdarshMadrecha Жыл бұрын

Good insights

@interviewpen Жыл бұрын

thanks for watching!

@amirafshari1613 Жыл бұрын

@interviewpen what do you think of mentioning managed solutions instead. so for example instead of a manually sharded DB, a cosmos DB managed Postgres that autoshards or a Citus distributed SQL cluster that auto shards?

@interviewpen Жыл бұрын

Totally! There's usually managed solutions for most of the services that we discuss in these designs, but we try to keep the videos general so you can understand the concepts regardless of how they're deployed. Thanks for watching!

@ashiquehoque762 Жыл бұрын

Could you please share "how QR CODE WORKS?"

@interviewpen Жыл бұрын

Thanks for watching! We'll add that to the list of things to cover. But from a basic perspective, a QR code reader looks for predefined patterns in the image; then it reads the black/white squares in a specific order. Each square is read as a bit, 1 or 0, and all together they form a binary representation of a URL or other message.

@pradgarimella 3 ай бұрын

Too much emphasis on calculations. In a real system design interview , candidates will spend 2 mins max on calculations. Anything more you are screwed

@interviewpen 3 ай бұрын

I don’t agree-one of the most important parts of the system design interview is showing that you can translate product requirements into a solution that fits the use case. This means understanding the load that will be placed on each part of the system. Thanks for watching!

@marcusaurelius6607 Жыл бұрын

good attempt. but you would not pass our interview with _that_ level of understanding systems. cheers from DB =)

@interviewpen Жыл бұрын

Cool - any specific suggestions on where we could go deeper with our content? Let us know!

@avi7278 Жыл бұрын

You just take this troll at his word that he is from Dropbox?

@biswajitsingh8790 Жыл бұрын

@@avi7278😂😂😂😂

@robl39 Жыл бұрын

Please explain what you’d expect

@entx8491 Жыл бұрын

@@avi7278nothing suggests he did, it's still a valid question which would in turn make his statement valid.

@sivam5204 4 ай бұрын

Chunk concept could be explained more.:)

@gxo-mt5vo Жыл бұрын

Useful video but focused too much on back of envelope calculations, and we have 100 mil writes per day, not 10 bil

@interviewpen Жыл бұрын

There's 100 million users, each performing 100 edits per day => 10B edits per day. The back of the envelope math might seem grueling, but it's really important to make sure we choose the right solutions to scale the system. Thanks for watching, and for the feedback!

@vinaychavadi7411 9 ай бұрын

@@interviewpen DAU is 1 million users, 100 edits per day per user => 100 Million edits perday.

@Pebblejo 10 ай бұрын

if you use "user+fileID" as the shard key, doesn't that mean you still need to query multiple nodes to retrieve all the info of all the files belong to the same users? how's that better than using only the fileID?

@interviewpen 10 ай бұрын

Yep, since file IDs are already unique, adding the user ID to the shard key has very little effect. Thanks for watching!

@aadill77 3 ай бұрын

very bad explanation. and the architecture is also not crisp. too naive

@teetanrobotics5363 Жыл бұрын

I hope this message finds you well. I wanted to take a moment to express my sincere gratitude for the exceptional content you've been sharing on your KZbin channel. Your recent series of five top-notch and in-depth system design videos have been an absolute treasure trove of knowledge. The clarity and depth with which you explain complex concepts are truly commendable. Your videos have been instrumental in helping me grasp the intricacies of system design and architecture. The practical examples you provide, along with your lucid explanations, have made learning a pleasure. I want to encourage you to continue creating such invaluable content. Your unique ability to break down complex topics into understandable components is a true gift. If possible, I would love to see more of these insightful system design videos from you in the future. Additionally, it would be fantastic if you could consider curating these videos into a playlist. Having them organized in one place would be tremendously helpful for both newcomers and those looking to revisit certain concepts. Once again, thank you for your dedication and hard work in sharing your expertise. Your contribution to the learning community is truly appreciated. I eagerly await more of your enlightening videos.

@interviewpen Жыл бұрын

Yes, we do have a “System Design” playlist on this KZbin, as well as more videos on interviewpen.com Thanks for the kind words & thanks for watching 👍

@dd-qz2rh 8 ай бұрын

bro went straight ahead and utlizied that sweet chatgpt power

@zuowang5185 4 ай бұрын

Is this prep for a new grad level?

@interviewpen 4 ай бұрын

System design questions are more likely to be asked for more senior level interviews, but companies are more and more starting to ask these types of questions in more junior roles as well! Either way, it's a good idea to have some understanding of these concepts for any role.

@fatcat22able Жыл бұрын

I feel kind of dumb - what is meant by "edit" in this context? Great video!

@interviewpen Жыл бұрын

I'm not sure which part of the video you're referring to specifically, but an edit is just a single change to a file that triggers a chunk of data to be updated in the system. Thanks!

@fatcat22able Жыл бұрын

@@interviewpen Thank you for the response! I guess I'm having trouble understanding how a file would be changed in the context of this application? My immediate thought was that a change to a file would entail a full reupload. But I could understand it if the service were such that, if I've uploaded an image to the service, and then I make a change to that image locally, then those changes would be uploaded as chunks in order to update the image in the system as opposed to reuploading & replacing the full image, correct? And this change is what we call an edit? Please let me know if I'm understanding this correctly. Thank you!

@kumar_gautam24 Жыл бұрын

Thanks, great content

@interviewpen Жыл бұрын

Glad you liked it, more content is on the way!

@sagarmantri4743 9 ай бұрын

At 28:39, the calculation of IOPS is seems wrong. (1M)(100)/86900 => 115000/sec? It should be roughly 1e6 * 100 / 1e5 = 1000/sec, am I missing something?

@interviewpen 8 ай бұрын

Yes, you're right. Should've been 1150, not 115000. Good catch :)

@Tony-dp1rl 11 ай бұрын

With the latency and buffering inherent in the queue usage and file IO and user notifications, I doubt there is a need to shard the database at all, and if there was due to load, then SQL isn't a good choice, but storage-backed Redis would be much better. SQL is a terrible choice for generic metadata.

@interviewpen 11 ай бұрын

Well, we'd likely have pretty high error rates if we tried to send that many writes to a single shard. On the second point, you're right that SQL isn't ideal in many use cases; it's hard to shard due to its relational model. There's tons of options for NoSQL sharded databases that could be used in this system. Thanks!

@nvskiran Жыл бұрын

S3 already provides option to upload in chunks. Why are you not using that?

@interviewpen Жыл бұрын

Yes, manually chunking our files gives us some more control (especially around updating pieces of the file), but multipart uploads could certainly work in this same design. Thanks for watching!

@khanhtoanle8396 Жыл бұрын

Nice video!

@interviewpen Жыл бұрын

sure!

@PritamDas-g7d1y Жыл бұрын

Thanks for the video love it

@interviewpen Жыл бұрын

Thanks for watching!

@nealpan Жыл бұрын

Great

@interviewpen Жыл бұрын

Thanks!

@islamicmedia.c-s3g 8 ай бұрын

@semenivanoff8615 Жыл бұрын

How do you update zip archive by chunks? Or encrypted file? DB is sharded, ok. Why isn't S3 sharded and Geo replicated? Also you rely on S3 provided by AWS and will be paying for virtual capacity of 10 PB, when it could be more practical and cheaper to have own servers and storage collocated in several DCs which do compression and deduplication which can provide alot more of virtual capacity and have less running costs in 2-3 years. But that is arguable You mentioned queue to manage chunks, but those are 1MB chunks. Which queue can use message of that size? Or it should be own developed queue?

@interviewpen Жыл бұрын

Yep, chunking absolutely breaks down with certain file types...but for others it can be very helpful. That said, we can still upload zips, etc. in chunks, even if we do have to upload every chunk. Under the hood, S3 is absolutely distributed and georeplicated. With 10PB of storage, S3 would cost $210k/month...so an on-prem object store would likely be a better option at this scale. Good point! Kafka can technically manage 1MB messages...but it's a bit of an anti-pattern, so there might be better ways to manage congestion in this system (perhaps something custom developed). Thanks for watching!