5: Netflix + YouTube | Systems Design Interview Questions With Ex-Google SWE

Рет қаралды 34,507

Күн бұрын

Пікірлер: 216

@idiot7leon 8 ай бұрын

Brief Outline 00:01:04 Problem Requirements 00:01:46 Capacity Estimates 00:02:52 Video Streaming Intro 00:04:00 Video Chunking 00:05:40 Chunking Advatages 00:07:09 Database Tables - Subscribers 00:09:39 Database Tables - User Videos, Users, Video Comments 00:11:33 Database Tables - Video Chunks 00:12:45 Database Choices 00:14:45 Video Uploads 00:15:57 Video Uploading - Broker 00:16:46 Video Uploading - Broker 00:18:51 Video Uploading - Chunks 00:20:27 Video Uploading - Chunk Storage 00:22:32 Video Uploading - Aggregation 00:26:41 Video Uploading - Streaming Datamodels 00:28:37 Video Uploading - Flink 00:31:15 Video Uploading - Flink Continued 00:33:53 Video Uploading - Search 00:34:59 Search Index - Partitioning 00:37:17 Search Index - Partitioning Continued 00:38:57 Search Index Uploads 00:40:21 Final Diagram - Netflix/KZbin Thanks, Jordan~

@laserbam Жыл бұрын

Thanks for doing this series! A few days ago, I signed my L5 offer at Google, so your system design videos (and slide decks) came in clutch

@jordanhasnolife5163 Жыл бұрын

Hell yes dude, extremely proud of you, keep killing it!!

@sauravsingh5663 9 ай бұрын

This is exactly what I was looking for. Love how you uncover the right level of detail where it is necessary. Great work !!

@dosya6601 6 ай бұрын

@allenxxx184 9 ай бұрын

Your channel deserves at least 1M subscribers. Most high-quality system design video!!!

@jordanhasnolife5163 9 ай бұрын

Best designs, best ass

@gawadeninad 5 ай бұрын

This is an underrated channel. Should have more views and subscribers. I liked how you are deep-diving into main scenarios rather than just covering everything on high level.

@MithunSasidharan1989 Жыл бұрын

Thank you for continuing to do this. Its goldmine for engineers preparing for interviews : )

@KratosProton 9 ай бұрын

42:36 Jordan man its been a long way... from your super wobbly handwriting in the 1st concepts video to this super beautiful amazing handwriting. And as always quality content!!!

@jordanhasnolife5163 9 ай бұрын

Lmao making big moves out here

@arvindgb7995 19 күн бұрын

25:00 i have an issue with the rabbitmq delete command. Afaik rabbitmq doesn't have an API to delete a specific message. You consume it and ACK to remove the message or you negate it so that it requeues or throws it into a dead-letter exchange queue.

@jordanhasnolife5163 15 күн бұрын

Yeah sorry when I say that I'm just referring to acking a message = removing it from the queue

@knightbird00 Ай бұрын

Bookmarks Introduction 2:55 Video streaming - formats, chunks and resolutions and ABR (Adaptive bitrate streaming) 7:40 Data schema - all standard tables (partition and index) 11:35 Chunks table Write path 15:00 Video uploads - Multi part with checksums. 16:00 Video processing - Can use kafka for fanout to multiple consumer groups [encoder types] 20:30 Video chunks storage - [HDFS, S3] 22:30 Video chunks processing - aggregation, idempotency Read Path 30:00 CDN logic based on subscribers. Diagram: 40:26

@vkchgc 6 ай бұрын

You really are doing the best system design videos I’ve ever seen ! Keep up the great work

@rahulnath9655 Жыл бұрын

This one is so dense and detailed, thanks man. I feel like I really understand these systems now.

@jordanhasnolife5163 Жыл бұрын

Woooo! Thanks Rahul!

@santos-samuel Ай бұрын

Hey Jordan, thanks for the vídeo. I have a question about your database tables and partitioning (11:00). You say that, for the UserVideos table you would shard on the user id so that all the videos from a user stay on the same node. That makes a lot of sense. But if we wanted to retrieve a video given its id, how would we do it? I see two options: 1-We either create a global index on video id to be able to find a video by id instantly, and we keep the original table sharded on user id to find all the videos from a user. We could shard this index based on videoId if needed 2-We compute the id of videos using a combination of userid+videoid and then when querying, we extract the user id from the videoid to find the right shard and then retrieve the video The second option doesnt seem to be the one actually used in practise, so i guess we would need a global index. Does this make sense?

@jordanhasnolife5163 Ай бұрын

The video ID can just be the combination of the userId + the timestamp at which it was posted. If you're looking for examples of this in practice see cassandra and how it represents its primary keys.

@Luzkan 10 ай бұрын

Congratz on 21k Jordan! Its 5th video for me so far and I'm amazed every single time with the details u are manage to dwell into. For how long on average do you think about the whole system before starting the video itself (lets say without refining it to a someting presentable, just thought mapping it out)? 14:39 / 41:40 - (In my design channel_id is the same thing as user_id) I'm wondering why do you suggest to shard on channel_id + video_id, rather than just video_id? I don't see how having close comments from other videos from a given user (channel) is helpful. 🤔 24:49 - What happens if RabbitMQ dies after successful upload to S3 and just after messages have been put to the que with metadata (i know there is option for durable ques and persistent messages, but is that the way to go)? Btw, do you know how Discord handled the casual dependencies (relationships between messages like msg to msg replies) with Cassandra?

@jordanhasnolife5163 10 ай бұрын

Hey! I'm basically remaking all of these videos right now, so I don't have to think about them for too long. I mainly just re-watch my old video on it and then try to decide if what I did last time was stupid haha. 14:39 - yup, typo on my part nice catch. 24:49 - Ideally we would have multiple replicas of rabbit mq so that if the leader dies the follower can take over and we can proceed as normal. I do not know the answer regarding discord! Maybe version vectors, maybe they always write to the same leader for a given parent comment Id, maybe quorums! I'd have to look into it.

@muven2776 6 ай бұрын

This is a great video f***ing Indeed! Got instant High & confidence going through this videos. To understand jordan videos my suggestion is to go through Jordans system design 2.0 playlist 0. DB fundamentals (0 to 15 videos) 1. Replication (16 to 24 videos) 2. Stream Processing videos and Flink(42 to 45) After understanding above video - system design videos is like a cake walk. Note down the terms which you come across like zoo keeper, elastic search and go back to 2.0 playlist and comeback to this series. Note down the technical terms which he mentions like. "Split brains" "Read Repairs" "Anti entrophy" Keep using these terms in the interview for showing that you know what is distributed system :D

@jordanhasnolife5163 6 ай бұрын

Nice! I like it!

@wensongliu5058 7 ай бұрын

Much appreciation to you, Jordan. This video covers so many detailed components and processes going back and forth, I already watched this video for many times and it's really helpful!

@xRuneGunx 11 ай бұрын

In 41:31 you mentioned using Cassandra increases write throughput. However, doesn't Cassandra use a Leaderless replication model such that write availability is increased? I was under the impression that multiple leader replication increases write throughput due to its nature of processing events in parallel. Can you clear up my confusion? Thanks for the video

@jordanhasnolife5163 11 ай бұрын

Yes sorry and good catch here. Cassandra can be run in multiple different configurations: one with quorum consistency, and another where writes just need to hit one node. I'm mainly referring to the latter, which is effectively multi leader replication.

@lelandrb 3 ай бұрын

I'm more on the frontend side but dang are these high quality. I wish there were similarly great sys design resources on my stack, but I still took a lot from this. Thanks so much

@rishabhgoyal8110 13 күн бұрын

Awesome Content! I assume the "likes" functionality would also be implemented how the "clicks" analytics functionality was in Tiny URL?

@jordanhasnolife5163 13 күн бұрын

I think either that or you could do it synchronously, there's probably not much contention! The one difference with likes here is that you want to keep track of who has liked a video, so that they can unlike it. Maybe do CDC on a "likes" table

@rishabhgoyal8110 12 күн бұрын

@@jordanhasnolife5163 thanks!

@footballizlyfe 9 ай бұрын

Video Timestamp: 10:18 Part-1: For the user Videos table, We can omit timestamp as UserId+VideoId make a unique pair and when you get the videos from the table, you get timeStamp and then you sort them and display the videos for a user who uploads videos. Correct me If I am wrong. Part-2: Also, in the Video Comments table, VideoId will be unique so why are we using timestamp along with this. Does this help in getting output in sorted manner ? Thanks :) Edit: Added Video Timestamp

@jordanhasnolife5163 9 ай бұрын

1: Definitely doable, however it is easier to keep things pre-sorted by timestamp in the metadata database so that you don't have to sort them on the fly for each read. 2: You answered your own question :). Having a timestamp for comments allows us to easily fetch comments in a pre sorted order, as we can index those comments on timestamp per video.

@footballizlyfe 9 ай бұрын

@@jordanhasnolife5163 Thank you :) I love this series and System Design 2.0. This got me thinking of starting my own series on System Design topics. Maybe one day for sure :)

@footballizlyfe 9 ай бұрын

@@jordanhasnolife5163 Thanks got it. This series and System Design 2.0 are gold. I might even start making videos on similar topics sometime sooner :)

@jordanhasnolife5163 9 ай бұрын

@@footballizlyfe Just don't take too many of my viewers away from me it's all I've got ;)

@footballizlyfe 9 ай бұрын

@@jordanhasnolife5163 haha, I'll try not to take the viewers ;)

@maizhouyuan4940 2 ай бұрын

Hi Jordan, THANK YOU for all of your videos, your channel is literally the channel of the month for me lol. In this searching part you mentioned you're making a more in-depth video for Elastic Search, can you please specify in which problem did you dive deeper on that? Would like to learn more about it, thank you so much!!

@jordanhasnolife5163 Ай бұрын

I want to see it was the twitter one (ep 2). If not I certainly have a 1.0 video about it

@dmitrigekhtman1082 11 ай бұрын

The upload and processing pipeline could include lots of different jobs with complicated interdependencies, with the S3 upload stage as one of the first steps. Possibly, a general-purpose workflow orchestration framework (something like Temporal, maybe?) could help coordinate all of it.

@jordanhasnolife5163 11 ай бұрын

Agreed, and I imagine that IRL they do probably have something like this!

@DmitriGekhtman 11 ай бұрын

You should do a video on workflow orchestration :D

@lagneslagnes 4 ай бұрын

What dependencies? I personally cannot think of any. The chunks can be processed out of order in any order. And the metadata about total chunks can also be updated out of order as Jordan explained. With additional requirements (beyond scope of interview?) we might have dependencies, but I don't think we have such with the requirements Jordan worked off of.

@dmitrigekhtman1082 4 ай бұрын

The processing of each chunk would typically be a multistep process, with different steps perhaps executing on different compute nodes. Imagine, for example, we wanted to run some ML-based object recognition on GPU nodes. Meanwhile, there’s a parallel job generating subtitles. Then the results of the object recognition and subtitles are combined for all of the chunks to feed into a recommendation system… you get the point - video processing involves a lot of systems and you need to coordinate the actions of those systems.

@lagneslagnes 4 ай бұрын

@@dmitrigekhtman1082 Yeah, true. It's just not part of his requirements.

@charan775 2 ай бұрын

13:30 related to comments database choice, a bit not clear. since B tree makes reads faster compared to LSM tree, why not go ahead with relational database? write ingestion on popular vids is one argument, but still on popular vids, read traffic will be even higher right?

@jordanhasnolife5163 2 ай бұрын

Yeah good point in retrospect that's probably the better choice here

@HarperChen-b5n Күн бұрын

Hi Jordan, I have a question regarding the final design graph: In the middle of the graph, Kafka is sharded by channel ID, whereas Flink is sharded by user ID. I thought Kafka and Flink needed to shard on the same key for consistency. Is it okay if they use different sharding keys?

@jordanhasnolife5163 Күн бұрын

channel id and user id is the same in the case of youtube

@iurnah 12 күн бұрын

15:58 Messaging Broker for encoding or resolution conversion (Kafka vs. IM message queue, Amazon SQS or Rabbit MQ), prefer IM message queue for the reason that we don't care state, such as relation between different messages. Round Robin fashion

@iurnah 12 күн бұрын

"however it is worth noting that in practice two big things come up"

@iurnah 12 күн бұрын

Bookmark: Mon Dec 16: 34:00 Search

@nirajvora9314 Жыл бұрын

Don't stop making videos bro. Your content is unique and effective.

@jordanhasnolife5163 Жыл бұрын

Not planning on it brothq

@weijiachen2850 8 ай бұрын

How does this guy know all these as a junior engineer? He should be promoted to a staff engineer.

@jordanhasnolife5163 8 ай бұрын

Very unclear if I have what it takes for that

@Amin-wd4du 3 ай бұрын

It’s not much about knowing all the technologies to succeed. It’s more about influence

@vetiarvind Ай бұрын

@@jordanhasnolife5163 hahahah you are overqualified

@rahilsharma7122 12 күн бұрын

The design did not incooperate how a user will view a video. Is it trivial and doesn't need to be mentioned in an interview?

@rishabhgoyal8110 11 күн бұрын

-> Add a bunch of "VideoServing" service instances partitioned by video_id behind a Load Balancer. -> Users can establish a Websocket connection (why bidirection communication needed will explain at last) with the respective server as per the video_id (load balancer -> Zookeeper -> consistent hashing ring) -> Video service goes to the videoChunks table, and streams the respected chunk as per the clients network quality & encoding requirements. -> Why bidirection communication needed? From server to client -> to transport the video chunks From client to server -> to provide the latest network details / User's video quality choice for dynamic chunk serving. @jordanhasnolife5163 correct me or better my approach if required

@jordanhasnolife5163 10 күн бұрын

I think I said it at some point, but you're just polling each chunk in order (asynchronously) after reading the chunk metadata for your video, and using the lower resolution ones if you begin falling behind

@jordanhasnolife5163 10 күн бұрын

I think your approach is generally fine here, but I would just load all the metadata client side and make those decisions locally so that you can go straight to S3/a cdn, maybe your approach is more accurate than mine though!

@Anonymous-ym6st 4 ай бұрын

Modern system more CPU bound instead of network bound -> not sure I understand it correctly. If it is about latency, network is definately taking more time. QPS wise, CPU bound can be solved by adding more nodes, but network bandwidth will be just like that much? (open to discuss, I don't have any experience on storage on my own..)

@jordanhasnolife5163 4 ай бұрын

It basically means that in something like AWS, if we want to perform a large analytical query, the main thing slowing us down is the ability of CPUs to parse through the data, as opposed to actually moving data from host to host over the network in order to parse it.

@ishaangupta4125 6 күн бұрын

I don't think we should upload chunks ourselves. Service should be responsible for uploading to S3 internally as chunks and pass on the URLs for those chunks for processing. What do you think?

@jordanhasnolife5163 3 күн бұрын

I think if the chunking process is pretty expensive that makes sense to me, otherwise look into signed URLs

@charan775 2 ай бұрын

seems like you flinking love Flink so much. used in every video

@jordanhasnolife5163 2 ай бұрын

Have you read my channel description?

@8610803eternal 2 ай бұрын

Instead of having a separate queue to alert the number of chunks per video for the flink consumer, could we just write to a table and have the flink consumer read off of it? Thanks for the videos!

@jordanhasnolife5163 2 ай бұрын

Well you'd have to use change data capture between the table and flink, but that seems super reasonable for sure!

@madhuj6912 Ай бұрын

Thanks for detailed explanation. I see user upload chunks directly into S3, is it user responsibility to upload video in different formats and chunks? or is it handled by system

@jordanhasnolife5163 Ай бұрын

I guess depends on your implementation, but I could see us saving a bit of time here having the user upload it directly

@dkcjenx 3 ай бұрын

Hi Jordan! I actually encountered this stream processing aggregation problem at my work. I think your solution is much more elegant than ours, but I have a question - Are you going to shard the flink servers that tracks video chunk processing status? Otherwise every server instance will need to contain all the video upload information (in memory I assume?) and won't be scalable. Assuming you do sharding by videoId, are you going to use consistent hashing to make sure the same videoId goes to the same flink server? Also, how would you guarantee persistence in case the server goes down and data in memory is lost (use write-ahead log?). Or alternatively are you goign to store the chunk upload status in a DB table?

@jordanhasnolife5163 3 ай бұрын

1) Yes, I'd shard by video Id here (I believe that's what we do in the video). I don't think that you technically need "consistent hashing" here as opposed to just normal hashing, but that totally works too! As for persistence, Flink does check pointing and kafka messages are replayable.

@dkcjenx 3 ай бұрын

@@jordanhasnolife5163 Hi Jordan, isn't normal hashing the same as 'consistent hashing' in the context of load balancing? I wonder what do difference do you imply here?

@jordanhasnolife5163 3 ай бұрын

@@dkcjenx Not entirely, as normal hashing means hash mod n. But that means we change the node that everyone is routed to if n changes (a new node is added or removed).

@rakeshreddytheeghala9397 2 ай бұрын

The Video Chunk and User Videos tables are updated after processing. The Chunks table requires a videoId. However, before pushing the video into the User Videos table, we need to insert it into the Chunks table. So, while inserting into the Chunks table, where do we get the videoId? (Is videoId a client UUID?)

@jordanhasnolife5163 2 ай бұрын

Yeah I don't see why not - it could also be the channel ID + the timestamp it was posted

@Anonymous-ym6st 4 ай бұрын

I am curious if it is common to use two type of DB in real use case (of course for big company as KZbin it's worth, but considering we are designing for a team / an org tech work), maybe compared with adopt cassandra, optimize based on mySQL will be more like a real case?

@jordanhasnolife5163 4 ай бұрын

Fair enough consistency of DB choice can be a real draw in some places.

@jordiesteve8693 5 ай бұрын

great video as always, really liked the aggregation part. One thing about the search index, if I got it right, you propose to shard for a given term and ideally have all (userId, videoId)s within the same shard. That's not possible, in Elasticsearch documents (descriptions in here) are sharded by documentId, and you need to end up running a distributed query for each search query. Why is that? Well, the algorithm that have powered the search space until recently has been bm25. There are two main components on this formula (the bm25 formula), one is how frequent each query term in each the document (the more the better), and how "popular" is this query term across the corpus of documents (the less popular, the better). For example if query terms are [a, b] the scoring formula would look like f(a, D) / popularity(a) + f(b, D) / popularity(b). Running bm25 in a single node is easy, however, when going to a distributed system unexpected things can happens (business as usual). In a distributed fashion, we HAVE TO (otherwise you are not running bm25 anymore) keep f and popularity consistent within a single node, that is why ES doesn't allow to control by what you partition on. Basically, when we run a query, we'll run a distributed query to all nodes, each one will run bm25, and a coordinator/aggregator node (can´t recall, there are different setups), will get the results and return the topk. Now, this has several implications, an important one: the bm25 runs within each shard and only considering the statistics (f and p) on that shard, that means that if we have doc1 and doc2 and doc1 is the same as doc2 and each live in a different shard, it could be that bm25(query, doc1) != bm25(query, doc2). Because of this, we need to make sure we do'´t overpartition and don't mess up the data distributions. Argh, hope people have followed this far! I really like inverted indexes, but the new kid in the block (well, not that new) is Vector Databases. The idea is you encode the text or whatever using Deep neural networks or whatever you have in hard to encode information to a vector, index it to a vector database and run approximate similarity searches.

@jordanhasnolife5163 5 ай бұрын

That makes a lot of sense to me! BM25 sounds like TF-IDF. I suppose we could do term partitioning without using elasticsearch if we didn't care about scoring, but of course scoring is super relevant. I mention that actually sharding by term in practice is pretty impossible more in my twitter search 1.0 video, and that the document based sharding is used IRL. Also makes for faster ingestions :)

@jordiesteve8693 5 ай бұрын

yep! tf-idf is very similar, ES suports both afaik, but bm25 is more tailored to the search problem

@ravi72munde 11 ай бұрын

For processing chunks, is it possible be to use Kafka + spark so each spark job handles single video but processes it’s chunks on multiple workers and at the end marks the job completed when all chucks are processed. Making keeping of state of the video’s chunks redundant.

@jordanhasnolife5163 11 ай бұрын

A couple of concerns here that you'd have to address: 1) how do we know when to trigger the spark job? 2) You're triggering a lot of spark jobs haha In practice, this may work! I think we'd have to try it out.

@ravi72munde 11 ай бұрын

Good point! How about if you could use Kafka queue to queue jobs. Message would just contains the videoID which has chunks ready to process. A consumer could act as a spark streaming(master) node. Picks available message, fetches all the chunks_ids/fileurls for that video and distributes chunks to worker nodes. Once all chunks are processed the master node would know and mark the video as complete. As an advantage it’ll be easy to track which video failed rather than chunks.

@lagneslagnes 4 ай бұрын

@@ravi72munde Your solution I would go with in an interview. I see nothing wrong with it. For interviews, I prefer solutions that abstract away a lot of details simply in favour of time. Jordan's solution is super cool though - for a longer more relaxed discussion. Most system design interviews I've had are a mad rush with no real dwell time to think in too much details.

@ravi72munde 4 ай бұрын

I did and I got the job 🥳

@lagneslagnes 4 ай бұрын

@@ravi72munde Congrats!

@vetiarvind 11 күн бұрын

Yo jordan what about the read portion. Connect to websocket and measure bandwith of the client and then delegate the video chunk link to the CDN from our server?

@jordanhasnolife5163 10 күн бұрын

I'd say something like that sounds right! Wonder if we even need a websocket as opposed to just using normal polling here

@thunderzeus8706 5 ай бұрын

Hi Jordan, I have dumb questions about the user-video table slide. Please correct me if I am wrong. 1.The proposed design uses MySQL and assigns (userId, videoId, timestamp) as primary key, using "userId" for partitioning. 2. You mentioned maybe a secondary index / sort key on videoId and timestamp 1). Since you eventually chose MySQL, what is a "sort key" in that case? On the other hand, shouldn't we use (userId, timestamp, videoId) instead of (userId, videoId, timestamp) as primary key so that records will be in timestamp order within each userId? 2). If secondary index, it will be a global secondary index. Won't the use case "look up by videoId" slow because you essentially need to find the videoId in logarithmic time, and then possible go to a different partition to get the metadata? Thanks (for bearing with me🤯)!

@jordanhasnolife5163 5 ай бұрын

2.1) I'd probably just use some combined index of userId and timestamp (which basically is the same as a video id, who needs a videoid anyways) 2.2) I suppose this answers your questions. If userId and timestamp make up the "video id", we can easily look things up by that combination.

@Randomguu Жыл бұрын

Wonderful series, cannot stop watching. Just one question on something which is bugging me- I heart this suggestion in a few of the other videos as well, how do you decide that sql db will be better as we have a read heavy system. I understand the btree vs lsm tree point but nosql scales better hence will have less locking on a single contention to a single sql node ( even if we have master slave for reads - still scale poorly no?). I think lsm vs btree is merely theoritical discussion rather than having pratical application here

@jordanhasnolife5163 11 ай бұрын

You say "NoSQL" scales better - what makes you say this? That's really only the case when we're running a bunch of distributed joins, which we aren't doing in any of this reads

@lagneslagnes 4 ай бұрын

@@jordanhasnolife5163 I think what what @Randomguu is saying is that if you set Cassandra's read quorum to 1, then the reads are ultra fast because different reads can land on different replicas in parallel. So saying SQL is faster for reads is not always true. Having said that, I think you are still spot on in using Cassandra _only_ for comments, and an ACID SQL DB for metadata like users, etc. for reason beyond write vs. read throughput.

@jordanhasnolife5163 4 ай бұрын

@@lagneslagnes That's true you can configure cassandra to be very fast for reads and still maintain quorum consistency :)

@shobhaagarwal6958 2 ай бұрын

Hi Jordan, Thanks for the video.. Which component in this system is responsible for converting the large video file into chunks and how does it do that?

@jordanhasnolife5163 2 ай бұрын

The client itself

@capriworld 3 ай бұрын

first of all, I have seen, multiple channels and finally landed here & continuing and referring to people also. thanks a lot for helping me out. Can we not use a Job Scheduler to put a task for checking whether all the chunks are done & processed. as you have said, the client, inform the metadata to the microservice about the upload, that will initiate a task and then, the instead of kafka/flink? then remove the task once done, Basically both are same, but, from the technologies perspective i thought, this may be more aligned. thanks again.

@jordanhasnolife5163 3 ай бұрын

Yeah I mean we may as well have re-built a job scheduler, I think that's just the core piece of the problem so I didn't want to abstract it away. I think the difference here is that job schedulers are nice for scheduling one job at a time, but how does it alert us when they're all done? Some do, but they probably use some sort of polling or stream processing like what we do under the hood.

@9527-ljc 11 ай бұрын

Thanks, this is great content. For entry lvl sde, which part should we focus more in SD interview?

@jordanhasnolife5163 11 ай бұрын

If you're looking for junior roles, I'd honestly just keep grinding leetcode haha. Otherwise, I'd say that the whole video is still relevant. Can't hurt to learn!

@siddharthgupta6162 11 ай бұрын

Thanks for the video, Jordan. Awesome content as always. Is there any difference between streaming vs chunking? I read somewhere that streaming is an error-prone process so one should prefer chunking over it - but there was no explanation on it. Any thoughts on this?

@jordanhasnolife5163 11 ай бұрын

Yeah to tell you the truth no clue - sounds like some guy spewing some bs as per usual with 99% of systems design videos lol

@siddharthgupta6162 11 ай бұрын

@@jordanhasnolife5163 lol sounds about right

@MuhammadUmarHayat-b2d 6 ай бұрын

qq: is client to be responsible for chunking the video and upload to s3? or should there be a merchanism to upload the video directly to s3 and have some dedicated backend wokers to chunk it in async fashion?

@jordanhasnolife5163 5 ай бұрын

Typically you'd want the client doing chunking to avoid having to retry uploading the full video in the event of some failure.

@MuhammadUmarHayat-b2d 5 ай бұрын

Makes sense. I also checked, s3 does provide support of multipart upload of fixed chunks which would be handy here

@shubhgupta8993 3 ай бұрын

I think the channelId that you mentioned in some places, is nothing but a userId?

@jordanhasnolife5163 3 ай бұрын

Yeah same idea

@jieguo6666 6 ай бұрын

Hey Jordan! Thanks for the video! If we use DDB we can use GSI of DDB so we seems don't need CDC. I'm curious is cassandra+CDC better than DDB, or it's a personal preference thing?

@jordanhasnolife5163 6 ай бұрын

If it's an eventually consistent global secondary index then I'd say personal preference. If it needs a two phase commit to stay completely consistent with the primary that seems like a pretty big difference then

@raaamu0007 4 ай бұрын

How do you actually join chunks messages in Apache flink. I do not have much knowledge on it . So for a given tumbling window of say 6 hrs for example you expect all the chunks to be processed from rabbit MQ and once all the individual messages are received in flink from rabbitMq you join them with the kafka message which is already received and publish a video upload complete event when total number of chunks is equal from rabbit mq messages and kakfa message?

@jordanhasnolife5163 4 ай бұрын

Yeah basically you just use a hashmap and key on chunkId.

@indraneelghosh6607 11 ай бұрын

Hi Jordan. Had a few questions related to the video upload flow. Could you please explain why you chose RabbitMQ over Kafka while uploading the metadata? Also, there may be times when there may be a spike in the amount of videos being uploaded particularly in the case of a KZbin-like system. I would expect video uploading on youtube would have a rather irregular traffic pattern as compared to a streaming platform like Netflix. Any ideas on how to tackle these spikes without manual intervention?

@jordanhasnolife5163 11 ай бұрын

To be honest, I do think that the uploading on KZbin would be more regular than you think. You've got people in every timezone. But yeah, I guess the way you'd do it is just have your consumers that are doing the encoding be part of some hadoop cluster that also is performing other work in the meantime, and as more jobs come in for uploads you can kill whatever jobs those nodes are currently doing and use them for uploads. For your first question, RabbitMQ is going to allow me to use a fan out design such that I don't need a bunch of different partitions (one per consumer) as I would with kafka. I don't care about message ordering at all here, so a fan out is fine.

@lagneslagnes 4 ай бұрын

@@jordanhasnolife5163 I don't get the fan-out comment. In Kafka, you can fan-out to hundreds of consumers even with a single partition by putting each consumer in a different consumer group.

@jordanhasnolife5163 4 ай бұрын

@@lagneslagnes that's fair - your suggestion is basically just partitioning the topic and having a bunch of nodes read from each partition as opposed to using a single JMS broker which doesn't rely on in ordered delivery.

@ariali2067 9 ай бұрын

Again, sorry same question caught me again and again. Is search index basically building a new table or basically a secondary index to existing user video table? I already convinced myself that it's a secondary index on top of existing tables, but then this video it seems that we are creating a new table (with some denormalized data from user video table) -> if this is the case (create a new table) -> why we need (user id, video id) as partition key here? Why we cannot use term as partition key such that for a given term search all the results are on the same node for faster read speeds? This really bothered me.. really appreciate if you can help clear my confusion here, thanks again!

@jordanhasnolife5163 9 ай бұрын

1) new table 2) too many much data for a given term typically, imagine for "Donald trump"

@meenalgoyal8933 9 ай бұрын

Hey Jordan, I am wondering how the design might change for audio streaming service like Spotify. I think a lot might remain same as youtube, but 2 major things: 1. Do you think we need to break audio file into chunks? Sure we can benefit from parallel uploading and getting one chunk at a time for streaming but audio files are lighter than video. 2. What kind of processing might be required for each audio file chunk?

@jordanhasnolife5163 9 ай бұрын

Hey! I think 99% of it is probably going to be the same. You'd probably have different bit rates for streaming the audio if you have a worse connection, which is the processing involved. Maybe you wouldn't need chunking since as you mentioned the files are much smaller in size.

@systemdesignlearner 3 ай бұрын

2 questions: 1) why don't we always read from the CDN instead of the chunk table? 2) why is it better if all the queries for a userid go to the same node?

@jordanhasnolife5163 3 ай бұрын

1) The CDN and chunk table are not replacements for one another. The chunk table is for metadata and the CDN is for actual video file data. 2) Sorry which queries are you referring to?

@systemdesignlearner 3 ай бұрын

@@jordanhasnolife5163 1) 43:53 u say we read from the video chunks table which will tell us what to read from s3. and if the user is a popular user, they read from the CDN 2) 8:35 Subscribers and 10:45 Users

@dinar.mingaliev Жыл бұрын

Hi Jordan, thank you so much for for keeping us educated and sharing your ideas in system design. Short question: dont we also need to add chunk processor, once a user uploaded a video into temporary S3 or DFS, the service splits it into chunks. And meanwhile one more question: if we have single leader replication + partitions in Cassandra, will it work with comment editing right? And also we need a service to create a user feed :)

@dinar.mingaliev Жыл бұрын

also I guess for insert, updated and delete operation on a single row are atomic, isolated and durable in Cassandra and assuming that the same user edits its comments - there should not be a problem with eventual consistency. what do you think man? :)

@jordanhasnolife5163 Жыл бұрын

Thanks! I had envisioned the user's client breaking the file into chunks. Secondly, I'd agree that edits of comments are no issue if we use single leader replication, but for multi leader replication they definitely could be!

@Ryan-g7h 4 ай бұрын

jorcan quick question, so are we storing both unprocessed /processed chunks in the same S3?

@jordanhasnolife5163 4 ай бұрын

Yeah

@Ryan-g7h 4 ай бұрын

@@jordanhasnolife5163 ty

@RS7-123 Ай бұрын

why partition by channel id and video id instead of just the video id since that will anyway be unique. what am i missing?

@jordanhasnolife5163 Ай бұрын

It's a common use case to want to find all of the videos posted by a particular channel, like when you go to my page for example

@RS7-123 Ай бұрын

sorry I should have been more explicit. this is about the comments table specifically. normally, one reads comments for a video. so partitioning just by video id which is already unique should be alright. wdyt. 2) I think the example you gave makes me feel think partitioning by channel makes most sense so all your videos would be on single partition. I don't know if you meant video id to be used as a sorting key instead of a partitioning key.

@jordanhasnolife5163 Ай бұрын

@@RS7-123 ah yeah agreed

@niapuchun 10 ай бұрын

The page at time 2:10th min the last line should say 1 million videos..isn’t it ?

@jordanhasnolife5163 9 ай бұрын

yep typo

@PrabhuMarappan7 5 ай бұрын

Hey Jordan, great videos for System Design. I was just wondering, will the client upload the whole file to S3 or upload it in chunks. And the backend process or job runner can essentially break it into chunks as it wants. Also, do you think splitting and uploading chunks will be more work on the client (the browser itself)?

@jordanhasnolife5163 5 ай бұрын

Hey, it's possible that the chunks themselves are first routed via an intermediate server, but I think the actual chunking will itself first happen on the client, or else we lose some of the benefits of chunking. I agree that this is more load on the browser.

@lagneslagnes 4 ай бұрын

@@jordanhasnolife5163 Current blob/object store services provided by the main public clouds (wasn't true in past) have great support for chunking, streaming uploads, parallel uploads and resumable uploads. That is, even when we upload a single file, the service will do all that internally using their client SDKs and backend servers. i.e., it will chunk and reassemble the chunks to a single file in the blob store. For a video/audio file, I don't think we gain a lot by literally managing the chunks in the client, unless I'm missing something?

@jordanhasnolife5163 4 ай бұрын

@@lagneslagnes If the SDK does it for us, fantastic! I'd also then like to confirm however that loading these files can be done similarly in chunks, so that we can adjust our bitrate/resolution as we load them. Hence the reason for me wanting to store them as conceptually separate files.

@lagneslagnes 4 ай бұрын

@@jordanhasnolife5163 So, the big difference between uploads in Netflix and say, DropBox++, is that the chunks are more of an internal implementation detail of the streaming download protocol. For KZbin/Netflix (see your requirements) we do not need the client nor the backend to know and track the chunks after the transcoding/packaging of chunks is done. The machines in the processing pipeline will break the uploaded file into chunks, and during the final packaging step, will put the metadata for the chunks within the streaming protocol's manifest file (e.g something like a specially named XML file that sits in a folder with all the separate chunk media files for the overall media item). The backend database(s) do not really need to keep tracking the chunks after that point, because there is no requirement to allow updates of parts of a audio/video file. So uploading to a object store, via the object store's regular upload SDK/protocol >might< secretly chunk and re-assemble for us. But overall that is transparent to us, so it just looks like a single file upload. Later processing steps run on that raw file, breaks it into chunks and converts to various formats/resolutions, then reassemble into a package that include a manifest file with details for all the chunks/formats/resolutions. Later when a client of the streaming download protocol downloads the file, it grabs the manifest file, which includes pointers(URLs) for the chunks within that same folder (for example). There is no need for chunk information to be requested from our actual backend databases, it's all internal implementation details of streaming protocols. Summary: Managing chunks with "our" design's metadata DBs is only really useful when we do require updates to happen at the chunk level (i.e., not when the semantics is always to re-upload entire file). Minor point btw, your video was full of good content, and showed a great way for creating a new streaming protocol system that just happens to manage chunks less transparently.

@jordanhasnolife5163 4 ай бұрын

@@lagneslagnes Thanks! Yeah, was aware this was the case for RTC streaming, wasn't aware S3 does it under the hood. Appreciate the info!

@xiaoyinqi7296 10 ай бұрын

Thanks for the video, Jordan, very impressive. want to understand the reason using Flink here, I know Flink is a streaming processing tool. I believe we want to confirm if the transcoding of all the chunks is done. my thought is to use chunk db table to mark each chunk's status.

@jordanhasnolife5163 9 ай бұрын

You can definitely use a chunk db. However, note that this means: 1) You need to make an additional network request to the chunk db every time 2) That request can fail, how do you ensure that we eventually write it there?

@tysonliu2833 3 ай бұрын

wdym by no state required when choosing message queue

@jordanhasnolife5163 3 ай бұрын

Can you give me a timestamp?

@asian1599 3 ай бұрын

@@jordanhasnolife5163i believe he's referring to 17:39

@ankitagarwal4022 8 ай бұрын

@jordanhasnolife5163 Hi Jordan, I have just one question, your processor is transforming the video into a list of transforming videos, it will depend on the number of encodings * resolutions. let's say for example we have 10 encodings and 4 resolutions. it will make it 40. So we have to transform on 1 chunk into 40 and upload into 40 into s3. I assume transforming one chunk to another itself a heavy process. Can you suggest some optimization here? if our event processing fails so we don't have to transform every chunk from the beginning.

@jordanhasnolife5163 8 ай бұрын

I'm pretty confused what you mean here - each resolution/encoding is processed independently in tandem already, so if one fails the rest do not fail, feel free to elaborate!

@ankitagarwal4022 8 ай бұрын

@@jordanhasnolife5163 what I understand about the flow of data 1. first we are uploading chunks is S3, lets say (c1,c2,c3.....) 2. adding chunk details in broker (rabbitmq) 3. The processor consumes chunk details from the broker let's say C1 and puts a list of transformed (C1R1E1, C1R1E2, C1R1E3, C1R2E1,C1R2E2,C1R2E3)video into the S3 considering (resolutions(R) = 2, encoding(E) = 3 ). and processor also put list details into flink.

@jordanhasnolife5163 8 ай бұрын

@@ankitagarwal4022 The only transformation of one chunk to another that we're doing right at the start is creating the list of all of the metadata that we will eventually need to create. So that can all go into rabbit mq, and once it does we can be fairly confident that the chunk will eventually be created downstream because it will only get removed from rabbit mq once the consumer puts the completion message in kafka

@kword1337 Жыл бұрын

Thanks for another banger dude! For complicated stuff like video aggregation, are you getting your ideas from white papers? Those level of designs seems beyond Designing Data Intensive Applications?

@jordanhasnolife5163 Жыл бұрын

Well I don't feel like DDIA is ever super opinionated on how to design things in particular. That being said, real time aggregation using stream processing seems to be something used across many systems and it also handles pretty much all failure scenarios for us, hence the reason I keep abusing it haha

@vigneshraghuraman 6 ай бұрын

once the chunks are uploaded by the user to S3, how does upload service know which chunks to put on the rabbit MQ? is this done via S3 notifications to the upload service?

@jordanhasnolife5163 6 ай бұрын

The client will upload chunks based on which ones are "new". Then they all go into rabbit mq.

@Anonymous-ym6st 4 ай бұрын

if we use video id + ts as index for comment, will it be case that some comment are posted at the exact the same ts?

@jordanhasnolife5163 4 ай бұрын

I mean you can always add the user id of the comment poster if you're afraid that duplicates are going to overwrite one another.

@isaacneale8421 6 ай бұрын

I like your idea of data locality in a DB for each of the processed chunks. But I don’t know if I understand if it works. When thinking about a single machine (say a personal laptop) reading from disk, the video ought to be stored as continuously as possible to ensure good data locality and no disk jumping. Makes sense. But when talking about a distributed service, I can’t see how this helps. As I understand a disk, it can be only reading in one location at once. There might be multiple physical hard drives on one machine though. Anyway, so let’s say I am watching a youtube video and I grab Chunk1 from the DB. Great. Chunk2 is next, which i’ll request in 5-10 seconds. But what happens if someone else is watching a video partitioned on the same DB shard. And they request their chunkXYZ. The disk jumps to their spot, then back to mine when i request chunk2. So it seems like making the distributed DB have good data locality can break down quite easily with concurrent requests. Hopefully, however, most videos are read from the CDN which would be much faster since it cache is in memory. But that’s a lot of expensive memory for caching all videos, so maybe that is on disk partially too, which I guess would have the same problem. Any thoughts? I suppose good data locality doesn’t hurt in the case where my sequential reads are not the systems sequential reads. So you might as well try to have good data locality.

@isaacneale8421 6 ай бұрын

Oops. I just rewatched and realized that you had an S3 location in this DDB. The data locality was for range queries to fetch the next X many chunk locations while buffering. This makes a lot of sense.

@jordanhasnolife5163 6 ай бұрын

Yup, range queries is what you're looking for!

@reddy5095 4 ай бұрын

Since the user stores the entire video in S3, he will have the list of chunks and their links, then he can make one api call to store the details in chunk table right? why are we using the video chunks rabbit mq instead?

@jordanhasnolife5163 4 ай бұрын

Also doable, you could run CDC on an event like that, but we want to make sure to process the s3 files (one for each chunk) in the background, so we need something to go into rabbitmq to tell us where those video files live.

@college7290 11 ай бұрын

Real treasure! Thank you. What resources did you use to learn these concepts? I know your knowledge is not out of books, but based on years of hard work and experience. how I can start learning these concepts myself? What can I do to be knowledgeable like you in next 5~10 years?

@jordanhasnolife5163 11 ай бұрын

Just reading haha, I'm nothing special! You'd be surprised how much you can learn by looking at "Uber system design" from reputable sources (their site and not KZbinrs)

@saurabhmittal6947 6 ай бұрын

hey jordan, I have one question.. how is client able to uniquely generate the chunk-id and video-id because here, you are showing that client will be uploading to s3 and then sending that data to upload-service but who is assigning unique-ids to all these entities flowing in our system ?

@jordanhasnolife5163 6 ай бұрын

The video id can just be some userId + a hash or something. The chunk ID is also basically a hash and just needs to be unique per video id

@shetysify 5 ай бұрын

Need your blessings going for an interview !!! You should next do a dsa course . Thank you !!

@truptijoshi2535 7 ай бұрын

Hi Jordan, can CDC have a single point of failure? If yes, how do we avoid? Also does CDC add extra latency?

@jordanhasnolife5163 7 ай бұрын

I mean in theory kafka, but I tend to imply that our Kafka cluster has replicas. CDC does make things slower, but I suppose in the cases where I use it I don't actually care (hence why I use it)

@VidulVerma 11 ай бұрын

Awesome design 🙇

@roshankumar0911 Жыл бұрын

I recently cleared my system design round after watching ur videos..it's so compact & precise. Thank you for making such videos. Can you please mention your linkedin id ?

@jordanhasnolife5163 Жыл бұрын

Glad to hear!! Congrats! www.linkedin.com/in/jordan-epstein-69b017177? If you don't mind, just don't tag me in stuff so that I don't lose my job haha

@roshankumar0911 Жыл бұрын

@@jordanhasnolife5163 Sure, thanks :)

@chalamrani1923 2 ай бұрын

great video

@alberdgdj1 Жыл бұрын

Hi Jordan, thanks for your videos they are of a huge value. I wonder if you could do a video about calculating BigO complexity with some exercises, that would be really helpful. Thanks mate!

@jordanhasnolife5163 Жыл бұрын

I appreciate that! I can do this, however it realistically would be a while before I get to it, just due to the fact that I'm mainly trying to focus on systems design. That being said, there are many good resources on the internet for how to calculate this type of thing!

@joemiller1057 5 ай бұрын

I would not chunk on the clients. If you want to change that logic you would have to update all the clients, i feel like it could also introduce device specific bugs. Upload the file and process, split it up serverside.

@jordanhasnolife5163 5 ай бұрын

I can see the argument moreso if we have many upload servers that are geographically distributed

@vorandrew Жыл бұрын

Chunking stuff question... Why would you want to store chunks except in cache? Let's say video is 50Mb, you want save permanently transcoded 3-4 resolutions x 1-2 formats? Petabyte here petabyte there and we are talking about big numbers... If you always can re-create them - no need to store transcodes for video that was last viewed 3 years ago... cache them with last-access timeout set to 1 week for example... Maybe you want to store first chunk for fast access at maximum

@jordanhasnolife5163 Жыл бұрын

Would appreciate if you could elaborate here! While it's true that we could store the entire video file and never deal with any chunks, assuming we originally upload chunks to S3 when first uploading the file we'll always need at least some chunk metadata in our database to load them

@vorandrew Жыл бұрын

@@jordanhasnolife5163 my guess is like this - we are receiving file of original resolution -> chunking it by 2 sec -> long term storage. Transcode first chunk into 144,240,360,480 etc resolutions (don't store) -> CDN expiration = 1Y last access (just to have fast start experience). Whenever somebody starts to watch video we transcode necessary resolution on the fly from original chunks in parallel and store it in CDN expiration = 1 week. I'm sure sum of transcode speed will be faster than viewing speed so we will make viewing seamless Regarding metadata - as you said during upload we can store all necessary chunking stuff in some nosql db

@vorandrew Жыл бұрын

Than you for your videos! ❤ after viewing some I can see your designs tend to give out space as FED is printing money 😂

@jordanhasnolife5163 Жыл бұрын

@@vorandrew Ah I see what you're saying here, I think it's one of those things that we'd have to actually try out and see if the latencies would be low enough. We do care a lot more about lowering read latencies here, so I wonder if this would work in practice but it's an interesting thought!

@jordanhasnolife5163 Жыл бұрын

@@vorandrew Haha yeah - my personal philosophy here is to use as much disk space as needed, we could always optimize for cost saving measures in the future! At least for the interview I don't know how often it would come up, but it's possible!

@vetiarvind 24 күн бұрын

I talked to 4 engineers at my startup company and all of them knew about you. Kind of crazy considering that you have "only" 58K subs, but I guess most smart engineers find and recognize your skill.

@jordanhasnolife5163 23 күн бұрын

It's my pervasive onlyfans presence

@davidabu3170 9 ай бұрын

you forgot the userId in the table it is quite important

@jordanhasnolife5163 9 ай бұрын

who needs users anyways?

@jashmerchant5121 5 ай бұрын

yours feel relatively complex and speedy tutorials compared to others

@jordanhasnolife5163 5 ай бұрын

My girlfriend used to say this about our sex life

@adithyabhat4770 11 ай бұрын

Thanks Jordan!

@JulianA-rm4ry 8 ай бұрын

Thank you Jordan

@JulianA-rm4ry 8 ай бұрын

Now i'm only 1/2 screwed

@rakeshvarma8091 8 ай бұрын

You Are Awesome Bro!!

@jordanhasnolife5163 8 ай бұрын

So are you!

@aforty1 8 ай бұрын

Liked and comment for the also! Thank you!

@zhonglin5985 8 ай бұрын

At kzbin.info/www/bejne/amTFc2qliNNkb5I, another queue is needed to stream total chunk count to Flink. This look a bit redundant to me. Why don't we just include total chunk count as an extra field of events that are sent to RabbitMQ?

@jordanhasnolife5163 8 ай бұрын

Totally doable as well, I considered this approach too. I mainly assumed there'd be a lot of other metadata around and didn't wanna bloat the messages.