Distributed Job Scheduler Design Deep Dive with Google SWE!

Distributed Job Scheduler Design Deep Dive with Google SWE! | Systems Design Interview Question 25

Рет қаралды 28,602

Jordan has no life

Күн бұрын

Пікірлер: 91

@shrutimistry2086 2 жыл бұрын

Some visual diagrams would be very useful, to better follow along with your explanation

@jordanhasnolife5163 2 жыл бұрын

Yep, am trying to be better about visualizations in my new series

@shivanshjagga255 Жыл бұрын

4:43 DB Schema: [jobID, s3URL, status, retryTimestamp ] status = ENUM , (enum - STARTED / NOT_STARTED / DONE / CLAIMED) . 7:00 Querying the DB. ACID compliance. Indexing should be done on timestamp - Query: select the tasks that are NOT_STARTED where timestamp< current_time 8:50 failure during job run. - MQ failure - Node failure - (9:44) :New Query: select the tasks that are NOT_STARTED where timestamp< current_time - AND - tasks that are STARTED where timestamp + enqueing time + heartbeat < current time. 10:46 : Messaging Queue choice 12:14 : Claim service /DB + zookeeper Zookeeper is to check if the node is down or not. Then we can write in the Metadata DB that it's a retryable error 14:54 : Node dies and comes back up and tries the job again = 2 nodes trying the job Distributed lock. Ending note: how toschedule jobs at a fixed rate (WEEKLY / MONTHLY ) The task-runner service itself will write to the DB, the next time the task should be run again Ex: for BI-WEEKLY schedule, it will add the next time it has to be run.

@shivujagga Жыл бұрын

18:25 The whole flow: 1. Client uploads job -> goes to S3 and gets stored in DB with its schedule. 2. The enqueue service (1 machine) polls DB every minute for all jobs with the query mentioned at 9:44 3. Batches and sends jobs to MQ. 4. MQ sends it to multiple workers. and sends heartbeats to zookeepers. (Zookeeper was used for distributed locking of jobs being run) 5. Worker updates the STATUS if the job was completed or not. I have 1 question that's not addressed though @Jordan has no life. What if the worker completes the job but fails at the point before updating the STATUS of the job as COMPLETED in the DB?

@silentsword9518 Жыл бұрын

This video on Job Scheduler is by far the best I've come across on KZbin. Thank you for creating it! I have a question though: it seems here a lot of effort is made to make sure the "exactly-once" semantics, by doing retry, and having Zookeeper as well as the claim service. Would that work be eased a bit if we use Kafka? My understanding is that Kafka has better support for "exactly-once" and also uses Zookeeper internally.

@jordanhasnolife5163 Жыл бұрын

Yeah definitely, I think though that maybe for the sake of the interview it's worth breaking that down

@rajatbansal112 Жыл бұрын

I think data schema can be better. We can have job table which contains jobid,name,cron expression etc. There will be another table also which is job_execution table which will maintaine every execution of job.

@jordanhasnolife5163 Жыл бұрын

Seems reasonable to me

@ArifSiddiquee 9 ай бұрын

Thanks for the excellent video. I have couple of questions. How are job ids created? Are they globally unique? When a recurring job gets another entry in the metadata db, do they get different id? How do client gets status of recurring jobs? Should there be a different db to store statuses of previous runs?

@jordanhasnolife5163 9 ай бұрын

Yeah I think just creating a particular job run with a UUID is fine. Somebody else in the comments here suggested using a "JobExecutions" table which tracks the status of completed jobs as opposed to scheduled ones, I think that would work nicely here.

@cambriandot3665 2 жыл бұрын

12:05 Job run: Distributed locks, heartbeats, retries, fencing tokens 15:30 More than once runs 16:28 Recurring jobs

@shivujagga Жыл бұрын

So helpful!!!

@xmnemonic Жыл бұрын

easy to listen to and follow, thanks for making this

@tavvayaswanth 2 жыл бұрын

If we maintain some state in database like submitted, queued, running, success, failed, we don't need to have any distributed lock on a job, your enqueing service will only poll for states which are submitted & running for so long let's say & failed ones, and all of it can be done in a serializable isolation level in MySQL as we have opted for it the first place.

@jordanhasnolife5163 2 жыл бұрын

While I agree that the majority of the time, this ought to work, ACID properties aren't enough because our SQL database could go down, and unless it has strong consistency (not recommended for performance reasons/network partition tolerance), it may be possible that a claimed job may not seem claimed in the database replicas. Ultimately, we will need some sort of consensus here.

@tavvayaswanth 2 жыл бұрын

@@jordanhasnolife5163 Agreed on the database could go down part, but this is where many master slave systems(hbase for example) use consensus to elect the right master and hence we will get strong consistency. Theoretically both of our solutions has to use consensus in anyway just that you are having a distributed lock service separately. Got it. By the way your videos are great, Way to go!

@vitaliizinchenko556 7 ай бұрын

Thank you for the content. One question: what if we want to schedule jobs based on job’s resource consumption requirements and availability of resources on workers. How would you change your design?

@jordanhasnolife5163 7 ай бұрын

I think that the message broker could itself maintain some internal state (or have consumers go through a proxy) which keeps track of how many jobs each has run and perhaps their hardware capabilities (maybe stored in zookeeper). Essentially a load balancer lol.

@geekwithabs 4 ай бұрын

At this point, based on that rad intro, I have to ask: Have you considered a role in Hollywood? 😉

@jordanhasnolife5163 4 ай бұрын

They send me linkedin DMs sometimes asking me to be an underwear model

@aritraroy3493 2 жыл бұрын

I didn't know you were chill like that either 😫

@jordanhasnolife5163 2 жыл бұрын

Listen bro if you didn't know I'm pretty damn chill

@aritraroy3493 2 жыл бұрын

@@jordanhasnolife5163 Left the freezer open 😨

@sanampreet3045 2 жыл бұрын

Great video ! just a small question . When a consumer node dies (stops sending heartbeats , how do we mark job status as failed , is zookeeper holding info that which consumer node is running which job id ? )

@jordanhasnolife5163 2 жыл бұрын

Yes because to start claiming a job a consumer must grab the corresponding job lock in zookeeper.

@pawandeepb5967 2 жыл бұрын

Very nice videos! awesome work !

@julianosanm 10 ай бұрын

How would we differentiate if the job timed out or is just taking long to execute? How can we prevent it from running twice or even indefinitely? Would it make more sense to use a log based queue and let it take care of retries?

@jordanhasnolife5163 10 ай бұрын

To be honest, the challenging part of distributed computing is that you can never truly know. Networks aren't perfect and so nothing is certain, jobs can complete years after in theory. But, as long as you set a reasonable timeout, and make your jobs idempotent, it's ok! Using a log based queue is totally fine too, but it would still have to use timeouts somewhere

@zhonglin5985 6 ай бұрын

How does the Job Claim Service communicate with the ZK? Does it poll ZK once in a while, get the all the running jobs' statuses, and then update our JobStatusTable?

@jordanhasnolife5163 6 ай бұрын

You can put something called a "watch" in zookeeper which will notify you when it changes

@niranjhankantharaj6161 Жыл бұрын

Thanks for the Great video! If zookeper stops receiving heartbeats, "we can go ahead and updated the metadata db" Curious , who would update the metadata db? Is it a) Zookeper that goes ahead and updates the metadata db? If so, is that feasible with Zookeepers capabilities for us to add such a custom logic? or b) Zookeper performs failover where it creates another worker node and has it restart this job ? Also, since zookeper will help claim service acquire distributed locks using fencing token, why do we still need ACID properties if SQL DB - why should we not use no-sql for metadata db?

@jordanhasnolife5163 Жыл бұрын

Fair point! I think a couple of servers that poll zookeeper for outages and restart their jobs would do it

@niranjhankantharaj6161 Жыл бұрын

@@jordanhasnolife5163 Any example design or literature that shows this design (polling zookeper for outages and implements custom logic with failover) ? I believe this is very critical, and if left unaddressed leaves the fault tolerance not solved

@niranjhankantharaj6161 Жыл бұрын

Looks like apache curator has some "recipes" that can be used when persistant nodes fails which can be used here. Also, curator can be used as a client with zookeper to acquire distributed locks

@jordanhasnolife5163 Жыл бұрын

@@niranjhankantharaj6161 I'll do a better job addressing this in the remake. You have many options though - for example a cron job on the status table that eventually sets job status back to "not started" after a certain amount of time that the job has yet to be completed. It's certainly not trivial, but it's not overly complex either

@jordanhasnolife5163 Жыл бұрын

@@niranjhankantharaj6161 Good to know, I'll take a look into curator!

@abhishekmishraji11 2 жыл бұрын

Hey Jordan , can uoi please make a video on collaborative editing tools like coderpad, google doc, google sheets. Actually I guess codepad would be a super set of google doc so you can choose coderpad over google doc while designing. Thanks,

@jordanhasnolife5163 2 жыл бұрын

Did that already

@abhishekmishraji11 2 жыл бұрын

@@jordanhasnolife5163 Thanks!

@champboy Жыл бұрын

What if these jobs had different priorities and we had to change the priority of a job at any point ? (Mainly concerned about when priority changed while its in the queue) For longer running jobs staying in the old priority queue might not be an option

@jordanhasnolife5163 Жыл бұрын

A bit confused here when you say the queue. We could index our SQL table by priority, or we could shard multiple tables with priority. Once it's in the queue it's going to be run more or less - perhaps you could do some weird type of in memory heap but that seems a bit extra

@zy3394 4 ай бұрын

why does the zookeeper has no arrow outwards ? should it be notifying the db the status change of the tasks? like updating the task status not complete/not complete/ etc.

@jordanhasnolife5163 4 ай бұрын

Probably just because I forgot to include it in the diagram.

@allo1579 2 жыл бұрын

Hey Jordan! I did not get why we need a lock here? If we enqueue a task into SQS, only one consumer will pick it up anyway (I think SQS takes care of concurrency here) and for the duration of the execution we can hide the task in the queue. Also, what happens to a task in the queue? Does worker removes it from the queue or makes it invisible for the duration of execution?

@jordanhasnolife5163 2 жыл бұрын

Locks are important because tasks may be put in the queue again if the system thinks that it failed to execute (e.g. there is a timeout that is exceeded) - yes once a task is removed from a queue it won't be removed again, however like I mentioned it could be re-enqueued if we accidentally think that it has failed

@allo1579 2 жыл бұрын

@@jordanhasnolife5163 oh, that makes sense! And what about a task in the queue? A task can take very long to execute, so I assume make it invisible in the queue is not really an option? Does executor remove it from the queue? In which case, what if it dies, who re-queues the task?

@prateekaggarwal3305 2 жыл бұрын

Hi Jordan, how often job schedules will be polled from the Db, is it every second, every min? do we also need to define an SLA for picking the job from the table.

@jordanhasnolife5163 2 жыл бұрын

I think that's based on the SLA like you said, personally I think something like every 10 seconds is probably reasonable

@akshay-kumar-007 Жыл бұрын

Can you elaborate on how the SLA would work in this scenario for scheduling a job?

@SoniaRana-o2b Жыл бұрын

Hey @jordan thank for the great video on scheduler design. I have a small query what will happend if we run multiple consumers for the service that will be polling data from DB and pushing it to queue? For scalability we may need to run multiple consumers and there is probability that jobs will get duplicated in queue.

@jordanhasnolife5163 Жыл бұрын

If our database uses transactions we wouldn't have to worry about this, each consumer could just mark a row as "being uploaded to queue" before they attempt to upload it and other consumers won't touch it if that happens

@valty3727 Жыл бұрын

6:10 what is it about the message queue that doesn't allow us to get any information about the job other than 'run' or 'not run'? admittedly my knowledge of message queues is kind of shaky but couldn't we configure a log-based message broker to give us info other than 'run' or 'not run'? also if you want another video idea, system design of a doordash/grubhub type app would be pretty cool!

@jordanhasnolife5163 Жыл бұрын

I'm a bit confused what you mean here - we're just placing the jobs themselves in the message queues. We keep track of the status of each job in a database so that we can request the status from a variety of other components. Sure, a message broker knows which jobs were sent to consumers, but that doesn't mean they were run successfully, and the message broker has no way of knowing this. As for the doorash point, I'd just check out my design of Uber, they're basically the same :)

@valty3727 Жыл бұрын

@@jordanhasnolife5163 got it, thanks!

@dind7926 Жыл бұрын

hey Jordan, great video as always. Have a couple of questions: - Instead of using Enqueuing service could which polls jobs every minute, could we instead just add an event stream on the DB and just do filtering within the stream where we only take a look at the jobs that need to be run? - Not sure I got the argument about using in-memory queue, could you add more context on why we decided to do that instead of log-based queue?

@jordanhasnolife5163 Жыл бұрын

1) We could but that's effectively just polling and I think defeats the purpose of using the stream 2) We don't care about the order in which jobs are run and want to maximize throughout, so an in memory queue with many consumers is more useful to us than a log based queue with a single consumer per partition

@jayshah5695 2 жыл бұрын

Are there any open source or commercial example that solve this problem? helps to understand the problem better.

@jordanhasnolife5163 2 жыл бұрын

Look up Dropbox atf

@Shufjskkskf Жыл бұрын

Looks like there's overlapping work between Job claim service and Zookeper for me, can zookeeper also do the job "Job claim service" does?

@jordanhasnolife5163 Жыл бұрын

Assuming that you mean the distributed locking part, then yeah I think so

@desltiny2884 Жыл бұрын

FIRST 30 SECONDS HAHAHA THE BEST

@akarshgajbhiye1289 8 ай бұрын

Jordan is clearly a man of culture ,

@nikkinic112 Жыл бұрын

Why MqSql for the Job Scheduler? Why not Nosql?

@jordanhasnolife5163 Жыл бұрын

We need transactions in our db table or else we could have write conflicts on a single node and jobs will get lost

@rishindrasharma7278 2 жыл бұрын

7:04 nice job ;)

@tunepa4418 2 жыл бұрын

Good intro lol

@jordanhasnolife5163 2 жыл бұрын

Why thank you, it was certainly out there

@wil2200 5 ай бұрын

Solid side job (id =14)

@user-se9zv8hq9r 2 жыл бұрын

song? in b4 darude - sandstorm

@jordanhasnolife5163 2 жыл бұрын

Lol it's some no copyright edm bs I gotta go find it haha

@erythsea 2 жыл бұрын

that intro tho

@andreystolbovsky 2 жыл бұрын

We don’t care about order of the jobs and we want an in-memory broker, so let’s pick Kafka. Wat. Wat a strange statement in otherwise interesting video.

@jordanhasnolife5163 2 жыл бұрын

Probably a misstatement on my part - meant sqs or rabbit mq

@jordanhasnolife5163 2 жыл бұрын

Actually it seems at 11:10 I said to not use kafka

@andreystolbovsky 2 жыл бұрын

Listened to that again - you’re right, I’m wrong. I felt it!

@user-se9zv8hq9r 2 жыл бұрын

love the farting part. are you going to start selling your farts anytime soon?

@jordanhasnolife5163 2 жыл бұрын

Should I make a Patreon or an only fans?

@mnchester 2 жыл бұрын

Only Farts

@jordanhasnolife5163 2 жыл бұрын

@@mnchester brb building that

@zhonglin5985 6 ай бұрын

At kzbin.info/www/bejne/jYXbeGhubZV4fpo, why long polling instead of regular polling?

@jordanhasnolife5163 6 ай бұрын

You end up putting a lot of load on your system that you may not necessarily need to.

@Lantos1618 2 жыл бұрын

jordan make a discord channel baka >,

@jordanhasnolife5163 2 жыл бұрын

Definitely something I'm considering, I'm stretched a little too thin to be on there consistently atm so will let you know if I change my mind!

@justicedoesntexist1919 Жыл бұрын

how crass is this man? Such people pass googlyness round and get into google? Do people really like to work with such people with questionable character?

@jordanhasnolife5163 Жыл бұрын

Nope they all hate me! I'm literally incapable of cursing during the interview round!

@justicedoesntexist1919 8 ай бұрын

So basically, interview process at Google is broken and there are false positives all the time. Got it!@@jordanhasnolife5163

@utkarshgupta2909 Жыл бұрын

Jordan dont you think we should be having a queue between job submission service and SQL db?

@jordanhasnolife5163 Жыл бұрын

I don't think it's necessary since a job submission is just adding one row to the database.

@utkarshgupta2909 Жыл бұрын

@@jordanhasnolife5163At what scale should we have queue there? I mean at what transaction per second, SQL needs a queue

@jordanhasnolife5163 Жыл бұрын

@@utkarshgupta2909 Can't speak to exact TPS, but I think a good rule of thumb for a queue is when something that is being uploaded needs to be sent to multiple places or there is a lot of processing that eventually has to be done on it