A bug in the GitHub's Rate Limiter

Рет қаралды 16,935

Күн бұрын

Пікірлер: 38

@nehagour6928 4 ай бұрын

Hey Arpit, I’m having some trouble finding the reference blog in the info card. Can you please help me? One thing that really caught my attention is how GitHub, despite being such a large organization, has been using a single global memcache server until they encountered actual issues. It’s pretty impressive that they weren’t just for the sake of following the distributed systems trend but sticked to the basic simple setup until they truly needed to change. Also, regarding the Lua script, I recall (though not very clearly) that you mentioned in other snippet it has commands like EVAL that can write to a replica in Redis, and it actually writes to the master. Please correct me if I’m mistaken.

@AsliEngineering 4 ай бұрын

1. Click on the I Card on the video, top right, or in the description scroll to the very end 2. I made that mistake once and said "server" instead of a cluster. They had one global Memcached Cluster (not server) 3. Yes. I could not catch this, but yes if you fire EVAL on Replica, Redis redirects it to Redis Master. Although GitHub did not reveal their exact flow, but in the blog they did mention that the reads were happening from Replica. My guess is that the operation is broken into two parts, one read and if all good then update. The first one can now still go to replica (through EVAL_RO instead of EVAL) and if the requests are within the limit then the count reduction can go to master through EVAL. I will be honest, this is my guess This was a nudge for me to think of it as following pseudocode 1. get and check the limit [from replica using EVAL_RO] 2. if exhausted: return error 3. else: 3.1 update the rate limiting counters in Redis using EVAL 3.2 create a new resposne obj and set the rate limiting headers 3.3 perform the operation 3.4. if error in execution, return error response 3.5 else: return the response This is what my best guess is as per the behaviour I have observed and the blog as I see two scripts attached in the blog at the end of it. Once again, thanks for this question, made me think harder about my understanding.

@nehagour6928 4 ай бұрын

@@AsliEngineering Thank you so much, Arpit. I really appreciate your explanation of EVAL; it is clear and easy to understand. Thanks again!

@rjarora 5 ай бұрын

It was a pretty bad design tbh from the start itself. Why would they calculate a static field at runtime? It's anti-pattern and counterintuitive.

@imhiteshgarg 5 ай бұрын

Out of all the issues that github faced, this was the silliest as per their engineering standards!

@obaid5761 5 ай бұрын

They should hire you bro. They'd never have another outage in the future 🎉

@SanjayB-vy4gx 5 ай бұрын

@@obaid5761 🤣

@Polly10189 5 ай бұрын

Bugs are always silly, just like you!

@LogsofaCodingNomad-ns9us 4 ай бұрын

So basically the real impact is when a user starts using Github api and has a use case where they need to make several calls , the reset time keeps drifting resulting in their valid calls getting rate limited hence causing the frustration.

@rahul10615 5 ай бұрын

Thank you for sharing the arpit , you content is your awesome

@PriestCoder 4 ай бұрын

Sir I want to know how you gained such in depth knowledge of these topics , also what you learned in early days while exploring engineering , I only find ultra advance topics which I have never heard till now before landing on to your channel

@himeshrupareliya7477 4 ай бұрын

non tech question - what app / device do you use to make these notes and do the markups...

@adityakirankorlepara4500 5 ай бұрын

They could have also stored this static value in redis right ? Why store in a db ?

@AsliEngineering 4 ай бұрын

That is what they are doing. I mentioned it in the video as well. The reset at is being stored as another key in Redis for every rate limiting key.

@adarshshete7987 4 ай бұрын

@aditya Redis is nothing but a DB in this case, I think you confused DB to some traditional relational DB.

@dhruvsolanki4473 4 ай бұрын

Time wobbling, seems like Interstellar. 😅

@anycat16 5 ай бұрын

great explanation

@anycat16 5 ай бұрын

great quality video

@ArnabBasu-r5m 3 ай бұрын

Hi, Maybe a stupid question, what if the network transfer time was subtracted at the API server before sending back the reset time. Reset time = api_server.current_time() - network_transfer_time between the API and the redis server + TTL This way the extra storage footprint of the reset time can be avoided. Your thoughts?

@shaikmastan1780 9 күн бұрын

It would work when all the timestamps and time durations are of the same precision. However we cannot control the precision of the network latency and due to this difference in precision, you can always come up with an edge case.

@ArnabBasu-r5m 9 күн бұрын

@ agreed!

@rupeshagarwal6487 15 сағат бұрын

There will be an edge case here as well. Considering the same example, if the request reaches the server at 1001.9 and takes 100 ms to reach the Redis server, the timestamp when it arrives at the server will be 1002. The TTL calculated at that point will be 3 seconds. If it then takes another 100 ms to return, the time at our server will be 1002.1. Subtracting the network latency gives us 1002.1 - 0.2 = 1001.9, and the calculated header will be 1001.9 + 3 = 1004.

@mohitvachhani1133 5 ай бұрын

Changes of this getting missed in integration testing....

@prasenjitgiri919 4 ай бұрын

I did not get the 1005 and 1006, time has passed, so what is the challenge

@AsliEngineering 4 ай бұрын

The value should have remained the same throughout but it did not.

@prasenjitgiri919 4 ай бұрын

@@AsliEngineering No, I get that and also I appreciate your time and effort, but something is off in my understanding. is it because time is decaying (i mean the redis ttl) so no matter how much the latency be the overall should be lower than the previous trip. I hope I understood that, but somehow it is still foggy. Ive a lot of reading ahead of me. thanks again sir!

@vuleanh7647 4 ай бұрын

@@prasenjitgiri919 there will be an issue if you are downstream and need to use the `reset` field, but for a regular user, it's not a problem.

@nikhilmugganawar 4 ай бұрын

Any reference link for readout?

@AsliEngineering 4 ай бұрын

Check the I card of the video.

@nikhilmugganawar 4 ай бұрын

Unfortunately I am not able to find the I card or perhaps dont know where to find it, could you please share it here or add in description

@AsliEngineering 4 ай бұрын

@@nikhilmugganawar I card pops at the end not sure why it is not loading. You can find it below the description as well. Open it and scroll at the bottom of the description. There is the link attached

@Polly10189 5 ай бұрын

Thanks @Arpit for the video, however, what actually was the impact of this bugs, what actually was the problem is created?

@AsliEngineering 4 ай бұрын

Clients (API users of GitHub) were getting inconsistent values and their downstream systems were getting affected. Not a large set of users, but people who smartly handle throttle, instead of adding random sleep, were affected by this wobble.

@gradbharath 5 ай бұрын

pls speed up explanations

@J0Y22 5 ай бұрын

you can spped up the video speed from your end 3>

@AsliEngineering 4 ай бұрын

After a long time I spoke slow to make explanations natural. People commented the opposite in other videos 😅 But yes, you can always speed up the playback speed.

@anycat16 5 ай бұрын

damnnn