Scaling Instagram Infrastructure

  Рет қаралды 276,855

InfoQ

InfoQ

Күн бұрын

InfoQ Dev Summit Boston, a two-day conference of actionable advice from senior software developers hosted by InfoQ, will take place on June 24-25, 2024 Boston, Massachusetts.
Deep-dive into 20+ talks from senior software developers over 2 days with parallel breakout sessions. Clarify your immediate dev priorities and get practical advice to make development decisions easier and less risky.
Register now: bit.ly/47tNEWv
----------------------------------------------------------------------------------------------------------------
Lisa Guo overviews Instagram's infrastructure, its history, multi-data center support, tuning uwsgi parameters for scaling, performance monitoring and diagnosis, and Django/Python upgrade.
Download the slides & audio at InfoQ: bit.ly/2uljG8j
This presentation was recorded at QCon London 2017.

Пікірлер: 162
@zss123456789
@zss123456789 4 жыл бұрын
*Timestamps* 0:00 Introduction (Lisa Guo) 2:21 1. Scale out 5:11 1.1 Instagram Stack Overview 5:46 1.2 Storage vs Computing 6:29 1.3 Scale out: Storage 8:13 1.4 Scale out: Computing 8:52 1.5 Memcache + consistency issues 12:05 1.6 DB load problem 14:01 1.7 Memcache Lease 15:12 1.8 Results, Challenges, Opportunities 17:03 2. Scale up 17:57 2.1 Monitor (Collect Data) 20:07 2.2 Analyze (C-Profile) 23:06 2.3 Optimize 26:19 2.3a Memory Optimizations 29:06 2.3b Network Latency Optimizations 30:40 2.4 Challenges, Opportunities 31:36 3. Scale Dev Team 33:06 3.1 What We Want 33:30 3.2 Tao Infrastructure 34:33 3.3 Source Control 36:17 3.4 How to ship code with 1 master approach? 37:54 3.5 How often do we ship code? 40:03 Wrap-up 41:15 Q&A
@zss123456789
@zss123456789 4 жыл бұрын
Note: My understanding for Memcache Lease is, you're allowing servers to return stale values with the knowledge of it being stale. This is different from most simple implementations of cache invalidation, which would query the db and update the cache whenever the value is stale. The philosophy here is that the stale value is still useful, and the value difference is not worth the load on the database.
@bogdax
@bogdax 2 жыл бұрын
@@zss123456789 That's a very good point I haven't thought about. Thanks!
@juakinggg
@juakinggg Жыл бұрын
not every hero are wearing the cape, thx !!
@mikejeffery8371
@mikejeffery8371 6 жыл бұрын
This was a fantastic presentation. She covered a huge amount of material in a short time. What they've done and how they've done it is very impressive.
@sanjeevdiitm
@sanjeevdiitm 3 жыл бұрын
InfoQ is doing excellent job by bringing these talks to us.
@smonkey001
@smonkey001 3 жыл бұрын
Every architecture video should be like this, instead of marketing BS.
@JamesCollins90
@JamesCollins90 2 жыл бұрын
"I need to learn about scaling" *heads to youtube, finds this video* "Wow, I now know EVERYTHING about scaling". The best video on scaling infrastructure i've found so far. No jargon, no acronym's, specific detail about exactly how things are balanced, routed, managed and replicated. Love it.
@cpsarathe
@cpsarathe 5 жыл бұрын
That’s the great presentation . To the point and not super technical . Newbie like me in the world of architecture can understand
@ryan-bo2xi
@ryan-bo2xi 4 жыл бұрын
This is a treasure box ! Thank you Miss/Mrs XYZ for the super lucid explanation.
@pareshmaniyar8273
@pareshmaniyar8273 2 жыл бұрын
Dude, load testing on prod! What a badass move!
@hokcuan2390
@hokcuan2390 Жыл бұрын
Amazing sharing! Kudos InfoQ❤
@tejasripavuluri6359
@tejasripavuluri6359 5 жыл бұрын
Awesome concise high level presentation.
@yuchonghe3192
@yuchonghe3192 3 жыл бұрын
One of the best presenter who I have ever seen.
@karvinus
@karvinus 5 жыл бұрын
Great presentation. Great job Lisa !
@jeffsaremi
@jeffsaremi 5 жыл бұрын
Extremely beneficial. Please have more of these
@FeliciaFay
@FeliciaFay 3 жыл бұрын
Really fantastic presentation, thanks Lisa and InfoQ!
@markuslenger2642
@markuslenger2642 2 жыл бұрын
A complex topic explained in a simple way. Thank you!
@rameshj9198
@rameshj9198 2 жыл бұрын
Kudos to infoQ team for bringing such tech videos.
@babitarpur
@babitarpur 5 жыл бұрын
Well thought through presentation. Many takeaways.
@chiranjibghorai6950
@chiranjibghorai6950 6 жыл бұрын
Excellent talk!
@enjoyalife1
@enjoyalife1 3 жыл бұрын
Well delivered talk with clear separation of topics.
@yuhechen7258
@yuhechen7258 3 жыл бұрын
Great presentation! I'm dealing with many of the scaling challenges discussed by Lisa in my organization. Although they vary and Instagram's solution does not solve my challenges, but Lisa certainly offers any view of how great companies address them.
@hengwang74
@hengwang74 3 жыл бұрын
Best Talk I have seen! Thank you for sharing!
@jccourse
@jccourse 4 жыл бұрын
it was a fantastic presentation. very clear, easy understand, and very detail,
@amlanch
@amlanch 5 жыл бұрын
Nice presentation. There are bunch of things that can be improved for detection of the time series jumps by Fourier transformation of the time series and comparing the two frequencies on a predetermined delta of difference.
@genie7941
@genie7941 4 жыл бұрын
Fantastic. So insightful.
@False41
@False41 3 жыл бұрын
Super informative. Thank you!
@zenymax36
@zenymax36 5 жыл бұрын
Great talk. I have got some new tools and process for my work. Thank you very much.
@infoq
@infoq 5 жыл бұрын
Happy to hear that.
@shoumeshrawat1362
@shoumeshrawat1362 2 жыл бұрын
Such an insightful presentaion from a developers point .. Thank you so much
@amitcool99
@amitcool99 3 жыл бұрын
Gold Video ! learned so many aspect of scaling
@ketanshah6613
@ketanshah6613 2 жыл бұрын
This has been such an educational video. I feel excited about the problems, everything was so well covered and explained and So many aspects were touched without any redundant data. Thank infoq for this video. Super super intereseting.
@filmbyben2
@filmbyben2 2 жыл бұрын
Such an awesome video, thank you for sharing
@cenkerdemir
@cenkerdemir 5 жыл бұрын
wow. this was a great talk!
@rustemiskakov2973
@rustemiskakov2973 Жыл бұрын
Best presentation I have ever seen! Thank you.
@denkigumo
@denkigumo 4 жыл бұрын
Fantastic talk! Learnt a lot.
@riteshbajaj6
@riteshbajaj6 2 жыл бұрын
Easy to understand presentation. Thanks
@Kideqx
@Kideqx 6 жыл бұрын
wow! this is cool
@person.a
@person.a 11 ай бұрын
Hey there! I just wanted to take a moment to remind you how incredible you are. Your kindness, resilience, and unique talents make a positive impact on the lives of those around you. Your smile has the power to brighten the darkest of days, and your words have the ability to uplift and inspire. Never forget the strength and beauty that reside within you. You are capable of achieving great things and making a difference in this world. So keep being amazing, keep chasing your dreams, and never lose sight of the incredible person you are. You've got this, and today is going to be an amazing day for you!
@VaibhavPatil-rx7pc
@VaibhavPatil-rx7pc 3 жыл бұрын
Great post I ever seen thanks
@mnchester
@mnchester 2 жыл бұрын
Amazing presentation!
@kienphan6436
@kienphan6436 2 күн бұрын
Great talk thank you
@Sanyat100
@Sanyat100 2 жыл бұрын
easily the best presentation i ever came across in these talks
@KrishnaDasPC
@KrishnaDasPC 2 жыл бұрын
Brilliant talk👍
@ZhaoWeiLiew
@ZhaoWeiLiew 5 жыл бұрын
This was pretty insightful.
@RichardTMiles
@RichardTMiles 3 жыл бұрын
she did really well. also s/o to the guy asking the very last question for answering it with his exp..
@placidchat7532
@placidchat7532 5 жыл бұрын
How do you do test the configurations for scale out, or is this applied to live running machines? Or are specific test machines carved out from live users?
@alpham6685
@alpham6685 3 жыл бұрын
This is pure gold !
@roshedulalamraju7936
@roshedulalamraju7936 Жыл бұрын
Thank you so much for sharing 😊😊😊
@pariveshplayson
@pariveshplayson 2 жыл бұрын
Fantastic!!
@Textras
@Textras 6 жыл бұрын
Very good thanks
@obiwan_smirnobi
@obiwan_smirnobi 2 жыл бұрын
Awesome talk, thank you!
@just4meonly
@just4meonly 2 жыл бұрын
Well said "performance part of dev cycle rather than after thought.."
@valentynkuznietsov7866
@valentynkuznietsov7866 3 жыл бұрын
Great talk!
@pranavsharma9025
@pranavsharma9025 2 жыл бұрын
Excellent talk.
@ZaidaGote
@ZaidaGote 6 жыл бұрын
Awesome
@hemalpatel1504
@hemalpatel1504 4 жыл бұрын
deployment to 20,000+ servers in 10 mins !!!
@Rxlochan
@Rxlochan 3 жыл бұрын
Yeah, just mic drop moment
@driziiD
@driziiD 5 жыл бұрын
awesome to see python scaled to INSTAGRAM LEVEL
@xnoreq
@xnoreq 5 жыл бұрын
Only usable on a large scale when replaced with C, lol. Once again Python has proven that it is a scripting language for toying around. This talk is like one complaint about Python after the other: 1) Performance is bad. 2) Memory usage is bad. (I lol'd when she said that just the running Python code itself takes up a significant amount of memory.) 3) GC is bad.
@Pjblabla2
@Pjblabla2 2 жыл бұрын
Very informative talk
@quicksilver5413
@quicksilver5413 2 жыл бұрын
Really good talk!
@weblancaster
@weblancaster 6 жыл бұрын
Great talk.
@random-characters4162
@random-characters4162 Жыл бұрын
Git and code shipping approach is mind blowing ❤
@TheInvestmentCircle
@TheInvestmentCircle 2 жыл бұрын
Wow. She is brilliant.
@vinylwarmth
@vinylwarmth 6 ай бұрын
This is a seriously good talk
@helinw
@helinw 5 жыл бұрын
Thanks for the great talk, very clear and concise. Interestingly, some of the problem in the "scale up" section can be resolved by using a programming language more suitable for modern machines. The "scale up" section sounds like "hacks that make Python faster".
@MrHades2325
@MrHades2325 4 жыл бұрын
I am graduating this year, so I don't have a lot experience. I feel from your comment that you have a lot of knowledge from experience. May I ask you which programming languages are more suitable for scalability in modern machines. Thank you in advance
@TeluguAbbi
@TeluguAbbi 4 жыл бұрын
@@MrHades2325 Erlang and Scala - To name two
@piyh3962
@piyh3962 3 жыл бұрын
Developer efficiency > compute efficiency
@jimmyadaro
@jimmyadaro 3 жыл бұрын
@@piyh3962 “Move fast, break things” :)
@abeidiot
@abeidiot 2 жыл бұрын
stupid comment. And I'm not even a python fan. It's usually academics who make such shallow statements
@senthilkumar5
@senthilkumar5 4 жыл бұрын
Excellent Presentation. Insight to practical scalable challenges.
@saurabhchopra
@saurabhchopra 4 жыл бұрын
44:21 You guys are robust!
@mitotv6376
@mitotv6376 Жыл бұрын
Very nice
@igorborovkov7011
@igorborovkov7011 9 ай бұрын
this planet will never recover from the Python's environmental impact
@chuckywang
@chuckywang 5 жыл бұрын
Does dead code really take up that much memory? It will never be run so it doesn't affect runtime, but how much smaller would your executable be if you removed dead code?
@gsb22
@gsb22 2 жыл бұрын
I think here they are talking about RAM consumption. In other compiled languages, compiler actually removes the code that will never get called, JS has tree-shaking something like that, but in case of Python, if a module is loaded into memory, Python loads on methods into memory and then this cascades. I'm not sure how much gain they could have had, but by the looks of improvements, it seems, they were building really fast and they left a lot of dead code behind which when cleaned helped them a lot. Had they been cleaning from start, they change would not be that much.
@deerew23
@deerew23 5 жыл бұрын
This is interesting
@ddg170
@ddg170 4 жыл бұрын
this is an awesome talk!!!
@jpzhang8290
@jpzhang8290 4 жыл бұрын
How would you synchronize betwen different postgresql servers? It would still cause latency issue.
@karnveerayush
@karnveerayush 4 жыл бұрын
Fantastic presentation, lot was covered in very short span of time. Is there anyone point me more such content here on KZbin. Thanks.
@infoq
@infoq 4 жыл бұрын
There is similar content available on infoq.com
@Sunshine_1998
@Sunshine_1998 2 жыл бұрын
Go Lisa!!
@That__Guy
@That__Guy 3 жыл бұрын
I started sweating when she talked about the single branch tactic
@payaljain4015
@payaljain4015 7 ай бұрын
you got that ? if yes can you please explain
@blasttrash
@blasttrash 3 жыл бұрын
11:36 Today I learnt that you can run daemons on a database also(postgres in this case as she said).
@psykidellic
@psykidellic 3 жыл бұрын
Yeah, even i was not aware. I did some digging and i this is done using PgQ. instagram-engineering.com/instagration-pt-2-scaling-our-infrastructure-to-multiple-data-centers-5745cbad7834 ... under the caching section.
@pizza-cat1337
@pizza-cat1337 4 жыл бұрын
Everyone commits on master and it doesn't go wrong... that's impressive haha.
@jimmyadaro
@jimmyadaro 3 жыл бұрын
Testing EVERYTHING 😂
@payaljain4015
@payaljain4015 7 ай бұрын
@@jimmyadaro but dev at one time is it ?
@cozzbie
@cozzbie 4 жыл бұрын
Wonder how they do code reviews if everyone works from one branch
@ankitsolomon
@ankitsolomon 5 жыл бұрын
Could someone pls post link for the article mentioned by author related to disabling garbage collection?
@infoq
@infoq 5 жыл бұрын
This article could be useful: www.infoq.com/articles/Java_Garbage_Collection_Distilled
@yuhechen7258
@yuhechen7258 3 жыл бұрын
Lisa didn't discuss about the postgres data sharding. Is it possible to store meta data and handle queries for billions users in just one postgres instance? Any idea?
@evgeni-nabokov
@evgeni-nabokov 3 жыл бұрын
10:20 She mentioned sharding by hash of user id.
@Joso997
@Joso997 5 жыл бұрын
How does it know if it should wait or use stale value?
@gsb22
@gsb22 2 жыл бұрын
Exactly. If every Django uses the stale data, memcache will never get updated. [Edit] : I think, if a request comes and no other "fill" request is being processed, then this request gets the DB access whereas other requests that are coming when the previous one was still filling, they get stale data and once the fill up is done and new like gets added and DB is updated, then the cycle starts. Example - Request R1 comes, no other requests are doing the "fill" process, memcache allows this requests to hit DB and do the fill up, meanwhile if R2,R3,...R100 comes, memcache says, their is already a fill process in work and you can fckk off with this stale value or wait till this "fill" process is done and then you would be treated as R1 and you get to query the data. Anyone who didnt get this, feel free to comment, I'll try different way to explain this then.
@nortrom212
@nortrom212 8 ай бұрын
Engineers are so good at optimizations that they ultimately optimize themselves. Great presentation though...
@pursuitofcat
@pursuitofcat 3 жыл бұрын
26:04 Is this statement correct? "We run n processes where n is greater than the cpu cores of the system." I thought we should have at most the same number of processes as the number of cores.
@MendaSpain
@MendaSpain 6 жыл бұрын
Wow, 20,000 web servers where the code is deployed with 40-60 rollouts per day
@mostafaelmadany8046
@mostafaelmadany8046 6 жыл бұрын
a huge work behind the scenes
@aeshi001
@aeshi001 6 жыл бұрын
definitely interested on how they manage to do this
@kevin8918
@kevin8918 4 жыл бұрын
OMG, the source control part is surprising. It looks like ig is a giant monolithic app with one code base. Why not break it out at early phase
@jimmyadaro
@jimmyadaro 3 жыл бұрын
Because the “move fast, break things” philosophy
@audi88
@audi88 6 жыл бұрын
Instead of having every django d1, d2 competing to go the db for a cache refresh and causing the 'thundering herd', the d1,d2s should only check for data in memcache. It can be the job of memcache or an external service to refresh the data (independent of d1,d2s) from the DB. memcache can continue to serve old or stale data to d1,d2, while in parallel - load the data from DB and then invalidate the old data in a transactional block. Of course for a short time till invalidation you may have double the size of data in you memcache. It is sort of similar on what memcache-lease is doing, but I think d1,d2s should be kept to focus on memcache rather than speaking to the db and causing 'herd' problem.
@matt_not_fat
@matt_not_fat 5 жыл бұрын
I don't agree, because cache is more expensive than DB. And like the speaker said, data access is local to region many times. If you eagerly update the memcache with the entire dataset you have to then deal with the huge amount of storage you require, not to mention that scaling out the memcache cluster (or any change in the hardware in that cluster) would take forever, because you need to prewarm the cache. If you don't do that you end up with a lazy population strategy, which is exactly what she is suggesting. You also amortize the cost of the first slow query. It's win win.
@shakeib98
@shakeib98 2 жыл бұрын
At 12:05, if the memcache is invalidated then why does it need it then? Like the read and write operations are on the database server then.
@cafeliu5401
@cafeliu5401 5 жыл бұрын
Can anybody see my comment? Am I trapped on a single Datacenter in SGP?
@akshatjainbafna
@akshatjainbafna Жыл бұрын
TAO is a Distributed Graph based database not a Relational database. Their are nodes and links for relations
@anandt8362
@anandt8362 2 жыл бұрын
Any reason why these images can't be asynchronously processed when you user uploads the image and stores different sizes in S3 buckets provided through CDN.. Thereby, you avoid processing while fetching whenever user requests .. This would further improve the processing power right .. Anyone thoughts on this ?
@tamborelconejo
@tamborelconejo 3 жыл бұрын
Someone can say where can we find more information about that git single-master approach?
@gsb22
@gsb22 2 жыл бұрын
It's simple, usually if you are working on a feature, you create a branch from master and then work on it and then after ages, you merge it back into master. What they did was, instead of branching out, every commit would go to master, so basically your commits have to be stable, but need not to be complete, so this way, if someone starts working next day, they already have the changes u committed which reduces future merge issues.
@tawfiknasser1348
@tawfiknasser1348 2 жыл бұрын
@@gsb22 This sound like not the best approach. what about code review ? or in case reverting only one commit after you pushed your 100 stable commit. now imagine after reverting this commit(for some reason) the feature is crashing ! shall your revert all the 99 commit ? should you fix and commit and push in the same day ? i mean, this can cause more issues than it may help.
@gsb22
@gsb22 2 жыл бұрын
@@tawfiknasser1348 you can cherry pick to revert a commit. And yes, this method has problems but this us the tradeoff they went with
@hammad8053
@hammad8053 3 жыл бұрын
"Don't count the servers, make the servers count"
@jimmyadaro
@jimmyadaro 3 жыл бұрын
That’s easy when you have a multimillionaire contract with a cloud computing provider (and/or own your own bare-metal servers).
@gsb22
@gsb22 2 жыл бұрын
@@jimmyadaro I think what it meant was, dont say we have 10k servers so the load will get handled, say that every server is running 100% efficiently.
@jimmyadaro
@jimmyadaro 2 жыл бұрын
​@@gsb22 Sure, that makes sense, but still, they are capable of pay per really-high-scale servers.
@kevintran6102
@kevintran6102 4 жыл бұрын
How can they handle conflict when using a single branch?
@gsb22
@gsb22 2 жыл бұрын
they push frequently, so merge conflicts are small and easy to fix. If two branches are merged after a month of development on them, then that's shit storm whereas if they are regularly updated with master, less conflicts.
@arunsatyarth9097
@arunsatyarth9097 3 жыл бұрын
Very nice presentation. But I wish she wouldnt say Data Centre and Region interchangably.
@zeroows
@zeroows Жыл бұрын
Use Rust :)
@jaywelborn
@jaywelborn 11 ай бұрын
20k servers updated in 10 minutes. I need another talk about just that
@ucretsiztakipci6612
@ucretsiztakipci6612 Жыл бұрын
36:00
@PankajMishra-ey3yh
@PankajMishra-ey3yh 6 жыл бұрын
I am fairly new to django and don't know whether I will be able to understand all these complex stuff one day or not :-(
@cpsarathe
@cpsarathe 5 жыл бұрын
Pankaj Mishra with experience and repeated watching you will .
@zolongOne
@zolongOne 4 жыл бұрын
You will
@jimmyadaro
@jimmyadaro 3 жыл бұрын
Were you able to do so? lol
@PankajMishra-ey3yh
@PankajMishra-ey3yh 3 жыл бұрын
@@jimmyadaro I changed to the DataScience domain later lol
@hardikmahant7353
@hardikmahant7353 3 жыл бұрын
In Instagram, Requests = Djangos? @15:02
@jayleejw1801
@jayleejw1801 11 ай бұрын
This convinces me that even Python can be scaled as a global distributed system. Stop saying python sucks guys
@jayleejw1801
@jayleejw1801 11 ай бұрын
Python the best.
@Roshen_Nair
@Roshen_Nair 2 жыл бұрын
Bookmark: 12:00
@monukumar-du1mv
@monukumar-du1mv 5 жыл бұрын
11:56 I think the solution where the cache is invalidated is not even a solution because the Memcache will be empty all the time. This is because whenever a comment is registered for a post, the cache is invalidated. Would anyone clarify?
@fabianoenglerneto129
@fabianoenglerneto129 5 жыл бұрын
It's not the entire cache that is invalidade, it's just the cache for the specific object, for example the comments of a single post only
@monukumar-du1mv
@monukumar-du1mv 5 жыл бұрын
@@fabianoenglerneto129 Thanks!
Scaling Facebook Live Videos to a Billion Users
51:31
InfoQ
Рет қаралды 88 М.
We Got Expelled From Scholl After This...
00:10
Jojo Sim
Рет қаралды 49 МЛН
🍕Пиццерия FNAF в реальной жизни #shorts
00:41
Carl Meyer about Django @ Instagram at Django: Under The Hood 2016
1:04:33
Django Under The Hood
Рет қаралды 42 М.
How Instagram Grew to 14,000,000 Users With 3 Devs
13:59
Coding with Lewis
Рет қаралды 153 М.
Scaling Push Messaging for Millions of Devices @Netflix
49:10
The Anatomy of a Distributed System
37:44
InfoQ
Рет қаралды 37 М.
Messaging at Scale at Instagram
29:49
Next Day Video
Рет қаралды 62 М.
Adding a cache is not as simple as it may seem...
13:29
Dreams of Code
Рет қаралды 104 М.
🚀  TDD, Where Did It All Go Wrong (Ian Cooper)
1:03:55
DevTernity Conference
Рет қаралды 550 М.
Evolution of Edge @Netflix
43:02
InfoQ
Рет қаралды 16 М.
i love you subscriber ♥️ #iphone #iphonefold #shortvideo
0:14
Si pamerR
Рет қаралды 3,6 МЛН
TOP-18 ФИШЕК iOS 18
17:09
Wylsacom
Рет қаралды 817 М.
Разряженный iPhone может больше Android
0:34
💅🏻Айфон vs Андроид🤮
0:20
Бутылочка
Рет қаралды 226 М.