How We've Scaled Dropbox

  Рет қаралды 331,484

Stanford

Stanford

Күн бұрын

(Feburary 22, 2012) Kevin Modzelewski talks about Dropbox and its History. He describes the technological issues faced by Dropbox and the actions they have to take in order to continuously improve it.
Stanford University:
www.stanford.edu/
Stanford School of Engineering:
soe.stanford.edu/
Stanford Computer Systems Colloquium:
www.stanford.edu/class/ee380/
Stanford University Channel on KZbin:
/ stanford

Пікірлер: 70
@SomeInfo-ib3wz
@SomeInfo-ib3wz 9 жыл бұрын
This is a legendary talk.
@zss123456789
@zss123456789 4 жыл бұрын
*Timestamps* 0:00 Intro (Kevin Modzelewski from Dropbox Server Team) 1:28 Agenda 2:10 1. *What* *is* *this* *talk?* 3:22 1.1 Why is this interesting? (summary: how to do it with little resources) 4:11 2. *Background* *(What* *is* *Dropbox)* 5:59 2.1 Challenge 1: Write volume (nearly equal as read volume, magnitudes above industry average) 7:25 2.2 Challenge 2: ACID (en.wikipedia.org/wiki/ACID) 10:11 3. *Examples* *(how* *have* *we* *evolved?)* 10:30 3.1 Example 1: High-level architecture 30:08 3.2 Example 1 questions 44:30 3.3 Example 2: Database for metadata 52:42 3.3 Example 2 questions 56:55 4. *Wrap* *up* 1:00:45 5. Final questions More info in replies
@zss123456789
@zss123456789 4 жыл бұрын
*Example* *1* *Questions* *pt.* *2* 11. "Costs of running on Amazon compared to DIY?" (ans: [DIY] costs more) 12. "How many operations people do you have on your side?" (ans: 7 including network guy) 13. "is your customer base world wide? [because Amazon Cloud is located in Virginia]" (ans: everything file related is in Virginia, everything metadata related is in San Jose, majority of customer is international, 65%) 14. "How many cloud based data do you store per user in S3?" (ans: Amazon takes care of replication, we just upload it once) 15. "S3 went down recently, did you get hammered?" (ans: Amazon is pretty competent over there, but it is interesting to see things on our side) 16. "Do you know how much S3 is used up by you?" (ans: some guy in audience knew, but camera is on) 17. "Evolution of instrumentation?" (ans: at the beginning it's easy, right now the server is pretty regular, you build up a good intuition of what's going wrong. We went for a long time without data visualization, but we have all that now which is better.) 18. "What metrics do you watch?" (ans: we watch all the servers' load, requests/sec, breakdown in time that went into a single request, bandwidth as measured by users, etc) 19. "What do you do for security?" (ans: I can't talk too much about specific things that have happened, but we take security very seriously)
@zss123456789
@zss123456789 4 жыл бұрын
*Example* *2* *Summary:* how it started out: - metadata is stored as a log of all the edits (server file journal) - fields: *id* | *filename* | *casepath* | *latest* | *ns_id* (namespace id) - primary key: id (meaning things are appended in id order) changes: - getting rid of "casepath" (probably has to do with case sensitivity things, but that has moved elsewhere) - to get file edit history, needed to add "prev_rev" (previous revision) - Primary key changed to ( *ns_id* | *latest* | *id* ) - changed varchar(260) to varchar(255) (255 is optimized because only 1 byte is needed for representing string length) - removed 'latest' from compound primary key (optimizes for writes, but more expensive in reads)
@zss123456789
@zss123456789 4 жыл бұрын
*Example* *2* *Questions:* 1. "When you really want to delete something, do you delete old data or does [the log] just grow?" (ans: in normal cases it just grows, I personally don't know [any cases where this is not true]) 2. "Did you have to change the size of id at some point?" (ans: ids are per namespace, we haven't had an issue, they went to unique to not unique at one point, [when primary key became a compound key]) 3. "How did you measure whether these changes make a difference?" (ans: it's extremely hard to test, because it's hard to generate realistic workload) 4. "You do A/B testing for new builds?" (ans: yeah, we are also increasing our ability to do operational changes incrementally)
@zss123456789
@zss123456789 4 жыл бұрын
*Final* *Questions:* 1. "What are the next big challenges [as a company]?" (ans: we always want to get bigger and appeal to more people) 2. "Are you discouraging people to use this as a backup service?" (ans: I don't know how many people do but I'm happy that people found productive ways to use Dropbox) 3. "How are you [solving the Mega Upload problem]?" (en.wikipedia.org/wiki/Megaupload#2012_indictments_by_the_United_States) (ans: we do explicitly prohibit this stuff, and we do take it very seriously) 4. "People paying for this?" (ans by guy in audience: you can get a small account for free, but if you want to REALLY use it you have to pay) 5. "I think [Dropbox] has 2 great advantages, one is you don't get privacy issues because you're not selling things to advertisers, the other is the spammers aren't going to pay for it when they can find a free one somewhere else" (ans: [paid nature] of our service doesn't stop them from trying, but we do try to detect abuse, etc) 6. "[difference in user experience in different locations]" (ans: we don't have a whole lot of metrics divided by geography, but client behavior is more tolerant to latency; but the requirements on back-end architecture will be less latency tolerant over time.) 7. "What are the main competitors, how do you think about them?" (ans: Box.net... as a whole we are just trying to build the best service we can, and not get distracted)
@zss123456789
@zss123456789 4 жыл бұрын
This is indeed a legendary talk, hopefully my notes will be useful to help people digest.
@SumeetKhandelwalbpd
@SumeetKhandelwalbpd 9 жыл бұрын
Good insight about large scale storage underneath Dropbox architecture. Thanks!!!
@purushothamankrishnamurthy6436
@purushothamankrishnamurthy6436 2 жыл бұрын
Great talk about architecture and the challenges. Learnt a lot from this lecture.
@vishalsh1624
@vishalsh1624 3 жыл бұрын
Really enjoyed the guessing part ! Gold talk!
@orochinagi1111
@orochinagi1111 9 жыл бұрын
loved the technical insight
@manasdalai3934
@manasdalai3934 4 жыл бұрын
Really nice presentation.
@bg-rz7vd
@bg-rz7vd 6 жыл бұрын
definition of a pragmatic programmer.
@linyan6800
@linyan6800 10 жыл бұрын
eager to know what they continued to talk after camera was turned off!
@gonkula
@gonkula 6 жыл бұрын
Loved this video. Great speaker and fantastic insight into a fairly simplistic approach to distributed systems and it's evolution within Dropbox across the years. There was just one small question that concerned me. We are told Dropbox is splitting a file into blocks of 4 mb in size, which are hashed and if the hash exists in storage already, they avoid storing a 2nd copy and instead create a mapping to the block already in storage. This is a fairly standard approach to de-duplication. My concern is that, at the scale of files Dropbox is handling, the possibility for several of these chunks to collide increases. So I am secretly hoping that, in addition to checking the hash of the block matches an existing one and that the actual contents are compared byte by byte.
@Lifelightning
@Lifelightning 6 жыл бұрын
I think Dropbox doesn't have to worry about collisions for two reasons: 1. They probably have metadata about the owner and filename of these files associated with the hashes. In this way, in order for two 4mb chunk hashes to collide, it would have to be under the same owner, or even within the same file, which would be highly infeasible with a solid hashing algorithm. 2. With a sufficient hashing algorithm, it's still pretty infeasible that two 4mb chunks anywhere within Dropbox collide. The infeasibility of this possibility makes it far outweigh checking every file byte by byte, as those comparisons would be prohibitively slow.
@ascendingone
@ascendingone 6 жыл бұрын
sha 256 is practically collision resistant
@humbleguy9891
@humbleguy9891 6 ай бұрын
@@Lifelightning I totally think reverse of your point number 1.
@RodyDavis
@RodyDavis 10 ай бұрын
Amazing talk 🔥
@TheDecrypted
@TheDecrypted 4 жыл бұрын
Nice talk !
@dedipyamandas3735
@dedipyamandas3735 7 жыл бұрын
This was awesome. Thank you dropbox.
@jmitesh01
@jmitesh01 4 жыл бұрын
Informative talk.. enjoyed the guessing game!
@sukeeshv
@sukeeshv 6 жыл бұрын
Awesome!
@OmarQunsul
@OmarQunsul 2 жыл бұрын
It's worth noting that Dropbox is now also using AWS for storing their Metadata, or big part of their Metadata, using DynamoDB and AWS S3. And if I am not mistaken, they are not using S3 anymore for file storage. So it's the other way around
@grandhirahul
@grandhirahul 2 жыл бұрын
Wait, where are they storing the files ? In Dynamo db?
@OmarQunsul
@OmarQunsul 2 жыл бұрын
@@grandhirahul no, in their data centers. only meta data on AWS. At least that's my last updated info
@techmind9608
@techmind9608 3 жыл бұрын
AND HERE I AM WATCHING IT IN 2021
@kingofwebguru
@kingofwebguru 2 жыл бұрын
Is there a doc version of this video, or similar, e.g. slides, webpages?
@orochinagi1111
@orochinagi1111 9 жыл бұрын
great video!!!!
@svhhjfhdcvbg3802
@svhhjfhdcvbg3802 7 жыл бұрын
🔛🔛🔜
@rodgetech
@rodgetech 7 жыл бұрын
Svhhjf Hdcvbg @@@
@prashantdhiru
@prashantdhiru 8 жыл бұрын
the prof. got some swag as seen at 1:03 :p
@ooamiworld5888
@ooamiworld5888 5 жыл бұрын
gavin belson @30:34
@anurag14080
@anurag14080 4 жыл бұрын
what does Noteserver gets data from ? How you maintain the storage for Noteserver
@KalpeshPatel80
@KalpeshPatel80 10 жыл бұрын
Very good presentation
@soulasoula9594
@soulasoula9594 7 жыл бұрын
Kalpesh Patel il2oq 1
@anirudhrowjee1378
@anirudhrowjee1378 3 жыл бұрын
"if you don't use dropbox, welcome to silicon valley...you will soon" This.. this is the guy I'm scared of
@nickhoang6473
@nickhoang6473 3 жыл бұрын
Any idea how dropbox stores blocklist in the SFJ as mysql doesn't support list data type?
@rahulsharma5030
@rahulsharma5030 3 жыл бұрын
@44, why will block server talks talks to metadata server?
@anatoliistepaniuk8217
@anatoliistepaniuk8217 7 жыл бұрын
2012 but looks far older! Stanford can't afford HD camera?
@kimchi_taco
@kimchi_taco 5 жыл бұрын
Looks my daddy's home video
@chang8106
@chang8106 4 жыл бұрын
I think they did on purpose. But the content quality is the thing matters
@MetalSlugSV
@MetalSlugSV 3 жыл бұрын
@@chang8106 Why would you deliberately make your video look bad? lol
@varunmankal4654
@varunmankal4654 6 жыл бұрын
If block server, which is in North Virginia, calls load balancer , wouldn't it cause latency? because it is similar to calling data base which is in texas as described in the lecture.
@uyuo2
@uyuo2 6 жыл бұрын
varun mankal yes, but latency is not a problem for Dropbox - asynchronous
@RahulSathe.07
@RahulSathe.07 5 жыл бұрын
true, but isnt latency what he mentions as the reason why they switched direct DB calls from block server to my-sql? can you clarify this please
@RahulSathe.07
@RahulSathe.07 5 жыл бұрын
agreed. i don't know how latency was avoided just by putting a LB in front.
@dijoxx
@dijoxx 3 жыл бұрын
No. Load balancer latency is negligible. It's not more than what an extra switch or router along the network path would cause.
@goverdhank
@goverdhank 6 жыл бұрын
good talk. nothing extraordinary -- but, it teaches you that great things do not have to be complex. pretty neat and simple architecture
@a55tech
@a55tech 7 жыл бұрын
What does each server type do? Notserver, Metaserver, Blockserver?
@Delohat
@Delohat 6 жыл бұрын
Notification server pings the clients every time there is a change, Metaserver keeps track of metadata in the database, Blockserver handles upload and download of the data.
@haochen9635
@haochen9635 4 жыл бұрын
How does the deduplication get done assuming each client's data is encrypted under its own key?
@StudyWithRishiP
@StudyWithRishiP 3 жыл бұрын
In that case, I think deduplication will find that there is not other copy of client's data at server storage. So it will store client's encrypted one also.
@qqqqqqqqqqqqqqq67
@qqqqqqqqqqqqqqq67 2 жыл бұрын
the underlying data its not relevant. bcs a set of bits decrypted with a key will always give the same result and if you change the key, it will give you a diferent result wich will be equally correct. So you are deduplicating the encrypted data, not the file uploaded.
@CS-eh8eo
@CS-eh8eo Жыл бұрын
Has much changed in the 10 years since this? Kubernetes obviously has entered the scene
@jwang3417
@jwang3417 6 жыл бұрын
1 Notserver can handle 1M connections is impressive. But 1 load balancer cannot handle multiple requests sounds not good.
@jwang3417
@jwang3417 6 жыл бұрын
Also not sure what is usage of namespace (ns_id )?
@ayushraj-zb6sv
@ayushraj-zb6sv 3 жыл бұрын
i am interestd in system design.But i am wondering if it is worth it watching it in 2021?
@alpacino3989
@alpacino3989 2 жыл бұрын
I wish India mein bhi aise industry ppl ko lectures dene de.
@khyatiashah
@khyatiashah 2 жыл бұрын
Indian engineering schools (except IIT) have the worst professors
@xdisruptor6630
@xdisruptor6630 4 жыл бұрын
Am I the only one who thinks this guy is talking in the same style as Elon Musk?
@pankajr141
@pankajr141 9 жыл бұрын
Scale out with time..
@ascendingone
@ascendingone 6 жыл бұрын
You can trade money for time.
@goodwish1543
@goodwish1543 2 жыл бұрын
Good talk, the contest is a little old.
@RandomShowerThoughts
@RandomShowerThoughts Жыл бұрын
lol why does this video look like it was shot in the 60s
@PrateekOjhaOfficial
@PrateekOjhaOfficial 4 жыл бұрын
Who downvotes these videos?
@cristenawashington4002
@cristenawashington4002 7 жыл бұрын
6mhyob
@rameshbabuy9254
@rameshbabuy9254 5 жыл бұрын
video looks too old .
Google system design interview: Design Spotify (with ex-Google EM)
42:13
IGotAnOffer: Engineering
Рет қаралды 999 М.
Must-have gadget for every toilet! 🤩 #gadget
00:27
GiGaZoom
Рет қаралды 4,8 МЛН
2000000❤️⚽️#shorts #thankyou
00:20
あしざるFC
Рет қаралды 16 МЛН
Increíble final 😱
00:37
Juan De Dios Pantoja 2
Рет қаралды 90 МЛН
Facebook and memcached - Tech Talk
27:56
Meta Developers
Рет қаралды 231 М.
Scaling Instagram Infrastructure
51:12
InfoQ
Рет қаралды 276 М.
A New Niche - The Dropbox Story
7:35
Business Unmasked
Рет қаралды 3,7 М.
20 System Design Concepts Explained in 10 Minutes
11:41
NeetCode
Рет қаралды 885 М.
Money: Humanity's Biggest Illusion
17:53
Aperture
Рет қаралды 2,2 МЛН
Radio Hacking: Cars, Hardware, and more! - Samy Kamkar - AppSec California 2016
51:12
Bill Moggridge: Designing Interactions
1:24:19
Stanford
Рет қаралды 46 М.
Scaling Pinterest • Marty Weiner • GOTO 2014
45:53
GOTO Conferences
Рет қаралды 25 М.
Must-have gadget for every toilet! 🤩 #gadget
00:27
GiGaZoom
Рет қаралды 4,8 МЛН