How Discord Stores TRILLIONS of Messages

  Рет қаралды 171,022

ByteByteGo

ByteByteGo

Күн бұрын

Пікірлер: 120
@geob7o
@geob7o Жыл бұрын
Hats off for the on call team during these processes 😅
@RakeshBitling
@RakeshBitling Жыл бұрын
This channel is underrated.. actually a lot of useful videos are available
@areeburrehmankhan1166
@areeburrehmankhan1166 Жыл бұрын
Yes dude he has taught me so much like wow .
@wilfredv1930
@wilfredv1930 Жыл бұрын
It is not underrated, for the subject and 500k subs is pretty well rated actually. It is a very well known resource.
@areeburrehmankhan1166
@areeburrehmankhan1166 Жыл бұрын
Amazing content man . Also your animation are really helpful in making things more understandable. Keep up the great work.
@yadav117uday
@yadav117uday Жыл бұрын
you should have discussed how they moved from mongodb to cassandra, its was a bigger engineering challenge
@gradientO
@gradientO Жыл бұрын
Can you brief about why nosql makes sense in this context instead of relational dbs?
@terencepan2232
@terencepan2232 Жыл бұрын
​@@gradientOscaling writes in relational db is hard or not as possible
@flygonfiasco9751
@flygonfiasco9751 Жыл бұрын
@@gradientOthere’s a lot of overhead in maintaining the relationships in a relational database and there’s overhead in ensuring strong consistency
@truevelvett
@truevelvett Жыл бұрын
@@terencepan2232 You can definitely shard but the operational complexity is insane
@kamal-xd7id
@kamal-xd7id Жыл бұрын
Because its not even written in original blog post from Discord.
@ariganeri
@ariganeri Жыл бұрын
Really good content. Just one point - writes go to both the local SSDs and the persistent disks at the same time. Write-mostly in Linux md means that reads go to the other disks, so in this case the local SSDs.
@locfuho3899
@locfuho3899 Жыл бұрын
I don't think that is needed in case of sending and receiving messages, as long as data sync from the write disk come in order, we only need eventual consistency. For example, in a group chat, every body sending and receiving messages, right? As long as the messages stored correctly in the write DB, they can eventually be synced to the read DBs. I mean there is no hard consistency here.
@ivanlee172
@ivanlee172 Жыл бұрын
One thing missing here is how they setup the monitoring system to have such analysis from the running system, which is also an important part. Would like to hear more about this.
@_jeh.k
@_jeh.k 2 ай бұрын
Those animations look sick!!🔥🔥🔥🔥
@king0s
@king0s Жыл бұрын
my takeaways and a question: ScyllaDB with C++ under the hood and no GC was a part of it. Request coalescing is another part. Selecting and deciding on optimal Google Cloud services i.e Persistent disk is another part. The two layered RAID setup is another part. Modifying the Linux kernel to do write on the Persistent disks and reads from the local SSDs was another part(But I'm wondering how they made the changes to the Linux kernel of the Google VMs? anyone help me out here, do they bake the changes to a VM image and deploy the image to the Google VMs?) And finally for the migration using a data migrator written in memory safe and highly performant Rust was another great decision.
@piyushkatariya1040
@piyushkatariya1040 Жыл бұрын
The real beauty of ScyllDB lies in Seastar framework which bypasses OS kernel to achieve close to metal speed. Also, Cassandra 5.0 which will release in few months from now won't have this issues and will much higher throughput as it will be able to compile against Java 17 SDK and might also probably run Java 21 which has Generational ZGC garbage collector which gives you sub milliseconds latency. Cassandra also have the advantages of defining aggregate function in Java or JS and run in Casandra DB instance itself rather than fetching all data in application server which is super expensive.
@bruterasta
@bruterasta Жыл бұрын
On the blog post it was pretty clear, that they were tired of GC alltogheter.
@piyushkatariya1040
@piyushkatariya1040 Жыл бұрын
@@bruterasta Generational ZGC is auto tunable.
@NapoleonPosada
@NapoleonPosada Жыл бұрын
Amazing. I have read about the features of Scylla but this a definitive proof that is an excellent database.
@ankeshkapil3129
@ankeshkapil3129 Жыл бұрын
But costly to run
@hagaiak
@hagaiak 11 ай бұрын
@@ankeshkapil3129 How is a more efficient database more costly to run? The more performant a DB is, the cheaper it is to run, even if you're talking about small workloads
@0xmmn
@0xmmn Жыл бұрын
kudos to make a video to the point and easy to grasp not like other channels doing long videos to hack the youtube algorithm.
@sharmilak5109
@sharmilak5109 Жыл бұрын
Hi your contents are worthy, it was such a meticulous way of explanation of concepts,thank you
@zl7289
@zl7289 Жыл бұрын
I’m just curious, what tools are you using to make this beautiful diagram and fancy effects 😮❤
@flatdinos
@flatdinos Жыл бұрын
next month I'll migrating one of my database, it's so painful to planning and flow designing of migration, but this video help a lot.
@zitronenlolli1
@zitronenlolli1 Жыл бұрын
great content!!
@nareshgb1
@nareshgb1 Жыл бұрын
"discord found themselves in a bit of a pickle"....good one :)
@javisartdesign
@javisartdesign Жыл бұрын
Cool, interesting to watch! learning all the time!
@BenThatOneGuy
@BenThatOneGuy Жыл бұрын
Quality content as always, love this channel
@tubenzr
@tubenzr Жыл бұрын
what an Excellent team to be able pull off that humongous data 🥶
@SimarMannSingh
@SimarMannSingh Жыл бұрын
QUICK QUESTION: I've seen this style of YT video somewhere else too, somewhere in a tech video I remember. Is there a tool you're using to make these kind of videos? Or did you (or someone you hired) edited the video yourself?
@khiariyoussef3226
@khiariyoussef3226 Жыл бұрын
I may not have enough experience with such tasks, but how do they make sure they pick the right database that fits their performance requirements (at that scale) ?
@peterpeter9230
@peterpeter9230 Жыл бұрын
Sometimes these companies have experts (maybe one of the developers of such DB) but at such a scale it's mostly these companies, who test if these DBs can widthstand such a hight load. So I'd call it precision guess work. Compare existing solutions in terms of performance etc., maybe run a extensive tests and then just give it a go.
@NachitenRemix
@NachitenRemix Жыл бұрын
I guess they have TONS of metrics which cann give them the info they exactly need on what they need and don't need to optimizr
@MaulikParmar210
@MaulikParmar210 Жыл бұрын
They didn't - there are Petabyte scale deployments used by many organisations that are larger than Discord uses. Your DB is going to be as good as its integration with the system and use case. This case study only shows discord lacks expertise or is reluctant to take in advisory from experienced people. This is true for Twitter and airbnb, too, when they switched DBs, thinking it would help but didn't. It requires a lot of experience and architectural knowledge of DB itself to make the rught call. In reality tho who make right call are usually not in like light or advertise about their milestones. So it's not an open book that can be simply explained with diagrams while implementation details would vary a lot. P.S. Even large orgs make mistakes or lack skills. They sometimes need external help, and when they dont, they often run into performance issues that are talked about in the public domain. How they pick it? It's mostly biased towards teams experience and decision makers bias towards tech they know until they are ready to move on and do all migrations to new tech, it's always moving, changing so there's no specific reciepie for that.
@georgesmith9178
@georgesmith9178 Жыл бұрын
This actually follows some money logic. First you pick a tool that is good enough and FREE, like Cassandra. If the business develops slowly, then you can get by with this initial solution just fine. But, when you have a booming growth, as is the case with Discord, the free solution is not designed for that type of scale, so you start to think about high-availability, low-latency, and so on, and usually end up with some commercial solution, that may be even be based on your original free solution, but with enterprise features on top. In the case of Discord, they just picked a DB with better engine, and they leveraged a different storage solution in the cloud, ergo a better performing storage. Of course, the explosive growth guarantees money is no longer an issue - you can afford the enterprise solution and the better performing storage because customers are paying :). There is you logic :).
@alooooooola
@alooooooola Жыл бұрын
it think the fact that they changed the behavior of read and write of an SSD make the different. I dont really think changing the DB make that much different, yes it may be better but not so much without the technic of override SSD behavior. And since this is Discord, they migrated from a garbage collector language and move to a non-garbage collector language before (Go to Rust), it could also affect their choice in this case.
@georgesmith9178
@georgesmith9178 Жыл бұрын
Awesome content. Please, tell me what you use for the graphics and animations. They are so fluid.
@rhaba
@rhaba Жыл бұрын
Is it possible that there's a typo at 4:48 and following? There are two md0 devices.
@kiransingh2935
@kiransingh2935 Жыл бұрын
I have literally never seen a company be happy with choosing Cassandra. I know of engineering orgs who have an entire pod of SREs dedicated to nothing but keeping the Cassandra alive.
@JacobSamro
@JacobSamro Жыл бұрын
Scylla is a saviour ❤
@RahulGupta_Grahul
@RahulGupta_Grahul Жыл бұрын
Nice explainer. Can you also please link the OG blogpost in the description?
@gus473
@gus473 Жыл бұрын
💯 Great episode! Interesting takes on this 🐘 project! 😎✌️
@beofonemind
@beofonemind Жыл бұрын
Thanks for this, very helpful.
@strtale
@strtale 4 ай бұрын
Discord doing the backend: 👹 Discord doing the UI/UX: 🥺
@ketaminefairy
@ketaminefairy Жыл бұрын
someone please explain more on the RAID 0 setup? What is the safety net under that? Am I missing/not understanding something?
@leomysky
@leomysky Жыл бұрын
Thank you for the video!
@aunghtayoo337
@aunghtayoo337 Жыл бұрын
"of course. Migration production data is no joke!". Thanks for the quality contents.
@gigakoresh
@gigakoresh Жыл бұрын
Inspiring story. It would be really cool to see how Telegram solves their backend challenges. They have an even greater scale, smaller team with less funding and fastest latency of any messenger. I wish they shared more of their backend, that system is engineering marvel, just like Discord.
@pegasusgemini6541
@pegasusgemini6541 Жыл бұрын
Me too, i'm really impressed on how Telegram is very responsive and optimized
@bjugdbjk
@bjugdbjk Жыл бұрын
Could you make a video on the Data services part which written in Rust ? That will be a quite interesting to many folks !!
@soorkie
@soorkie Жыл бұрын
I love your content. Do we have a discord server for this channel and this community? I would really love to engage more.
@dougphilips8807
@dougphilips8807 Жыл бұрын
I am confused the difference between "Request coalescing" and what your other videos call caching? Since "Request coalescing" is part of the success here I'd like to understand the difference better, thanks!
@touchwithbabu
@touchwithbabu Жыл бұрын
Love your content and respect for your efforts
@punkerIII
@punkerIII Жыл бұрын
Thank you for the video.
@jerryking4777
@jerryking4777 Жыл бұрын
wow, can i ask what after effect template is used for presentation? I would like to present my work in school.
@willianrocha8615
@willianrocha8615 Жыл бұрын
Only 9 days what a huge achievement
@MinhHoang-fu7tt
@MinhHoang-fu7tt Жыл бұрын
Amazing topic!!!
@sagiajaj17
@sagiajaj17 Жыл бұрын
How is their solution different from RAID10? It is essentially raid1 array mirroring a raid0 set. Am i missing something?
@ayotomiwasalau6373
@ayotomiwasalau6373 Жыл бұрын
What pipeline tool did they use for the data migration from Cassandra to Scylla?
@thememmer
@thememmer Жыл бұрын
What tools do you use to create these videos
@RahulGupta_Grahul
@RahulGupta_Grahul Жыл бұрын
I think the boxes at kzbin.info/www/bejne/hWSzqKiweNt0oKssi=3YXYT_VTw2QqriyG&t=343 are a bit misleading. First, probably scylla uses both persistent disks for writes and NVMe(s) for reads, so, really one box should contain both with arrows for reads and writes. Secondly, the message service is partitioned, therefore, different writes go on different instances of scylla instead of a single scylla master node or central cluster which the triangular placement of scylla in diagram suggest.
@DK-ox7ze
@DK-ox7ze Жыл бұрын
If the writes were going to the persistent disk with high latency, then how were they made immediately available (with low latency) to all the intended members of that messaging group?
@daviduzumaki
@daviduzumaki Жыл бұрын
I'm guessing data was also stored in an LRU cache when it was written to the persistent disk
@DK-ox7ze
@DK-ox7ze Жыл бұрын
@@daviduzumaki In that case they might have to again deal with data loss issues which they were facing, and the reason why they introduced persistent storage in the first place.
@blasttrash
@blasttrash Жыл бұрын
3:28 is that like a cache on query parameters?
@nexus888
@nexus888 5 ай бұрын
If Discord would only create a good UX. Having may servers is a nightmare to go through..
@carlosm.1233
@carlosm.1233 Жыл бұрын
What software do you use to create such presentation?
@EdwinFairchild
@EdwinFairchild Жыл бұрын
is discord the only paltform having to store trillions of messages? or data for that matter? how have other companies solved it?
@123mrfarid
@123mrfarid 3 ай бұрын
I think quite a few like meta companies products : whatsapp, insta, fb. Google : google fpm, gmail, google ads, youtube
@bjugdbjk
@bjugdbjk Жыл бұрын
Rust is a superstar !!
@mz7640
@mz7640 Жыл бұрын
Can't they use AWS dynamoDB?
@palkollar7739
@palkollar7739 Жыл бұрын
what do you use for these animations?
@jasonguo7596
@jasonguo7596 Жыл бұрын
Is it ok to move older messages to a separate database periodically? So the main database would not be carrying so much data all the time. And have dedicated service to read messages from that database with historical messages.
@RisalFajar
@RisalFajar Жыл бұрын
When there's a new message or edited old message, how do the app receive the changes? Is it by listening to data on the DB (like a Web Socket)?
@jerkmeo
@jerkmeo Жыл бұрын
That's just awesome sharing. Thanks!
@delucabruno
@delucabruno Жыл бұрын
kudos to the Discord team 👏
@pm71241
@pm71241 Жыл бұрын
Hmm ... As a user of ScyllaDB. This doesn't seem like the most daunting DB migration scenario, I could imagine. I mean... Had it been any other database than Cassandra, the task would have been magnitudes larger.
@spiewajit4ncz
@spiewajit4ncz Жыл бұрын
Why?
@quang.luu.179
@quang.luu.179 Жыл бұрын
excellent conten.
@shinmini99
@shinmini99 Жыл бұрын
thanks for content :)
@luckylove72
@luckylove72 Жыл бұрын
So they didn’t plan to have data warehouse?
@seeball
@seeball Жыл бұрын
That's incredible
@AymanDevops
@AymanDevops Жыл бұрын
An amazing video thanks man, I just have a question about the Data service part, isn't similar to redis or CDN or anyother cache mechanism? I'm just wondered why they called this name and what it does exactly
@lord_nikon_010
@lord_nikon_010 Жыл бұрын
I believe is not like a CDN or Redis. I understand that is an API acting as an interface between the Monolith and the database, decoupling one form another - which allowed to use a language (Rust) more efficient to handle the high-performance data operations.
@5p4rk3r
@5p4rk3r Жыл бұрын
how to run scylla db in the cloud??
@vaddimka
@vaddimka Жыл бұрын
Well it doesn't actually tell HOW it stores trillions of messages (like partitioning, cache organization, hot path issues), just a high level overview of the migration and dbms
@abhishekkumar-ei5hl
@abhishekkumar-ei5hl Жыл бұрын
Please make video in realtime application like figma, Google docs, or multiplayer game
@TheSelectmax
@TheSelectmax Жыл бұрын
Thanks for the content. A little introduction was missing what kind of database this is and why it is better, except for C++ under the hood
@jianruan5491
@jianruan5491 Жыл бұрын
cool,I want to try in my work
@dirac7233
@dirac7233 Жыл бұрын
Scylla is basically a C++ version of Cassandra. This story is one more proof of java's inferiority. Why are people still using Java ?? 🤷🏽‍♂️
@sergiocoder
@sergiocoder 7 ай бұрын
Because they can't learn C++ :)
@carlosharris2681
@carlosharris2681 Жыл бұрын
Friendly programming noob here: Can someone explain why Garbage Collection-free gives Scylla an advantage over Cassandra? Thanks in advanced.
@sergiocoder
@sergiocoder 7 ай бұрын
To my understanding, without a garbage collector there are no stop-the-world pauses to collect the garbage in the background and therefore more computing resources are spent on the actual work of DB.
@nishant_singh
@nishant_singh 11 ай бұрын
Can you make me a pro at syatem design ?? I just love this subject...
@repairstudio4940
@repairstudio4940 Жыл бұрын
To the Discord devs 🥂
@sriharsha580
@sriharsha580 Жыл бұрын
What is a garbage collector, how it is useful for DB’s?
@cherubin7th
@cherubin7th Жыл бұрын
Cassandra is written in Java, so the garbage collector is automatically part of it.
@RicardoSilvaTripcall
@RicardoSilvaTripcall Жыл бұрын
The Garbage Collector is responsible for checking which variables aren't more needed in the program and removing than from memory, the problem is, the Garbage Collector has to run every now and then, and this process requires processing power and will add latency to the main application running, because for the most of the time, the whole application or parts of it has to be stopped and wait for the Garbage Collector to do it's job, even though this process takes a few milliseconds, in a high throughput application this can add up and slow down the whole system. Languages like C++ and Rust have a "manual" memory allocation and deallocation, the programmer is responsible to take care of it before hand, so you won't need a garbage collector at runtime, what results in faster executions ...
@zakk6182
@zakk6182 Жыл бұрын
@@RicardoSilvaTripcallappreciate this
@imMavenGuy
@imMavenGuy Жыл бұрын
9 days - are you kidding me!
@2xKTfc
@2xKTfc Жыл бұрын
So much effort for messages that nobody can ever find again anyway 😂Seriously, Discord scrollback might as well be write-only it's so unusable. :(
@pegasusgemini6541
@pegasusgemini6541 Жыл бұрын
Waouh!
@chrishabgood8900
@chrishabgood8900 Жыл бұрын
Joy != rust
@joaoguilherme-or1ud
@joaoguilherme-or1ud Жыл бұрын
Show!!!!
@RohanDas23
@RohanDas23 Жыл бұрын
How do they store TRILLIONS of messages? Like everyone else, they use DATABASE. Hows that any different of any other services?
@shpluk
@shpluk Жыл бұрын
I can agree it is cool technology wise, but 99.9% of discord messages are garbage, and who will ever scroll back to see any of them? I admire the engineering effort it just seems to me kind of wasted effort
@bluebird3131
@bluebird3131 Жыл бұрын
Not true at all. If you have message garbage on Discord, it your server choice you have to question, not the tool or the others... 🥱
@rolina.azmitiamedrano5137
@rolina.azmitiamedrano5137 Жыл бұрын
👀👌
@nareshb5
@nareshb5 Жыл бұрын
1st viewer😂
@leonchen8317
@leonchen8317 Жыл бұрын
Brother, I love your content, but please, your tone is too flat. It makes me sleepy
How Discord Stores TRILLIONS of Messages
13:06
Coding with Lewis
Рет қаралды 668 М.
Top 6 Most Popular API Architecture Styles
4:21
ByteByteGo
Рет қаралды 999 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
Session Vs JWT: The Differences You May Not Know!
7:00
ByteByteGo
Рет қаралды 324 М.
Simon Sinek's Advice Will Leave You SPEECHLESS 2.0 (MUST WATCH)
20:43
Alpha Leaders
Рет қаралды 2,5 МЛН
«Битва брокеров сообщений: Kafka, RabbitMQ, SQS»
1:57:07
Яндекс Практикум
Рет қаралды 13 М.
Top 7 Ways to 10x Your API Performance
6:05
ByteByteGo
Рет қаралды 357 М.
How To Choose The Right Database?
6:58
ByteByteGo
Рет қаралды 344 М.
Top 5 Most-Used Deployment Strategies
10:00
ByteByteGo
Рет қаралды 291 М.
System Design: Why is Kafka fast?
5:02
ByteByteGo
Рет қаралды 1,1 МЛН
Discord Stores BILLIONS of messages using this database
11:41
Coding with Lewis
Рет қаралды 186 М.
Scalability Simply Explained in 10 Minutes
9:20
ByteByteGo
Рет қаралды 55 М.
HTTP 1 Vs HTTP 2 Vs HTTP 3!
7:37
ByteByteGo
Рет қаралды 361 М.