How Disney Hotstar Captures One Billion Emojis!

  Рет қаралды 105,505

ByteByteGo

ByteByteGo

3 ай бұрын

Get a Free System Design PDF with 158 pages by subscribing to our weekly newsletter: bit.ly/bytebytegoytTopic
This video is based on engineering blog post: blog.hotstar.com/capturing-a-...
Animation tools: Adobe Illustrator and After Effects.
Checkout our bestselling System Design Interview books:
Volume 1: amzn.to/3Ou7gkd
Volume 2: amzn.to/3HqGozy
The digital version of System Design Interview books: bit.ly/3mlDSk9
ABOUT US:
Covering topics and trends in large-scale system design, from the authors of the best-selling System Design Interview series.

Пікірлер: 120
@mechwarrior4793
@mechwarrior4793 3 ай бұрын
1969: Put a man on the moon with the power of a calculator 2024: Process emojis with the power of multiple tech stacks, complex algorithms and the processing power of a multiple data centers
@ReadThisOnly
@ReadThisOnly 3 ай бұрын
crazy
@riyaddecoder
@riyaddecoder 3 ай бұрын
Its all about the data size
@pabss3193
@pabss3193 3 ай бұрын
this
@user-tj2we8ke8p
@user-tj2we8ke8p 3 ай бұрын
This is my favourite video so far, please do more of these case studies!!
@drhdev
@drhdev 3 ай бұрын
I just want to thank you guys for the content you create. Your newsletter has invaluable information that is hard to find elsewhere. Keep it up!
@eduardoluissantos1359
@eduardoluissantos1359 2 ай бұрын
Love how you can share that much value in less than 5 mins!
@redhood7105
@redhood7105 3 ай бұрын
Consistently interesting content, consistently! Always learning tons from your content. Thank you!
@pieter5466
@pieter5466 3 ай бұрын
One of my favorite videos, particularly since I started a side project to learn Golang and Kafka!
@juanitoMint
@juanitoMint 2 ай бұрын
What did you learn so far? Repo?
@ToastRusk
@ToastRusk 3 ай бұрын
All these technologies serving emojis LMAO
@LawZist
@LawZist 3 ай бұрын
They serve high load
@tengahhidup
@tengahhidup 3 ай бұрын
It's not emojis, it's one Billion emojis
@mm345-0
@mm345-0 3 ай бұрын
The value clearly isn't in the emoji's, but the contextual data (not to mention the engagement). If advertisers can get real-time feedback on what happened and what the immediate sentiment is - it provides context that other channels (ie. purchases, forums/tweets/etc) can't. Individually one emoji's pretty worthless, but aggregated it seems like it would be pretty invaluable. I'd imagine the feedback loop for users would be valuable.
@b-u-n
@b-u-n 3 ай бұрын
This seems like a really expensive problem that sampling could've made cheaper
@b-u-n
@b-u-n 3 ай бұрын
you can't display every emoji so there is a % chance that someone will not see their emoji displayed. this means you can move that % to the frontend, reject similar submissions, and save 90% on your processing and 70% on your development costs.
@shanehanna
@shanehanna 3 ай бұрын
Cool and all but you could just sample a small percentage of the input to produce the output in a single service and nobody watching the emoji stream would be able to tell the difference. The vast majority of the reaction clicks could be dropped on the floor client side and the user experience will feel the same. Unless the gathered reaction data is the product you are selling to advertisers or something but even then just state it's sampled.
@ricardoamendoeira3800
@ricardoamendoeira3800 3 ай бұрын
This comment is the difference between intelligence and wisdom. Your solution would be much cheaper and simpler to run, maybe with a dynamic toggle to stop sampling if the number of events was low enough, such as a match with low viewership.
@biaozhang1643
@biaozhang1643 3 ай бұрын
Hotstar needs to calculate how "hot" the game is. Sampling wouldn't work.
@ismbks
@ismbks 3 ай бұрын
why not? @@biaozhang1643
@borstenpinsel
@borstenpinsel 3 ай бұрын
Sampling works for TV. They say "x million watched the match last night" but they have no way to tell. At least that's how it works here on Germany. Selected homes have a box that recognises the channel and sends it over radio signal. You get paid a few bucks for your voluntary participation "​@@biaozhang1643
@seannewell397
@seannewell397 3 ай бұрын
I have a feeling they would've done that if it were clear and easy. There's something to spark being able to do those time slices for them I think. That being said, ofc in general build a simple one service thing and make it do the job, then see what breaks where cause it's probably the data model or data handling and if you can tune the data structure and/or algo you can run very very far with just one go process or a cluster of go node for example (still all one logical web server). idk wasn't on the team tho.
@abhilashkr1175
@abhilashkr1175 3 ай бұрын
Spark has a lot of operational overhead
@SreenathV
@SreenathV 3 ай бұрын
In adddition to Spark's over head, we could use Kafka streams to calculate the emoji count (simple use case) and there by one less service in the architecture. For comlex computation, Apache Spark stands better as it is highly performant, clustered, statefull stats, windowing features etc.
@seannewell397
@seannewell397 3 ай бұрын
But the time slice is VERY good for this, basically aggregates for you over a sliding time window to do sentiment analysis.
@faizandhami
@faizandhami 3 ай бұрын
Great understanding
@MilanVVVVV
@MilanVVVVV 3 ай бұрын
Honestly a weird stack with a lot of unnecessary overhead/complexity on multiple places.. Also don't know why they opted for such different technologies across the stack - i.e. literally a single Elixir/Erlang cluster w/ builtin batch processing (if necessary) could do the exact same job, only simpler and more resiliently.
@dietsodalite3716
@dietsodalite3716 3 ай бұрын
I kinda feel the same way
@ketaminefairy
@ketaminefairy 3 ай бұрын
IMO it's not even the fact that it's so complex that makes it bad, it's the fact that they put so many engineers hours into building such a complex solution for a feature that at the end of the day pretty much doesn't make a difference to your users
@dietsodalite3716
@dietsodalite3716 3 ай бұрын
@@ketaminefairy like the big value add seems to be aws emr processing
@dputra
@dputra 3 ай бұрын
​@@ketaminefairy not a big value for users, but it provides invaluable data for companies about users sentiments. We are talking about 1 billion emojis here 😊
@ketaminefairy
@ketaminefairy 3 ай бұрын
@@dputra You are right about that, I personally wouldn't think much of the data but I can see how they would love to gather it. That being said, their infra was built to feed this data back in real time to end users. If they only wanted the data for analysis, the go api + kafka and some processor like flink would have been more than enough IMO.
@paulotcj
@paulotcj 3 ай бұрын
This seems to be overcomplicated for its purpose intention. Additionally, while taking all data available would be highly desirable, missing reactions/emojis due to server bottlenecks would be barely noticeable.
@juanitoMint
@juanitoMint 2 ай бұрын
Now that I've seen the vid I would implement it using a leaky bucket approach WDYT?
@VaibhavShewale
@VaibhavShewale 3 ай бұрын
that was really awesome!
@grkuntzmd
@grkuntzmd 3 ай бұрын
Go Go! I learned Go in early 2016 and instantly fell in love with it. One of the languages that I learned very early in in my software engineering career was C and Go felt like C with many modern additions.
@Kats0unam1
@Kats0unam1 3 ай бұрын
Go is the way indeed.
@markxavior
@markxavior 3 ай бұрын
​@@Kats0unam1rust and zig is coming
@Kats0unam1
@Kats0unam1 3 ай бұрын
@@markxavior I like zig too.
@artursvancans9702
@artursvancans9702 3 ай бұрын
Any reasons why HTTP requests and not websockets? Also, why python consumer and not golang if you already have a golang ingress service in the front?
@truthprevails899
@truthprevails899 9 күн бұрын
@ByteByeGo 1:49 if the emojis are written in batches then while showing the order will be missed, right ? i.e. emojis won't be displayed in the same order it was received. right ?
@akshaykumar-yk2ri
@akshaykumar-yk2ri Ай бұрын
This video was asked as a HLD interview question to me
@sampath2054
@sampath2054 3 ай бұрын
what is the use of Amazon s3 here ? do we storing the emojis ?
@Su_Has
@Su_Has 3 ай бұрын
Why can't we just use Kafka itself in the last pub/sub part? How is MQTT different?
@soumalya
@soumalya 3 ай бұрын
I want a video on how hotstar handles massive live viewership in their platform
@girishhalemani5947
@girishhalemani5947 3 ай бұрын
Does this added in bytebytego weekly newsletter?
@pavelpikat8950
@pavelpikat8950 3 ай бұрын
Why run different vendors for messaging? If you already use and operate Kafka, why introduce MQTT as a separate pub/sub?
@JvK20008
@JvK20008 3 ай бұрын
Probably because it's designed for low bandwidth, low processing devices. It's fine to have a go service pick up a http post, because that's on their infrastructure that they control, but sending sentiments back to unknown devices, it's probably a little more reliable
@HarshalRinge
@HarshalRinge 3 ай бұрын
MQTT is used here as a replacement of websocket
@AvinashRaj
@AvinashRaj 3 ай бұрын
How will you create the topics in Kafka?
@chigozie123
@chigozie123 3 ай бұрын
A messaging queue would make more sense than Kafka. I would also get rid of Spark and the last Kafka.
@shadabbahadara
@shadabbahadara 3 ай бұрын
Which software can we use to make such presentations?
@HarishIndia123
@HarishIndia123 3 ай бұрын
Awesome
@LawZist
@LawZist 3 ай бұрын
When vol3 is coming out?
@_man.on.strings__
@_man.on.strings__ 3 ай бұрын
create a video on how hotstart workks handles loads of request and viewoers with no lag
@pennyether8433
@pennyether8433 3 ай бұрын
Assuming there are 1m emojis submitted per second, is Spark getting sent this many updates? It seems like a rather simple problem, given that the number of emojis is extremely limited. The main issue is handling the concurrency of millions of users, but beyond that aggregation is simple. I don't see why there is so much plumbing just to do a simple aggregation, especially one that is restricted to a window of time. You have to load balance anyway, so why not have each load balancing node do a simple aggregation across 100ms windows, then send their result (eg: a simple JSON of {timestart: , timeend: , smile: 1500, anger: 1,000, sad: 500, ... } to a single main aggregator capable of handling N requests per 100ms (where N is number of load balancing nodes)? From there it can send results to consumers, and also write to a persistent database.... any simple db could handle 10 (small) writes per second. I don't understand why it has to be as complicated as this is.
@jyotiprakash6932
@jyotiprakash6932 3 ай бұрын
you might have to replicate batch processing layer in every node(application layer under LB) instead why not do it centrally? there might be more metadata associated with emoji which you can aggregate centrally in a central layer?
@atharvbhadange6871
@atharvbhadange6871 3 ай бұрын
Well, in my opinion giving LB additional tasks apart from routing requests can hamper it's throughput.
@pennyether8433
@pennyether8433 3 ай бұрын
@@atharvbhadange6871Disagree. The additional overhead is negligible: Incrementing a count in a hashmap (of size 20 or so), and shooting out a single message every 100ms. Even if you have to add another 10% more LB nodes, which would be among the absolute worse case here, it still saves you from the rest of the inflated architecture that was described.
@DavidHarned1
@DavidHarned1 3 ай бұрын
Why wouldn't they just semi-fake it? Its not like they're going to render millions of emojis client-side per second. Its not mission critical to make sure no emojis are missed server side. Just make it so that client side it always animates the emoji you've sent so that it *feels* like it actually sent. Then limit the backend to only accept a maximum amount of requests per second.
@pennyether8433
@pennyether8433 3 ай бұрын
@@DavidHarned1 Excellent point! A relatively small random sampling would be sufficient.
@cakitomakito3979
@cakitomakito3979 3 ай бұрын
how they can reach that performance by using python not other language can someone explain please? is there any specific reason to choose python
@chansamonejk5379
@chansamonejk5379 3 ай бұрын
I like this presentation of you. I want a slide presentation of you. please
@bforbiggy
@bforbiggy 3 ай бұрын
"System outages are a thing of the past" I like the optimism but... yeah we both know that's not true
@catcoder12
@catcoder12 3 ай бұрын
Things are a lot better now although. I don't remember the last time KZbin was ever down even after massively popular live streams or video releases
@venkatamunnangi1287
@venkatamunnangi1287 3 ай бұрын
Why use spark over flink?
@varshard0
@varshard0 3 ай бұрын
I think it's because Spark is built for batch processing while Flink is for stream processing. No need to process individual event as a stream. Imo, they are interchangably in this case as micro batches with Spark.
@QckSGaming
@QckSGaming 3 ай бұрын
@@varshard0 Also it's highly likely they had everything listed here already running so they retrofitted existing solutions.
@thecastiel69
@thecastiel69 3 ай бұрын
What is the usecase of S3 here? 4:09
@steephengeorge
@steephengeorge 3 ай бұрын
Data store for sentiment analysis
@zoravursingh5617
@zoravursingh5617 3 ай бұрын
they could just capture 1/10th of the emojis, generate 10x similar traffic with some noise added. ni one woukd be able to tell
@Dr_Larken
@Dr_Larken 3 ай бұрын
Free cookies! And you don’t need cache!
@AshuSinghIN
@AshuSinghIN 3 ай бұрын
you had me at kafka, lost me at spark
@mrluthfianto
@mrluthfianto 3 ай бұрын
why not just use simple queues and tail the latest 100 emojis to the client, while aggregating the rest in the background?
@airliners321
@airliners321 3 ай бұрын
Editing problem at 0:38?
@MAtukulis
@MAtukulis 3 ай бұрын
same on 0:00 it says "day" instead of "today"
@saffanalvy
@saffanalvy 3 ай бұрын
Dude! Why are you showing the picture of my Bangladesh's cricket captain getting stamped by Sri Lankan wicket keeper? Lol. This is awkward. Nice video btw. A question though. What will happen if the golang api service is replaced by a rust api service? I know rust is fast but how will it effect the entire pipeline based on performance? I'm assuming it'll increase. What's your opinion on this?
@npcs
@npcs 3 ай бұрын
I don't understand why they're using both kafka and pub/sub here
@buzzprime93
@buzzprime93 3 ай бұрын
make one on how Instagram handles uploading millions of photos
@sharinganuser1539
@sharinganuser1539 3 ай бұрын
I felt a lot of over engineering aspects ,but on low level there can be nuances and so the arch...but then we are just talking about high volume of emojis so idk.
@mohali4338
@mohali4338 3 ай бұрын
Seems the most inefficient design I have seen recently. Just put a redis and pub sub iot core or similar. Kafka, spark, EMR are good for analytics only
@ankeshkapil3129
@ankeshkapil3129 3 ай бұрын
I don't think this is very difficult to do as don't need to care about data consistency and nobody cares even if you loss track of few emoji's
@morthim
@morthim 3 ай бұрын
ive never heard of hotstar before. oh god oh dear god why?
@_soundwave_
@_soundwave_ 3 ай бұрын
Is kafka free?
@aakarshan4644
@aakarshan4644 3 ай бұрын
It's open source under Apache
@user-gz6ml8ts3f
@user-gz6ml8ts3f 3 ай бұрын
free as in? you would obviously need the infra to host it. If you want to self host it, you should have a team with the expertise to ensure the availability of the cluster. There are cloud hosted services such as MSK on aws and confluent provides its own set of clusters and solutions, but its way too costly lol
@dtvind
@dtvind 3 ай бұрын
Yes for local dev, not as production or cloud
@user-gz6ml8ts3f
@user-gz6ml8ts3f 3 ай бұрын
@@dtvind You can run a kafka cluster which can be self hosted on any production environment. It's open source. LoL
@vikingthedude
@vikingthedude 3 ай бұрын
Free as in freedom
@jww0007
@jww0007 3 ай бұрын
guys look you haven't sat down with the problem to think & not even discussed with a team & you dont know their constraints (time, money, devs) & even if it's over engineered you don't know the org structures or incentives that lead to that, maybe it's a promotion thing youre missing too much context, dont act like you can watch a 5min video and critique it no matter how intuitive your suggestion is
@user-kw9cu
@user-kw9cu 3 ай бұрын
Golang 💪
@UmeshKumar-iw7bx
@UmeshKumar-iw7bx 3 ай бұрын
How reddit works????
@andru5054
@andru5054 3 ай бұрын
they should use lmao-lang
@jazeem10
@jazeem10 3 ай бұрын
I think, its a perfect example of how to over engineer
@skyhappy
@skyhappy 3 ай бұрын
Why did they use Python instead of go? could use one language for everything
@ricardoamendoeira3800
@ricardoamendoeira3800 3 ай бұрын
Conway's law is probably at play here.
@LSC132
@LSC132 3 ай бұрын
​@@ricardoamendoeira3800could you explain a bit more? How that applies here?
@OmanDev
@OmanDev 3 ай бұрын
One simple word *libraries*
@skyhappy
@skyhappy 3 ай бұрын
@@OmanDev you're making baseless assumptions
@margs_aviation2153
@margs_aviation2153 3 ай бұрын
Probably it’s easier to connect the pub sub using python
@MahmoudSeoudiCEO
@MahmoudSeoudiCEO 3 ай бұрын
🥲
@window.location
@window.location 3 ай бұрын
Is this a joke ?
@dietsodalite3716
@dietsodalite3716 3 ай бұрын
Why design a python consumer that will publish to pubsub?
@Daniel-xi2eb
@Daniel-xi2eb 3 ай бұрын
I have the same problem
@igboman2860
@igboman2860 3 ай бұрын
So why not use go all through the architecture? Why did they choose python to read the aggregation topics?
Good APIs Vs Bad APIs: 7 Tips for API Design
5:48
ByteByteGo
Рет қаралды 195 М.
How To Choose The Right Database?
6:58
ByteByteGo
Рет қаралды 276 М.
ОДИН ДОМА #shorts
00:34
Паша Осадчий
Рет қаралды 6 МЛН
I PEELED OFF THE CARDBOARD WATERMELON!#asmr
00:56
HAYATAKU はやたく
Рет қаралды 36 МЛН
didn't want to let me in #tiktok
00:20
Анастасия Тарасова
Рет қаралды 12 МЛН
unPAC: QUIPS Re-Launch
24:19
Q-PAC
Рет қаралды 13
How ChatGPT Works Technically | ChatGPT Architecture
7:54
ByteByteGo
Рет қаралды 695 М.
Linux File System Explained!
5:16
ByteByteGo
Рет қаралды 165 М.
Back-Of-The-Envelope Estimation / Capacity Planning
8:32
ByteByteGo
Рет қаралды 85 М.
Top 7 Most-Used Distributed System Patterns
6:14
ByteByteGo
Рет қаралды 226 М.
10 Viral PowerPoint Presentations 🚀
5:58
SlideSkills
Рет қаралды 224 М.
How Does Linux Boot Process Work?
4:44
ByteByteGo
Рет қаралды 495 М.
Top 6 Tools to Turn Code into Beautiful Diagrams
3:24
ByteByteGo
Рет қаралды 526 М.
Эффект Карбонаро и бумажный телефон
1:01
История одного вокалиста
Рет қаралды 2,4 МЛН
Introducing GPT-4o
26:13
OpenAI
Рет қаралды 4,3 МЛН
wyłącznik
0:50
Panele Fotowoltaiczne
Рет қаралды 13 МЛН
IPad Pro fix screen
1:01
Tamar DB (mt)
Рет қаралды 3,7 МЛН
Apple. 10 Интересных Фактов
24:26
Dameoz
Рет қаралды 117 М.