Amazing content as always dude. Love how much in depth you go in all of your videos! My favorite channel of all by far! Have recommended this to several friends.
@jordanhasnolife51632 жыл бұрын
Thanks Snehil!
@Ms4521232 жыл бұрын
Sshhh Man's been hiding the gun show this whole time. Giga Chad on the low
@jordanhasnolife51632 жыл бұрын
Gotta do it to compensate for my miniscule peen
@cc-to2jn Жыл бұрын
dude u along with neetcode are my goto. Great content, and clear explanations.
@jordanhasnolife5163 Жыл бұрын
I appreciate it!!
@mickeyp12917 ай бұрын
as allways great videos - 18:00 forgot all that stuff about ESs caching so thanks for the reminder, gonna reread that part in ES docs. great job knowing about lucene most of my applicants have no clue about ES definitely not that lucene is not a db but a search engine (hate the json syntax, but what can you do) again, super fun to listen to your vids and watch this content
@jordanhasnolife51637 ай бұрын
Thanks Mickey!
@shivamsinha6422 жыл бұрын
liked solely for the description
@SwapnilSuhane3 ай бұрын
great depth of core search design discuss with bit comedy ;)
@RandomShowerThoughts Жыл бұрын
16:00 exactly, I was thinking the same thing. Typically write to the source of truth, and use a queue to send it out to the various locations
@mickeyp12917 ай бұрын
today you assume the queue is the source of truth, then you spill into s3
@RandomShowerThoughts Жыл бұрын
16:00 we can also use debezium (for certain databases) and that would write to kafka and listen on that topic
@jordanhasnolife5163 Жыл бұрын
I'll have to look into this! Haven't had the privilege of using Kafka during my career so haven't heard of debezium
@RandomShowerThoughts Жыл бұрын
@@jordanhasnolife5163 it’s pretty cool, I used it at my last company. We used debezium to capture changes from the database using the WAL, it would then write to a Kafka topic and we can read off it. The one downside here is that it writes all the messages into a single topic, and a single partition to ensure ordering. So the approach you mentioned of writing directly to Kafka will allow us to write to multiple partitions if needed (allowing more parallelization)
@kyabia233310 ай бұрын
amazing, very helpful
@yashagarwal82496 ай бұрын
Will Search Service pull the actual documents from the DB once it receives the documents Ids from cache/search index?
@jordanhasnolife51636 ай бұрын
Yep!
@maxmanzhos84116 ай бұрын
Wrote a long comment about how a posting list (documents containing a term) is implemented as a skip-list + encoding as per apache/lucene github repo Lucene99PostingsFormat. As I was wondering why we can't use similar idea for follower/following list storage in news feed problem (from System Design 2). But it's only viable if you either store the data in Lucene (I guess no one does that with this purpose in mind) or if you have a full control over DB code, so that you can do such advanced customization over a column (also not practical). nice guns
@jordanhasnolife51636 ай бұрын
Interesting, haven't heard of that data structure but would agree that it may be an overoptimization. Thanks, I work hard on the guns haha
@idobleicher Жыл бұрын
I liked your videos, new sub!
@anupamdey4893 Жыл бұрын
Love your content ! Keep up the good work !!
@jordanhasnolife5163 Жыл бұрын
Thanks Anupam!!
@AmolGautam6 ай бұрын
Thanks giga bro
@jordanhasnolife51636 ай бұрын
np gigachad
@neethielizabethjoseph3 ай бұрын
Don't we need a parser/lexer service between kafka and search index that parses the tweets, hashes it to the correct partitions of the search index ?
@jordanhasnolife51633 ай бұрын
Something like elastic search will do this for us, hence why I don't explicitly include it.
@raj_kundalia6 ай бұрын
thank you!
@kamalsmusic2 жыл бұрын
If we use the local index (meaning each node stores term -> [doc id's] and multiple nodes can reference the same term), does this mean we need to query all the nodes to answer a search query? How do we know which nodes have the term we are interested in if we are not partitioning by term?
@jordanhasnolife51632 жыл бұрын
Yes, you have to query them all and aggregate. It's unfortunate, but there's typically too much data to shard by term as opposed to document.
@axings110 ай бұрын
@@jordanhasnolife5163 could we first partition by term, then further partition into multiple shards if a single term has too much data?
@user-zc7os7on7k9 ай бұрын
i dont understand most of the things. but thanks for the video.
@jordanhasnolife51639 ай бұрын
feel free to elaborate
@FarhanKhan-wu3fq Жыл бұрын
Did you really just "NOPQRS"ed to figure out what comes after P?
@jordanhasnolife5163 Жыл бұрын
I am dumb
@neek63272 жыл бұрын
Hey man, qq. I was wondering if you thought it would be important in an interview to mention how we know which machine holds which partition? I was thinking maybe we could have a distributed search/index service that maintains the mappings between the partition -> machine. And that mapping could be made consistent across the “search/index service” nodes via a consensus algo or maybe zk. Does this make sense at all or am I missing something? Maybe it’s the local secondary indexes that take care of the problem I’m describing and I just don’t understand 🤷♂️
@neek63272 жыл бұрын
Like rather than relying on local index, if we knew which machine held which partition couldn’t we just go directly to the correct shard and perform a binary search?
@jordanhasnolife51632 жыл бұрын
Yes you would use zookeeper or a gossip protocol to keep track of which docs are held on which partition. Though this shouldn't really matter since we have to query each partition anyways.
@neek63272 жыл бұрын
Hmm sorry, maybe this is going over my head. Why is it that we need to query each partition if we know exactly the partition that contains the word we’re looking for? Like say someone is searching the word “gigachad” and we know that machine 1 holds the partition range with that word in it. Couldn’t we go directly to machine 1 and perform a binary search there rather than querying all the shards? Maybe my understanding is off?
@jordanhasnolife51632 жыл бұрын
@@neek6327 We aren't partitioning that way here - we are partitioning by groups of document Ids, not term. While in theory, partitioning by term is optimal, the reality is that there are often too many document IDs associated with one term to fit on a given machine, and as a result we have no choice really but to use local indexes on a group of documents.
@neek63272 жыл бұрын
Got it, that makes sense. Thanks 🙏
@RandomShowerThoughts Жыл бұрын
Grokking the system design sucks at this question ngl, searched for a solution right after reading it
@eudaimonian94732 жыл бұрын
Gigachad42 in da house
@jordanhasnolife51632 жыл бұрын
I've actually evolved to gigachad43 now
@art4eigen932 жыл бұрын
interviewee: Api design going to be pretty tiny Interviewer: How much tiny? Interviewee: You know....
@jordanhasnolife51632 жыл бұрын
This guy gets it 😙
@RandomShowerThoughts Жыл бұрын
00:40 lmaooooo the day in my life as a software engineer videos are cringey af