What's ElasticSearch Used For? | Search Indexes | Systems Design Interview 0 to 1 with Ex-Google SWE

Рет қаралды 17,259

Jordan has no life

Күн бұрын

Пікірлер: 34

@slow2steady Жыл бұрын

This is like a complete show… you learn and laugh

@adrian333dev 7 ай бұрын

My average experience while watching Jordan's content: He starts explaining and in the first minute of the video he mentions something from his previous video, I pause and start watching his previous video and he again mentions something from his past content, and this loop goes on until I say enough is enough

@jordanhasnolife5163 7 ай бұрын

My average answer: start from #1

@adrian333dev 7 ай бұрын

@@jordanhasnolife5163 Saw this coming but starting 60 video series after finishing two system design courses on Udemy feels intimidating

@pakkunhatake 6 ай бұрын

same

@shreesharao7261 Ай бұрын

@@jordanhasnolife5163 You should incorporate some quick recap just like in TV shows so that every video is self contained, unless your intention is to increase views to all your videos ;)

@jordanhasnolife5163 Ай бұрын

@@shreesharao7261 haha it depends, there can be a lot of recap

@Unstoppable_gaur Жыл бұрын

Great Content would like some more of this kind. Appreciate the effort and dedication you try to make this system design videos they are helpful. These videos really made me to fall in love with the System design and I just keep reading blogs and looking out for your new videos for this knowledge.

@jordanhasnolife5163 Жыл бұрын

Thanks Joseph, means a lot!

@mayankchhabra3070 12 күн бұрын

If we take the example of creating a search on top of chats and if we partition it at chat_id wont that lead to an uneven distribution of data? Given elastic search has these shards and it tries to distribute the data evenly across all the shards but if we explicitly route our data to a specific shard (using chat_id in our example) it can lead to uneven distribution of data across shards where one chat might have active and other might be dormant. Just thinking out loud how we would solve for this :P (Probably distribute it evenly by using some composite key but that would defy the purpose to just search chats from one partition)

@jordanhasnolife5163 8 күн бұрын

Using many small partitions and balancing them appropriately I believe tends to be the preferred approach here

@andyborch9886 4 ай бұрын

Man I normally don't laugh at your jokes but this one actually made me laugh, I think it was mainly due to your stare at the end of the intro! 😆

@NghiaPham-o7x 11 ай бұрын

Hi Jordan, great job, learn from you a lot! One thing I don't understand is you are mentioning the global index might be inefficient because we might need to send the document to many partitions. I'm wondering why do we need to send the document to many partitions? What I am thinking is, when a query comes, and we have a node to handle that query, this node will gather document lists from the indexes and merge it into a set of document ids and then query those documents from partitions. Or am I missing something?

@jordanhasnolife5163 11 ай бұрын

Hey! I'm saying that when we upload a document, we have to write to multiple partitions. This is because the document has many words in it!

@NghiaPham-o7x 11 ай бұрын

@@jordanhasnolife5163 I see, so you are talking about write path. Out of curiosity, I'd like to discuss more on options here, as with global index, we can have other approaches: 1. Write the same document to multiple partitions -> as you said that it will make partition meaningless 2. We save the document in one partition + update the global index from other partition with distributed transaction (e.g. 2PC) Is there any flaw from the second approach or it can be used in real system? The second approach is slow on write, and it might be bad for heavy write system like logging, but I think it will benefit for a light write and heavy read system. What do you think?

@jordanhasnolife5163 11 ай бұрын

@@NghiaPham-o7x The second appraoch seems doable, but just consider what happens when we have to write the document to 10 partitions instead of just two haha

@msebrahim-007 3 ай бұрын

I'm not really understanding the difference between of using the local index instead of a global index. It sounds like the reason not to use a global index is because it is possible for a document to be duplicated to multiple partitions, so instead a local index is used with a pointer to a document in-memory. This is where my confusion lies. It doesn't sound like a local index addresses the issue of the document being duplicated onto multiple partitions but instead just references the document locally (but it is potentially on multiple partitions) by using a pointer. If in both cases the document will be duplicated to multiple partitions, why not just use a pointer in the global index case? That way there is no scatter-gather required for a particular word.

@jordanhasnolife5163 3 ай бұрын

To be clear, we're denormalizing the documents. It's not a pointer to the document in the local index case, you're actually storing the document itself there. Otherwise I'd agree with you.

@sahilguleria6976 3 ай бұрын

@@jordanhasnolife5163 can you please explain what does denormalizing the documents means here?

@jordanhasnolife5163 3 ай бұрын

@@sahilguleria6976 I'm not just holding a document id in the search index, I'm holding a decent amount of document data in it

@sahilguleria6976 3 ай бұрын

Elasticsearch partitoning section : In partition 1 we have cherry: 47, 39. So this partition has these documents in memory. Now do the two document 47, 39 stay only in partition 1? If yes, is this how we prevent duplication? Also do all the other tokens in 47, 39 reside in the same partition?

@jordanhasnolife5163 3 ай бұрын

Yes, those documents just stay in that partition, as do the other tokens in 47, 39. Confused what you mean - the same document will always be hashed to the same partition, ideally.

@theblobinc Ай бұрын

I too have no life, thats probably why I find myself here learning about elasticsearch....

@GANJIMAN123 4 ай бұрын

not clear if elastic search uses local index or global index?

@jordanhasnolife5163 4 ай бұрын

Local

@ryan-bo2xi Жыл бұрын

Great job sir !!

@Summer-qs7rq 10 ай бұрын

Amazing video. Thanks for these informative videos. However i have a question related about elastic search. given that scatter gather is difficult to avoid in elastic search. So how much data can it scale to ? like if i want to build search on twitter now the data is growing at rapid pace. Will it be okay to store all the tweets in elastic search ? or if we need retention then what happens to the tweets that are not found in the elastic search ? could you please help answer above questions ?

@jordanhasnolife5163 10 ай бұрын

I think that for twitter, for example, what they would do for example is to index data by timestamp. That way, when you search for something on elastic search, it'll mainly hit the indexes for the last couple of days of data. That way there are fewer posts and there are less things to perform a "scatter/gather" for. You basically just have to be clever about how you want to shard your data.

@Summer-qs7rq 10 ай бұрын

@@jordanhasnolife5163 in case of timestamp are you suggesting to search the key word for latest time and then if it not found then look into different time stamp index ? Wouldnt this make more time consuming ?