A Gentle Introduction to Topic Modelling, Latent Semantic Analysis and Latent Dirichlet Allocation

Рет қаралды 3,412

Joshgun Sirajzade

Күн бұрын

Пікірлер: 11

@bilmezonlar 2 жыл бұрын

One of my favorite topic, thanks.

@joshgun_sirajzade 2 жыл бұрын

😂😂 It is great to hear! Tell me, what is your background in? I mean, what do/did you study? Are you more interested from humanities point of view or more from computer scientific?

@joeybasile545 7 ай бұрын

Great video. The music is groovey and actually not distracting once I got into the video.

@ozycozy9706 2 жыл бұрын

Just wonder if publishing companies are using this technique for taxonomy.

@joshgun_sirajzade 2 жыл бұрын

Well, I guess it depends on particular publishing company. However, I would definitely say yes in the broader sense, and not only publishing companies, but also many others like internet companies, law firms, cords and many others. The thing is that the phenomenon giving us the topics can be used in all other things, like search, clustering and classification. For example, like I say in my video about document term matrix, it can be used both for identifying similar documents or similar words. Believe me when I say that the modern algorithms like BERT which is in use in google search engine leverages the same or similar phenomenon although with a different techniques (here deep learning). So, the question comes down to which specific algorithm and which purpose. Presenting topics like words (or terms) in a cloud is just the tip of ice and is more or less famous, however some may see it also critically and go to more standard taxonomy like "field names", "disciplines" and so on. You can also find some information in my newest publication: www.springerprofessional.de/deep-mining-covid-19-literature/23606068

@ozycozy9706 2 жыл бұрын

@@joshgun_sirajzade Thanks for sharing the link.I really like springprof books. I am big fan of SVD, as I remember that is being used in NLP too. need to remember those :)

@ah1548 2 жыл бұрын

Ok, it's very, very gentle - fine. But please, for the love of God, don't use background music!

@joshgun_sirajzade 2 жыл бұрын

Thank you so much for the feedback! I did not know or consider that the music can be distractive.🤣I will try to put the next videos without it...

@carneirouece 2 жыл бұрын

:)))

@joeybasile545 7 ай бұрын

I'd say that the order of the words in a document do matter for topic generation... that is, for precision. Sure, we can generalize and do so accurately. We can get the bins that are categories, but precision involves the subsets, yeah? And the order of the words can orchestrate particular meaning. This is how my mind works, at least. Yet, when we are attempting to automate this process, it may seem unncessary. But, I think that the machine's inference/categorization capabilitiy is of course increased when considered, as, for example, it could point out incoherence of a document, even if it has key words that would place it in a particular topic/multiple topics (depending on what the hell you're doing, I suppose). Please do let me know your thoughts. I'm wanting to learn more about this space.

@joshgun_sirajzade 7 ай бұрын

Thank you for the nice comment! Your thoughts are absolutely correct. I think, even today there is no consensus for a definition of „topics“, whether in Computer Science nor in the Humanities (especially in linguistics for example where there are much more precise definitions for „words“, „sentences“ or „text“). The most common idea is that words in a text express topics. The order of words was always a debate. Whilst it can give a topic more precise shape (like with what words you start and end might be relevant), ignoring it generalizes more and creates less number of topics, which might be handy if someone has to look at the topics or for inferencing as you correctly pointed out. Moreover, the latest algorithms like Transformers (I might create a video about that, too) which are used in chatbots, make indeed a great use of the word oder, wich are called there positional embeddings. That is why when chatting they can find the exact topic you are talking to them. So, I guess, it would be more beneficial to consider word order. Another point is as you might have already guessed, topic modeling is intimately related to other techniques of text mining and language modeling. Considering them all together might make answers to such questions easier. For that, my video about Document-Term-Matrix can be helpful. Thank you again for a great question and don't hesitate to ask new ones.