I am brand new to DAW and soft soft - these tutorials are excellent an very helpful to get soone like up and running. Appreciate
@rvian4 Жыл бұрын
wow the features this bert approach provides really improves explanation of topic models
@dubey_ji Жыл бұрын
i found your channel today and man I must say thank you very good content
@python-programming Жыл бұрын
Thanks so much!! =)
@xevenau Жыл бұрын
Do you happen to have a tutorial that explains how to turn articles into a dataset for topic modeling. Thanks!
@sarasharick52092 жыл бұрын
Great video. I experimented with Top2Vec after that video, so looking forward to experimenting with BERTopic too.
@mmishrafaculty Жыл бұрын
Awesome. That was so informative. And explained so clearly. Thank you so much.
@python-programming Жыл бұрын
Thanks so much! I am planning a new video on BERTopic soon to cover its new features.
@bentobenack22 жыл бұрын
This is incredible, I subscribed to your channel today while looking for topic modeling content, I found very good content. However, I would also like to find something from BERTopic, and a few minutes later after subscribed, I receive a notification from KZbin of your channel, and I said, it can't be true! Thank a lot!
@python-programming2 жыл бұрын
Haha! That is so perfect! Hope this video helps!!
@bentobenack22 жыл бұрын
@@python-programming Definitely helped!
@danieleriahe-him4693 Жыл бұрын
Thanks so much for the high quality content you published so far, your playlist are a gold mine for beginners and enthusiast into the AI field. Have you ever considered making a video to explain principles of creating an efficient dataset for text summarization, or other specific tasks? Many thanks in advance for your consideration!
@wasgeht24092 жыл бұрын
two questions :) 1) Could i write a sentence and they give me after the training the probability for the topic based on the training ? 2) Could i use for example customer requests for training ? in this case you are using a unstructured data. I hope u understand my questions :D
@hankzhong2 жыл бұрын
Great intro, but the default has too many topics to be useful for human understanding, is there a way to reduce the number of topics naturally? Also can we measure perplexity and coherence of these topics like LDA? Thanks
@DoreenGyamfi-i7k Жыл бұрын
this was so informative, thank you.
@python-programming Жыл бұрын
I am so glad it was helpful!
@raziehfadaei480110 ай бұрын
Thank you for your good video. Does BERTopic need any preprocesing like lemmatization or tokenization like LDA?
@TheArnold20022 жыл бұрын
Best video on topic modeling I've seen so far. Can I get all documents related to a topic, instead of just the top 3?
@python-programming2 жыл бұрын
Thanks! Indeed you can. BerTopic has changed a bit since I made this video, so I will have to check the docs but I am certain you can.
@andreasheiner34262 жыл бұрын
Thanks, great tutorial. A question, what's your experience with quality of the model and sentence? Short sentences don't really work (to little semantics), long won't work either (too "much" semantics). Thoughts?
@python-programming2 жыл бұрын
Thanks! And great question. If you are looking for an off the shelf solution try top2vec, but I think you may run into similar issues. What language are your docs? Also, how varied are they in size? A more custom solution may be necessary.
@andreasheiner34262 жыл бұрын
@@python-programming I've standard English web sites, from product reviews to travel reports. Generally a page contains some 10 paragraphs. Content on a page is highly correlated (you'd expect), so the page content is defined by a few paragraphs. The topic of a paragraph is mostly in a single sentence; the rest is "glue". This turns out to be a reasonable assumption (eye balling). BERTopic supports these observations, especially if you remove paragraphs with the topic probability for the most dominant topic less than some cutoff (say 0.6; the reason that, worst case another topic is present for at most 0.4). From experience you're left with 3% unallocated documents; each allocated document has at most 3 topics. This is all nice, assuming BERTopic gives good results for both long and short paragraphs with the same hyper parameters. If my assumption is incorrect I've a problem :( So, thoughts?
@kennethgomes4727 Жыл бұрын
Please can you explain why didnt you use UMAP, HDBSCAN and C-TF-IDF for this?
@python-programming Жыл бұрын
Thanks for the question! You absolutely can. I have a whole other tutorial that walks through each of those steps. I think BERTopic, LeetTopic (my library), and Top2Vec provide a simpler solution for those who may not be familiar with a custom UMAP, HDBScan workflow. I try to make tutorials for users at all levels and I think these other libraries address the needs of those newer to Python/ML.
@somewhereovertherainbow95505 ай бұрын
Thanks!!! very much helpful!
@hosseinahmadi1855 Жыл бұрын
Greeeeeeeeat!. Thanks. Another useful video
@grgr14672 ай бұрын
hi ! where can i find the source file you used?
@suhasp23852 жыл бұрын
Just simply put the code, it works! thanks!
@yashjain28413 ай бұрын
How to run it on dataset with more than 12k rows?It is showing some "correct_alternative_cosine" error. Please help
@KR-good11 ай бұрын
Great presentation.
@johnny_silverhand2 жыл бұрын
Fantastic explanation
@luiztauffer85132 жыл бұрын
Thanks for the amazing content! Do you know if BERTopic could be used to train a model to identify similarity to custom, pre-defined topic?
@python-programming2 жыл бұрын
Thanks! I would not use BERTopic, rather soaCy for text classification. You could use BERTopic to gather data for easy labeling.
@luiztauffer85132 жыл бұрын
@@python-programming thanks a lot, I actually went on to search for it and found another one of your videos explaining EXACTLY what I wanted! For reference it's this one: "The EASIEST! way to do Text Classification with spaCy and Classy Classification" thanks again!
@python-programming2 жыл бұрын
@@luiztauffer8513 haha! Perfect! No problem!
@sohinisarkar19353 ай бұрын
Is it possible define number of topics here ?
@BillVoisine6 ай бұрын
Thank you!!
@mrtn58822 жыл бұрын
Nice tutorial, thank you! If I follow the video correctly, about 25% of your documents are marked as outliers. Is that normal? Can you maybe talk about this a bit in a further video?
@python-programming2 жыл бұрын
Yea that is a bit normal woth BERTopic. I plan to do another video that compares dofferent topic modeling approaches and that will be a key feature
@mrtn58822 жыл бұрын
@@python-programming Great, I’m looking forward to that video! 😊
@emekaobiefuna4509 Жыл бұрын
Great info!
@wasgeht24092 жыл бұрын
Wow
@johnny_silverhand2 жыл бұрын
Best topic model to use for modelling 3000 documents each having 3 pages of text ?
@adambenari39442 жыл бұрын
BERTopic or Top2Vec will both work, but you'll need to reduce your corpus to shorter text. You can use an introduction or conclusion as your text, or perform some summarization before you start modelling
@tantzer61132 жыл бұрын
Does this work for Arabic documents?
@python-programming2 жыл бұрын
As long as there is a BERT model for Arabic, yes. I know there is an NEH funded project for this but I am not sure if it is available yet. There is a lot of research in Arabic NLP so I would be surprised if another does not already exost. I do not have Arabic, though, so I cannot validate the results.
@tantzer61132 жыл бұрын
@@python-programming Thank you for answering.
@flosrv31948 ай бұрын
no way to install this shit, get error popping from everywhere and when i resolve them, thre others appear, unusable crap
@LearnProfessional12 жыл бұрын
is tNice tutorials ASMR?
@niflag2 ай бұрын
So quiet
@olucasharp Жыл бұрын
Comment to say thanks and support this absolutely awesome channel 🪩 Huge thanks and this is sooo clearly explained, good luck ⚡
@python-programming Жыл бұрын
Thank you so much for your support and this wonderful comment!