The EASIEST! way to do Text Classification with spaCy and Classy Classification

  Рет қаралды 17,595

Python Tutorials for Digital Humanities

Python Tutorials for Digital Humanities

Күн бұрын

Пікірлер: 44
@python-programming
@python-programming 2 жыл бұрын
Repo: github.com/wjbmattingly/fewshot-text
@kosemekars
@kosemekars 2 жыл бұрын
Best text-related ML channel on youtube
@python-programming
@python-programming 2 жыл бұрын
Thank you so much!
@ОлегШенкер-з8ш
@ОлегШенкер-з8ш 5 ай бұрын
Oh, really? Did you manage to see something else?
@wdonno
@wdonno 2 жыл бұрын
You are reading my mind! Looking forward to this!
@python-programming
@python-programming 2 жыл бұрын
Awesome! I hope you like it!
@VitthalGusinge
@VitthalGusinge 2 жыл бұрын
i am just searching for best NER algorithms since last two dasy for my usecase can't wait to see what you have it here
@python-programming
@python-programming 2 жыл бұрын
This won't focus on NER, but there is a few-shot NER from the same company called concise_concepts. I have tested it and found it good for some labels and bad for others.
@giantdutchviking
@giantdutchviking Жыл бұрын
Thanks for making this vid, been learning Python for a bit and this stuff makes Python shine!
@nguyenngochai6245
@nguyenngochai6245 2 жыл бұрын
Thank you very much for sharing! Love it. May I ask would it be possible to add more classes to the data ? It would be even more awesome If it could be done for other non-English language models.
@python-programming
@python-programming 2 жыл бұрын
Yes it will be possible to add other classes and you can use any language model on hugging face
@nguyenngochai6245
@nguyenngochai6245 2 жыл бұрын
@@python-programming Thank you for your instant reply! I have successfully tried it with the "ja_core_news_lg" model, but I could not get a satisfactory result out of the Japanese sentence-transformers model. Do you have any tips for choosing the appropriate models?
@python-programming
@python-programming 2 жыл бұрын
@@nguyenngochai6245 no problem! I will test it out today
@shahidmahmood7252
@shahidmahmood7252 2 жыл бұрын
Good knowledge, shared wonderfully. Looks like a great module. Now thinking of all the applications in works of English literature. thanks!
@python-programming
@python-programming 2 жыл бұрын
Thanks!
@transflux-us
@transflux-us 2 жыл бұрын
I was trying to identify "local indicators of climate change impacts" (what changes people observe in their environment -... not city people... :D ) in a database of scientific articles. results are ok. its hard, but it might use as a pre-scan
@python-programming
@python-programming 2 жыл бұрын
That is really interesting!
@luiztauffer8513
@luiztauffer8513 2 жыл бұрын
This is gold material, thanks so much for putting this out in such a comprehensive way! @Python Tutorials for Digital Humanities In one of your videos you mentioned you do research in History, is that right? I’m curious to know how people are using text classification methods such as this in History research, do you have any material you could point me out to?
@python-programming
@python-programming 2 жыл бұрын
Thanks!! Yes, my background is a PhD in medieval history but I mostly work with archival material at Smithsonian and USHMM. A lot of the publications you can find in history with text classification deal with sentiment analysis. You can find articles on Digital Humanities Quarterly and the Oxford Digital Humanities journal.
@Hypothermia1337
@Hypothermia1337 2 жыл бұрын
Hello Dr. Mattingly, do you know if it's possible, to fine-tune a pre trained model? I'm really not familiar with that but I need to tweek a model with a few exceptions. Yours Sincerly
@python-programming
@python-programming 2 жыл бұрын
It is! If you want to fine tune a language model that can be done via Gensim or the Transformer library from HuggingFace. If you want to fine tune NER you will have some problems, namely catastrophic forgetting.
@victordeleon9988
@victordeleon9988 2 жыл бұрын
Great video, thanks a lot. Do you recommend any models in spanish besides those already available in spacy?
@python-programming
@python-programming 2 жыл бұрын
No problem! It depends on what you are trying to do, there are some great BERT models for Spanish. You can find them on HuggingFace's website.
@victordeleon9988
@victordeleon9988 2 жыл бұрын
@@python-programming Great, thanks a lot, your channel is awesome.
@python-programming
@python-programming 2 жыл бұрын
@@victordeleon9988 Thanks!!
@ezrakassa3472
@ezrakassa3472 2 жыл бұрын
Cant wait. Is it multiple or binary classification though? I am hoping there would be a multiple classification as there is an elaborated video you did on binary classification?
@python-programming
@python-programming 2 жыл бұрын
This will be binary, but it works for multi-class just as well. Remember when you use few-shot classification, you are not doing traditional supervised learning. Instead, you are using the vectors of a support set (not training set) to then auto-identify similar vector sentences. The similarities are then scored so that you know how much something belongs to a certain category. The more classes that you have, the more support samples you need. I recommend using it to get a quick sense of your data and generate a starting data set quickly to then train a new model via supervised learning. This video is meant to serve as my transition into multi-class classification on this channel =), so those videos should be coming out shortly. We will use spaCy (simpler) and Keras (more advanced). It multiclass text classification will also receive a whole chapter in my forthcoming book on spaCy ML.
@DK-rl1sf
@DK-rl1sf 2 жыл бұрын
Thank you for this tutorial. I tried saving the trained model using nlp.to_disk('D:/ABC'). But when I load it back using spacy.load('D:/ABC') in a fresh Jupyter Notebook, I get the error "[E002] Can't find factory for 'text_categorizer' for language English (en). This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. ...". I am still in the same conda environment so I can't be missing dependencies. What is causing this problem?
@Filipkasic
@Filipkasic 2 жыл бұрын
Is there a way to utilize this model without having to define what the keywords are but simply to provide a list of them without any definition?
@youTanod
@youTanod 2 жыл бұрын
Thank you very much for this useful video. This is exactly what I need. I tried it with real data, but I get this warning message, what should I do? UserWarning: The least populated class in y has only 1 members, which is less than n_splits=2.
@python-programming
@python-programming 2 жыл бұрын
Can you paste what your support data dictionary looks like?
@youTanod
@youTanod 2 жыл бұрын
@@python-programming drive.google.com/file/d/1WcXuI2a7x_EvTreG5GWOE3lyj3Y9CAPc/view?usp=sharing
@gangs0846
@gangs0846 Жыл бұрын
Is this still relevant comparing to using gpt for classification?
@python-programming
@python-programming Жыл бұрын
That is a great question. Yes, though GPT 4 is better at few shot than this approach. I still think this is useful for getting a quick classifier up and running locally to help annotating.
@gangs0846
@gangs0846 Жыл бұрын
@@python-programming thank you sir
@CoreyMalcom
@CoreyMalcom 2 жыл бұрын
This is a really good tutorial Thank you! I have not been able to get it running so far. When I attempt to "nlp.add_pipe( ) " on the text_categorizer, the kernel crashes and restarts. Any clue as to why this would be happening? I have a fresh environment with spacy and the classy_classification newly installed.
@python-programming
@python-programming 2 жыл бұрын
Thanks! Hmmm that is odd. What is your OS? Mind DMing me on Twitter with some pics?
@CoreyMalcom
@CoreyMalcom 2 жыл бұрын
@@python-programming Sent. Thanks for looking at this. Will be really helpful.
@python-programming
@python-programming 2 жыл бұрын
@@CoreyMalcom no problem! I am in the middle of traveling. Will try and respond tomorrow
@maxwellmandela
@maxwellmandela 2 жыл бұрын
great stuff!
@python-programming
@python-programming 2 жыл бұрын
Thanks!
@szachynakubie4955
@szachynakubie4955 2 жыл бұрын
thank you
@trashyAIguy
@trashyAIguy Жыл бұрын
Cool! I'll use it in my trashy ai to make it less trashy 🤣 to make it understand intentions
@lisagilyarovskaya5593
@lisagilyarovskaya5593 2 жыл бұрын
Thank you very much for this video, was looking for something exactly like this !! I was wondering if there is any way to save the model config on the disk once the pipe with support samples was added, do you have any ideas on that?
How to Easily Add a Coreference Resolution Model into a spCy Pipeline with Crosslingual Coreference
17:07
Python Tutorials for Digital Humanities
Рет қаралды 2,7 М.
The Best Way to do Topic Modeling in Python - Top2Vec Introduction and Tutorial
15:08
Python Tutorials for Digital Humanities
Рет қаралды 30 М.
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 20 МЛН
Арыстанның айқасы, Тәуіржанның шайқасы!
25:51
QosLike / ҚосЛайк / Косылайық
Рет қаралды 669 М.
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
Creating a text classification model in spacy 3x (Topic Modeling in Python for DH 04.02)
19:53
Python Tutorials for Digital Humanities
Рет қаралды 21 М.
Fine-Tuning BERT for Text Classification (Python Code)
23:24
Shaw Talebi
Рет қаралды 8 М.
How I animate 3Blue1Brown | A Manim demo with Ben Sparks
53:41
3Blue1Brown
Рет қаралды 1,1 МЛН
Best way to do Named Entity Recognition in 2024 with GliNER and spaCy - Zero Shot NER
5:01
Python Tutorials for Digital Humanities
Рет қаралды 9 М.
BERTopic Explained
45:14
James Briggs
Рет қаралды 26 М.
Sentence similarity using Gensim & SpaCy in python
14:47
Thursday Content
Рет қаралды 17 М.