Check out the Textbook for this series: ner.pythonhumanities.com/intro.html Playlist on NER: kzbin.info/aero/PL2VXyKi-KpYs1bSnT8bfMFyGS-wMcjesM Playlist on spaCy: kzbin.info/aero/PL2VXyKi-KpYvuOdPwXR-FZfmZ0hjoNSUo
@shivadhanush31313 жыл бұрын
i can see the patience you have trying to make everyone understand bit by bit. Great.. much appreciate
@python-programming3 жыл бұрын
Thanks!
@kiranmore29 Жыл бұрын
The way you teach is really good, I have been following the video from beginning , but can you please share data that you used in this video of training (camp_training_data.json) also its not present in the given git repo (Please reply asap)
@wiktoria51629 ай бұрын
Hi @kiranmore29, Did you get it?
@jeremyzhang86032 жыл бұрын
Sir, this saved me a lot of time! Great videos, thanks!
@python-programming2 жыл бұрын
Awesome!! So happy it helped!
@kehteraho3 жыл бұрын
Hi, Thanks a lot for this series, really appreciate the efforts you are making. Could you please share some more insights on to how the annotations have been done and is there any automated way or smarter way for doing the same.
@maliha_abroad2 жыл бұрын
Do you have a git repository where you share the datasets?
@afroman16112 ай бұрын
If the file is not with you then can you please share or tell the data from which you have created the training data?
@mandaravh75413 жыл бұрын
Can you please share how to train the model in Spacy version 3? because this code doesn't work with V3
@python-programming3 жыл бұрын
Absolutely! It is in the works!
@asfandkhan62068 ай бұрын
Could you share your json file (camp_training_data.json) with us??
@esooghazy3 жыл бұрын
Thanks for your great videos! :) I just have a request. Could you please create a video for the changes in custom training, exactly in the training code in v3? I can see that you wrote in the comments it is in the work since 2 month, but it would be highly appreciated if you can just tell us how to fix the error of nlp.update because when I follow the documentation and use Example. bla bla bla it doesn't work. I also watched all the new videos, but I wasn't able to catch the solution in any of them. Thanks again! :)
@python-programming3 жыл бұрын
Hi! Indeed. I already have a few on this subject, check out the bottom of this playlist: kzbin.info/aero/PL2VXyKi-KpYvuOdPwXR-FZfmZ0hjoNSUo
@StefanoBoyanov11 ай бұрын
@python-programming Could you share your json file (camp_training_data.json) with us? Because we can not go any further in the video without it, unless you have an already prepared data before hand, which is not my case. Thank you in advance
@python-programming11 ай бұрын
Oh no! I will try and find that file. I'm not sure why it's not in the repository. Sorry about that.
@hichembouricha63283 жыл бұрын
The code doesn't work anymore with create_pipe and add_label.. can show us an alternative please?
@python-programming3 жыл бұрын
Thanks! I made a new video for spaCy 3, but I forgot to change these titles to spaCy 2.
@hichembouricha63283 жыл бұрын
@@python-programming ok thanks a lot 🙏 I will check them later
@parthrangarajan32413 жыл бұрын
Is there a way to annotate custom data other than manually performing such a cumbersome task?
@python-programming3 жыл бұрын
Grest question. In other ML applications, you have a lot of liberty with synthetic data or data augmentation to generate or rapidly expand an existing small dataset. Unfortunately, for text there are a lot of issues with these because it is difficult to produce good synthetic data and data augmentation methods are language and domain specific.
@pablovitale60583 жыл бұрын
Hi great serie thank you ! What are the usual good practices for NER with custom entities ? Do I need to create a new NER pipe from scratch and add my custom labels or do I only need to add custom labels on spacy pre-trained model ?
@python-programming3 жыл бұрын
This is a great question and one I get a lot. I will make a video explaining this in more detail, but the short answer is that a custom model is better than using a pretrained model. This is because if you try and train new labels to an existing model, you will experience catastrophic forgetting where the pretrained model quickly forgets old training.
@michaelmohen15008 ай бұрын
Love these videos! Any chance you could send me the camp_training_data.json file? I'd really like to finish this series of videos.
@aniketchatterjee24403 жыл бұрын
Hii, I have my trained data in ".spacy" format how do I load it and train it? Thank you
@vinsmokearifka3 жыл бұрын
Prof, if I try to custom ner model, is dep_ included? I mean, is there no need to custom dep_ also? Thank you
@raymondforce138 Жыл бұрын
Hi, when you train the new NER component, is it initialized based on the transformer or tok2vec? When I try training the ner on its own and append it to the new pipeline, the accuracy is horrible. I see how you do this with internal training, but how do you do with the config system?
@python-programming Жыл бұрын
This is because you need to make sure the vectors of your ner model align withbthe vectors of the rest of the spaCy pipeline. In the config file, you can point the vectors to the same pipeline you are using to ensure they align
@dec136663 жыл бұрын
So by doing this, won't Spacy "default" model "forget"the previous entities (i.e.: 'DATE', 'PERSON', etc), and start "labeling everything as my own, customized entity 'CONC_CAMP'"?
@python-programming3 жыл бұрын
Indeed it will.
@dec136663 жыл бұрын
@@python-programming Wow! That was a quick response! Lol. Well now seriously, what I am doing right now, is kinda similar than what you did in this video (in my case, my model tags only "JOB_SKILLS"), and for me, it seemed very straightforward; however so far, it is labeling pretty much every single word as a "JOB_SKILL". I saw at the end of your video that you just "took a magic better model under your sleeve" and boom! It worked... But I'd love to know HOW could we pass from our first model version to that "best" model. I have already increased the size of my dataset, as well as making sure the number of words is balanced (as it seems to also affect the performance of your model), so those points are already checked. I was checking some blogs in the web to see how could I improve my model (i.e.: www.machinelearningplus.com/nlp/training-custom-ner-model-in-spacy/), and what I have learnt is that, apparently SpaCy kinda forgets older entities and thus, MIGHT cause what's going on in my case. In the provided blog, what the author does is "generating some other labeled examples with other entities, and ADD them tou your own, customized dataset", training and testing, and displaying quite decent results. However and sadly, that and other blogs are set for SpaCy V2.0, and I don't know how could I do something similar for SpaCy V3.0. Any video in your NLP series that would be worth considering, or some other material that could be kept in mind too? Thank you very much and keep it up with the innovative material (and the quick responses! ;) ).
@python-programming3 жыл бұрын
Have you seen my free textbook? It answers all these questions from beginning to end. And most of it is noa spacy 3. Ner.pythonhumanities.com
@dec136663 жыл бұрын
@@python-programming Thank you very much Dr. Mattingly, I'll make sure to check it 😀👍.
@python-programming3 жыл бұрын
No problem! Sorry for my short reply. I am currently cleaning a pool. =)
@FRUXT2 жыл бұрын
I don't understand what we train here ? We give a list of the camps. Why we need to train ? It just has to spot the name in the text to return the entities...
@python-programming2 жыл бұрын
A list will not work with varient spellings, typos, poor OCR, etc. An ML model can account for those, especially a BERT model or other models that leverage subword embeddings
@FRUXT2 жыл бұрын
@@python-programming Ah ok I understand. It will target different typos but will it target other camps that was not in the train that it could target with the context for example ?
@python-programming2 жыл бұрын
Precisely! This is really useful also because sometimes people have unique names for camps in testimonies that are not in lists.
@FRUXT2 жыл бұрын
@@python-programming Thanks for your answer ^^ In my school I have a project about this kind of problematic. The thing is I don't understand how you create you training set, it seems a hard task, isn't it ?
@python-programming2 жыл бұрын
@@FRUXT it can be. A good way to start is to come up with a set or rules to autogenerate a dataset. Use that to train one model. Next load that model into something like Prodigy from the soaCy Team and annotate a gold standard dataset.
@vinsmokearifka3 жыл бұрын
Prof, how the workflow if using another language, not English?
@python-programming3 жыл бұрын
Same workflow but of your proper nouns decline, you need to think about that. Check out my video on Classical Latin NER. I go through the whole workflow. It is helpful even if you do not know Latin.
@vinsmokearifka3 жыл бұрын
@@python-programming thank you Prof
@python-programming3 жыл бұрын
No problem!
@brucechang5068 Жыл бұрын
Hi, what IDE is this?
@python-programming Жыл бұрын
This was the good ole' days of Atom before Microsoft stopped supporting it. I use VS Code now. It is much better than Atom in many ways, but I still miss that IDE.
@brucechang5068 Жыл бұрын
@@python-programming thank you for the reply. I am working on a project that can be converted into a customized NER task. I have no experience in NLP and the research environment has limited tools to use(luckily we have SpaCy). I'm watching your tutorial these days. They are super helpful. Thank you for working on this!
@python-programming Жыл бұрын
@@brucechang5068 I am so happy to hear that! No problem! Glad you are finding them useful! Good luck on your NLP journey!
@ronchristino123 жыл бұрын
Is the code available in a GitHub repo?
@python-programming3 жыл бұрын
Thanks for reminding me! just added it to the repo for this series: github.com/wjbmattingly/ner_youtube Also, keep a look out because I am currently preparing the Jupyter Notebooks for this series as well. nbviewer.jupyter.org/github/wjbmattingly/holocaust_ner_lessons/tree/main/
@ronchristino123 жыл бұрын
Also just one more question. The start and end indexes of the entities in the training data, are they token level or character level?
@python-programming3 жыл бұрын
@@ronchristino12 Great question! They are the start character and end character of the entity in the string. So, no they do not use the token index of the token in the spaCy doc. I've often wondered why this is the case in spaCy, but I suspect it allows for universal training data structure for all languages (because not all languages tokenize and index the same way). Does that answer your question?
@ronchristino123 жыл бұрын
@@python-programming yeah it does. Thank you so much.
@python-programming3 жыл бұрын
@@ronchristino12 No problem! If you ever have any other questions, feel free to continue leaving comments. It helps me figure out what I need to include/discuss in more detail in my videos.
@ricardocalleja3 жыл бұрын
everything fine until line 29. I think the problem is in the nlp.update method ValueError: too many values to unpack (expected 2) for text, annotations in TRAIN_DATA: nlp.update([text],