Check out the Textbook for this series: ner.pythonhumanities.com/intro.html
@bruinjoe2 жыл бұрын
Thank you for these fantastic videos. The generate_rules function needs to be updated for SpaCy 3 comment out the lines: ruler = EntityRuler (nlp) nlp.add_pipe ( ruler) and replace with this line: ruler = nlp.add_pipe ( "entity_ruler" )
@TwitchTheHamster3 жыл бұрын
This line: _item = item.replace("The", "").replace("the", "").replace("and", "").replace("And", "")_ will affect names like "Andrew" or "Theodore" as well. It might be better to include the space after "The" or "And" in the replace pattern.
@python-programming3 жыл бұрын
Great catch! You are absolutely right.
@berrodriquez262 жыл бұрын
this guy is a legend
@ricardocalleja3 жыл бұрын
In line 80 nlp.add_pipe(ruler) gave me an error, I changed the hole function to: def generate_rules(patterns): nlp = English() source_nlp = spacy.load("en_core_web_sm") nlp.add_pipe("ner", source=source_nlp) ruler = EntityRuler(nlp) ruler.add_patterns(patterns) nlp.to_disk("hp_ner")
@python-programming3 жыл бұрын
I am needing to update these videos to spaCy 3. That was thebold spaCy 2 syntax. Glad you figured it out!
@rasmuseberleyАй бұрын
but this just loads the normal ner with its default categories, right? I just tried it and am not sure its working as intended.. Thanks so much, anyway, I already spent so much time trying to follow this and get it to work..
@rasmuseberleyАй бұрын
based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video kzbin.info/www/bejne/Z2e4m5aXncRgnpI
@rajeshwarsehdev23183 жыл бұрын
Great tutorials! :) Why can't we use BERT models for NER extraction? Is there any specific reason for when to use spaCy? pls help me to understand
@python-programming3 жыл бұрын
You absolutely can! BERT is a trade off. It is expensive to train, but has higher accuracy metrics. In spaCy 3, you can train BERT NER models. This series will be getting to that soon.
@mohamedelmeziani40123 жыл бұрын
thank you so much for this great tutorial. I want to ask you why you have used the NPL model instead of the created model name in the testing function?
@python-programming3 жыл бұрын
No problem! I am not sure I understand your question. What is the time in the video you are referencing?
@ShubhamKumar-pn7qx2 жыл бұрын
In function : test_model, how doc = nlp(text) not giving an error when it is not passed as a parameter and what's use of passing model in same function
@takamatoga Жыл бұрын
same
@dhairyaumrania49862 жыл бұрын
Thanks for this tutorial, very helpful!
@python-programming2 жыл бұрын
So glad you found it helpful!!
@shmouel4747 Жыл бұрын
Thanks for the brillant tutorial! I would like to add pattern multiple time. However, if I only use the ruler.add_patterns function it returns "name ruler is not defined" and If i execute ruler = nlp.add_pipe("entity_ruler") it returns "entity ruler already exist in pipeline"
@abdelhomi8363 жыл бұрын
Thank you for this great tutorial. Would it be possible if we could create a sub-entity pattern based on the main the ruler patterns?
@DeepakChauhan-mn5jw3 жыл бұрын
While browsing through your playlist videos I noticed that the 6 videos in Neural Networks for Digital Humanities (DH) and Machine Learning for Digital Humanities (DH) are the same. Maybe the whole playlist is duplicated.
@python-programming3 жыл бұрын
Thanks for letting me know! I originally i lntended for the two to go different directions. I need to get back to that.
@abedatascience38403 жыл бұрын
How did you get all names of people as a knowledge base/dictionary of names in the beginning? Did you just get ready dictionary or smth else? thanks
@python-programming3 жыл бұрын
Good question. I may need to explain that a bit more clearly. Sorry about that. I gathered the names from Wikipedia using BeautifulSoup and Requests (en.wikipedia.org/wiki/List_of_Harry_Potter_characters). That was the original knowledge base.
@abedatascience38403 жыл бұрын
@@python-programming Thanks make sense
@1qazxsw20103 жыл бұрын
Thank you for these videos. I'm following along using my own data, from which I want to retrieve objects or NOUNS instead of names, but I noticed that this method is case sensitive, so I tried converting both my text and json list to upper case and I got wrong results (when my json list had a matching case letter, the results were perfect). How can we make spaCy case insensitive?
@python-programming3 жыл бұрын
Thanks for the comment! I am glad you are finding these videos useful. A standard way to do this is either do data augmentation by creating an entity ruler that has upper case, lower case, capitalized, and non capatilized words. Or, and this is less conputationally expensive, use only lowercase and make sure to lower your texts before running the entity ruler over it. Or you can use a patternmatcher in spaCy that will automatically do that by passing a pattern with lower attached as a parameter. If that does not help, let me know and I will give a better response when I get to a conputer on Tuesday
@1qazxsw20103 жыл бұрын
@@python-programming Great, thank you for clarifying. I just tried with lowercase and it happened the same. Then I realized I was doing it wrong. I was doing it like: for item in data: new = item.upper() This was wrong. I tried with: for i in range(len(data)): data[i] = data[i].upper() And it worked! thank you for making me try it another way. I also tried it with lowercase, but I did not see much of an issue with the performance. Those workflows are pretty interesting, I'd love to see them in action! hehehe
@vinitaverma56763 жыл бұрын
Hey your video is top notch. You've used json file in this, so can we use excel sheet instead of json file?
@python-programming3 жыл бұрын
You absolutely could. I would recommend using pandad to import the data. If you do not know how to use pandas, I have a playlist and book on it. Pandas.pythonhumanities.com
@vinitaverma56763 жыл бұрын
@@python-programming Thanks for replying. One more thing in one of the video i.e. Train a Spacy NER model, you've used hp_training_data.json could you provide that json file as I'm getting error such as ValueError: not enough value to unpack(expected 2).
@miladjurablu90322 жыл бұрын
thank you for this great video i have question how can I use EntityRuler for persian(farsi) language?
@rasmuseberleyАй бұрын
For anyone trying this with V3: based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video kzbin.info/www/bejne/Z2e4m5aXncRgnpI
@gokulgupta10213 жыл бұрын
Hi, a really nice way to explain every single thing, simply loved it. When I am trying to implement generate rules function. I am getting an error. "ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline."
@surajgupta6962 Жыл бұрын
def generate_rules(patterns): nlp = English() ruler = nlp.add_pipe("entity_ruler") #ruler = EntityRuler(nlp) ruler.add_patterns(patterns) #nlp.add_pipe(ruler) nlp.to_disk("hp_ner") patterns=create_training_data('hp_character.json','PERSON') generate_rules(patterns) use this code
@memsbdm9125 Жыл бұрын
@@surajgupta6962 u dropped this 👑
@gurkanyesilyurt4461 Жыл бұрын
I got this error: OSError: [E050] Can't find model 'hp_ner'. It doesn't seem to be a Python package or a valid path to a data directory.
@sabririhab93833 жыл бұрын
i have a question : can the same custom trained model work on multiple languages if my dataset has words from different latin languages using the same tags ?
@python-programming3 жыл бұрын
Good question. Yes and no. It will work but it needs to have encountered the other langs in training to learn context. You can use an entity ruler that would be language agnostic
@sabririhab93833 жыл бұрын
@@python-programming thank you for responding ! alright , so if i am using both english and frensh that should be fine
@python-programming3 жыл бұрын
That should be fine.
@haniehmaroofi94523 жыл бұрын
Thanks for the videos. I am pretty new with SpaCy. Trying to run the codes from your textbook, encountered this error: ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline. Then updating the nlp.create_pipe("entity_ruler") with nlp_add_pipe("textcat") as suggested by error and get this new error: AttributeError: 'TextCategorizer' object has no attribute 'add_patterns' Do you have any idea what could be wrong?
@python-programming3 жыл бұрын
These videos are for spacy 2.0. They upgraded to 3.0 earlier this month. I am working on updating the notebook and videos. Of you pip install spacy 2.0, the code will work
@egomalego3 жыл бұрын
@@python-programming What would I need to change in order to use my patterns in the model with V3?
@egomalego3 жыл бұрын
This actually worked for me (found here: spacy.io/api/entityruler ) : entity_ruler = nlp.add_pipe("entity_ruler") entity_ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)
@python-programming3 жыл бұрын
@@egomalego Thanks for sharing this! I was just about to link to my new spaCy 3.0 text classification training that introduces viewers to the new config system.
@haniehmaroofi94523 жыл бұрын
@@egomalego Thank you very much!
@kevinclements73053 жыл бұрын
Is there a Spacy 3.0 version of of this code showing the differences around nlp.add?
@rahuldey63693 жыл бұрын
These entity extraction task is somewhat like a multiclass classification task. Is it so?
@python-programming3 жыл бұрын
That is how I look at it.
@rahuldey63693 жыл бұрын
@@python-programming ok thank you
@PC-my3hl3 жыл бұрын
How can I add more patterns to an existing model, for example, if I want to add more patterns to hp_ner, how can I do it?
@python-programming3 жыл бұрын
I am on my phone responding. I can give a better response Monday, but essentially you open your saved model use the get_pipe method to grab the entiry ruler and add patterns to ut the same way you did initially. You then save the new model and you should have those patterns saved. Alternatively you can do it by opening the jsonl file that contains the patterns and placing them on new lines.
@PC-my3hl3 жыл бұрын
@@python-programming Thank you, can you give a more specific explanation?
@Dreamer-xj3ms2 жыл бұрын
How to create a dialog flow management model
@dd_annonymous33333 жыл бұрын
What if we want to add this to an existing model such as en_core_web_lg?
@python-programming3 жыл бұрын
No problem. Load the lg model and then add the entity pipe (I would suggest before the NER pipe)
@dd_annonymous33333 жыл бұрын
@@python-programming where would that be done in your code for example? I’m assuming I’m the generate_rules function?
@python-programming3 жыл бұрын
I have two videos on how to do it for spacy 2 and 3. Here is the spacy 3 video kzbin.info/www/bejne/fHSapp2dr5KLrdk
@dd_annonymous33333 жыл бұрын
@@python-programming you have been a great help! Thank you. Love the videos and gave ya a follow on twitter lol