How to Use spaCy's EntityRuler (Named Entity Recognition for DH 04 | Part 01)

  Рет қаралды 29,462

Python Tutorials for Digital Humanities

Python Tutorials for Digital Humanities

Күн бұрын

Пікірлер: 62
@python-programming
@python-programming 3 жыл бұрын
Check out the Textbook for this series: ner.pythonhumanities.com/intro.html
@bruinjoe
@bruinjoe 2 жыл бұрын
Thank you for these fantastic videos. The generate_rules function needs to be updated for SpaCy 3 comment out the lines: ruler = EntityRuler (nlp) nlp.add_pipe ( ruler) and replace with this line: ruler = nlp.add_pipe ( "entity_ruler" )
@TwitchTheHamster
@TwitchTheHamster 3 жыл бұрын
This line: _item = item.replace("The", "").replace("the", "").replace("and", "").replace("And", "")_ will affect names like "Andrew" or "Theodore" as well. It might be better to include the space after "The" or "And" in the replace pattern.
@python-programming
@python-programming 3 жыл бұрын
Great catch! You are absolutely right.
@berrodriquez26
@berrodriquez26 2 жыл бұрын
this guy is a legend
@ricardocalleja
@ricardocalleja 3 жыл бұрын
In line 80 nlp.add_pipe(ruler) gave me an error, I changed the hole function to: def generate_rules(patterns): nlp = English() source_nlp = spacy.load("en_core_web_sm") nlp.add_pipe("ner", source=source_nlp) ruler = EntityRuler(nlp) ruler.add_patterns(patterns) nlp.to_disk("hp_ner")
@python-programming
@python-programming 3 жыл бұрын
I am needing to update these videos to spaCy 3. That was thebold spaCy 2 syntax. Glad you figured it out!
@rasmuseberley
@rasmuseberley Ай бұрын
but this just loads the normal ner with its default categories, right? I just tried it and am not sure its working as intended.. Thanks so much, anyway, I already spent so much time trying to follow this and get it to work..
@rasmuseberley
@rasmuseberley Ай бұрын
based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video kzbin.info/www/bejne/Z2e4m5aXncRgnpI
@rajeshwarsehdev2318
@rajeshwarsehdev2318 3 жыл бұрын
Great tutorials! :) Why can't we use BERT models for NER extraction? Is there any specific reason for when to use spaCy? pls help me to understand
@python-programming
@python-programming 3 жыл бұрын
You absolutely can! BERT is a trade off. It is expensive to train, but has higher accuracy metrics. In spaCy 3, you can train BERT NER models. This series will be getting to that soon.
@mohamedelmeziani4012
@mohamedelmeziani4012 3 жыл бұрын
thank you so much for this great tutorial. I want to ask you why you have used the NPL model instead of the created model name in the testing function?
@python-programming
@python-programming 3 жыл бұрын
No problem! I am not sure I understand your question. What is the time in the video you are referencing?
@ShubhamKumar-pn7qx
@ShubhamKumar-pn7qx 2 жыл бұрын
In function : test_model, how doc = nlp(text) not giving an error when it is not passed as a parameter and what's use of passing model in same function
@takamatoga
@takamatoga Жыл бұрын
same
@dhairyaumrania4986
@dhairyaumrania4986 2 жыл бұрын
Thanks for this tutorial, very helpful!
@python-programming
@python-programming 2 жыл бұрын
So glad you found it helpful!!
@shmouel4747
@shmouel4747 Жыл бұрын
Thanks for the brillant tutorial! I would like to add pattern multiple time. However, if I only use the ruler.add_patterns function it returns "name ruler is not defined" and If i execute ruler = nlp.add_pipe("entity_ruler") it returns "entity ruler already exist in pipeline"
@abdelhomi836
@abdelhomi836 3 жыл бұрын
Thank you for this great tutorial. Would it be possible if we could create a sub-entity pattern based on the main the ruler patterns?
@DeepakChauhan-mn5jw
@DeepakChauhan-mn5jw 3 жыл бұрын
While browsing through your playlist videos I noticed that the 6 videos in Neural Networks for Digital Humanities (DH) and Machine Learning for Digital Humanities (DH) are the same. Maybe the whole playlist is duplicated.
@python-programming
@python-programming 3 жыл бұрын
Thanks for letting me know! I originally i lntended for the two to go different directions. I need to get back to that.
@abedatascience3840
@abedatascience3840 3 жыл бұрын
How did you get all names of people as a knowledge base/dictionary of names in the beginning? Did you just get ready dictionary or smth else? thanks
@python-programming
@python-programming 3 жыл бұрын
Good question. I may need to explain that a bit more clearly. Sorry about that. I gathered the names from Wikipedia using BeautifulSoup and Requests (en.wikipedia.org/wiki/List_of_Harry_Potter_characters). That was the original knowledge base.
@abedatascience3840
@abedatascience3840 3 жыл бұрын
@@python-programming Thanks make sense
@1qazxsw2010
@1qazxsw2010 3 жыл бұрын
Thank you for these videos. I'm following along using my own data, from which I want to retrieve objects or NOUNS instead of names, but I noticed that this method is case sensitive, so I tried converting both my text and json list to upper case and I got wrong results (when my json list had a matching case letter, the results were perfect). How can we make spaCy case insensitive?
@python-programming
@python-programming 3 жыл бұрын
Thanks for the comment! I am glad you are finding these videos useful. A standard way to do this is either do data augmentation by creating an entity ruler that has upper case, lower case, capitalized, and non capatilized words. Or, and this is less conputationally expensive, use only lowercase and make sure to lower your texts before running the entity ruler over it. Or you can use a patternmatcher in spaCy that will automatically do that by passing a pattern with lower attached as a parameter. If that does not help, let me know and I will give a better response when I get to a conputer on Tuesday
@1qazxsw2010
@1qazxsw2010 3 жыл бұрын
@@python-programming Great, thank you for clarifying. I just tried with lowercase and it happened the same. Then I realized I was doing it wrong. I was doing it like: for item in data: new = item.upper() This was wrong. I tried with: for i in range(len(data)): data[i] = data[i].upper() And it worked! thank you for making me try it another way. I also tried it with lowercase, but I did not see much of an issue with the performance. Those workflows are pretty interesting, I'd love to see them in action! hehehe
@vinitaverma5676
@vinitaverma5676 3 жыл бұрын
Hey your video is top notch. You've used json file in this, so can we use excel sheet instead of json file?
@python-programming
@python-programming 3 жыл бұрын
You absolutely could. I would recommend using pandad to import the data. If you do not know how to use pandas, I have a playlist and book on it. Pandas.pythonhumanities.com
@vinitaverma5676
@vinitaverma5676 3 жыл бұрын
@@python-programming Thanks for replying. One more thing in one of the video i.e. Train a Spacy NER model, you've used hp_training_data.json could you provide that json file as I'm getting error such as ValueError: not enough value to unpack(expected 2).
@miladjurablu9032
@miladjurablu9032 2 жыл бұрын
thank you for this great video i have question how can I use EntityRuler for persian(farsi) language?
@rasmuseberley
@rasmuseberley Ай бұрын
For anyone trying this with V3: based on the update video, the function should look like this: def generate_rules(patterns): nlp=English() ruler = nlp.add_pipe("entity_ruler") ruler.add_patterns(patterns) # type:ignore (because VSCode somehow cant find this function, although it works) nlp.to_disk("hp_ner") link to update video kzbin.info/www/bejne/Z2e4m5aXncRgnpI
@gokulgupta1021
@gokulgupta1021 3 жыл бұрын
Hi, a really nice way to explain every single thing, simply loved it. When I am trying to implement generate rules function. I am getting an error. "ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline."
@surajgupta6962
@surajgupta6962 Жыл бұрын
def generate_rules(patterns): nlp = English() ruler = nlp.add_pipe("entity_ruler") #ruler = EntityRuler(nlp) ruler.add_patterns(patterns) #nlp.add_pipe(ruler) nlp.to_disk("hp_ner") patterns=create_training_data('hp_character.json','PERSON') generate_rules(patterns) use this code
@memsbdm9125
@memsbdm9125 Жыл бұрын
​@@surajgupta6962 u dropped this 👑
@gurkanyesilyurt4461
@gurkanyesilyurt4461 Жыл бұрын
I got this error: OSError: [E050] Can't find model 'hp_ner'. It doesn't seem to be a Python package or a valid path to a data directory.
@sabririhab9383
@sabririhab9383 3 жыл бұрын
i have a question : can the same custom trained model work on multiple languages if my dataset has words from different latin languages using the same tags ?
@python-programming
@python-programming 3 жыл бұрын
Good question. Yes and no. It will work but it needs to have encountered the other langs in training to learn context. You can use an entity ruler that would be language agnostic
@sabririhab9383
@sabririhab9383 3 жыл бұрын
@@python-programming thank you for responding ! alright , so if i am using both english and frensh that should be fine
@python-programming
@python-programming 3 жыл бұрын
That should be fine.
@haniehmaroofi9452
@haniehmaroofi9452 3 жыл бұрын
Thanks for the videos. I am pretty new with SpaCy. Trying to run the codes from your textbook, encountered this error: ValueError: [E966] `nlp.add_pipe` now takes the string name of the registered component factory, not a callable component. Expected string, but got (name: 'None'). - If you created your component with `nlp.create_pipe('name')`: remove nlp.create_pipe and call `nlp.add_pipe('name')` instead. - If you passed in a component like `TextCategorizer()`: call `nlp.add_pipe` with the string name instead, e.g. `nlp.add_pipe('textcat')`. - If you're using a custom component: Add the decorator `@Language.component` (for function components) or `@Language.factory` (for class components / factories) to your custom component and assign it a name, e.g. `@Language.component('your_name')`. You can then run `nlp.add_pipe('your_name')` to add it to the pipeline. Then updating the nlp.create_pipe("entity_ruler") with nlp_add_pipe("textcat") as suggested by error and get this new error: AttributeError: 'TextCategorizer' object has no attribute 'add_patterns' Do you have any idea what could be wrong?
@python-programming
@python-programming 3 жыл бұрын
These videos are for spacy 2.0. They upgraded to 3.0 earlier this month. I am working on updating the notebook and videos. Of you pip install spacy 2.0, the code will work
@egomalego
@egomalego 3 жыл бұрын
@@python-programming What would I need to change in order to use my patterns in the model with V3?
@egomalego
@egomalego 3 жыл бұрын
This actually worked for me (found here: spacy.io/api/entityruler ) : entity_ruler = nlp.add_pipe("entity_ruler") entity_ruler.initialize(lambda: [], nlp=nlp, patterns=patterns)
@python-programming
@python-programming 3 жыл бұрын
@@egomalego Thanks for sharing this! I was just about to link to my new spaCy 3.0 text classification training that introduces viewers to the new config system.
@haniehmaroofi9452
@haniehmaroofi9452 3 жыл бұрын
@@egomalego Thank you very much!
@kevinclements7305
@kevinclements7305 3 жыл бұрын
Is there a Spacy 3.0 version of of this code showing the differences around nlp.add?
@rahuldey6369
@rahuldey6369 3 жыл бұрын
These entity extraction task is somewhat like a multiclass classification task. Is it so?
@python-programming
@python-programming 3 жыл бұрын
That is how I look at it.
@rahuldey6369
@rahuldey6369 3 жыл бұрын
@@python-programming ok thank you
@PC-my3hl
@PC-my3hl 3 жыл бұрын
How can I add more patterns to an existing model, for example, if I want to add more patterns to hp_ner, how can I do it?
@python-programming
@python-programming 3 жыл бұрын
I am on my phone responding. I can give a better response Monday, but essentially you open your saved model use the get_pipe method to grab the entiry ruler and add patterns to ut the same way you did initially. You then save the new model and you should have those patterns saved. Alternatively you can do it by opening the jsonl file that contains the patterns and placing them on new lines.
@PC-my3hl
@PC-my3hl 3 жыл бұрын
@@python-programming Thank you, can you give a more specific explanation?
@Dreamer-xj3ms
@Dreamer-xj3ms 2 жыл бұрын
How to create a dialog flow management model
@dd_annonymous3333
@dd_annonymous3333 3 жыл бұрын
What if we want to add this to an existing model such as en_core_web_lg?
@python-programming
@python-programming 3 жыл бұрын
No problem. Load the lg model and then add the entity pipe (I would suggest before the NER pipe)
@dd_annonymous3333
@dd_annonymous3333 3 жыл бұрын
@@python-programming where would that be done in your code for example? I’m assuming I’m the generate_rules function?
@python-programming
@python-programming 3 жыл бұрын
I have two videos on how to do it for spacy 2 and 3. Here is the spacy 3 video kzbin.info/www/bejne/fHSapp2dr5KLrdk
@dd_annonymous3333
@dd_annonymous3333 3 жыл бұрын
@@python-programming you have been a great help! Thank you. Love the videos and gave ya a follow on twitter lol
@python-programming
@python-programming 3 жыл бұрын
Thanks!
@insanecbrotha
@insanecbrotha Жыл бұрын
cool series but also serious spaghetti code lol.
@rodiaz1566
@rodiaz1566 Жыл бұрын
Holy shit dude you lost me
How to Use spaCy to Create an NER training set (Named Entity Recognition for DH 04 | Part 02)
10:32
Python Tutorials for Digital Humanities
Рет қаралды 21 М.
Adding ENT_TYPE patterns into an EntityRuler in spaCy
14:02
Python Tutorials for Digital Humanities
Рет қаралды 3,9 М.
Car Bubble vs Lamborghini
00:33
Stokes Twins
Рет қаралды 31 МЛН
How to Train a spaCy NER model (Named Entity Recognition for DH 04 | Part 03)
15:40
Python Tutorials for Digital Humanities
Рет қаралды 19 М.
Training a custom ENTITY LINKING model with spaCy
28:24
Explosion
Рет қаралды 50 М.
What are Word Vectors (Named Entity Recognition for DH 06)
18:49
Python Tutorials for Digital Humanities
Рет қаралды 7 М.
Introduction to Named Entity Recognition (NER for DH 01)
16:43
Python Tutorials for Digital Humanities
Рет қаралды 34 М.
Data Analysis with Python for Excel Users - Full Course
3:57:46
freeCodeCamp.org
Рет қаралды 3 МЛН
SPACY v3: Custom trainable relation extraction component
38:11
When to use NER, EntityRuler, SpanCat, or SpanRuler in spaCy
10:49
Python Tutorials for Digital Humanities
Рет қаралды 3,9 М.
Rules Based NER in Python  (Named Entity Recognition for Digital Humanities 02)
20:50
Python Tutorials for Digital Humanities
Рет қаралды 15 М.