Training a spaCy SpanCat Model to Annotate in Texts more quickly in Prodigy

Training a spaCy SpanCat Model to Annotate in Texts more quickly in Prodigy | SpanCat 03

Рет қаралды 2,425

Python Tutorials for Digital Humanities

Күн бұрын

Пікірлер: 21

@GrahamAndersonis Жыл бұрын

Thanks! What video do you recommend after this spancat 03? Feels like there is more to know here

@python-programming Жыл бұрын

Thank you so much for this!! 🎉

@python-programming Жыл бұрын

There shoud be an 04. It looks like it never uploaded to youtube. I will be sure to fix that soon!

@BSP77 Жыл бұрын

This is a wonderful series! I look forward to the next video, thank you!

@jsr7599 10 ай бұрын

Does part 4 not exist because you ran into issues with it not predicting anything? Confused on why there’s no docs online about finishing this spancat process, but a lot of posts online about it not predicting correctly

@python-programming 10 ай бұрын

Thanks for the comment! No, it works fine. I lost the video footage and I need to re-record it. I'm trying to get it done ASAP. I use it on a lot of projects and spancat does work well. You need more training data for it, typically.

@davidrussell9662 9 ай бұрын

Please do. I was looking forward to it@@python-programming

@JonasWindey 2 ай бұрын

Bump! Been waiting for the next video for a while now :)

@GrahamAndersonis Жыл бұрын

For spancat, is it better to treat a section of sentences as single doc for tagging, or is it better do tag sentence by sentence. In my case , there are sentences, external doc references, tables, figures, code, and other stuff that describe a section.

@python-programming Жыл бұрын

It depends on how much context is needed to accurately predict a span. If it relies on larger context, go larger (up to 250 tokens or so).

@GrahamAndersonis Жыл бұрын

@@python-programming if the token size is larger than 250, do you simply make section 1a, and section 1b? In my case I have some control over where I divide the section.

@python-programming Жыл бұрын

spaCy will automatically handle the chunking of the text for you when you run the model. This is just for training the model. If you have some control, then yes, just find a natural breaking point and separate there (such as a paragraph)@@GrahamAndersonis

@GrahamAndersonis Жыл бұрын

@@python-programming for future ref, do you consult and/or have a discord?

@python-programming Жыл бұрын

@@GrahamAndersonis I do! You can reach me via the form on my site: wjbmattingly.com/

@pcxxy Жыл бұрын

super helpful video, looking forward to video 04 keep up to great work!

@dariaglushkina2036 Жыл бұрын

Hello! Thanks a lot for your tutorials! Could you please make a new video on how to correctly create and modify config files? I've tried to train a spancat model upon en_core_web_lg and en_core_web_trf models (I want to have both ner and spancat), but it did not work because of some errors in config files. I think this topic will be very useful also for others. Thank you again.

@python-programming Жыл бұрын

Absolutely! I will try to do that soon! Thanks for the idea!

@paulmiller591 Жыл бұрын

Very helpful Cheers!

@python-programming Жыл бұрын

I'm so happy to hear it helped!

@shawnmarcy4413 Жыл бұрын

🎉🎉🎉

@judithnathanail3742 8 ай бұрын

Enjoyed the video. Would love to see a video using the Prodigy pdf plugin - Prodigy_pdf - to annotate some pdfs in Prodigy and then train a model in Spacy (or something else); followed by applying the created model to some unknown pdfs. Lots of humanities materials are pdfs. There is a nice video on annotating papers (kzbin.info/www/bejne/qKjcq5hqbtOYbqs) but to be useful, we need to use the annotated output to train a model.