Training a spaCy SpanCat Model to Annotate in Texts more quickly in Prodigy | SpanCat 03

  Рет қаралды 2,425

Python Tutorials for Digital Humanities

Python Tutorials for Digital Humanities

Күн бұрын

Пікірлер: 21
@GrahamAndersonis
@GrahamAndersonis Жыл бұрын
Thanks! What video do you recommend after this spancat 03? Feels like there is more to know here
@python-programming
@python-programming Жыл бұрын
Thank you so much for this!! 🎉
@python-programming
@python-programming Жыл бұрын
There shoud be an 04. It looks like it never uploaded to youtube. I will be sure to fix that soon!
@BSP77
@BSP77 Жыл бұрын
This is a wonderful series! I look forward to the next video, thank you!
@jsr7599
@jsr7599 10 ай бұрын
Does part 4 not exist because you ran into issues with it not predicting anything? Confused on why there’s no docs online about finishing this spancat process, but a lot of posts online about it not predicting correctly
@python-programming
@python-programming 10 ай бұрын
Thanks for the comment! No, it works fine. I lost the video footage and I need to re-record it. I'm trying to get it done ASAP. I use it on a lot of projects and spancat does work well. You need more training data for it, typically.
@davidrussell9662
@davidrussell9662 9 ай бұрын
Please do. I was looking forward to it@@python-programming
@JonasWindey
@JonasWindey 2 ай бұрын
Bump! Been waiting for the next video for a while now :)
@GrahamAndersonis
@GrahamAndersonis Жыл бұрын
For spancat, is it better to treat a section of sentences as single doc for tagging, or is it better do tag sentence by sentence. In my case , there are sentences, external doc references, tables, figures, code, and other stuff that describe a section.
@python-programming
@python-programming Жыл бұрын
It depends on how much context is needed to accurately predict a span. If it relies on larger context, go larger (up to 250 tokens or so).
@GrahamAndersonis
@GrahamAndersonis Жыл бұрын
@@python-programming if the token size is larger than 250, do you simply make section 1a, and section 1b? In my case I have some control over where I divide the section.
@python-programming
@python-programming Жыл бұрын
spaCy will automatically handle the chunking of the text for you when you run the model. This is just for training the model. If you have some control, then yes, just find a natural breaking point and separate there (such as a paragraph)@@GrahamAndersonis
@GrahamAndersonis
@GrahamAndersonis Жыл бұрын
@@python-programming for future ref, do you consult and/or have a discord?
@python-programming
@python-programming Жыл бұрын
@@GrahamAndersonis I do! You can reach me via the form on my site: wjbmattingly.com/
@pcxxy
@pcxxy Жыл бұрын
super helpful video, looking forward to video 04 keep up to great work!
@dariaglushkina2036
@dariaglushkina2036 Жыл бұрын
Hello! Thanks a lot for your tutorials! Could you please make a new video on how to correctly create and modify config files? I've tried to train a spancat model upon en_core_web_lg and en_core_web_trf models (I want to have both ner and spancat), but it did not work because of some errors in config files. I think this topic will be very useful also for others. Thank you again.
@python-programming
@python-programming Жыл бұрын
Absolutely! I will try to do that soon! Thanks for the idea!
@paulmiller591
@paulmiller591 Жыл бұрын
Very helpful Cheers!
@python-programming
@python-programming Жыл бұрын
I'm so happy to hear it helped!
@shawnmarcy4413
@shawnmarcy4413 Жыл бұрын
🎉🎉🎉
@judithnathanail3742
@judithnathanail3742 8 ай бұрын
Enjoyed the video. Would love to see a video using the Prodigy pdf plugin - Prodigy_pdf - to annotate some pdfs in Prodigy and then train a model in Spacy (or something else); followed by applying the created model to some unknown pdfs. Lots of humanities materials are pdfs. There is a nice video on annotating papers (kzbin.info/www/bejne/qKjcq5hqbtOYbqs) but to be useful, we need to use the annotated output to train a model.
Best way to do Named Entity Recognition in 2024 with GliNER and spaCy - Zero Shot NER
5:01
Python Tutorials for Digital Humanities
Рет қаралды 9 М.
Streamlit in 2024 Tutorial - 01 - The Basics - The different ways to write data into an app
7:46
Python Tutorials for Digital Humanities
Рет қаралды 2,4 М.
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
The evil clown plays a prank on the angel
00:39
超人夫妇
Рет қаралды 53 МЛН
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 16 МЛН
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
How to Fine-Tune BERT Transformer with spaCy 3
3:38
UBIAI
Рет қаралды 8 М.
Named Entity Recognition Using BERT Transformers-@shahzaib_hamid
15:08
Shahzaib Hamid
Рет қаралды 1,2 М.
Shoud I learn NLP in 2024? #datascience #machinelearning #ai
8:46
Python Tutorials for Digital Humanities
Рет қаралды 1,8 М.
Best Way to Transcribe Audio and Video with Python and Whisper-MLX ASR #datascience
12:10
Python Tutorials for Digital Humanities
Рет қаралды 2,2 М.
What is Semantic Searching? (NLP Concepts)
7:31
Python Tutorials for Digital Humanities
Рет қаралды 1,4 М.
Google’s Quantum Chip: Did We Just Tap Into Parallel Universes?
9:34
Best way to do Table Detection in 2024 with TF-ID - Quick Tutorial with Code
5:54
Python Tutorials for Digital Humanities
Рет қаралды 952
Best Way to Build Network Analysis App in Python with Streamlit and st-link-analysis - Easy Tutorial
13:38
Python Tutorials for Digital Humanities
Рет қаралды 3,1 М.
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН