Check out the Textbook for this series: ner.pythonhumanities.com/intro.html
@chessketeer2 жыл бұрын
Thank you, Dr. Mattingly, for sharing you knowledge and for doing this in a so unbelievably learner oriented way.
@python-programming2 жыл бұрын
You are very welcome! Thanks for the comment!
@amandaahringer74662 жыл бұрын
So happy to have found your channel! Very much looking forward to the rest of the series! Thank you for taking the time to create this!!!
@python-programming2 жыл бұрын
Awesome! So happy to help.
@xirongcui78643 жыл бұрын
This kind of video is exactly what I need.I am currently working on building a Knowledge Graph of the thermal power industry.The first problem we face is NER and Relationship Extraction in a closed domain.It helped me a lot.
@python-programming3 жыл бұрын
Glad it helped!
@nitindabadghav2 жыл бұрын
This playlist is just fantastic. More power to you !! Thank you very much !!!
@python-programming2 жыл бұрын
Glad you like it!
@saadatkamaei88293 жыл бұрын
Thank you Doctor. This is exactly what I need! 😊
@vincent_hall7 ай бұрын
Criticism isn't a bad thing, it highlights places for improvement and places where things are already good. Criticism is good.
@agni8840 Жыл бұрын
i love these videos so much, thank you
@python-programming Жыл бұрын
No problem!
@arungade23 жыл бұрын
I just started this playlist, but I was wondering if I could learn to extract the entities from song lyrics. in maybe later vids as well.
@python-programming3 жыл бұрын
Yes, you can. If you follow my steps in this series, you will be able to do that
@nomisjooon92414 жыл бұрын
Thank you very much for this video! I'm also using spaCy for Information Extraction and it works quite well. However, documents not only include free text, but also tables. E.g. in your domain number of concentration camps per country. Do you have any suggestions to combine extracted information from free text with cell specific information from tabular data?
@python-programming4 жыл бұрын
Thanks for the feedback! It is most appreciated. Great question. So, there are few different routes to solve that problem and it largely depends on the state of the tables. Have they been OCR'ed? Tesseract and Tabula are both good candidates, depending on the situation. Is it already in an Excel spreadsheet? If so, then using the module CSV is the way to go or XLRS/XLRD. Let me know the state of the data and I can advise more precisely. Again thanks!
@nomisjooon92414 жыл бұрын
@@python-programming Thank you so much for responding that fast! I'm dealing with pdf files that contain test procedures. Per document there a various tests which are specified in different sections. When I started this project I wasn't aware of how many problems with pdfs arise and that most of them still aren't adequately solved. That's why - for now - I've decided to manually extract the file's different sections for the training. So it's up to me in which file format the free text and the tables of each section are stored. Of course it would be nicer to have a fully automated process starting with the actual pdf file but right now I'm ok with only showing the general applicability of NER and IE in the industrial context. So far, I've only found Use Cases within the biomedical and legal AI area. I've already developed own labels (e.g. test_specification, test_value,...) and trained the model with spaCy. It works quite well for the free text but I don't even know where to start with the tabular data. The data stored in tables mostly contains additional information which is needed to perform the IE adequately.
@python-programming4 жыл бұрын
Not a problem at all! Always happy to help. Okay from the sound of things I would go the Tesseract or Tabula-py route. Those are usually the top contenders for this situation. You may have to adjust save each pdf page as an image, then use OpenCV to adjust the brightness/contrast to get better results. I had to do this in my own research on multiple occasions. Once the image is adjusted, Tesseract and Tabula both perform well. But those are definitely the route to pursue. I've got both slated for video series next year, but maybe I'll move them closer to January.
@nomisjooon92414 жыл бұрын
@@python-programming Thank you so much for your suggestions! Let's assume Tesseract and Tabula solve the problem for automatically extract the tables of a document with several sections about concentration camps in 1. Germany, 2. Poland, 3. Czech Republic. And let's assume that each of these sections contain tables with numbers of people in different cities of the corresponding country. My main issue here would be that I don't know how I can use IE in way that the algorithm understands to which section/country this table is referring to and how to get the content of a specific cell (e.g. Berlin XXX people). I can't find any solutions combining NER and IE with tabluar data.
@python-programming4 жыл бұрын
@@nomisjooon9241 I understand better now. It will likely need to be a custom solution to your problem, either a custom neural net (very simple one should do it) to identify the type of data combined with or in lieu of a rules-based solution. DM me on twitter with some pics of the pdf structure. I am away from my computer until next week, but I will see if I can write some code for you and help you out.
@programmingworld97512 жыл бұрын
Thanks for the nice videos. One request please. I can't find the complete play list of all the videos you mentioned above. The playlits for Spacy is huge and I still cant find the 05 of the above series. Please can you point me to the playlist of above series. Thanks
@python-programming2 жыл бұрын
This has been on my to do list for a month. I will try and do it first thing tomorrow morning.
@programmingworld97512 жыл бұрын
@@python-programming Thank You so much. Also please suggest good universities to pursue Phd in Stats in the UK. I really like your way of teaching and your tutorials. Thank you so much. I am presently working on demand planning using Timeseries. Is there any document/tutorial you can suggest. Thanks