This guy. You teach amazingly well. Gifted communicator, looking forward to future content!
@dataschool7 жыл бұрын
Thanks!
@brendensong80004 жыл бұрын
Oldies, but goodies!!!! it's awesome!!! Thank you!
@dataschool3 жыл бұрын
Thank you so much!
@shashankpulijala33787 жыл бұрын
Thank you Kevin. I really liked the in depth explanation of the concept. Teachers like you inspire me a lot....
@dataschool7 жыл бұрын
You're very welcome! Thanks for your kind comment!
@hpchen54028 жыл бұрын
Another great video by Kevin! Thanks a lot for kind sharing.
@dataschool8 жыл бұрын
You're very welcome!
@ccchang01117 жыл бұрын
Thank you Kevin! Such informed tutorial with great details but not redundant!
@dataschool7 жыл бұрын
You're very welcome! Thanks for your kind words!
@ciurkut25 жыл бұрын
another very good, easy to follow tutorial ^^
@dataschool5 жыл бұрын
Thanks!
@78106 жыл бұрын
Amazing course! Thanks.
@dataschool5 жыл бұрын
You're welcome!
@mkarthekeyan7 жыл бұрын
Thank you very much for sharing the lecture, its one of the best and well explained lecture on this topic for beginners like me.
@dataschool7 жыл бұрын
Great to hear!
@datascienceds79656 жыл бұрын
Please do a video on sentiment analysis
@dataschool6 жыл бұрын
Thanks for your suggestion!
@parambole86718 жыл бұрын
Awesome lecture as always
@dataschool8 жыл бұрын
Thanks!
@shunmugaprabhusiddharthan26787 жыл бұрын
Excellent Explanation! Thanks a lot... Kevin
@dataschool7 жыл бұрын
You're very welcome!
@musasall57405 жыл бұрын
Excellent!
@dataschool5 жыл бұрын
Thank you!
@hatrer22445 жыл бұрын
So clear! Big ups.
@dataschool5 жыл бұрын
Thanks!
@Ankurkumar146805 жыл бұрын
Great video, amazing teaching skills...thanks a ton :)
@dataschool5 жыл бұрын
You're welcome! Thanks for your kind words 😄
@rahulkulkarni72248 жыл бұрын
Hi Kevin, I follow and recommend your tutorials to my friends. With this particular video I have a question. Why did you choose Scikit learn package for feature extraction, Text cleaning etc. What advantages do I get over NLTK. NLTK can easily interact with Scikit learn for ML classification algorithms. I believe NLTK is more mature and has 1000 other features which might not be required while performing basic Text mining. But I am just trying to know the differences w.r.t to performance, ease of use and interaction with Scikit for ML.
@dataschool8 жыл бұрын
If your focus is machine learning, scikit-learn is a far better choice than NLTK. scikit-learn is built for machine learning, whereas NLTK includes machine learning as a small feature. scikit-learn supports the entire machine learning pipeline, from preprocessing through evaluation and even ensembling. I have some more thoughts about scikit-learn in this video: kzbin.info/www/bejne/f6S7iZ-Pi6enZ68 If your focus is Natural Language Processing, and machine learning is a secondary concern, then NLTK may be a good choice. But for higher performance NLP, with a cleaner API and a simpler workflow, many people these days are choosing spaCy: spacy.io/ NLTK is not optimized for performance, whereas scikit-learn and spaCy are. Hope that helps!
@ianyang87997 жыл бұрын
hi kevin, hope u can make videos with deep learning such as CNN,RNN, LSTM
@dataschool7 жыл бұрын
Thanks for your suggestion! I'll consider it for the future!
@phuccoiinkorea33418 жыл бұрын
Good job! Thank so much!
@dataschool7 жыл бұрын
You're very welcome!
@ravisamal35335 жыл бұрын
it was a great session.
@dataschool5 жыл бұрын
Glad you liked it!
@CarlosRomeroconnect8 жыл бұрын
Easy and informative!
@dataschool8 жыл бұрын
Thanks!
@salamatburj95025 жыл бұрын
Hi, Kevin! I have question regarding using countvectorization in CV. Can we just transform before splitting to folds and train model? In principle, it will not train features which are not in the training set.Can please elaborate on this? Thank you!
@dataschool5 жыл бұрын
That's far beyond what I can cover in a KZbin comment, I'm sorry! This is explained in-depth during my course, however: www.dataschool.io/learn/
@radouane55918 жыл бұрын
Thanks, I downloaded all your videos. Although I had my capstone project in NLP: Analyzing tweets, I am thinking your self paced class will be beneficial to me.
@dataschool8 жыл бұрын
Excellent! Here's a link to learn more about the course: www.dataschool.io/learn/ Feel free to email me if you have any questions. My email address is on that page.
@radouane55918 жыл бұрын
do you have a black Friday deal for the class?
@dataschool8 жыл бұрын
I'm sorry, but I don't. Worth asking though! :)
@hafizhassaan92634 жыл бұрын
Hello Sir! I want to ask a question is that "how to convert journal title name to journal abbreviation using NLP or the method which is easy than NLP? Please guide me, waiting for your kind response. Thanks in anticipation.
@semanticgeek30728 жыл бұрын
Great video Kevin. Is this process an alternative for using NLTK?
@dataschool8 жыл бұрын
NLTK is a library focused on Natural Language Processing (NLP), whereas scikit-learn is focused on machine learning. Some machine learning tasks can be done in NLTK, while others cannot. I choose to use scikit-learn for machine learning with text because it's a far better tool than NLTK for machine learning. However, if your focus is NLP, then NLTK (or spaCy) may be a good choice.
@Rijndhadu8 жыл бұрын
Did you gave the similar lecture in any other python conference or is it completely different ??
@dataschool8 жыл бұрын
This is a shorter version of the tutorial that I delivered at PyCon 2016. I made some improvements to the lesson for PyData DC and there are lots of good audience questions, but it may not be worth your time if you have already watched the lesson from PyCon. Thanks for asking!
@guohuashen5997 жыл бұрын
Thank you Kevin for your great video! I have a question: how can I combine the plural and singular words or verbs with different tenses together and just keep one of them (I don't want to differentiate them)?
@dataschool7 жыл бұрын
That's beyond the scope of what you can do natively with CountVectorizer. However, the tasks you are proposing may be of limited value, if your goal is predictive accuracy.
@warrock-54898 жыл бұрын
Hi Kevin, lots of appreciate for the tutorial! I got a question regarding on how to merge other features to the the Vectorization feature. For example, when we pass got a column of 'text' feature pass to TFIDFVector (after fit and transform), how do we properly add other features to it such as 'subject' feature to each of the train and test instances. Thanks in advance. :)
@warrock-54898 жыл бұрын
I tried using pipelining, but it's very confusion in such a way I store the data in panda DataFrame with 2 tuples(text, subject) as training data. And this causes error when fitting the pipeline.
@dataschool8 жыл бұрын
You would use a FeatureUnion: scikit-learn.org/stable/modules/pipeline.html I cover this in module 5 of my online course: www.dataschool.io/learn/ Hope that helps!
@khizaraman3862 жыл бұрын
I used colab and it didn't show much description for count vectorizer...as it shown in Jupiter. Could u please tell the difference between colab and Jupiter. Which one is better?
@dataschool2 жыл бұрын
Neither is better, they are just different! This might help: www.dataschool.io/cloud-services-for-jupyter-notebook/
@khizaraman3866 ай бұрын
@@dataschool Thanks a lot!! I am back at revising ML...so refreshing to be back!!!
@takbirhossaintushar72907 жыл бұрын
sir in the model we are just feeding the machine which is desperate or not . if we want to feed more class suppose we want to predict a comment which is positive or negative or neutral then which will be the commands of scikit learn or how we implement these ?
@dataschool7 жыл бұрын
It sounds like you are just describing a 3-class problem instead of a 2-class problem. The basic scikit-learn code is exactly the same for classification problems, regardless of the number of classes. However, note that the relevant evaluation metrics are different when there are more than 2 classes. Hope that helps!
@kikiisboy49117 жыл бұрын
with imbalanced multi-class text dataset, should I normalize the data with TFIDF weight score or not?
@dataschool7 жыл бұрын
You could experiment to see whether TF-IDF is useful. There's no easy way to know in advance whether or not it will be better!
@mdmasumbillah17965 жыл бұрын
Thank You.
@dataschool5 жыл бұрын
You're welcome!
@ranjanpatel51468 жыл бұрын
how i can classify an email as positive or negative response
@dataschool8 жыл бұрын
I would frame this as a classification problem. The response value is "positive" or "negative", the features are the text (as well as any other engineered features), and the email messages are the observations. The main challenge will be obtaining labeled training data, which is training data that has been labeled with the true response value (so that you can train your model). Hope that helps!
@srikantachaitanya65618 жыл бұрын
thank you...
@dataschool8 жыл бұрын
You're welcome! I hope the video is helpful to you.
@shamsuddinjunaid305 жыл бұрын
It’s sklearn.model_selection instead of sklearn.cross_validation
@dataschool4 жыл бұрын
That's correct, the scikit-learn API changed since this video was recorded.
@royklaassebos24007 жыл бұрын
Again, another great tutorial! Recently I've been watching a lot (i.e. almost all haha) of your videos; really love how you explain code line by line so that we can understand the "why" in addition to the "how". There are only a handful of people I've found online that have a similar teaching style. Honestly, I think your content is almost too good to give away for free. Maybe you should consider publishing your videos on Udemy (e.g. Kirill Eremenko is an excellent teacher as well and reaches a huge audience on Udemy - www.udemy.com/user/kirilleremenko/ ). Anyway, thanks again really appreciate it!
@dataschool7 жыл бұрын
Thanks so much for your kind words! I really appreciate it! I will be releasing paid courses in the future, but I also enjoy giving away a lot of material for free so that everyone can access it :)
@rohitnagal37046 жыл бұрын
If i have 100 articles then i have to create 100 corpus related to that or something else
@dataschool6 жыл бұрын
Generally, each article would be a row in your dataset. Does that help?
@rohitnagal37046 жыл бұрын
If i have article of 30 line then it i convert into one single vector
@dataschool6 жыл бұрын
Yes
@rohitnagal37046 жыл бұрын
@@dataschool thanks
@gourusai1016 жыл бұрын
can any one explain Max_df and Min_df Clearly
@dataschool6 жыл бұрын
My answer here should help: stackoverflow.com/questions/27697766/understanding-min-df-and-max-df-in-scikit-countvectorizer