FAKE NEWS CLASSIFIER WITH MACHINE LEARNING ALGORITHMS USING Natural Language Processing- PART 1

Рет қаралды 99,003

Күн бұрын

Пікірлер: 111

@krishnaik06 4 жыл бұрын

Hello All,This video were for the members, but many of you all had requested this video. So I have uplaoded for everyone. It is also added in NLP playlist Happy Learning!!

@keshavbansal5148 4 жыл бұрын

Bomb video 💥💥💥, 💞💞💞

@iNSane4224 4 жыл бұрын

Thank you sir

@anujvyas9493 4 жыл бұрын

Thankyou soo much sir 👍

@siddhantranjan1607 4 жыл бұрын

sir why you have only used the title column of the dataset for predictions there is one more column as text what about it?

@stackexchange7353 4 жыл бұрын

Could you use sklearn for the feature extraction portion and Keras for the model portion?

@parthraghuwanshi2980 Жыл бұрын

People like u are gems who after working hard all days in office takes out time just to do post quality content that too being selfless I can truly understand how much good values a person has

@suresherriboyina 4 жыл бұрын

Please upload Part 2, Because now we have so much of time , so keep upload new project videos

@alexanderbalasky6174 4 жыл бұрын

Great walkthrough but look into your audio!

@abhishekpurohit3442 4 жыл бұрын

Just got 97% accuracy by combining both title and text....using passive aggresive classifier..

@tejashshah5202 3 жыл бұрын

At 10:48 , CountVectorizer() should not be performed before train_test_split(). If you do, this leads to data leakage and is not correct. Correct way is to fit_transform() on train and transform() on test data.

@kofistale5261 3 жыл бұрын

Can you please share code how to do that

@tusharsinghsoam2508 Жыл бұрын

such an underrated comment....you are on point man

@thechaoticneuron 4 жыл бұрын

Hello Krish, this is a great video. I started my learning into NLP with this. A small question; I followed the entire procedure similarly and I implemented Logistic Regression at the end. It gave me a higher accuracy of 94% with less false positives and negatives as well. I'm just keen to know how PassiveAggressive Classifier is said to be better for NLP applications and why not s simple logistic regression cannot be used. Thank you :)

@vasanthrohith4564 Жыл бұрын

It's super useful🤩.... thanks for teaching ❤

@samirpaul5499 2 жыл бұрын

Hello Krish - This is an amazing video. I have been watching your videos and learned many things. Wonderful contribution towards the aspiring Machine Learning engineers. I have one question, request you to clarify. After this Bag-of-words/TF-IDF model is built, for new sentences, how do we construct the input featues (to be passed to predict function of the model). If such explanation exists in any other video, please point me to that, else would request you to make a short video on this, this will be immensely helpful. Thank you again. - Samir Paul

@2500204 4 жыл бұрын

I am a machine learning engineer , I like that you post such videos but the problem with real data set is that there is no training data . You have to collect and create your own training data . People who are watching this video don't know what is about to hit them once they enter this field. It's not plug and play. I spend 80% of my time creating data and processing it and only 20% actually doing ML In one anomaly detection project we had to use db scan to find noise in the data then we marked the noise dp's as anomalous and cluster dp's as non anomalous. Then used that data to train our ANN.

@johnyjose3941 4 жыл бұрын

Hi Krish, @ 8:28, why did you take messages['title'] ?? I think we should take messages['text']

@pec8377 Жыл бұрын

Hi @Krish, shouldn't you create the bag of words on the X_train instead of the full dataset ? Otherwise the accuracy will not be the same when providing a new sentence

@abhishekpurohit3442 4 жыл бұрын

Sir you've not uploaded the video on passive aggressive classifier....Please upload it Sir!!

@subhamacharya7084 4 жыл бұрын

Sir,kindly make some basic video on Pandas

@sachinborgave8094 4 жыл бұрын

Thanks Krish, please upload GBM indepth intuition.

@krishnaik06 4 жыл бұрын

Yes GBM is planned

@ronaksengupta6174 4 жыл бұрын

Sir after this please make one video on How Pandemics impact the Financial Markets or anything regarding covid19 dataset analysis

@arjyabasu1311 4 жыл бұрын

great idea

@vishalvanpariya1466 4 жыл бұрын

Greta video but I have one query why you use only title feature in modelling you should use all the features

@vishaldas6346 4 жыл бұрын

Hello Krish, I think there is a mistake while explaining True Positive & True negative. Correct me if I'm wrong.

@karanshethia3560 2 жыл бұрын

Hey Krish. Great video. Can you let me know how can we make a predictive system once we have tried out different models and selected the one that is more accurate/effecient?

@mdenamulhaque7589 4 жыл бұрын

Dear Krish, for being a data scientist should we need to learn SQL or something like this? If it's need then why it's absent in your data science play list or do you have any idea in future for that. I'm very confused.please info.thnx

@rayyanamir8560 2 жыл бұрын

Yes you need to know only the basic queries. Like group by , Joins, Order By, Select , Where , etc

@tanishbothra5044 4 жыл бұрын

Hello, Suppose we need to add more features in our X which are not text..i.e suppose we get a sparse matrix after count vectorizer and now we have one more feature length and we want both features.How to combine both?

@surajthallapalli4227 4 жыл бұрын

Hi Sir, I guess there is a data leakage problem. First we need to split train and test and later we have to apply Countvectorizer rite? In the video first the CountVectorizer is applied and later train and test split is done. Please clarify this.

@tejashshah5202 3 жыл бұрын

You are right.

@junaidyousaf4602 3 жыл бұрын

Sir after this please make one video on How To Detect Hate Speech ...

@themightylion5147 4 жыл бұрын

Sir @ 8:28, why did you take messages['title'] ?? I think we should take messages['text'].

@lokbharatendu7063 4 жыл бұрын

Hi Krish First of all thanks for such good videos! I am very new to data science so my question might sound very basic. In the current video and also in some of the other videos in the current playlist, you have mentioned that algorithms like Naive Bayes/ MultinomialNB work very well with text data. In all the samples we are converting sentences to words and then to features (having values 0 and 1 in case of BOW). So post this conversion aren't we just dealing with numeric data rather than textual data? As all the text has been converted to independent features having numeric value. Can't we just use any of classification algorithm? if yes then why we say Naive Bayes works well with text data.

@jainilpatel1173 3 жыл бұрын

I got error in -- please give ans In passive aggressive classifier algorithm: Unexpected keywords argument n_iter(50)

@mohammedzia1015 2 жыл бұрын

Hi Krish,, It was a nice video, but I have one ques, the condition "If score = previous_score" will satisfy every time right ? As you have set the value of previous_score to ZERO. So what is the use of this ? Don't we have to assign score value to previous_score like this "previous_score = score", after the IF condition ?

@neeleshnayak4375 2 жыл бұрын

Hey did you noticed test and train data have lot of overlap in this problem? and removing overlap will lead to poor prediction by the model

@mbmathematicsacademic7038 3 ай бұрын

Thank you Kris

@rajarshidgp2003 2 жыл бұрын

time[15:40] - u r not able to see the because you had done reset_index - so the original index numbers have been lost

@kalppanwala6439 4 жыл бұрын

krish can u make videos regarding BERT can't find any good explanations regarding the same

@krishnaik06 4 жыл бұрын

Yes vidoes are planned

@preethisetty4309 3 жыл бұрын

Can u explain fake online reviews detection using passive aggressive classifier

@1pmcoffee 4 жыл бұрын

@Krish Naik Sir do you provide paid personal consultation on hourly basis? Its there any way i can connect with you?

@tirumalaparise9474 3 жыл бұрын

Sir, how can this model predict fake news or real news, when some other external news is given Preprocessed in the same way of x[title]? Does it work for real life? Or just on testing data..

@thepresistence5935 2 жыл бұрын

Waited 2 hours, but it not executed in my lap, so tooked first 1000 data and changed worked

@adarshgupta9952 2 жыл бұрын

Hello, can you share that csv file? Because when I took first 100 or 200 data it includes other columns as well and call them as "unnamed 5, unnamed 6 ---- and unnamed 685". Furthermore, when I drop NaN values, all the rows are dropped and I am left with 0 rows. OR please tell me how you took first 1000 data. anything may help.

@adarshgupta9952 2 жыл бұрын

Never mind I got it. Tip for others: I was using MS Excel to remove data. Use any csv editor to remove data and not excel.

@RadomName3457 2 жыл бұрын

Hi guys, could anyone explain to me what the coefficients of the models are. Why do we have those numbers?

@saisubramanyam3243 3 жыл бұрын

Sir how to predict the label of test data instances whether it is fake or real.

@sabafarheen4918 3 жыл бұрын

Sir just with single word how we can say it's fake?? Please answer

@subarnasamanta4945 4 жыл бұрын

I am trying this project in kaggle with gpu enable but gpu is not working showing 0% usage there can yu tell me why

@adarshgupta9952 2 жыл бұрын

I have exported this model as a ".sav" file using Pickle. Now, how can I test this model? I want to write a news statement and want to predict if it is true or not. Please help anyone!

@urvashisingh3329 4 жыл бұрын

Sir i need help its very very urgent i want the code for the speakers age and gender classification plz help sir i really need it.

@sandipansarkar9211 4 жыл бұрын

Superb video .But while practice coding I am stuck at corpus and from there on wards it is all stuck.I have tried a number of times but to no avail.Thanks

@adarshgupta9952 2 жыл бұрын

You have to reduce the number of dataset (rows) from the csv file. Use a csv editor to do that and not MS Excel (it'll give error).

@varunpusarla 4 жыл бұрын

Why do we use Naive Bayes for NLP problems ?

@omkarpatil2854 4 жыл бұрын

Hello krish, For the for loop which generates the corpus. Yours was done in few minutes but for my laptop (i5 7th gen, 1050tx 4gb graphics) it took more than half n hour. It's there anything i need to configure?

@adarshgupta9952 2 жыл бұрын

You have to reduce the number of dataset (rows) from the csv file. Use a csv editor to do that and not MS Excel (it'll give error).

@avanishsingh8518 4 жыл бұрын

Hi sir , Sorry but I have a question ,in the video you were telling that you are going to use text column for countvectrozer ..but you are taking title column. why?

@ushirranjan6713 3 жыл бұрын

Sir,When I am trying to import the data and read in the colab, then it not is happening. There was some error due to that I have to do these changes, to read the data. df=pd.read_csv('train.csv',engine='python', encoding='utf-8',error_bad_lines=False)

@ushirranjan6713 3 жыл бұрын

@@K.S_5723 , I have to search it, but you can download from the krish folder and do the same change what I have done, it will work. I f I will get i will send you

@johannachristy7515 4 жыл бұрын

Pls also tell us how to implement this in a web application

@aravindnaidu1286 4 жыл бұрын

sir, I have a doubt why you havent taken other parameters other than title and we are getting an accuracy of 94 pecent Iam just shocked!!!!! plz reply plz

@shauryananda207 4 жыл бұрын

Not able to implement Passive Aggressive Classifier. The argument 'n_iter' is unexpected

@vipindube5439 4 жыл бұрын

Hello Krish Sir your voice getting lower please play on high pitch.

@sahityakandru6134 4 жыл бұрын

While I was running the ipynb file you gave i can find an error that Unable to allocate 698. MiB for an array with shape (18285, 5000) and data type int64 Can you please explain this

@amansingh3347 3 жыл бұрын

Increase your Ram or use Google Colab

@011_mohdanwar2 4 жыл бұрын

Dear sir , I want to ask u, can we Work on title or text . Process of remove the stopword??? Can u explain me ???

@011_mohdanwar2 4 жыл бұрын

It's urgent sir

@tusharpangare2468 4 жыл бұрын

@@011_mohdanwar2 are you working on the same project buddy

@ganeshhegde8972 4 жыл бұрын

Nice sir

@ishwarjagdishashar9096 4 жыл бұрын

I have followed all codes. Getting error in re.sub(). "NameError:name 're' is not defined". Do I have to install any library to run the re.sub() function?

@pinkalshah5237 4 жыл бұрын

Just import re is enough.

@ebrahimkutty1491 4 жыл бұрын

CountVectorizer can remove stop words.

@tusharpangare2468 4 жыл бұрын

messages.reset_index(inplace=True) im having error like: AttributeError Traceback (most recent call last) in ----> 1 messages.reset_index(inplace=True) AttributeError: 'function' object has no attribute 'reset_index' can someone help me

@thunder440v3 4 жыл бұрын

Wow!

@avibitm 3 жыл бұрын

Hi krish can u help us make API on this

@sivarajasekharyannam9398 3 жыл бұрын

Hello sir please send the document in this project

@CasualGamer669 3 жыл бұрын

can tou make it to be a simulator ??

@mansikumari9533 3 жыл бұрын

I am getting parser error while uploading dataset .Please solve

@mansikumari9533 3 жыл бұрын

Also what is "re" in line 127..it's giving error "re is not defined" when trying in different dataset.

@harikrishnanm5109 4 жыл бұрын

It was really helpful. Can u make videos on Grammer Correction using Rule based methord, Language Models & classifiers. its really hard to understand it otherwise

@sgrsgr5663 4 жыл бұрын

Krish, Voice in this video is not much clear.

@sonalgarg5628 4 жыл бұрын

review = re.sub('[^a-zA-Z]'," ", messages['title'][i]) i get the error in this line- expected string or bytes-like object please solve this

@manishwadhwani5860 4 жыл бұрын

just apply messages['title'] = messages['title'].apply(str)

@sainathpatil844 4 жыл бұрын

Even i m getting the same issue if u get the solution please let me know

@sainathpatil844 4 жыл бұрын

@@manishwadhwani5860 not working

@sainathpatil844 4 жыл бұрын

Just import re there

@sahityakandru6134 4 жыл бұрын

Just import re before running that line

@faryaltahseen7197 Жыл бұрын

Sir ! I m very new to NLP, Thank you so much for this playlist... i am learning so many things from you....but plx tell me how can i fix this error? 12 from unicodedata import normalize 14 if normalize: ---> 15 cm=cm.astype('float')/cm.sum(axis=1)[:,np.newaxis] 16 print("Normalized Confusion Matrix") 17 else: AttributeError: module 'matplotlib.cm' has no attribute 'astype'

@rohitbaisane6712 3 жыл бұрын

How your classifier detect fack news It detect every news or like only the news of dataset?

@siddhartharaja9413 3 жыл бұрын

It's for that dataset

@piyushvyas2475 4 жыл бұрын

Can you please add F1, recall and precision score in this tutorial for used algos.

@monicainapakolla7148 4 жыл бұрын

from sklearn.metrics import classification_report target_names = ['FAKE', 'REAL'] print(classification_report(y_test, pred, target_names=target_names)) try this

@mainuddinali9561 2 жыл бұрын

i m not able to download dataset , without practice it is waste

@jonashero5054 3 жыл бұрын

where is the part 2

@nagarajannethi 3 жыл бұрын

kzbin.info/www/bejne/e2rKh5-bntt1bK8

@malik_msn 4 жыл бұрын

Would have enjoyed it but the poor audio made mess of ut

@rubabvlogs1843 4 жыл бұрын

aggle fake news icon is Trump ...:)

@MuhammadAbdullah-gx2ou Жыл бұрын

dear sir i am facing this error here: TypeError Traceback (most recent call last) in () 4 corpus = [] 5 for i in range (0, len(messages)): ----> 6 review = re.sub('[^a-zA-Z]', ' ', messages['title'][i]) 7 review = review.lower() 8 review = review.split() /usr/lib/python3.10/re.py in sub(pattern, repl, string, count, flags) 207 a callable, it's passed the Match object and must return 208 a replacement string to be used.""" --> 209 return _compile(pattern, flags).sub(repl, string, count) 210 211 def subn(pattern, repl, string, count=0, flags=0): TypeError: expected string or bytes-like object