Sentiment Analysis in Python with TextBlob and VADER Sentiment (also Dash p.6)

Рет қаралды 74,256

sentdex

Күн бұрын

Пікірлер: 95

@MarkJay 6 жыл бұрын

I'm a simple man, I see new sentdex video, I like

@pavan540 5 жыл бұрын

I am getting the following error. --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in () 1 from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 2 ----> 3 analyzer = SentimentIntensityAnalyzer() 4 vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!") 5 print(vs) /Applications/anaconda2/lib/python2.7/site-packages/vaderSentiment/vaderSentiment.pyc in __init__(self, lexicon_file, emoji_lexicon) 210 _this_module_file_path_ = os.path.abspath(getsourcefile(lambda: 0)) 211 lexicon_full_filepath = os.path.join(os.path.dirname(_this_module_file_path_), lexicon_file) --> 212 with open(lexicon_full_filepath, encoding='utf-8') as f: 213 self.lexicon_full_filepath = f.read() 214 self.lexicon = self.make_lex_dict()

@SirFloIII 6 жыл бұрын

in the later textblob analysis (19:17 onwards) you did a dumb. you only took the sample with polarity of greater than 0.5 and then asked then if the polarity was greater than 0 and of course it was, since 0.5 > 0 and > is transitive. of course you were getting 100% "accuracy", but thats cheating. of course you classify correctly on those samples where you classifiy correctly.

@MetodNovak 6 жыл бұрын

Hi at time 7:09 when you mention positive and negative accuracy, That actually should be precision.

@CristiNeagu 6 жыл бұрын

Also, it's obvious TextBlob is "slightly better", since your condition for counting a sentence as valid is that it should have (for positive accuracy) a polarity greater or equal to 0.0001. It's then obvious that any sentence that passes that test will have a polarity greater than 0. You are counting them twice. You will always get 100% accuracy.

@sentdex 6 жыл бұрын

The reason I used the thresholds was to filter out degrees of uncertainty. The 100.0% accuracy is a certainty, but what I was moreso trying to look for was the sample # at that point.

@andrear1989 6 жыл бұрын

Ok, but still you cannot conclude that TextBlob performs better than the other library, as you are not really comparing the two methods.

@CristiNeagu 6 жыл бұрын

Andrea Ramazzina I think he can. Since every single line of the file is set to be either positive or negative, in accordance with the file, you can compare directly the two methods by seeing how many positive results they report.

@CristiNeagu 6 жыл бұрын

sentdex To be honest, you could have been a bit clearer about what the two sample files contain :)

@ElliV87 5 жыл бұрын

Hi! Thank you for this video. I have a question about the compound scores. The VADERSentiment documentation states that the threshold is >0.05, = 0.05 neutral sentiment: (compound score > -0.05) and (compound score < 0.05) negative sentiment: compound score

@PatrickBateman12420 5 жыл бұрын

Agreed. He got that wrong in the video. It should have been < -0.10 in his example.

@GaxtonOkobah 5 ай бұрын

Python keeps complaining that "ModuleNotFoundError: No module named 'vaderSentiment'". This happens while using Spyder. Kindly help out

@hedleypanama 6 жыл бұрын

#HoldIt! I speak Spanish and the translation seems accurate

@antoine109 6 жыл бұрын

Big fan of your work sentdex, I'm a Master's degree student in NLP (about that, your series on NLTK was very useful). and it's always a pleasure to learn from your videos. Currently I'm very interested in words embedding, will you make a video or a series about word embedding, word2vec, etc ?

@bamber101 6 жыл бұрын

If anyone could potentially help me with the issue in the link below, I'd be very appreciative. stackoverflow.com/questions/51398378/python-nlp-code-not-functioning-as-should

@panoss4149 6 жыл бұрын

i really like your videos , but if its possible use a chrome plugin to invert the white color of webpages to something darker (eg. CareYourEyes plugin). The dark ide (alt-tab) white web page (alt-tab) dark ide, blows my eyes

@guruappapadasali7106 5 жыл бұрын

Hi, Line 22 of the code where we are comparing the polarity to a factor. Should it be - 0.1 instead of + 0.1? Just a thought.

@ThePellski 6 жыл бұрын

Does anyone know if the sentiment scores provided by VADER can be improved using word stemming techniques?

@asriomar11 6 жыл бұрын

dataset original source www.cs.cornell.edu/people/pabo/movie-review-data/

@nithiyashrees3456 Жыл бұрын

print(analysis.translate(from_lang="en",to='ta'))

@sellen2u 6 жыл бұрын

It would be interesting to see a comparison with a dataset of tweets or Reddit comments. VADER claims it's "specifically attuned to sentiments expressed in social media". Emoticons, slang, all caps, initialisms, acronyms, etc.

@dandogamer 6 жыл бұрын

there are research papers out there I believe, but yes vader performs better than something like sentiwordnet for social media

@TradeWithMVR 6 жыл бұрын

The one where you get 100% accuracy, you are considering the polarity either to be negative or positive and checking the same in the next statement. if x < -0.001 then it is definitely less than 0 and if x > 0.0001 then it is clearly greater than 0. How is this logic even working? I am lost in this. Can you explain it clearly ?

@sentdex 6 жыл бұрын

At this step, I am purely trying to get the sample count up. It will be 100% accuracy since it passes the first check.

@TradeWithMVR 6 жыл бұрын

I get it. Thanks

@plamenyankov8476 3 жыл бұрын

Thanks, it is very useful. Could I ask - do I need to define: for line in f.read().split(' '): analysis = TextBlob(line) if analysis.sentiment.polarity

@yuanyuanfan6666 3 жыл бұрын

You made a mistake when using textblob the second time, which is why you kept getting 100% accuracy; it should not be 100% accuracy.

@chrisvlachos 6 жыл бұрын

I have installed VADER using pip, but I cannot get it to work. I have an excel file with tweets that I want to analyse... Can someone please help me?

@TheBluCypher 5 жыл бұрын

using your final example with textblob, the number of negative samples textblob identifies is only 2072 out of a total of 5332. If we use the negative.txt file to actually test for positive sentiment instead, we get 2345 which is higher than negative. So, textblob is actually saying that our negative sentiment file contains MORE positive sentiment than negative sentiment lol, which is entirely wrong for our case. Is there a better way of representing negative sentiment? Because if we were to go in blind not knowing our dataset contains negative sentiment, we would actually end up with polarity leaning towards the positive end. (first time doing sentiment analysis so trying to wrap my head around this aha)

@yusufbaysal7796 6 жыл бұрын

Could you suggest library for Turkish language sentiment analysis. I am working on fasttext. I am wondering your opinion.

@reemawangkheirakpam8165 6 жыл бұрын

hi sentdex!! is it not possible to fetch tweets older than a week or so????

@acid123ist 4 жыл бұрын

cannot install vaderSentiment on Anaconda.

@mvaldes 6 жыл бұрын

mexican here, translation looks legit. one word off but the rest is good.

@siddharthabiswas2147 6 жыл бұрын

Do you know any tool that classifies tweets on basis of emotions? or just identifies the emotions in that tweet ? I need it for a project

@RnDcompany 2 жыл бұрын

You found one? :D would be also interested..

@debjitchattopadhyay7627 4 жыл бұрын

does anyone know the translate to language codes in TextBlob().translate()?

@ashishagrawal5483 6 жыл бұрын

will this code work on any dataset, like I have the reddit comment dataset,where each line is an individual comment or a reply. How accurate will it predict if I will set the polarity between 0.0001 to -0.0001.. Please reply soon!

@JuggernautProducts 5 жыл бұрын

Hey, so you definitely have a lot more experience with this than I do. When using your methods to make Vader Sentiment more accurate along with Text Blob while using the sample texts provided, I keep running into major accuracy issues. For example, on the negative text file, both Vader and Text Blob end up classifying the text as roughly 50/50 positive and negative, which is no better than flipping a coin. Do you know why this might be?

@MaxLatif 4 жыл бұрын

Brilliant video dude !!!!! thanks a million (keep them coming) I just ran the code on a Dutch corpus and its doing fine!

@skorpimish 6 жыл бұрын

18:27; line 27; why here you can leave a = sign? in this case there will be pos = neg and both are less than 0.1, which can not be treated as neg, and therefore increase the counter

@nomanssky09 6 жыл бұрын

Finally you have started using an IDE :D

@sentdex 6 жыл бұрын

IDLE is an IDE

@AshishKumar-tu7id 4 жыл бұрын

@@sentdex They don't understand that.

@justinhouck1245 6 жыл бұрын

If you are getting this ERROR UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4645: ordinal not in range(128) try: with open("positive.txt","r", encoding='utf-8') as f:

@justinhouck1245 6 жыл бұрын

also watch how you save your .txt files. make sure you save as utf-8.

@mrooney9596 6 жыл бұрын

PLEASE someone help! i want to follow along so bad and spent the whole day trying to fix this error, UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6573: ordinal not in range(128) it happens in both vader and text blob. i can't find any fix for it , it wont even run simple from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!") print(vs) pleassssse to save my sanity and so i can follow along. thanks

@MrFinnagle 5 жыл бұрын

You may have found an answer by now. The solution I found for this specific problem was to use the open function from the io package in python. I use that open function because you can then specify encoding. That solves the problem for at least TextBlob because one of the lines in the provided texts is causing the problem.

@step7steveX 5 жыл бұрын

Hello. Just a quick question off topic. I am using Python35 with Sublime as the IDE. I am trying every possible command to comment out lines but they do not seem to be working. Can anyone suggest a solution? Thanks. I also tried ctrl + / but no luck.

@step7steveX 5 жыл бұрын

Never mind I manged to install the necessary package and customize the theme in open resources.

@FuZZbaLLbee 6 жыл бұрын

How do these compair to the sentiment analysis built with tensorflow?

@sentdex 6 жыл бұрын

I haven't found any really good sentiment analysis with tensorflow, other than via some API. Despite that, this would almost certainly be far more lightweight than running a neural network. Not sure on the speed. I doubt TF would go faster in classifications, but speed could be comparable. A neural network could conceivably be far more accurate, but at certain costs.

@FuZZbaLLbee 6 жыл бұрын

I remember looking at this one. It claims to be 79% accurate ahmedbesbes.com/sentiment-analysis-on-twitter-using-word2vec-and-keras.html#comment-3548018958

@PatrickBateman12420 5 жыл бұрын

@@sentdex Typically you use a 1D CNN with FastText to do sentiment classification e.g. using Keras. A Deep Learning model requires a huge dataset. With a small dataset, you'll get most likely a much higher variance than with a hard-coded algorithm such as VADER.

@muhammadusmanakram406 5 жыл бұрын

can we use any of these packages for review baesd project??

@eragonritter6436 6 жыл бұрын

4:22 The translation feature seems to be the old Google translate, so not really good...

@pysuhayb15 6 жыл бұрын

thank you sentdex please can get new series from android kivy

@smadgulkar 6 жыл бұрын

+1 for the move to sublime! what made you move? great video as usual, keep em coming!

@Eurley66 6 жыл бұрын

I just freaked out when I saw Sublime on the thumbnail...

@jnscollier 6 жыл бұрын

Why

@mailistfajar7518 2 жыл бұрын

this is the greatest video i've ever seen

@edelciojunior3917 4 жыл бұрын

Is there a playlist for this serie?

@kirisko3067 3 жыл бұрын

does this work with emojis?

@tuvshuutuvshuu4447 2 жыл бұрын

vader does

@bashisobsolete.pythonismyn6321 6 жыл бұрын

eats shoots and leaves.

@CristiNeagu 6 жыл бұрын

You are making the assumption that every single line in the text you're analyzing is positive or negative, respectively. A review can be a mix of positive, neutral, and negative statements. As such, your accuracy metrics are irrelevant.

@sentdex 6 жыл бұрын

In the case of my sample data, the text I am analyzing *is* either positive or negative. In reality, not everything is, but, in this case....it is :P

@CristiNeagu 6 жыл бұрын

Ok, fair enough. I thought that each sample text is a review, when it looks like they are sentences picked from reviews.

@sentdex 6 жыл бұрын

Yep, one file is all "positive" reviews and the other is all "negative" reviews. If you read them though, you'd probably suggest not all of them are clearly one way or the other. That's why I like using this set, it's a fairly realistic set that is quite challenging, and maybe even a bit noisy. Good for testing a classifier and it's confidence in scoring.

@ashutoshpatole6262 5 жыл бұрын

How about voice sentiment analysis @sentdex

@sentdex 5 жыл бұрын

That's actually a really cool idea! I'll have to ponder on that

@PatrickBateman12420 5 жыл бұрын

@@sentdex Ever watched the TV series Lie to Me? They used it in profiling ...

@asdfasdfuhf 6 жыл бұрын

Where'd u get that shirt?

@sentdex 6 жыл бұрын

pythonprogramming.net/store/

@wolfisraging 6 жыл бұрын

I am glad u r finally using sublime... Big fan always 😊

@CristiNeagu 6 жыл бұрын

Ikr? Now one more step until he starts using VS Code with a proper debugger and programming environment.

@beckettman42 6 жыл бұрын

The text editor wars never cease. I've been trying to convert to VS Code but it just doesn't *feel* as comfortable as my sublime setup.

@CristiNeagu 6 жыл бұрын

Michael Beckett I like it cause I have all my tools in one place, and I found Sublime too clunky to set up. But I understand what you mean. I used to use Visual Studio Community a lot, but when I started doing scripting at my work place I had to stop using it due to the obvious licensing issues. So I did spend a lot of time trying to find an IDE that does what I need it to do. I tried IDLE, Spyder, Sublime. No luck with them, only frustration. And when VS Code came out, it was perfect. Although, to be honest, I should have given PyCharm a try.

@beckettman42 6 жыл бұрын

I probably will convert in the next few weeks based on people's recommendations. I think I just feel a certain loyalty to my sublime setup since it took so long to get the way I like. I still need to get my VS Code to 'see' my folders and change the font...maybe after my latest 'golang' project. :)

@wolfisraging 6 жыл бұрын

Well as such Atom is really good and open source..... My recommendations are sublime and atom... VS code is good and perfect, but when I do extensive 'for loop' coding like neural networks, then the vscode lags sooooooooo much

@sandeepvk 6 жыл бұрын

You should start by saying what it is all about. Sentiment Analysis in 4 Minutes by Siraj is better video

@sentdex 6 жыл бұрын

No thanks.

@cruso2711 6 жыл бұрын

hey great video! Id be interested in the implementation of some unsupervised generative models, like GAN.

@johnnyboss1561 6 жыл бұрын

Please update the GTA V bot!!!

@dulalsandip7950 6 жыл бұрын

good one bro..if possible make video on raspberry coded with python for camera and it detects the object and send you mail and message in your email

@sentdex 6 жыл бұрын

Yes.

@iwannawatchDavid 6 жыл бұрын

Awesome more dash

@ankushsharma-gu7co 6 жыл бұрын

Thanks bro

@thorodinson7467 6 жыл бұрын

notification squaaad

@xxXXCarbon6XXxx 6 жыл бұрын

Thanks for the comparison. I have used Textblob on call centre data and thought it was ok, but wondered if there were alternatives. I had never heard of Vader, only NLTK. Given Textblob's ability to do sentiment analysis, text classification and tokenisation I think I'll stick to the Blob. BTW it is interesting to use MatplotLib to scatter chart out sentiment vs polarity to see how your test data looks.