Introduction to Text Analytics with R Part 1 | Overview

  Рет қаралды 153,283

Data Science Dojo

Data Science Dojo

Күн бұрын

Пікірлер: 105
@aachaves
@aachaves 5 жыл бұрын
Dave, I've done several trainings during my career, both online and in-person, and I can assure you that your teaching style is the best I've ever known. Congratulations, you have the gift. Well done!
@ahmadalamer59
@ahmadalamer59 4 жыл бұрын
Thank you for your contribution for the world David! you are amazing and your parents are proud of you.
@VijayKumar-pd8mu
@VijayKumar-pd8mu 7 жыл бұрын
Excellent Explanation, clear understanding about the topic. if anyone want to learn text analytics this is the best one .
@ambatista1982
@ambatista1982 7 жыл бұрын
Great video. I'm from Brail, and my level of English is beginner. but his teaching is very good, and I understood the idea and examples well. Thank you. I signed the Channel and left my like! A "hello" from Brazil to you!
@AnalyticsMaster
@AnalyticsMaster 7 жыл бұрын
Really appreciate your style of teaching Dave..... u are a super cooooooool teacher....
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@Anand Subramanian - Thank you for the compliment, always great to get confirmation! Glad you liked the video!
@dreznik
@dreznik 6 жыл бұрын
length(which(!complete.cases(df)) can be written as sum(!complete.cases(df))
@bexleymike
@bexleymike 7 жыл бұрын
Great video, Dave! This is just what I've been looking for.
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@Bexleymike - Glad you like the video!
@mAcCoLo666
@mAcCoLo666 7 жыл бұрын
Wonderful content. Just what I need to get up to speed for my university project about text mining :D
@sophiej4605
@sophiej4605 4 жыл бұрын
Thank you so much! If anyone has errors to replicate the lecture, use the codes and dataset the speaker uploaded. The link is above.
@vipulgupta7062
@vipulgupta7062 5 жыл бұрын
Great video...to start with..nice job
@gezahagnnegash9740
@gezahagnnegash9740 3 жыл бұрын
Thanks for sharing . It's really helpful for me
@Datasciencedojo
@Datasciencedojo 3 жыл бұрын
Glad you liked it, Gezahagn.
@kristyburns2363
@kristyburns2363 5 жыл бұрын
Love love love the way you teach! Thank you 🙏
@goodmanshawnhuang
@goodmanshawnhuang 4 жыл бұрын
Great work, thanks David for the wonderful explanation.
@OpalCrossCoaching
@OpalCrossCoaching 3 жыл бұрын
This is great content on text mining in R. I also have a channel that discusses text mining in R on data from the web, PDF documents and data frames.
@163ii
@163ii 3 жыл бұрын
Well articulated and clear. Thanks so much for this video.
@kobeoncount
@kobeoncount 11 ай бұрын
Dave, thank you for the brilliant series. Can you please tell if the codes in this series would be applicable to a project that aims to makes predictions on 3 categories (positive/neutral/negative)? Is there any importing detail I should know if I want to go for 3 categories? :)
@TheShekhar91
@TheShekhar91 7 жыл бұрын
Hi @Dave, While executing the below code I get an error, can you please comment on this: library(ggplot2) > ggplot(spam.raw, aes(x = TextLength, fill = Label)) + theme_bw() > + geom_histogram(binwidth = 5) + labs(y = "Text Count", x = "Length of Text", + title = "Distribution of Text Length with Class Lebels") Error in +geom_histogram(binwidth = 5) : invalid argument to unary operator > Thanks, Shekhar
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@Shekhar Tanwar - This modified code works: ggplot(spam.raw, aes(x = TextLength, fill = Label)) + theme_bw() + geom_histogram(binwidth = 5) + labs(y = "Text Count", x = "Length of Text", title = "Distribution of Text Length with Class Lebels") HTH, Dave
@jonimatix
@jonimatix 7 жыл бұрын
Great video, thanks for this. Keep them coming!
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@jonimatix - Glad you liked the video!
@Tracks777
@Tracks777 7 жыл бұрын
Nice content! Keep it up!
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@MisterBassBoost - Glad you like the video. We will producing new videos each week. Stay tuned!
@rajkumar-hh2hg
@rajkumar-hh2hg 5 жыл бұрын
Excellent video, thanks for this. Can you make some video with a multi-label classification problem?
@AdityaRaj-cu7jm
@AdityaRaj-cu7jm 4 жыл бұрын
hey @Dave getting this as error package or namespace load failed for ‘quanteda’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]): namespace ‘rlang’ 0.4.2 is already loaded, but >= 0.4.3 is required please help me out here please
@mohammedasadi
@mohammedasadi 5 жыл бұрын
Very helpful, thank you so much!
@nnennaumelloh8834
@nnennaumelloh8834 3 жыл бұрын
This is great! Thank you!
@haraldurkarlsson1147
@haraldurkarlsson1147 2 жыл бұрын
David, The code for creating a new variable called spam.raw$Text.length
@مشاعل-ت7ظ
@مشاعل-ت7ظ 4 жыл бұрын
Hi. what should i learn first? natural language processing or text analysis?
@karthik777777
@karthik777777 7 жыл бұрын
Wonderful video sir.. Do you have any lecture or material on unsupervised text analytics as well? unsupervised when I mean it is I have lots of server log and want to make some sense of out it.
@alexandregouveac
@alexandregouveac 6 жыл бұрын
Tkanks a lot for sharing your knowledge!
@davidcurrie2528
@davidcurrie2528 6 жыл бұрын
Kaggle has a download link for the spam.csv file - github doesn't seem to have a download option. Part 1 is great.
@TheShekhar91
@TheShekhar91 7 жыл бұрын
Hey @Dave, It started working, thanks anyway :)
@mhjrt
@mhjrt 6 жыл бұрын
Great video, thanks!
@midwest042003
@midwest042003 5 жыл бұрын
pls i am getting this error "Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection" when i run "cl
@kebman
@kebman 3 жыл бұрын
I've always been curious about the usage of Neo4j and graph databases in conjuction with text analytics. Of course, I'm a complete noob in this field, but it never the less fascinates me. So how would you do that?
@mattRRgraham1996
@mattRRgraham1996 4 жыл бұрын
HOW TO GET AROUND ERROR PRESENTED AT ~ @24:00 spam.raw$TextLength
@shagunsarraf2312
@shagunsarraf2312 10 ай бұрын
Thank you so much.
@pradeepvelavali9321
@pradeepvelavali9321 6 жыл бұрын
Nice explanation. But one small question " how does it differentiate spam & ham data? , because everything we took is raw data & all are messages only here" . Thanks in advance.
@evry1loveronica
@evry1loveronica 6 жыл бұрын
what's the difference between text count and length of text? Thank you so much for the awesome tutorial
@marcoanelli2045
@marcoanelli2045 6 жыл бұрын
Very, very good.If they can teach a 58 year old physiscian, they can teach anybody... :-)
@kebman
@kebman 3 жыл бұрын
Lol I have never used R. Let's hope for the best, guys!
@sonjawap5044
@sonjawap5044 7 жыл бұрын
R won't separate rows that contain quotation marks when I use read.csv. How do I solve this?
@rohitnagal3704
@rohitnagal3704 6 жыл бұрын
Here how we deal with corpus. And idf is negative in this case also ?
@shwetapatil5682
@shwetapatil5682 6 жыл бұрын
It's really amazing!! thank u so much.
@مشاعل-ت7ظ
@مشاعل-ت7ظ 4 жыл бұрын
what if i want to exclude stop words from stop_words() list how can i do it? i tried to to make custom stopwords but it didn't work.
@gilltim5711
@gilltim5711 2 жыл бұрын
I'm running this in RStudio Cloud version, and when I run this line of code: spam.raw$TextLength
@93jackjoe
@93jackjoe 6 жыл бұрын
Thank u so much! It's very much helpful!
@shunpeng3995
@shunpeng3995 5 жыл бұрын
this is a great video for me!
@vanshjauhari3671
@vanshjauhari3671 4 жыл бұрын
Hi Dave, my data is showing 2 missing values??
@cauliflower78
@cauliflower78 7 жыл бұрын
I am getting a error while reading the file: Error in make.names(col.names, unique = TRUE) : invalid multibyte string at 'v'
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@db-engineering - A couple of trouble shooting questions: 1 - Are you using the .CSV and .R files from the GitHub? 2 - What OS and version of R are you using?
@cauliflower78
@cauliflower78 7 жыл бұрын
Actually I was following the video and not using the code in github. After looking at the code in github adding the file encoding, I am not getting any error. Thank you very much for the awesome tutorial.
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@db-engineering - Glad you liked the tutorial and that you are now unblocked. Dave
@cauliflower78
@cauliflower78 7 жыл бұрын
In may laptop I have R version 3.3.2 and everything is working fine ( but its slow). I have R version 3.3.1 in a powerful workstation, where I do not have power to upgrade R. I am trying to run in with 3.3.1 but I am having hard time installing quanteda. I am getting following error. Error: package ‘RcppArmadillo’ 0.6.100.0.0 was found, but >= 0.7.600.1.0 is required by ‘quanteda’ * removing ‘/home/ruser_usapkota/R/x86_64-redhat-linux-gnu-library/3.3/quanteda’ Is there some workaround to successfully install ‘quanteda’ in R version 3.3.1?
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@db-engineering - My apologies on this, but I would strongly advise looking into updating R to the latest bits rather than try to deal with the network of varying package dependencies. In fact, I would suggest updating your laptop to the latest R version (i.e., v3.4.1 at the time of this writing). HTH, Dave
@jamesxiang1031
@jamesxiang1031 7 жыл бұрын
Hi Dave, Absolutely like your video, it helps me a lot on data analysis! By the way, what does spam.raw$Text do? I didn't see you create it earlier so I guess it is a command/function that R understand? Thanks, James
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@James Xiang - If I understand your question correctly you can interpret the R code along the lines of, "R, I would like you access the Text variable of the spam.raw data frame." HTH, Dave
@atelesteles4022
@atelesteles4022 6 жыл бұрын
Hi if i want search one word in one location in twitter, how i do?? i used this code cand
@ShubhamRai06
@ShubhamRai06 6 жыл бұрын
like ur style : Don't hasitate... Good one
@kunpu585
@kunpu585 7 жыл бұрын
Hi, when I am trying the code in line 64 in my computer, it is said "Error in nchar(spam.raw$Text) : invalid multibyte string, element 634".Can you please help me to deal this problem?
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@Kun Pu - This is likely due the encoding of the sms text message data file. I would suggest getting the code and data file from the GitHub to see if that unblocks you. HTH, Dave
@roadofskyluis
@roadofskyluis 7 жыл бұрын
Hi Dave I got this when I run line 34 Warning messages: 1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : invalid input found on input connection 'spam.csv' 2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@林以凡 - Have you downloaded the .R and .CSV files from the GitHub and see if this addresses your problem? Most folks find that getting the GitHub files allow them to troubleshoot issues they encounter. Let us know if you continue to run into issues with the GitHub files. HTH, Dave
@roadofskyluis
@roadofskyluis 7 жыл бұрын
I still got the issue, I downloaded your code and spam file on github. but this time issue happen more earlier. I run read.csv function and see this warning and object spam only include 20 obj. of 5 variables. 1: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : invalid input found on input connection 'spam.csv' 2: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : EOF within quoted string here is the spam object contains for first 6 rows: v1 v2 X X.1 X.2 1 ham Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat... NA NA NA 2 ham Ok lar... Joking wif u oni... NA NA NA 3 spam Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's NA NA NA 4 ham U dun say so early hor... U c already then say... NA NA NA 5 ham Nah I don't think he goes to usf, he lives around here though NA NA NA 6 spam FreeMsg Hey there darling it's been 3 week's now and no word back! I'd like some fun you up for it still? Tb ok! XxX std chgs to send, a£1.50 to rcv NA NA NA
@roadofskyluis
@roadofskyluis 7 жыл бұрын
spam.raw
@othman82637
@othman82637 7 жыл бұрын
Hello Dave, salute you for your great initiative, very effective and organised, would it be possible to guide us on the following command : > spam.raw$TextLength
@othman82637
@othman82637 7 жыл бұрын
I manage it : iconv(spam.raw$Text, "ISO-8859-1", "UTF-8") and it works
@钱其玮
@钱其玮 6 жыл бұрын
Just want to mention that to configure "fileEncoding = "UTF-16"" to "mac" , then it works in the command: spam.raw$TextLength
@MrJfernandes7
@MrJfernandes7 7 жыл бұрын
Dave, I saw your post on linkedin and came to check it out. I cloned the Github repository and when I ran this code I got the following error. >>> spam.raw$TextLength
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@Jorge Fernandes - Interesting, I was able to replicate on my Mac. I have updated all the files in the GitHub to reflect the fix. Apologies for the bug! Dave
@yoka8118
@yoka8118 7 жыл бұрын
UTF-8 encoding still wouldn't work. Latin1 would. "spam.raw = read.csv("spam.csv", stringsAsFactors = F, fileEncoding = "latin1")"
@nobody-in-spe
@nobody-in-spe 7 жыл бұрын
thanks Yo Ka
@avijitnandy6662
@avijitnandy6662 6 жыл бұрын
Hello Sir, I have a request, Can i use your code and the learning and demonstrate this whole in Bengali language and upload it, am I allowed to that. There are a lot of people who use this language and might be helpful for them to understand. As you know helping some one to understand in there mother tongue is the best way to teach. Thank You
@bedantamadhabgogoi
@bedantamadhabgogoi 5 жыл бұрын
Thank you !!
@daudkhan8642
@daudkhan8642 6 жыл бұрын
1. install.packages(c("gglot2","e1071","caret","quantenda","irlba","randomForest")) 2. spam.raw
@mortonwakeland3809
@mortonwakeland3809 7 жыл бұрын
OK, so what is R, say vs Voyant? or is R something else. Coming into this cold, not apparent. Thanks.
@desertrose00
@desertrose00 6 жыл бұрын
everything stops working as soon as I run: "spam.raw
@UpDownMichelle
@UpDownMichelle 6 жыл бұрын
This might be a silly question, but if you've installed ggplot or dplyr or any other package in a previous analysis (on the same machine), do you have to reinstall it EVERY time you want to use it? Or can you just install a bunch of packages once and then never have to do it again? Thanks for your videos, btw. I landed a pretty significant interview by watching these!
@MrShivam24
@MrShivam24 6 жыл бұрын
install everytime after you open R studio
@JuhiPandeyTiwari
@JuhiPandeyTiwari 6 жыл бұрын
No need to install every time.You just need to load the packages using " library("package name") " command.
@arunabhlala
@arunabhlala 5 жыл бұрын
@@MrShivam24 Not required
@yaoxie7139
@yaoxie7139 7 жыл бұрын
Hi Dave, Thanks for the video. When I run the code spam.raw$TextLength
@yaoxie7139
@yaoxie7139 7 жыл бұрын
@David Langer Yes, Dave. @Anubhav Dhiman Thank you!
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@Yao Xie - Did the solution from @Anubhav Dhiman address the problem?
@Datasciencedojo
@Datasciencedojo 7 жыл бұрын
@Anubhav Dhuman - Thank you for the fix. I have updated the code in the GitHub to reflect this and tested on both Windows and Mac OS X under R v3.4. Dav
@Dhrittinagpal
@Dhrittinagpal 7 жыл бұрын
Thanks David..this reapply saved my day..but can you explain what does - fileEncoding="latin1" - mean?
@seamansun8435
@seamansun8435 4 жыл бұрын
when I run the rode"spam.raw$TextLength
@tmpcox
@tmpcox 4 жыл бұрын
same problem.... did you find the answer?
@seamansun8435
@seamansun8435 4 жыл бұрын
@@tmpcox I do not find the answer
@tmpcox
@tmpcox 4 жыл бұрын
@@seamansun8435 thx! Mhh i have just followed the rest of the intro course without that part, for so far no problems at all👍
@arunabhlala
@arunabhlala 5 жыл бұрын
Does this series of videos give me the path to learn about extracting complex data in PDF file and then analysing them? Sir, please do reply
@rohitnagal3704
@rohitnagal3704 6 жыл бұрын
If we have to deal with 1 lakh articles then tfidf is relevant. Basically i am working on question answering algorithm
@slkslk7841
@slkslk7841 4 жыл бұрын
To the MALAYALI data scientist who noticed something at 23:42
@smritikalra3948
@smritikalra3948 7 жыл бұрын
Hi I am facing an issue. When I pass this command, spam.raw$TestLength
@kylenash4112
@kylenash4112 6 жыл бұрын
the GitHub is not working. thanks Dave
@Datasciencedojo
@Datasciencedojo 6 жыл бұрын
Hi Kyle, thanks for pointing that out. Both links work now!
Text Analytics Fundamentals | Introduction to Text Analytics with R Part 2
33:59
R for Text Analysis - Understanding R and RStudio
1:03:38
N8 CIR
Рет қаралды 5 М.
Mia Boyka х Карен Акопян | ЧТО БЫЛО ДАЛЬШЕ?
1:21:14
Что было дальше?
Рет қаралды 11 МЛН
Real Man relocate to Remote Controlled Car 👨🏻➡️🚙🕹️ #builderc
00:24
这是自救的好办法 #路飞#海贼王
00:43
路飞与唐舞桐
Рет қаралды 125 МЛН
Intro to Data Visualization with R & ggplot2
1:11:15
Data Science Dojo
Рет қаралды 275 М.
Text analysis in R. Demo 1: Corpus statistics
23:12
Kasper Welbers
Рет қаралды 20 М.
An Introduction to Topic Modeling
26:39
Summer Institute in Computational Social Science
Рет қаралды 67 М.
Data Pipelines | Introduction to Text Analytics with R Part 3
31:49
Data Science Dojo
Рет қаралды 38 М.
Introduction to Text Analysis in R
51:21
David Caughlin
Рет қаралды 11 М.
Topic modeling with R and tidy data principles
26:21
Julia Silge
Рет қаралды 62 М.
How to use GPT for text analysis in R
14:58
Social Identity and Morality Lab
Рет қаралды 4,2 М.
Mia Boyka х Карен Акопян | ЧТО БЫЛО ДАЛЬШЕ?
1:21:14
Что было дальше?
Рет қаралды 11 МЛН