This is truly so helpful as I am working on my Master's Thesis. Please let us know if there is a place we can donate! You've put so much work into these videos!
@marinstatlectures3 жыл бұрын
thanks for thinking of us, we appreciate it :) you can donate here (using PayPal): statslectures.com/support-us or, here (through GoFundMe): www.gofundme.com/f/free-statistics-for-all?rcid=r01-156512981924-6ae2c490de764123&pc=ot_co_campmgmt_w
@ariel4793 жыл бұрын
ur vids > my university course classes, tutorials, & textbook
@princezidan32282 жыл бұрын
the attach function is a life saver , thank you
@easydatascience2508 Жыл бұрын
You can watch mine. Playlists for Python and R give most of the fundamental tutorisla, with detailed source files downloadable.
@marinstatlectures6 жыл бұрын
When working with a large data set, you might only be interested in a small portion of it. How do you sort through all the variables and observations and extract the data that you need? R has several ways of sorting and selecting data in a process that is called “subsetting.” Brackets in R let you select, or subset, data from a vector, matrix, array, list or data frame. Watch this video to learn how to use square brackets in R. Use the dataset to practice along (bit.ly/2rOfgEJ)! Like to support us? You can Donate statslectures.com/support-us or Share the Videos.
@parthibanvk66379 жыл бұрын
All your videos are simple and awesome. These are simply awesome.. Thank a lot Marin.
@marinstatlectures9 жыл бұрын
you're welcome parthiban vk , we appreciate you saying that!
@inshort68316 жыл бұрын
You sir, are saving my degree right now
@DarkCrow029 жыл бұрын
These series of videos are very helpful. I'm taking a course at college for statistics, and this is the program we are using. I'm learning so much that in other courses that require statistics I prefer to use this program instead of any other program they recommend because of how easy I find it.
@marinstatlectures9 жыл бұрын
that's great to hear DarkCrow02 ! yeah, R can have a bit of a learning curve compared to others, but it's a much better software once you get comfortable with it.
@rameshjoshi40924 жыл бұрын
Sir Marin, What a teacher you are! Thanks a lot!
@MultiBilly858 жыл бұрын
Searched online for a long time going through countless R tutorials. So far, yours is the best! Clear audio and video and easy to understand. Thank you MarinStats! Subscribed!
@marinstatlectures8 жыл бұрын
good to hear +Bill Waqa , we're glad you're enjoying our videos!
@anmoljawa83677 жыл бұрын
appreciate the hard work that you have put in to these videos , very straight forward and easy to grasp
@hasanulgonisohayeb26997 жыл бұрын
I'm from Bangladesh. I'm really very much grateful to you sir...... May god bless you....
@gagegolish93065 жыл бұрын
Thank you! This series is by far the best R tutorial I've come across on KZbin.
@marinstatlectures5 жыл бұрын
i agree ;)
@marinstatlectures9 жыл бұрын
Hi +Andy Alikberov , you have your settings so that i am not able to reply to your comment, so i hope you see my answer here. I'm not sure what it is you are trying to accomplish with the command *FemSmoke 15*, but what this IS doing is asking R if each row/observation of the data is a female who is also over the age of 15. for a female older than 15, the variable *FemSmoke* will take on the value "TRUE" and for any observation that is not a female who is older than 15, the *FemSmoke* variable you are creating will take on a value of "FALSE". so this ends up being just a vector/variable with one column, and so that's why you get an error when you ask for certain rows and columns of FemSmoke...it''s because it does not have multiple columns... for reference, i have copy/pasted your question below (as your account settings don't allow me to reply directly to your comment): COMMENT FROM ANDY ALIKBEROV: Hey Mike, Could you say why when I type FemSmoke 15 and then > FemSmoke [2:23, ] it says: Error in FemSmoke[2:23, ] : incorrect number of dimensions (used your data source) Thank you very much! Your videos r amazing!
@kapishupadhyay7638 жыл бұрын
Thank you so much for explaining R in the easiest way. i knew nothing in R , but after watching your videos and practicing i am confident to learn R .A Big Thank you for sharing your knowledge
@marinstatlectures8 жыл бұрын
you're welcome kapish upadhyay , happy to hear that you've found our videos helpful!
@elizabethblears29197 жыл бұрын
This is so wonderful! You are a life-saver. As soon as I get my first pay-check as a graduate student, I am donating! Thank you again so much!
@marinstatlectures7 жыл бұрын
Thanks +Elizabeth Blears , we appreciate that :)
@mphat105 жыл бұрын
this is the only useful channel among youtube
@marinstatlectures5 жыл бұрын
thanks minh phat nguyen! glad you enjoyed the videos!
@hassanlangat3 жыл бұрын
following from Kenya Africa, am learing alot
@hasanulgonisohayeb26997 жыл бұрын
After a long search, I have got best video series to learn R....
@spvpapsp1239 жыл бұрын
So organised !! Great job !! Excellent tutor !! Subscribed :)
@Cchamac0Oo5 жыл бұрын
This tutorial really likes me and help me to understand how work with data, thank you
@marinstatlectures5 жыл бұрын
you're welcome, great to hear!
@turkialjrees96929 жыл бұрын
Thanx you my friend .... amazing , and very useful , and i am following your class since 6 months and becuase of your videos i am very good at R ......i am R ...geek
@marinstatlectures9 жыл бұрын
thanks +Turki Aljrees , that's great to hear!
@Kafelnokov4 жыл бұрын
Thank you , thank you for clarifying subsetting in R
@vishakvijayan9423 жыл бұрын
Excellent presentation sir
@dpippatan42919 жыл бұрын
i love your videos, learnt so much!! thank you so much!
@dhwalker868 жыл бұрын
I really appreciate your work. thank you.
@jangukochai6 жыл бұрын
Dude you are a lifesaver. Thanks so much.
@smrazaabidi14956 жыл бұрын
Marin, You awesome, great buddy for such a nice and neat explanation in just below 10 minute of time. Amazing ...
@lyssamkoa10 жыл бұрын
Thanks Mike! Your videos are incredibly helpful!
@marinstatlectures10 жыл бұрын
You're welcome Parapanda !
@jiangtao1336 жыл бұрын
the subset function is very useful.
@jhonalejandroacostadavila84807 жыл бұрын
Excellent tutor. Thank you!!
@John-ej7wn10 жыл бұрын
Hello Marin, I am loving the tutorials. They are helping me to understand more clearly than reading through documentation and saving tons of time. I had one question. 1) At home, on a very similar data.frame I have to use MaleOver15 15,] . Why do I have to add the extra LungCapData$ when it worked without it in your example? Error in `[.data.frame`(LungCapData, Gender == 'male', ) : object 'Gender' not found
@marinstatlectures10 жыл бұрын
Hi John Danson , thanks! its always tough to troubleshoot a coding error from a distance, but it is most likely because you have not attached the data, using: *attach(LungCapData)*. have you attached it? it is not recognizing the variable Gender, and this is likely why. when you type *LungCapData$Gender*, this tells R that it can find the variable Gender inside of the LungCapData. its a bit weird, as if you have not attached the data, it should also have a problem with Age, and require you to type *LungCapData$Age*. anyway, hope that helps...
@ujwaldey42707 жыл бұрын
Great video marin.
@charlesa87076 жыл бұрын
I did a summary for male only,and 85 caesarean. I check the data line 11 and 7 check yes for caesarean. Well, The lessons are very good.
@sinha1000009 жыл бұрын
awesome video MIKE!!
@marinstatlectures9 жыл бұрын
Thanks Aman Sinha !
@Sannito18 жыл бұрын
martinStats Thnx for these videos, I just love them
@mattpetitt54268 жыл бұрын
Thanks Mike for putting these together. They are a big help to anyone who is interested in learning R in a hands on manner. I have a question on how to create subsets (i.e. FemData and MaleData) within the dataset without using the attach function. For example using the command: mean(lungcapdata$Age[lungcapdata$Gender=="male"]) will return the average age of all males within the data set, but I'm not sure what the corresponding command to create the MaleData subset would be without using the attach command. Any input would be appreciated. Thanks!
@marinstatlectures8 жыл бұрын
Hi +Matt Petitt , thanks. the command as you entered it would be for if you did not attach the data. if you attach the data, then you don't need to use the $. (i.e.) if you attach the data, then you could just use *mean(Age[Gender=="male"])*. and if you wanted to create a separate set of data for only the males (if you've attached the data) then you can just enter *MaleData
@mattpetitt54268 жыл бұрын
+MarinStatsLectures , thanks for the prompt response. To clarify my question do you know of any way to create these subsets (i.e. FemData and MaleData) from within the original luncapdata set WITHOUT using the "attach" command?
@marinstatlectures8 жыл бұрын
Hi Matt Petitt , sure it's pretty much the same, except since you wont have attached the data, you will need to tell R where the variables are, by using *LungCapData$**variable.name* . to create a subset of just the male data, you can use *MaleData
@rahulchaudhary-zs7co7 жыл бұрын
Hi, thanks for the easy to understand video. I have one question though, is it possible to create QCC with subset data. If so how to go about it? Thanks
@drm14045 жыл бұрын
Hi I have a data frame with > 20 variables. there is one variable with 2 levels [yes and no], but "no" has much more cases than "yes": yes: 300 observations no: 12,000 observations for this variable, I want to extract: 300 yes 300 no and save it a new data set which include 600 observations [300 yes, 300 no] + all other variables for those 600 how to do that? Thank you
@Randyminder8 жыл бұрын
Hello. Great videso. Learning a lot. I'm curious why you didn't discuss the subset command in this video? Or do you cover this command in some other video?
@marinstatlectures8 жыл бұрын
Hi +Randy Minder , no reason in particular...there's lots of ways to get the same thing done in R, and i chose to present this one. i think the [ ] are a bit more flexible in what they can do, but you can easily accomplish the same thing using the *subset* command.
@Logictemplates8 жыл бұрын
Best ripper upload.
@arvindkumar-wv2wf3 жыл бұрын
hii your video is nice which csv file have you given there smoke in character these are all values in character when i practice i got an error i convert them into factor as well they give the error how to solve this problem
@saptarshilahiri95734 жыл бұрын
Btw if any of you wish to select more than one type of variable -- say, for example you had Yes, No & Unknown in Cesarean variable column -- how would you select only for values Yes & No? LungCapData[Caesarean=="yes" | Caesarean =="no", ] would be a way to do it. The | operator means "or".
@aliceusa3837 жыл бұрын
Awesome! What if I have an "NA" value in age or a missing value? When I input mean(age), R gave me a results of "NA". How to assign all the NA value into 0??? And how to let R skip the value of NA??
@bhavesh6553 жыл бұрын
superb!
@romanderafael36098 жыл бұрын
THANK YOU SO MUCH
@marinstatlectures7 жыл бұрын
you're welcome +Roman de Rafael !
@martinchacon25977 жыл бұрын
Hello. Why you don't you need specify the source of the variable? For example, you use brackets direclty for the variable Gender[ XX], instead of having to name it LungCapData$Gender [XX]. If I try to execute functions directly from the variable it does not let me.
@drvalenpori5 жыл бұрын
Thank you!
@marinstatlectures5 жыл бұрын
you're welcome
@karamccormack47208 жыл бұрын
Hi! I ran into a little issue.. before I saw that you gave the link to download LungCapData, I made my own LungCapData in excel, with random data and with the same column names. Then I downloaded the actual LungCapData and I'm running into an error message that some objects are masked. Can you tell me how to start from scratch and remove my old data set names? Thanks!
@marinstatlectures8 жыл бұрын
Hi +kara mccormack , sure, it is because you've *attached* both sets of data, and they have the same variable names. what you should do is *detach* the data, e.g. if you've called it "LungCapData" like ive done, then *detach(LungCapData)*. make sure to detach both sets of data. id suggest to enter the command a few times, as you may have attached it more than once. if it gives you an error message when detaching, then it means that the data is no longer 'attached' (that is a good thing). once you do this, then clear the workspace. you can do this manually in RStudio by clicking on there icon of the "broom". then you can re-import the data, and you should be good from there.
@kousik5358 жыл бұрын
You have created a a subset of all the Male data.Then you would like to find all the data which is having Gender to be male and age over 15.For this u write MalOver15=LungCapData[Gender=="male" & Age>15,].I tried doing MalOver15=Male[Age>15,] Logically both must work and must produce the same result but it does not. Can you explain why?
@marinstatlectures8 жыл бұрын
Hi +Kousik Krishnan , the two will not produce the same result, and are not equivalent statements. ive answered this question a few times in the comments below, so i will copy/paste the answer ive given before. it's also worth mentioning that you've written "Male" when you should use "MaleData", as that is the name of the object ive created in the video that contains only the male-data. ok, here's an explanation of what that wont work... **** heres what's happening there. the command you mention first, *MaleOver15 = LungCapData[Gender == "Male" & Age > 15, ]*, takes the data and pulls out the rows where age>15 and gender is male, all columns. you are trying to reproduce these same results in a different way. first you have used *MaleData = LungCapData[Gender=="male", ]* to create a subset of the data that is only the males, and all columns. the next command you type is the following: *MaleOver15 = MaleData[Age>15, ]*, which you are hoping will extract only those over 15 from the MaleData...*BUT ITS NOT DOING THAT*...it is probably giving you 177 rows, not the 89 rows that you are expecting. THE REASON: is the following....the Age>15 you have inside the square brackets is actually referring to the Age variable itself (the one in the data, for all genders), and not the Age of only the males. so, this statement is identifying rows of the full data set where ages are greater than 15, and then trying to extract those rows from the MaleData. in the full data set, there are 177 individuals with ages over 15, and you are identifying the rows of those individuals. it then extracts the rows of those individuals, from the object "MaleData". when it runs out of rows, it recycles from the beginning (e.g.) MaleData has 367 rows.... so if it tries to extract an object in row 368 from MaleData, it will recycle Male data, and row 368 will actually be the data for row 1 (368 = 367 + 1). THE SOLUTION: in the square brackets, you must identify that it is the Age that is stored in MaleData that you are referring to (as just Age itself will default to the entire variable Age). you can use the following, and it will produce the desired results *MaleOver15 = MaleData[MaleData$Age > 15, ]* This command will ask it to extract the rows of MaleData, only for rows where the Age that is stored in MaleData (MaleData$Age) is greater than 15.
@olsjonbaxhija67209 жыл бұрын
Hey, Mike. Quick question, if you are able to help. If I have a table with a column that has both positive and negative integers, how can I remove the rows with negative integers. The data looks a bit like this: NAME LUCKY NUMBER bob -9 mike 3 jim -18 steve 3.4 I would like it to only yield the following: NAME LUCKY NUMBER bob -9 jim -18 Thank you again for the videos.
@olsjonbaxhija67209 жыл бұрын
Nvm. I figured it out. Thank you though!
@marinstatlectures9 жыл бұрын
Olsjon Baxhija good to hear you solved it. and since I'm here replying anyway, i might as well mention how you can get that done, even though you've figured it out. just type in *newdata 0, ]* . this will store in a new object called "newdata", only the rows of "data" where LUCKY is greater than 0. the blank after the comma tells R to include all of the columns
@amitverma-ef1mb4 жыл бұрын
When I am using attach command it's showing: objects are masked from lungcapdata and then the name of variables... How can I solve it. And whenever I need to find the mean of Age it always asks me choose the file... Why it is happening... Please help...
@smooo329 жыл бұрын
Great explanation! How can I calculate missing values in one column? ?I want to practice more on R in website or app, do you have any recommendation
@marinstatlectures9 жыл бұрын
Hi +Norah Moh , thanks! I'm not sure exactly what you mean by "calculate missing values in one column". consider a variable "X". if you use the command *summary(X)*, R will return a summary, including the number of NAs for the variable. if you use *mean(X, na.rm=T)*. if you type *help(na.action)* in R, you will get a help menu for different actions you can take for NAs. you can also use *is.na**(X)* and R will return a vector/column of TRUE/FALSE answering whether or not each value is NA. and if you use *table(**is.na**(X))*, you will be returned a table, listing the number of non-NAs (FALSE) and the number of NAs (TRUE). hopefully, one of those has helped solve your issue.
@smooo329 жыл бұрын
Thank you so much I got the answer that I want by 2 method summary(x) and table(is.na(x))
@marinstatlectures9 жыл бұрын
good to hear, glad i could help Norah Moh
@kennethlimosnero21318 жыл бұрын
It would be nice if you give us the link where would we can download the raw data for interactive learning. Thanks!
@marinstatlectures8 жыл бұрын
Hi +Kenneth Limosnero , just click on "SHOW MORE" below the video, and you will see a link there to download the data. there, you will also see a list of the topics covered in the video, and the time at which they are covered.
@mmebella42977 жыл бұрын
Hello Mike, I am facing a problem with Class and levels. class(Smokes) as "character" ad levels(Smoke) as "NULL". Can you please help where am I going wrong?
@TheEverydayAnalyst4 жыл бұрын
When initializing a dataframe, pass stringsAsFactors = T in the initialization eg. dataFrame
@ncg8910 жыл бұрын
I typed this into R studio: quarter1
@marinstatlectures10 жыл бұрын
Hi ncg89 , I'm sorry but it's difficult to trouble shoot your problem from a distance.
@andrebieler79068 жыл бұрын
Nice videos. The indexing by variable name seems kind of inverted to me though. Why is it LungCapData[gender=="male",] instead of LungCapData[,gender=="male"]. When selecting by indexes I would need to do LungCapData[,5] to get the gender column. Is there a special reason for this or simply inconsistency?
@andrebieler79068 жыл бұрын
Ok forget about it. It makes perfect sense :)
@jonathannewsted41615 жыл бұрын
Hello, I made two subsets. One with adults and one with children and I am trying to add a variable to the subset of kids. I am trying to multiply age and height for those under 18. But I am not sure how to go about creating a new variable for the children subset data without it doing to the original data set and the adult data set.
@rahulgujar16156 жыл бұрын
hey Marin, thanks for the video ,,in my data set i tried the length command but it is showing me object not found..why so ?
@waswa1439 жыл бұрын
Hi Mike, Please tell me what is wrong in this code: " maleover15 15, ]" or why it is wrong
@marinstatlectures9 жыл бұрын
Hi Wasim Tarique , there's a few things it may be. you've probably take care of these, but just to make sure...1) you have to have the data "attached" to call on variables by name, like using Age, and not data$Age. 2) you have to have already created the object "maledata", which would be the data with only the males. assuming you've done those two, the most likely error is as follows: 3) within the square brackets, you have "Age", so this is going to find the rows where Age>15 in the entire data set (as the Age variable you've specified is the Age for the entire dataset, not the Age for only the males), and it will extract the corresponding rows from maledata. what you will need to type is *maleover15 15, ]*. you can notice the difference between the code you wrote and the code i wrote is the use of *maledata$Age* instead of just *Age*. this code will look at the Ages contained in "maledata", and not the entire data set.
@waswa1439 жыл бұрын
Hi MarinStatsLectures Thank you so much, I got your point.
@aman-xk2mk8 жыл бұрын
hi mike.. i was encountering one problem.. whenever i am reading the data in my console and displaying some of it's contents then always it is producing headers of its own like V1,V2,V3 and it is treating the headers Lungcap, Height, Age and likewise to be contents under those headers. Moreover it is treating all the numeric content in the lungcap data set to be factors.. how to deal with this?? By the way your lectures are proving very helpful for me .. thanks for your contribution..
@marinstatlectures8 жыл бұрын
Hi +amandeep singh , when reading in the data, you should include *header=T* within the *read.table* command. R doesn't know that the first row of your data is the variable names, and so it is giving them the generic names "V1", "V2",... and then the variable names are being put as the first observation for the variables. and since they're written as characters, R sees the variables as having text in them, and assigns them as factors. just make sure to let R know that the first row is the variable names (using header=TRUE), and it should solve the problem.
@indeeed03 жыл бұрын
Hi ! thank you for your videos, its help a lot. I am wondering, how could i find the same result of nrow(subset(dataframe,age >= 20 & age
@mashsyed980810 жыл бұрын
Mike, could I use logic statements to construct a random variable? I have income data (age, race, gender, education) and I need to come up with a random variable. So I was thinking I could do: Let P(X = the number of ppl > 40 yrs old with income < $50k and a BS degree or higher)? What is the best way to create random variables in R?
@marinstatlectures10 жыл бұрын
Hi Mashhood Syed , you can use logic statements to create yes/no (1/0) variables. suppose you wanted a variable that was an indicator for having Age>40 and Income
@mashsyed980810 жыл бұрын
Thank you Mike this is really helpful. I guess based on my understanding of random variables, I never considered as part of the definition that its no longer random since you have the observations. So is it correct to say that you first define the random variable and then you run the experiment? Just want to make sure I understand the concept correctly.
@marinstatlectures10 жыл бұрын
Hi Mashhood Syed , that is correct. we call it a random variable before the experiment. thats why in probability theory (like the binomial distribution, etc) they are referred to as random variables, and when talking about summarizing data, etc, they are referred to as variables (because they have been observed)
@DRR_daily6 жыл бұрын
Thank Marin It's really awesome tutorials for learning R But I face some problem I can't extract data from this command maledata = LungCapData[Gender=="male"] Error: Length of logical index vector must be 1 or 6 (the number of rows), not 725
@aratoud57569 жыл бұрын
Hi, I'm curious to know why we can use *mean(age[gender == "female"])* with attached data, but without doing it, *mean(data1$age[data$gender == "female"])* is invalid?
@aratoud57569 жыл бұрын
+Aratoud $ can't be used inside the functions, *$ operator is invalid for atomic vectors* , what's the solution?
@marinstatlectures9 жыл бұрын
Hi +Aratoud, the error is most likely coming from a typo. in one place you use the name "data1" and in another place you use "data". correct that, and you should be fine. one of those two is a vector/column of values, not a matrix/dataframe, and so it is giving you that error, letting you know that you can not extract things from a vector/column using the $.
@aratoud57569 жыл бұрын
+MarinStatsLectures Oh my bad, tq
@akhilp73908 жыл бұрын
hi mike , i am using edgeR for gene expession. but i am getting an error of "Length of 'group' must equal number of columns in 'counts'" even though they are equal. what should i do ?? please help me in it
@marinstatlectures8 жыл бұрын
Hi +Akhil Pampana , it can be quite difficult to troubleshoot errors from a distance, and in not sure on this one. i'd start with trying things like *length(group)* and *dim(data)* to confirm how long the "groups" variable is, and how many rows/columns are in your data, etc. sorry i can't be of better help, but without being able to play around with the data, i can't really tell what the issue is.
@akhilp73908 жыл бұрын
can i send you data so that u can work on it and let me know in a day ??
@marinstatlectures8 жыл бұрын
Hi Akhil Pampana , i'm sorry but I'm extremely busy at work over the next few weeks, and don't have time to work on this for you. if you explore things a bit you should be able to figure out where you are going wrong with this. good luck!
@akhilp73908 жыл бұрын
thank you
@malluk41277 жыл бұрын
Does anyone why I get the following: levels(Smoke) NULL
@Ikarus29858 жыл бұрын
When you extract the MaleOver15 data by using: MaleOver15=LungCapData[Gender=="male" & Age>15, ] I thought: Why not using the previous reduced dataset (MaleData) and extract from it Age>15 ? So I did: MaleOver15=MaleData[Age>15, ] What I got, was 177 obs. 6 var. where after row 90 (i=717) the dataset goes on with NA, NA.1, NA.2 ...etc - no case data in any var. In the first 90 obs. that still include cases with data there still seem to be males from all available ages and not +15. I can reproduce this problem but I do not know why it occurs. Your code works fine. Can you give me a hint why this difference occurs? Greetings from TU Darmstadt.
@marinstatlectures8 жыл бұрын
Hi +Ikarus2985 , i have answered this question many times in the comment section, so ive just copied/pasted a detailed answer ive provided previously...here it is: heres what's happening there. the command you mention first, MaleOver15 = LungCapData[Gender == "Male" & Age > 15, ], takes the data and pulls out the rows where age>15 and gender is male, all columns. you are trying to reproduce these same results in a different way. first you have used MaleData = LungCapData[Gender=="male", ] to create a subset of the data that is only the males, and all columns. the next command you type is the following: MaleOver15 = MaleData[Age>15, ], which you are hoping will extract only those over 15 from the MaleData...BUT ITS NOT DOING THAT...it is probably giving you 177 rows, not the 89 rows that you are expecting. THE REASON: is the following....the Age>15 you have inside the square brackets is actually referring to the Age variable itself (the one in the data, for all genders), and not the Age of only the males. so, this statement is identifying rows of the full data set where ages are greater than 15, and then trying to extract those rows from the MaleData. in the full data set, there are 177 individuals with ages over 15, and you are identifying the rows of those individuals. it then extracts the rows of those individuals, from the object "MaleData". when it runs out of rows, it recycles from the beginning (e.g.) MaleData has 367 rows.... so if it tries to extract an object in row 368 from MaleData, it will recycle Male data, and row 368 will actually be the data for row 1 (368 = 367 + 1). THE SOLUTION: in the square brackets, you must identify that it is the Age that is stored in MaleData that you are referring to (as just Age itself will default to the entire variable Age). you can use the following, and it will produce the desired results MaleOver15 = MaleData[MaleData$Age > 15, ] This command will ask it to extract the rows of MaleData, only for rows where the Age that is stored in MaleData (MaleData$Age) is greater than 15.
@Ikarus29858 жыл бұрын
Haha, mindblowing. Would never thought of that. But I remember you mentioned this issue in one of the videos. I did a short search on the topic in the comments but could not identify the object. Well, thank you for the plausible and comprehensive answer :)
@marinstatlectures8 жыл бұрын
no problem, happy to help ;)
@chenxiqiu88009 жыл бұрын
Hi Mike, Thanks for the great video! Just a quick question: I tried MaleData
@marinstatlectures9 жыл бұрын
Hi Chenxi Qiu , its hard to troubleshoot errors like this from a distance, but i notice that in the square brackets you have male spelt with a lowercase m, and it should be a capital M, so it may be related to that.
@chenxiqiu88009 жыл бұрын
Hi MarinStatsLectures , Thanks for the quick response! However, it doesn't look like this is the case. The data I downloaded has "Gender" with capital "G" but "male" with low case "m". I also tried the command with lower case male, and it didn't work (I found the MaleData with no observations). I understand that this is hard to troubleshoot from a distance, I think I should google it by myself. Thanks anyways! Best tutorial I have found on R.
@marinstatlectures9 жыл бұрын
sorry about that, i haven't worked with this data for a bit, and though it was with a M, not m, but you're right. one other thing i can think of is that a single "=" is used to assign things/values to an object, while a double "==" is used to mean equality in the mathematical sense. (e.g.) x=5 will assign the value 5 to x, while x==5 can be used to ask if the object x has a value of 5 or not? so, you must be careful when using equal signs. good luck working through the issue, and thanks for the compliment!
@waswa1439 жыл бұрын
Hi +Chenxi Qiu, see serial number before 725 I'm sure 724 is not there. Because that one is "Female". So as per formula it is showing 367 rows, which are "male"
@chenxiqiu88009 жыл бұрын
Wasim Tarique Thanks for the response Wasim! However, this doesn't seem to be the case for me, since my output has 725 rows, and every row was shown as "male". It was kind of weird though
@tiisetsomashele14947 жыл бұрын
whenever I use the file.choose() command, the result is incorrect: R reads it as n observations with only one variable, even if there are more that variable. I've tried retyping the dataset but no success. What is wrong?
@tiisetsomashele14947 жыл бұрын
'more than one variable' I meant
@brittnifoster86606 жыл бұрын
If some of my data includes "NA" how do I exclude those values from a subset?
@JeanPierre70ju6 жыл бұрын
Hello, I have a question on the 3:12, I typed summary(Gender) and it shows like this: Length Class Mode 725 character character I don't know why it happens as I did the exactly same as you did. Thank you for videos btw, It helps me a lot, appreciate it.
@JeanPierre70ju6 жыл бұрын
Also, as yours shows that it gives the order number of Maleover15 such as 11th 12th 23th 40th when you type Maleover15[1:4, ]. But mine only shows 1 2 3 4, Why is it different? Is it a matter of version?
@marinstatlectures6 жыл бұрын
R is seeing "Gender" as a 'character' and not as a 'factor'. you can change it to a factor (a categorical variable) using *Gender
@JeanPierre70ju6 жыл бұрын
Ah. thanks a lot!
@nawaidkhan61664 жыл бұрын
hey im getting this error Warning message: In Sex == "male" & Age > 20 : longer object length is not a multiple of shorter object length Can you please tell me whats wrong .
@syremusic_4 жыл бұрын
For some reason, I'm not able to filter using the default square brackets, and have to instead use the filter() command from dplyr. Anyone else have the same issue?
@andywolf71409 жыл бұрын
Hey Mike, Could you say why when I type FemSmoke 15 and then > FemSmoke [2:23, ] it says: Error in FemSmoke[2:23, ] : incorrect number of dimensions (used your data source) Thank you very much! Your videos r amazing!
@ipwoman267 жыл бұрын
Do you still have this question? I can answer if you do.
@lakmalmudalige57568 жыл бұрын
@Mike. Dear Mike, thank you for your easy to follow videos. Having worked through your data set and videos I have moved onto my own data. However, I am having trouble with subsetting. I have a data frame with 201 observations and 219 variables. I am able to get a summary of the observation (haemoglobin) however when I want to find the mean haemoglobin for females I type the command, mean(dataframe[Sex=="Female"] I get the output NaN. I don't have missing data as far as I aware. What am I doing wrong
@lakmalmudalige57568 жыл бұрын
Is it because I have so many columns - I read somewhere that the R studio has a limited number of columns
@marinstatlectures8 жыл бұрын
Hi +Lakmal Mudalige , it will not have to do with the number of columns...219 variables actually isn't very much in terms of what would be considered "big". there is most likely an error in the syntax. i would suggest to start by checking that (e.g.) is your dataset really named "dataframe"? and if so, this may be causing the error, as dataframe is the name of an object in R. for example, if you named a variable "mean" this would confuse R, as when you type in mean, it wont know if you are referring to that variable, or to the function to calculate a mean. so, I'm guessing that this is what is causing the problem....if you've named your data "dataframe", change that, and then you will probably be ok. if it's not that, id also suggest to check that the variables are named "Sex" and not "sex", or that the categories/level is "Female" and not "female", and so forth. but i really think it has to do with the use of "dataframe"
@lakmalmudalige57568 жыл бұрын
Thank you, I think the problem was that I had a column named mean and was confusing R. Having sorted that I am able to get through :) May I ask another question, I have a factor column called "Type of Stroke" with 5 options. Is there a way to subset the data for all rows for those without the factor "Ischamic stroke", I can subset for "Ischeamic Stroke", is there a reciprocal or inverse or not equal to function?
@marinstatlectures8 жыл бұрын
Hi +Lakmal Mudalige , sure you can do that...in R "!=" means "not equal", so you would enter something like (depending on the exact variable names) *NewData
@lakmalmudalige57568 жыл бұрын
Once again, many thanks for your kind instructions, worked at charm!
@treyhannam38064 жыл бұрын
I net null as the levels for the string variables (Smoke, Gender, C)
@ord2exordlife10 жыл бұрын
Hey Marin, thank you for posting the videos. I have question about the the statement " MaleOver15 = LungCapData[Gender == "Male" & Age > 15, ]". I tried MaleOver15 = MaleData[Age>15, ] but that never worked. Can you explain why?
@marinstatlectures10 жыл бұрын
Hi venomnert , good question!...heres what's happening there. the command you mention first, *MaleOver15 = LungCapData[Gender == "Male" & Age > 15, ]*, takes the data and pulls out the rows where age>15 and gender is male, all columns. you are trying to reproduce these same results in a different way. first you have used *MaleData = LungCapData[Gender=="male", ]* to create a subset of the data that is only the males, and all columns. the next command you type is the following: *MaleOver15 = MaleData[Age>15, ]*, which you are hoping will extract only those over 15 from the MaleData...BUT ITS NOT DOING THAT...it is giving you 177 rows, not the 89 rows that you are expecting. THE REASON: is the following....the *Age>15* you have inside the square brackets is actually referring to the Age variable itself (the one in the data, for all genders), and not the Age of only the males. so, this statement is identifying rows of the full data set where ages are greater than 15, and then trying to extract those rows from the MaleData. in the full data set, there are 177 individuals with ages over 15, and you are identifying the rows of those individuals. it then extracts the rows of those individuals, from the object "MaleData". when it runs out of rows, it recycles from the beginning (e.g.) MaleData has 367 rows.... so if it tries to extract an object in row 368 from MaleData, it will recycle Male data, and row 368 will actually be the data for row 1 (368 = 367 + 1). THE SOLUTION: in the square brackets, you must identify that it is the Age that is stored in *MaleData* that you are referring to (as just Age itself will default to the entire variable Age). you can use the following, and it will produce the desired results *MaleOver15 = MaleData[MaleData$Age > 15, ]* This command will ask it to extract the rows of MaleData, only for rows where the *Age that is stored in MaleData (MaleData$Age)* is greater than 15.
@ord2exordlife10 жыл бұрын
MarinStatsLectures Well explained sir! I understand now.Also one more question...if you don't mind, can you recommend a comprehensive book on R for a beginner?
@narangaman69 жыл бұрын
Thanks Mike !! I was facing exactly the same Query :-)
@marinstatlectures9 жыл бұрын
great to hear aman narang ...two birds with one stone ;)
@TheEverydayAnalyst4 жыл бұрын
@@marinstatlectures Bird three is here after 5 years.
@leosizaret41048 жыл бұрын
Amazing set of videos! I am confused though: making a variable MaleOver15_2 = MaleData[Age>15, ] is different from MaleOver15 = LungCapData[Gender=="male" & Age>15, ]. Why is that? I find that many obs in MaleOver15_2 are under age 15, which means I definitely did something wrong! Any idea? :) Thank you!
@marinstatlectures8 жыл бұрын
Hi +Leo Sizaret , heres what's happening there. the command you mention first, MaleOver15 = LungCapData[Gender == "Male" & Age > 15, ], takes the data and pulls out the rows where age>15 and gender is male, all columns. you are trying to reproduce these same results in a different way. first you have used MaleData = LungCapData[Gender=="male", ] to create a subset of the data that is only the males, and all columns. the next command you type is the following: MaleOver15 = MaleData[Age>15, ], which you are hoping will extract only those over 15 from the MaleData...BUT ITS NOT DOING THAT...it is probably giving you 177 rows, not the 89 rows that you are expecting. THE REASON: is the following....the Age>15 you have inside the square brackets is actually referring to the Age variable itself (the one in the data, for all genders), and not the Age of only the males. so, this statement is identifying rows of the full data set where ages are greater than 15, and then trying to extract those rows from the MaleData. in the full data set, there are 177 individuals with ages over 15, and you are identifying the rows of those individuals. it then extracts the rows of those individuals, from the object "MaleData". when it runs out of rows, it recycles from the beginning (e.g.) MaleData has 367 rows.... so if it tries to extract an object in row 368 from MaleData, it will recycle Male data, and row 368 will actually be the data for row 1 (368 = 367 + 1). THE SOLUTION: in the square brackets, you must identify that it is the Age that is stored in MaleData that you are referring to (as just Age itself will default to the entire variable Age). you can use the following, and it will produce the desired results MaleOver15 = MaleData[MaleData$Age > 15, ] This command will ask it to extract the rows of MaleData, only for rows where the Age that is stored in MaleData (MaleData$Age) is greater than 15.
@leosizaret41048 жыл бұрын
Thank you for the detailed & fast answer Marin! I never would have thought of that, I'm quietly fascinated by how you figured that out :D
@marinstatlectures8 жыл бұрын
it's knowledge that comes from years of using R +Leo Sizaret ;)
@bokkieyeung5046 жыл бұрын
why the command 'MaleOver15 15, ]' and even 'MaleOver15 15, ]' can extract the correct subset of the data? the logic should be: MaleData is the subset data from LungCapData, which adds the constraint that Gender=='male', then 'MaleOver15 15, ]' should extract the sub-subset data from MaleData, which adds the constraint that Age>15, but it does not work, why?
@marinstatlectures6 жыл бұрын
take a look at the comment that is pinned at the top of the comments section for this video,...iv'e answered in detail there
@manashigogoi95874 жыл бұрын
mean(Age[Gender=="male"]) Error in mean(Age[Gender == "male"]) : object 'Age' not found can you tell me why??
@nizomov48744 жыл бұрын
First u have to attach the data
@nizomov48744 жыл бұрын
Attach(lungcapdata)
@japeprasad10 жыл бұрын
Hello Marin.. I love your videos.. your step by step method is really good.. I was trying to put out data where sales is above mean. Tried different variation, however, it is working only when I am using $ to point particular column (variable).. CDM = Dataset Name, Sales = variable, rows = 36 OverMean 200000, ] ----- This is only working OverMean 200000, ] ----- Not Working OverMean mean(CDM$Sales), ] --- Not Working Any help is appreciated !
@marinstatlectures10 жыл бұрын
Hi Prasad Jape , thanks for the comments! The other two are not working for the following reason...The Variable "Sales" is stored in the object "CDM", and so R will not recognize "Sales" on its own. if you type "CDM$Sales", then you are asking R to look into the object CDM for a variable named Sales, and R finds this. If you use the R command *attach(CDM)*, then all of the variables in the object (the data) will be visible by R, and yo will not need to use the dollar sign. If you work with data attached, you should always detach it using *detach(CDM)* when you are done working, so that the variable names are removed from R's memory.
@japeprasad10 жыл бұрын
MarinStatsLectures Thank you for prompt response. Yes, I have learned attach and detach command from previous video.CDM[CDM$Sales] should be right.. CDM twice. In LungCapData, when we are pulling out data by Genger, We have written LungCapData only once. Did we attach data with R Studio? I remember attaching in previous video, not in this video though.
@marinstatlectures10 жыл бұрын
Hi Prasad Jape , yes, in all of the videos, I work with the data attached. Usually I've already loaded the data into R and attached it, prior to starting the video (you can see the R code there to read it in and attach it). At the start of the videos, I usually just say "I've already imported the data into R and attached it". You can work either way, by attaching the data, or by using the $ to extract the variables from the data itself. Most programmers will tell you they prefer to NOT attach the data, and this is for good reason. Attaching runs the risk of over-writing variables if you don't detach the data, and then attach a different set of data with the same variable name. I prefer working with the data attached as it shortens the code I have to type, and I just make it a habit to detach data every time I stop working in R, and make sure to clear my workspace regularly.
@japeprasad10 жыл бұрын
MarinStatsLectures Thank you very much Marin. That was so much helpful... :) :)
@marinstatlectures10 жыл бұрын
You're welcome Prasad Jape !
@superstar31796 жыл бұрын
If we have to find the mean age of female and if we introduce transgender in it what will the code for it. If we have the ages of these two at the same time without the male of age then what is the code for it ?
@stallionheart1399 жыл бұрын
MaleOver15 = MaleData[Age > 15, ] Why does this not give the males greater than 15 years? We've already created MaleData object so why again we are searching through LungCapData and entering Gender and Age conditions?
@marinstatlectures9 жыл бұрын
Hi +Siddharth Ranadive , the error is that the "Age" you use in the second command is looking at the Age variable, not the age from the subset of only the male data...you would have to use: *MaleOver15 15 , ]* this command will look at the "Age" contained in the male data, whereas what you typed looks at the Age variable for all of the data...the ages of all people , not the subset of the male data...hope that made sense. i answered the same question in a thread below in a bit more detail, so you can check that out too...
@stallionheart1399 жыл бұрын
+MarinStatsLectures Thanks for the prompt reply!
@sindhubiswas65396 жыл бұрын
> t=mean(data1$Age[data1$Gender=="female"]) > t [1] NaN > sir why is it showing not a number...any help please
@muaadh_abdoalsabri19416 жыл бұрын
hi every one and thanks a lot for this explanation but i have problem for 1. levels(Gender) and 2. mean(Age[Gender=="femal"]) the first one it com Nul the second NaN
@siddharthmagadum164 жыл бұрын
I'm no able to filter th dataset with Gender=='male' using the same syntax
@marinstatlectures4 жыл бұрын
I’m sorry, I’m not able to provide any help or suggestions with that limited info. Providing a bit more info, I may be able to solve the issue for you
@siddharthmagadum164 жыл бұрын
@@marinstatlectures Thanks for replying, in my dataset, the class(Gender) is returning "character" . I want to convert it into "factor" class type. How to do that?
@Manishjain-qs9fz6 жыл бұрын
data1 = maledata[Age==15,] maledata is subset of lungcapdata of male but when im subsetting maledata with condition to get age of only 15 why it is not creating data of my use
@chanuyou6 жыл бұрын
This is because the Age variable that you have used in the square brackets is referring to the original LungCapData table and not the maledata table that you have extracted. In order for this to work, you need to write maledata$Age==15 so that the compiler knows that the Age variable you are referring to is of the maledata table and not the main data table.
@deepikajain46198 жыл бұрын
please tell me how i can sum of digit of a numberi r studio...
@marinstatlectures8 жыл бұрын
Hi +deepika jain , i'm not completely clear on your question, but here's a few things that will likely help. to sum two numbers, just use the addition sign, (e.g.) *5 + 2*, and R will return a value of 7. if you want to sum the values in a common, etc, you can use the sum command, (e.g.) *sum(x)* , and R will sum up all of the values stored in the vector "x". hope that helps...
@deepikajain46198 жыл бұрын
Sir I asking that I have a four digit number for example-1234 Now I want to sum of digits of this number Such as-1+2+3+4=10 So I asking what I use for this..
@iacaldrod7 жыл бұрын
For me you may save the number as a string. Iterate over it's parts and sum them as integers. Maybe a task better achieved with a for loop
@malluk41277 жыл бұрын
When I ask for the summary data, here's what I get: summary(Gender) Length Class Mode 725 character character platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 4.2 year 2017 month 09 day 28 svn rev 73368 language R version.string R version 3.4.2 (2017-09-28) nickname Short Summer
@gauravgregrath66067 жыл бұрын
using this command doesnt give me the desired result plz help maleover10 10 , ] it still gives me all the ages
@marinstatlectures7 жыл бұрын
Hi, ive answered this questions (but for age 15) many times, so just take a look through the comments/replies, and you'll find a lengthy explanation of why this is happening...
@michaelguzman41365 жыл бұрын
i need to find the average of earnings of people whos height is at most 67, how do i do that
@marinstatlectures5 жыл бұрын
Hi, you can use use: *mean(earnings[height
@CntrlAltUniverse4 жыл бұрын
03:56 2 factors
@MyExSports8 жыл бұрын
# Funny thing that happened and I can´t think why! > mean(Age[gender=="female"]) [1] 12.35359 > mean(Age[Gender=="female"]) [1] 12.44972Any ideas? Cause I think the first one just shouldn´t work!?
@marinstatlectures8 жыл бұрын
Hi +MyExSports , you're right in that it shouldn't work. its hard to trouble-shoot these things from a distance, but it may be that you have another variable in R named "gender", and that s causing confusion. if you enter the command *ls()* into R, it will return a list of all things in its memory (you an also see this list in RStudio, if you look in the top-right corner, where you see "environment"). there, you can see if you have another variable named "gender". but beyond that, I'm not sure where the error is coming from.
@ajafterparty8 жыл бұрын
Fantastic I like the intro I'd like to invite you to consider some of my recording videos or listen to any of my music. If you'd like to know how to make a certain sound, mix, produce or make a recorded instrument better let me know in the comments. Cheers
@manashigogoi95874 жыл бұрын
mean(Age[Gender=="female"]) Error in mean(Age[Gender == "female"]) : object 'Age' not found can you tell me why??
@张源逢4 жыл бұрын
did you load your dataset? Age is a attribute of dataset so you must load your dataset first then assign 'Age'
@张源逢4 жыл бұрын
you could use mean(LungCapData1$Age[LungCapData1$Gender == 'female']) to approach same mean.