Changing Numeric Variable to Categorical in R | R Tutorial 5.4 | MarinStatsLectures

  Рет қаралды 178,251

MarinStatsLectures-R Programming & Statistics

MarinStatsLectures-R Programming & Statistics

Күн бұрын

Changing Numeric Variable to Categorical (Transforming Data) in R: How to convert numeric Data to categories or factors in R deal with nonlinearity in linear regression and more. Free Practice Dataset (LungCapData) here: statslectures.com; More Statistics & R Programming Videos: goo.gl/4vDQzT
►► Like to support us? You can Donate (bit.ly/2CWxnP2), Share our Videos, Leave us a Comment and Give us a Thumbs up! Either way We Thank You!
In this R video tutorial we will learn to create a categorical variable (a factor or qualitative variable) from a numeric variable in R using the "cut" command (function). In this R video you will also learn how to label the categories and make the intervals left-closed or right-opened using the "labels" and "right" arguments.
Transforming data, converting a continuous variables into categorical variable or factors, is useful for making cross-tabulations for a variable, fitting a regression model when the linearity is not valid for the variable and more.
This video is a tutorial for programming in R Statistical Software for beginners, using RStudio.
■Table of Content:
0:00:08 Why should we convert a numeric variable into categorical variable or factors
0:00:44 How to use the "cut" command in R to convert a continuous variable into a categorical variable
0:00:53 How to access the help menu for the “cut” function in R programming language
0:01:21 How to specify the break points for the new categorical variable in R
0:01:51 How does R treat the border observations in a categorical variable
0:02:08 How to name or label the categories that we created in R using the “labels” argument
0:02:57 How to change the way R treats the border observation in a categorical variable so that the intervals are left-closed or right-opened using the "right" argument within the "cut" command.
0:03:33 How to label the categories in a categorical variable in R programming language
0:04:43 How to tell R to create a certain number of categories or levels rather than specifying the break points ourselves
► ► Watch More:
► Intro to Statistics Course: bit.ly/2SQOxDH
►Data Science with R bit.ly/1A1Pixc
►Getting Started with R (Series 1): bit.ly/2PkTneg
►Graphs and Descriptive Statistics in R (Series 2): bit.ly/2PkTneg
►Probability distributions in R (Series 3): bit.ly/2AT3wpI
►Bivariate analysis in R (Series 4): bit.ly/2SXvcRi
►Linear Regression in R (Series 5): bit.ly/1iytAtm
►ANOVA Concept and with R bit.ly/2zBwjgL
►Hypothesis Testing: bit.ly/2Ff3J9e
►Linear Regression Concept and with R Lectures bit.ly/2z8fXg1
Follow MarinStatsLectures
Subscribe: goo.gl/4vDQzT
website: statslectures.com
Facebook:goo.gl/qYQavS
Twitter:goo.gl/393AQG
Instagram: goo.gl/fdPiDn
Our Team:
Content Creator: Mike Marin (B.Sc., MSc.) Senior Instructor at UBC.
Producer and Creative Manager: Ladan Hamadani (B.Sc., BA., MPH)
These videos are created by #marinstatslectures to support some courses at The University of British Columbia (UBC) (#IntroductoryStatistics and #RVideoTutorials for Health Science Research), although we make all videos available to the everyone everywhere for free.
Thanks for watching! Have fun and remember that statistics is almost as beautiful as a unicorn!

Пікірлер: 93
@shannondiener5372
@shannondiener5372 7 жыл бұрын
You have no idea how you have saved my life with your videos! I am using R to analyse my data for my M.Sc. I had no prior knowledge of R at all. I have watched your videos from the very first one to this one and now i understand how to use R to analyze me data. Such a milestone has been conquered because of you. Thank you
@jagdixit
@jagdixit 9 жыл бұрын
One of the best R tutorials course
@marinstatlectures
@marinstatlectures 9 жыл бұрын
thanks jagdixit !
@meshackamimo1945
@meshackamimo1945 9 жыл бұрын
Wow, I love this. I will use it for geospatial modeling of different aquifer type zones in northern Kenya. Keep up, Marin! May God grant you the grace to carry on with this noble mission.
@cigan77
@cigan77 9 жыл бұрын
Thanks! The world needs people like you!
@marinstatlectures
@marinstatlectures 9 жыл бұрын
thanks Angelo Cignarelli , we appreciate that!
@kathyvolke8622
@kathyvolke8622 4 жыл бұрын
Excellent! Thank you for showing and explaining in sufficient detail without skipping steps.
@AyobamiAdeyeye
@AyobamiAdeyeye 5 жыл бұрын
You have saved me too many times, I wish I could hug you right now. Thank you
@marinstatlectures
@marinstatlectures 5 жыл бұрын
you're welcome... ((hug))
@user-jd8fr9bj6v
@user-jd8fr9bj6v 9 жыл бұрын
@MarinStatsLectures most easy to grasp and useful tutorial ever encountered!!! thank you !!
@marinstatlectures
@marinstatlectures 9 жыл бұрын
thanks +gaurav patle , we appreciate that!
@Jinalchoksey
@Jinalchoksey 5 жыл бұрын
Hi Marin!!!Thank you so much, your videos have been a real blessing in understanding R. One of the simplest and uncomplicated ways of coding in R. Such a relief.....!!!!!
@marinstatlectures
@marinstatlectures 5 жыл бұрын
you're welcome :)
@esperanzazagal7241
@esperanzazagal7241 4 жыл бұрын
You are a good man. This function saved me.
@smrazaabidi1495
@smrazaabidi1495 7 жыл бұрын
Mike, We are blessed for having exclusive instructor/teacher like you. I really do not have any words to praise you. I have a question that you told in this video that how to convert numeric variable to categorical but could you please tell me that how to convert categorical variable to numeric in R. Very grateful to you. Thanks & regards, Abidi.
@marinstatlectures
@marinstatlectures 7 жыл бұрын
thanks! you can use *as.numeric(variable.name)* to convert a variable to numeric in R...although, generally, it wont make sense to convert something categorical into a numeric variable, (e.g.) "sex", "country of birth", "ethnicity", etc, dont really make sense when numeric. what R will end up doing is it will make the first category into "1" and the second category into "2" and so on. there are, of course, times where this may make sense to do...like a likert scale (rate from 1-5) which is technically an ordinal-categorical variable, but could be treated as numeric as times
@smrazaabidi1495
@smrazaabidi1495 7 жыл бұрын
Great ! got it. One more question is that when you are coming up with logistic regression in R tutorial or other machine learning algorithm tutorials? As I am PhD Scholar (Year-3) in Shanghai, China, basically from Pakistan and working on application of machine learning/data mining/learning analytics in education. Thanks & regards, Abidi.
@marinstatlectures
@marinstatlectures 7 жыл бұрын
we would like to make more tutorials as we have time (especially covering different regression models), part of the challenge is that it takes a lot of time to create them, and difficult to squeeze in between our jobs and taking care of our child. we make very little money from these videos (other than the very occasional donation we receive), so they end up getting prioritized below our paid work (Vancouver is an unbelievably expensive city to live in :) )
@smrazaabidi1495
@smrazaabidi1495 7 жыл бұрын
Yes, indeed, British Colombia, Vancouver is very expensive like Shanghai, China, I really appreciate your work, your dedication and your style of imparting complex information to audiences. I warmly wish for your future endeavor and prosperity of your family. I will surely be a part, run a campaign and promote your mission to every student and educator in China and collect as many donation as I can. Its my dream to get opportunity for Post Doctorate position in University of British Columbia, there i exhibit my expertise and collaborate with you to accomplish your mission. :) )
@gurumahadev5702
@gurumahadev5702 5 жыл бұрын
The cut function superb. Thanks a lot for making this video.
@marinstatlectures
@marinstatlectures 5 жыл бұрын
you're welcome :)
@joseluisbeltramone599
@joseluisbeltramone599 4 жыл бұрын
I’m learning a lot with your videos, Mike. Thanks a lot!
@SaaSGuuy
@SaaSGuuy 9 жыл бұрын
One of the best R tutorial video online I have ever watched. You should do this on Coursera!!
@marinstatlectures
@marinstatlectures 9 жыл бұрын
thanks Palm Phuwarat ! once we have built up our video library a bit more, we probably will do something like that.
@eatdemonsforlunch
@eatdemonsforlunch 2 жыл бұрын
This was so incredibly helpful! I appreciate you taking the time to make these!!
@azmal6158
@azmal6158 5 жыл бұрын
Huge help for my fisheries thesis data, thanks!
@marinstatlectures
@marinstatlectures 5 жыл бұрын
great to hear! you're welcome :)
@astroari
@astroari 4 жыл бұрын
You have explained it nicely. Thanks
@robinmclachlan
@robinmclachlan 2 жыл бұрын
Thanks! Super simple explanation.
@wasafisafi612
@wasafisafi612 2 жыл бұрын
Thank you so much for wonderful videos. You made my day
@CanDoSo_org
@CanDoSo_org 2 жыл бұрын
Thank you so much! This is what I am looking for.
@houdahosna5573
@houdahosna5573 9 жыл бұрын
wooow thank you !!! please never stop doing vedios
@marinstatlectures
@marinstatlectures 9 жыл бұрын
you're welcome +houda hosna ! we plan to keep making videos for a loooong time ;)
@noahrubin375
@noahrubin375 3 жыл бұрын
I like the point you made about using this to help adhere to the linearity assumption
@MrDamon76
@MrDamon76 9 жыл бұрын
I love your video. It is so helpful to understand statistical concepts with using R
@marinstatlectures
@marinstatlectures 9 жыл бұрын
Thanks +MrDamon76 , we appreciate you saying that!
@nadjahorn3767
@nadjahorn3767 7 жыл бұрын
Thanks a lot, you saved my day!
@faritzy
@faritzy 4 жыл бұрын
Thank you so much! This is really useful.
@AnirudhJas
@AnirudhJas 7 жыл бұрын
Very useful video! Thanks a lot!
@marcoventura9451
@marcoventura9451 4 жыл бұрын
CAT function explained well! :-)
@nez01
@nez01 2 жыл бұрын
this finally makes sense! omg you have no idea how annoying it was to find a simple explanation like this about the cut function and how to use it! P.S. If you don't mind, could you help me out on another thing: I'm trying to figure out what the vif function in R is not recognized? I have downloaded the correct package (Google said it would be the car package) but R still honest recognize it...
@marinstatlectures
@marinstatlectures 2 жыл бұрын
Have you loaded the he library? After installing it you have to enter >library(car) to load the car package in R. You have to call on the library for any new R session where you want to use a package. You only need to do this once per session
@pankajkhajouria1810
@pankajkhajouria1810 6 жыл бұрын
If someone knows how to create magic with R, it's you my guru.
@marinstatlectures
@marinstatlectures 9 жыл бұрын
Hi +Villy C , you have your settings set so that i can't reply to your comment, so i hope you see my reply here. you can not convert a categorical variable into a numeric one. for example, if you have age categories of 0-10, 10-20, 20-30, and so on, you can't get back to the numerical ages, as you only have the category they are in (i.e.) you don't know the actual age of someone in age category 20-30, for instance. you can assign the mid-points of the interval as their age, but generally speaking, this is not a good idea.
@kmahim82
@kmahim82 5 жыл бұрын
i have a count dependent variable which i want to convert to equidistant ordered categories… so that i can perform ordered logit regression…. can u suggest r syntax for the same?
@sreegeetha7683
@sreegeetha7683 7 жыл бұрын
Hi, I am a beginner to R and stats . Your videos are very nice and helpful for the people like me. Can I follow the same command if I have to create categorical variable from a Integer.
@akjosue9891
@akjosue9891 7 жыл бұрын
so useful thanks
@bananna445
@bananna445 5 жыл бұрын
Hi Marin! Thank you so much, these have been very helpful. One question: when you are typing a new command, I notice you are able to quickly scan between commands you ran previously and select one. How do you do this?!
@marinstatlectures
@marinstatlectures 4 жыл бұрын
Thanks! You can use the arrow up and arrow down keys to move between previously entered commands
@virginia6622
@virginia6622 9 жыл бұрын
Thank you for these explanations, that are very useful!!! Excuse, I have a question: How do the inverse process... it means... to convert a categorical variable to numerical variable?
@TheBatanoni
@TheBatanoni 5 жыл бұрын
Stat wise no you can not
@shrinivasdharma6377
@shrinivasdharma6377 6 жыл бұрын
Thanks for this lucid explanation. One question. How do we find out what intervals R has decided if we only give a command break=4
@marinstatlectures
@marinstatlectures 6 жыл бұрын
Hi, you can use *hist(x)$breaks* to have R return to you the "break-points" that were used. you can also use *attributes(hist(x))* to see all of the "attributes" that can be extracted from the histogram. this command is quite useful, as it allows you to see what is stored in certain objects, and the names that can be used to extract them. (e.g.) when fitting a linear regression model: *model
@raghavendras5331
@raghavendras5331 6 жыл бұрын
Great video, thank you. Can you please clarify how to bin height(0-20 and 50-60) in "A"-group, (30-40 and 61-70) in "B"-group and rest in "C"-group . I am using ifesle statement which is more verbose. thank you
@marinstatlectures
@marinstatlectures 6 жыл бұрын
Yeah, the ifelse is more flexible. The vertical line “|” (not an i or L, but the special key that is a vertical line) represents “or” in R. So you want an if statement with the first condition “or” the second condition. If (condition1 | condition2){ } Worth noting is that “&” is used for (condition1 & condition2)
@saniasuneeth9
@saniasuneeth9 2 ай бұрын
hi can we use the factor function, instead of cut here
@imrulhasan8719
@imrulhasan8719 9 жыл бұрын
Thanks for your video..Is there any way to perform supervised binning (Entropy based binning) in R?
@2012Rtist
@2012Rtist 8 жыл бұрын
Hi, how I can deal with letter grades (A, B+,B,C..) ? if I want to calculate (mean ) and find the correlation between grades and other variables, ?/
@carlajorge6874
@carlajorge6874 6 жыл бұрын
Hello! What if you would like to create a new variable based on the values of multiple variables within the same dataset? For example you want to create a variable named "status". The value will be based on the variables: alpha, a continuous variable beta, a continuous variable charlie, a categorical variable (0,1) status will be categorized as "A" if alpha
@28SlatesCo
@28SlatesCo 8 жыл бұрын
This tutorials have helped me greatly understanding R. I actually stumbled upon the reverse problem. I have a .csv source that R reads erroneously. The dataset has two variables, both numerical in the spreadsheet. But when I load it up in R, it recognizes one variable as numerical but treats the second variable as a factor. I haven't found a good method to transform the factor into numerical that worked for me. I should add that that spreadsheet is straight out of Google Analytics (maybe there's something wrong in the encoding).. I was wondering if you stumbled upon the same problem. Thanks
@marinstatlectures
@marinstatlectures 8 жыл бұрын
Hi +w x , i haven't worked with Google Analytics data before, so i can't comment on their formatting, etc...but it sounds like there may be a character in one of the values, or maybe a white space. basically, when R sees only numbers (even if they are coding for categories) it will read that in as numeric (or integer) and when it sees characters/letters it will usually read that in as a factor (or maybe a string). since R is taking this numeric variable and reading it in as a factor, it seems likely that there is a character somewhere in there, or maybe some white space after a value (for example "1234 ") with a space after the last digit, and this may be causing R to see it as a factor. if there were only numbers in there, you could use *variable
@28SlatesCo
@28SlatesCo 8 жыл бұрын
+MarinStatsLectures thanks a lot for the suggestions,, yea there was something wrong in the encoding of the dataset from GA
@marinstatlectures
@marinstatlectures 8 жыл бұрын
good to hear it's all sorted out w x !
@aesthetic_being-m
@aesthetic_being-m 3 жыл бұрын
Very nice explanation.. I have a column of INJ_BODY_PART, it consists of data like KNEE/PATELLA, FINGER(S)/THUMB, BACK (MUSCLES/SPINE/S-CORD/TAILBONE), MULTIPLE PARTS (MORE THAN ONE MAJOR), LOWERLEG/TIBIA/FIBULA, WRIST, CHEST (RIBS/BREAST BONE/CHEST ORGNS) and so on.. kindly help me with line of code.. how can I categorise them ..
@shubhamshukla5093
@shubhamshukla5093 5 жыл бұрын
in my rstudio, After giving the labels although, R takes default labels... how to solve this?
@marinstatlectures
@marinstatlectures 5 жыл бұрын
hi, if you can provide more detail i may be able to help. some sample code or a reproducible example would help me understand what the exact issue is.
@alfheidurstella
@alfheidurstella 5 жыл бұрын
Is it possible to make an interval that contains both the highest and lowest number from the set of values? For example, I am working with months and I need my intervals to be 12-2, 3-5, 6-9, 9-11, so one interval is from 12-2.
@davidricardoguzmanmora8198
@davidricardoguzmanmora8198 6 жыл бұрын
How to define óptimal break points?
@hengkihengki417
@hengkihengki417 4 жыл бұрын
hi, i want to know the way, if I want to convert numeric data (total score of the respondent, knowledge variabel) into categoric (Low and high)
@vinnyloid
@vinnyloid 8 жыл бұрын
Nice video just want to know what is the difference between cut and discretize? Though i find cut simple and easy to understand
@marinstatlectures
@marinstatlectures 8 жыл бұрын
Hi +vinnyloid , they both accomplish the same thing, and aren't very different. the "discretize" function has a few more/different arguments, but for the most part they are 2 slightly different ways of accomplishing the same goal
@vinnyloid
@vinnyloid 8 жыл бұрын
Nice thanks
@TheCooPeer
@TheCooPeer 6 жыл бұрын
How do I change change Strings to Categorical Variables?
@marinstatlectures
@marinstatlectures 6 жыл бұрын
you can use *variable
@aesthetic_being-m
@aesthetic_being-m 3 жыл бұрын
Please tell me hot to change timings to categorical in R.. i want to categorisetime into early morning, morning, mid-day, afternoon, evening please give me a line of code for it
@droostale2520
@droostale2520 5 жыл бұрын
I want to create a function that uses the cut function to spit out a phrase that depends on the output of the lubridate::now(), how should I implement the cut function for this?
@droostale2520
@droostale2520 5 жыл бұрын
I actually got it to work
@manjushapatildesai7267
@manjushapatildesai7267 3 жыл бұрын
Great , I have in my research age as 13,14,15,16 and years as 1,2,3,4 how i convert that to categorical it would help me a lot
@forambarot2317
@forambarot2317 3 жыл бұрын
My linear assumption wasnt valid so i converted intocategorical. Now what can I do ahead?
@TheCooPeer
@TheCooPeer 5 жыл бұрын
How would you procede if I had intervals from lets say 100-999, 1500-1799, 2000-3999 and so on. Would my break command look like this: breaks=c(100,999,1500,2000,4000)?
@marinstatlectures
@marinstatlectures 5 жыл бұрын
that would mostly do it...the command would create categories of: 0-100, 100-999, 999-1500, 1500-2000, 2000-4000, 4000+
@sfundomabaso3200
@sfundomabaso3200 3 жыл бұрын
How do you specify a label for 0
@sfundomabaso3200
@sfundomabaso3200 3 жыл бұрын
This is incomplete. So you have 7 categories (including 0 and 100) and 6 labels. This is causing a problem on my end since I have a label for 0 and R gives an error that the two vectors are not equal (lengths of 'breaks' and 'labels' differ)
@motylanoga5705
@motylanoga5705 8 жыл бұрын
Seems that R uses [a,b) (4:32) intervals, not (a,b] (2:00)
@marinstatlectures
@marinstatlectures 8 жыл бұрын
R uses (a,b] by default (as shown at 2:00), and you can change to use [a,b) by setting the argument *right=FALSE* (as shown at 2:57)
@motylanoga5705
@motylanoga5705 8 жыл бұрын
Thank you, I didn't notice it :) Btw I've already watched all your tutorials, so thank you for all once again!
@marinstatlectures
@marinstatlectures 8 жыл бұрын
good to hear dupa blada !
@yousif_alyousifi
@yousif_alyousifi 3 жыл бұрын
How to handle the missing values and then Change Categorical Variable to Numeric in R?
@MarcosRodriguez-xz3nt
@MarcosRodriguez-xz3nt 5 жыл бұрын
I am trying to change 1 to "female" and 2 to "male" in my gender column. How do I do that?
@marinstatlectures
@marinstatlectures 5 жыл бұрын
there are a few ways. before importing, you can highlight columns and use a find/replace....find "1" and replace with "female". once the data is in R, you can use the "dplyr" package and make use of the "rename" command to rename levels of a factor.
Dummy Variables or Indicator Variables in R | R Tutorial 5.5 | MarinStatsLectures
6:41
MarinStatsLectures-R Programming & Statistics
Рет қаралды 218 М.
Multiple Linear Regression with Interaction in R | R Tutorial 5.9 | MarinStatsLectures
7:16
MarinStatsLectures-R Programming & Statistics
Рет қаралды 155 М.
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
СИНИЙ ИНЕЙ УЖЕ ВЫШЕЛ!❄️
01:01
DO$HIK
Рет қаралды 3,3 МЛН
Explore your data using R programming
25:39
R Programming 101
Рет қаралды 116 М.
Describe and Summarise your data
19:44
R Programming 101
Рет қаралды 59 М.
Correlations and Covariance in R with Example  | R Tutorial 4.12 | MarinStatsLectures
6:36
MarinStatsLectures-R Programming & Statistics
Рет қаралды 209 М.
ROC and AUC in R
15:13
StatQuest with Josh Starmer
Рет қаралды 275 М.
use ifelse() function in R to create dummy variables and categorical variables
10:19