How to Identify and Treat Outliers in Stata | Stata Tutorial

  Рет қаралды 35,698

The Data Hall

The Data Hall

Күн бұрын

In this video we explain the methods of identifying multivariate outliers in Stata. The different methods covered ranges from simple sorting of the variable, using extremes (SSC) command in Stata, box plot, histogram, spike plot and score.
Download exercise files:
payhip.com/b/ihtfj
Download the code (do and data file) from following link:
thedatahall.com/how-to-identi...
We also cover different methods of treating outliers i.e. winsorization and trimming outlier. How to winsorize or trim variable in stata. For both the purposes we used the winsor2 command in stata. this command is used to winsorize and trim data at a specified percentile.
How to Create A Histogram in Stata: thedatahall.com/how-to-create...
Introduction to Stata Interface: thedatahall.com/introduction-...
How to write stata command: • How to Write Stata Com...
How to unzip files in stata: • How to Unzip Files in ...
How to Add comments in stata: • How to Add Comments in...
Best 10 Introductory Econometrics Books
thedatahall.com/best-10-intro...
Regression Models for Categorical Dependent Variables Using Stata, by J. Scott Long and Jeremy Freese
amzn.to/3NDoFtC
STATA Guide for Introductory Econometrics for Finance by Chris Brooks
amzn.to/3LzONmB
Introduction to Time Series Using Stata, by Sean Becketti
amzn.to/3AT064f
Discovering Structural Equation Modeling Using Stata, by Alan C. Acock
amzn.to/3LW8FSr
Website: thedatahall.com
Disclaimer: Some links are affiliate links that help the channel at no cost to you.

Пікірлер: 57
@Mat-mt8pk
@Mat-mt8pk 3 жыл бұрын
Methods of finding outliers 1:14 #1. Sorting 2:52 #2. Box Plot 6:04 #3. Extremes 10:05 #4. Histogram 10:50 #5. Spike Plot 11:42 #6. Zscore Treatment 13:07 #1. Keep outliers 13:42 #2. Correct error 14:23 #3. Winsorization 19:06 #4. Trimming
@thedatahall
@thedatahall 3 жыл бұрын
Thanks for the efforts
@gonout8402
@gonout8402 2 жыл бұрын
You have explained everything that my professor taught me in 2 months in just 20 minutes and it's is much more understandable and useful. Thank you very much
@thedatahall
@thedatahall 2 жыл бұрын
😄
@rouniktalukdar872
@rouniktalukdar872 2 жыл бұрын
Amongst the nicest video lecture that I have come across on this topic.. Thanks a lot. please keep uploading more contents on STATA.
@thedatahall
@thedatahall 2 жыл бұрын
Thanks for the appreciation
@wilsonahinful5127
@wilsonahinful5127 2 жыл бұрын
This is all that I have been looking for, thanks very much indeed
@addisugetahun1441
@addisugetahun1441 2 жыл бұрын
Thank you for your nice and clear lecture in identifying and treating outliers.
@jemalhassen2841
@jemalhassen2841 3 күн бұрын
It a very helpful video. Thank you!
@thedatahall
@thedatahall 3 күн бұрын
Thanks. Keep sharing
@alphadie2012
@alphadie2012 2 жыл бұрын
Clear and concise explanation. Thank you
@yilebesaddisu5314
@yilebesaddisu5314 3 жыл бұрын
Thank you dear, very helpful!!
@jibrilyero2263
@jibrilyero2263 5 ай бұрын
Great job 🎉
@tomaxow
@tomaxow 2 жыл бұрын
Really well done and explained
@thedatahall
@thedatahall 2 жыл бұрын
Thanks
@korneliuslanggason5477
@korneliuslanggason5477 3 жыл бұрын
thank you for the explanation.
@thedatahall
@thedatahall 3 жыл бұрын
Thanks
@danishjunaid1659
@danishjunaid1659 2 жыл бұрын
Very well explained
@thedatahall
@thedatahall 2 жыл бұрын
Thanks
@isaacasante4060
@isaacasante4060 Жыл бұрын
Awesome video. Could you please do a similar one using panel data.
@thedatahall
@thedatahall Жыл бұрын
Sure will make a video on that
@lottet1945
@lottet1945 3 жыл бұрын
Thank you for this clear explanation! Do you have a video on Cook's distance and Mahalanobis distance in Stata by any chance?
@thedatahall
@thedatahall 3 жыл бұрын
Thanks for watching the video. Unfortunately i currently dont have video on this. I will see if in future i might add this. But if u r interested in spss then there are videos on KZbin
@shafiqullahyousafzai15
@shafiqullahyousafzai15 3 жыл бұрын
Thanks from Afghanistan
@badiahahmed2085
@badiahahmed2085 3 жыл бұрын
Thank you for your great video. I have a question please, After using the Winsorization, can I take the logarithm for some variables? Thank you.
@thedatahall
@thedatahall 3 жыл бұрын
Yes you can take log after winsorization. But be advised that after taking log the interpretation of coefficient changes to percent change. I am soon going to make a video on functional forms, so if u dont have the idea on interpretation after taking log then that video will help.
@badiahahmed2085
@badiahahmed2085 3 жыл бұрын
@@thedatahall Thank you for your response, that will be great. MANY THANKS
@AhaNYS
@AhaNYS 3 жыл бұрын
Thank you for the video! I have a question, I want to use ssc extremes among subcategories. How can I apply this extremes for every subcategory??
@thedatahall
@thedatahall 3 жыл бұрын
U can try bys category: extremes etc etc
@aibannongspung1765
@aibannongspung1765 2 жыл бұрын
Thank you so much for this insightful video !! Suppose I want to trim the top and bottom 0.1 % of the distribution .How do I write the command ?
@thedatahall
@thedatahall 2 жыл бұрын
I have never tried with decimals but the command will look like winsor2 variablename, trim cut(0.1 99.9)
@thedatahall
@thedatahall 2 жыл бұрын
Let me know if it works
@shrinjoy1234
@shrinjoy1234 3 жыл бұрын
How do we use winsor command if we want to replace outliers with Q3+1.5 IQR Can we use winsor command to handle outliers of multiple columns in one go? Please advise.
@thedatahall
@thedatahall 3 жыл бұрын
it is not possible using winsor or winsor2 command. you will have to write code for it. one way is to create a variable that will store the value of Q3+1.5iqr and then u can use that to replace in your main variable
@atiyaabdulkarim716
@atiyaabdulkarim716 3 жыл бұрын
A quick question, if we use sort function, will it allign all other observations in other variables? For eg. If we Sort by price, but we have other variables on age education and i.d. No. So after sorting by price, would it keep track of age and education with respect to i.d. after sorting or only one variable would be sorted not others, this can create problems, No?
@thedatahall
@thedatahall 3 жыл бұрын
In stata the sort comment will keep tract of all variables and sort them simultaneously. The whole row will move and not the specific column of price.
@thedatahall
@thedatahall 3 жыл бұрын
Sort only sorts in accending order, there is another command gsort -price so now it sort in descending
@user-zn4rv5wb3k
@user-zn4rv5wb3k 6 ай бұрын
Hi, hope you are doing great. Can you share the link of multivariate outliers, I am not able to find it?
@thedatahall
@thedatahall 6 ай бұрын
Thanks for your kind words. Unfortunately we haven't made any video on multivariate outliers. I will add that in my todo list
@user-zn4rv5wb3k
@user-zn4rv5wb3k 6 ай бұрын
It would be highly appreciated.@@thedatahall
@tranglephuong1896
@tranglephuong1896 Жыл бұрын
Can you give me the dataset you run in video?
@thedatahall
@thedatahall Жыл бұрын
unfortunately i have misplaced the data and do file for this specific video.
@alfinasintiya7477
@alfinasintiya7477 2 жыл бұрын
saya tidak dapat menggunkan "extremes" adakah solusinya?
@thedatahall
@thedatahall 2 жыл бұрын
i just used extreme, its working fine with me what error u are getting? saya hanya menggunakan "extremes", berfungsi dengan baik dengan apa ralat yang anda dapat?
@atiyaabdulkarim716
@atiyaabdulkarim716 3 жыл бұрын
Can you tell us/take us through calculator functions in stata (syntax for exponent and complex function)
@thedatahall
@thedatahall 3 жыл бұрын
Sure, u want me to make a video on arithmetic etc functions in stata?
@atiyaabdulkarim716
@atiyaabdulkarim716 3 жыл бұрын
@@thedatahallthank you for getting back to me. I am a medical student and i have to use calculate function in stata to generate a new variable. My problem is that some components are used in exponent form, if you look at MDRD equation to define chronic kidney disease or CKD EPI equation, you will see serum creatinine levels, age are entered in the formula. My specific question is if i want to use this information from some variables in my data set, how can i do this. I tried exponent function but my calculations appear to be incorrect and it seems i am not following the right steps. I would highly appreciate if you could make a video or may be if you can give me a feedback.
@thedatahall
@thedatahall 3 жыл бұрын
What command did u used, if u used exp() function then thats to invert log... If u email me the equation at info@thedatahall.com and might be some sample data or the command u have used i will look into it. If u wanted to take power e.g. square of a number then u do gen newvariable=oldvariable^2
@thedatahall
@thedatahall 3 жыл бұрын
I searched for mdrd equation but i am not sure i found the right one
@atiyaabdulkarim716
@atiyaabdulkarim716 3 жыл бұрын
@@thedatahall thank you for getting back to me, here is the link: patient.info/doctor/estimated-glomerular-filtration-rate-gfr-calculator Normal creatinine values range between 0.6 to 1.2 mg/dl...so one can use values at higher end or perhaps old age and see what is the filteration rate....
Outlier detection using STATA
13:15
KnowHow
Рет қаралды 8 М.
🚨 YOU'RE VISUALIZING YOUR DATA WRONG. And Here's Why...
17:11
Adam Finer - Learn BI Online
Рет қаралды 34 М.
Llegó al techo 😱
00:37
Juan De Dios Pantoja
Рет қаралды 60 МЛН
Schoolboy - Часть 2
00:12
⚡️КАН АНДРЕЙ⚡️
Рет қаралды 7 МЛН
Как бесплатно замутить iphone 15 pro max
00:59
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 8 МЛН
Stata Tutorial: Intro Data Cleaning with Panel Data
22:07
Mike Jonas Econometrics
Рет қаралды 40 М.
Statistics Made Easy 3.1: Data Preparation in Stata
34:47
Statistics Made Easy
Рет қаралды 415
Stata - Keep/Drop and Missing values
5:13
Steffen's Classroom
Рет қаралды 27 М.
Categorical Variables in Stata
7:56
SebastianWaiEcon
Рет қаралды 136 М.
Tobit and Heckman models in Stata
36:26
Mike Jonas Econometrics
Рет қаралды 32 М.
Testing assumptions for linear regression using Stata
15:05
Generate time series variable Stata
7:53
JDEConomics
Рет қаралды 36 М.
Llegó al techo 😱
00:37
Juan De Dios Pantoja
Рет қаралды 60 МЛН