Linear Regression vs Decision Trees

  Рет қаралды 4,245

Dimitri Bianco

Dimitri Bianco

Күн бұрын

Пікірлер: 30
@chymoney1
@chymoney1 2 жыл бұрын
This was great Dimitri you should do more technical stuff
@philipnye5947
@philipnye5947 2 жыл бұрын
Great video! I’d love to see more content discussing model selection.
@DimitriBianco
@DimitriBianco 2 жыл бұрын
I'll see if I can create a few. Videos like this one often come from questions someone asked me.
@willgriffin9793
@willgriffin9793 2 жыл бұрын
Great video, hope we see more model comparisons in the future.
@fahmyboy1
@fahmyboy1 2 жыл бұрын
LOVED this video I’m all for anything related to practical modeling
@manuelangelsuarezalvarez3355
@manuelangelsuarezalvarez3355 2 жыл бұрын
Love these more practical videos. Thank you really much for sharing this! Greetings from Spain.
@Apuryo
@Apuryo Жыл бұрын
I am new to this, but I have a question. Shouldn't the x^2 be a parabola? why does the plot look like x=y^2? isn't the plot showing y=x^[1/2]?
@georgez.7278
@georgez.7278 2 жыл бұрын
very nice and helpful video, thank you
@daanialahmad1759
@daanialahmad1759 2 жыл бұрын
Great video Dimitri
@DimitriBianco
@DimitriBianco 2 жыл бұрын
Thanks!
@andresrossi9
@andresrossi9 2 жыл бұрын
Ok I'm going to trade off some precious sleep to see this❤️
@charlesmcdowell9436
@charlesmcdowell9436 2 жыл бұрын
Great video.
@DimitriBianco
@DimitriBianco 2 жыл бұрын
Thanks!
@rosaevee274
@rosaevee274 2 жыл бұрын
I would be interested to see a comparison between trees and polynomial regression. Or between trees and lasso/ridge.
@desaint9469
@desaint9469 2 жыл бұрын
Great Video, sir !!
@Isaiah_McIntosh
@Isaiah_McIntosh 2 жыл бұрын
Not on the topic of the video at all but if I am making any methodology mistakes any advice would be appreciated, or if it's possible to treat with the serial correlation and instability at the end without changing the variables. I'm taking my first econometrics class which is currently working on OLS but my International trade lecturer wants me to examine the effects of competitive export threat from china, balassa Samuelson, reer and natural resource rents on domestic manufacturing sector. Trying to determine the impact of export threat (china) on our domestic manufacturing sector, the controls being based in productivity ratio (balassa Samuelson), and natural resource reliance/ dutch disease (oil and gas revenues). So below is my reasoning so far......honestly I'm out of ideas besides changing the variables. Given that manufacturing value added (as a percentage of GDP) depends on lags of itself and other variables in the model we’re looking towards Autoregressive models: VAR, VECM, ARDL or ECM. First when testing all variables for stationarity using the Kwiatkowski-Phillips-Schmidt-Shin test we found that there was a mix of I(0) and I(1) data, which encouraged the use of an ARDL model. KPSS was referenced instead of the standard ADF or PPP tests due to clearly incorrect stationarity claims for the Balassa Samuelson data in the standard methods, which were absent in the KPSS test. Data was previously logged to remove an I(2) stationarity result. We then checked the lag order selection criteria. All criteria recommended a lag length of 1 for the model. A long run form and bound test was then estimated using log (manufacturing value added) as the dependent variable, this confirmed no cointegration as the F test stat was less than the I(1) bound. An ARDL model was then estimated at lag length 1, using manufacturing value added as the dependent variable. Tests were then performed for serial correlation and stability. We found serial correlation and apparent evidence of a structural break. I thought it could be possible to treat with serial correlation problems by increasing the lag length to 2 but I don't have a thetorical basis for that change so my tutor denied it as an option. I'm completely out of idea short of returning to variable selection, but I am hoping it's just a model specification error that I can fix. Variables currently being used are log of Manu value added %GDP = f(log(real effective exchange rate), log(balassa Samuelson), log(Static index of competitive threat with respect to manufacturing), log( oil and natural gas revenues))
@DimitriBianco
@DimitriBianco 2 жыл бұрын
With any model, the residuals will tell you what's wrong. Have you ran an ACF and PACF test on the residuals of your current model? This will tell you if there are serial correlation issues not correctly addressed. I'm certain you'll have serial correlation remaining. Are you specifying any AR terms? If so try using a seasonal lag meaning just that lag. For example Yt-2 instead of lags 1 and 2 for an AR(2). Also plot the cross correlation between the dependent and independent variables. Sometimes you need to lag your independent variables for them to have significance.
@Isaiah_McIntosh
@Isaiah_McIntosh 2 жыл бұрын
@@DimitriBianco I managed to get a serviceable model. I needed to drop the balassa Samuelson exchange rate variable, this was the I(0) so I was able to run a VAR after removing it, and the real effective exchange rate, it was very correlated with oil and natural gas revenues which was leading to multicollinearity issues, as well as serial correlation on its own. I can capture the effect I wanted to investigate from those variables more readily in a separate model. I also had to replace the static index of competitive threat with the dynamic index of competitive threat. I box-cox transformed my remaining variables, rather than blindly logging as I was doing before, to improve normality of residuals. Between the transformation and variable reselection I manged to get a model with normal residuals according to Jacques Bera, no serial correlation according to LM/Breusch Godfrey test, AR roots all within the unit circle so no stability/structural break issues. I really hope I didn't botch this and will have to start over again. Real life data is a headache.
@umanggarg970
@umanggarg970 2 жыл бұрын
Really, statistics videos will help a lot!
@mimi-kv6qu
@mimi-kv6qu 2 жыл бұрын
Great content as always. Do you think it is a good idea to conduct a long video about how to conduct Algorithmic trading which cover the programming , the data processing, the trading logic and the backtest. I think it would be very valuable in term of demonstrate the bigger picture that the work related to Quant Finance!
@DimitriBianco
@DimitriBianco 2 жыл бұрын
It is a hard area for me to cover especially because I'm not involved in it. To do it right you also need a lot of data which others don't have.
@mimi-kv6qu
@mimi-kv6qu 2 жыл бұрын
​@@DimitriBianco I thought quant understand trading so that they can build a trading strategies by math model. Is there a misunderstanding for me about quant job? Thank you.
@DimitriBianco
@DimitriBianco 2 жыл бұрын
@@mimi-kv6qu quants build models for finance as a whole industry. For example, some price stocks (trading) and others price loans (banking). We also build a wide range of models to solve problems like portfolio optimization, detect fraud in credit card transactions, model volatility, determine how often to call customers, predict counter party risk, or predict GDP.
@DimitriBianco
@DimitriBianco 2 жыл бұрын
For me I just like building models to solve interesting problems.
@daanialahmad1759
@daanialahmad1759 2 жыл бұрын
Dimitri if nodes are large Decision Trees can cause overfitting
@DimitriBianco
@DimitriBianco 2 жыл бұрын
This is true.
@sentralorigin
@sentralorigin 2 жыл бұрын
why are decision trees specifically mentioned here as opposed to other techniques such as neural networks, SVMs, Bayes, nearest neighbor, etc.
@DimitriBianco
@DimitriBianco 2 жыл бұрын
Because the video needs to be short and concise. There are many other statistical methods that could have also been used as well.
@sentralorigin
@sentralorigin 2 жыл бұрын
@@DimitriBianco ah ok, i thought there was a specific reason of choice, like some special relationship between linear regressions and decision trees
@vfxvision723
@vfxvision723 Жыл бұрын
Multicollinearity in Decision Trees
24:09
Dimitri Bianco
Рет қаралды 2,3 М.
Why I Didn't Go Into Trading
14:31
Dimitri Bianco
Рет қаралды 29 М.
Непосредственно Каха - бургер
00:27
К-Media
Рет қаралды 3,2 МЛН
Where Does Cross Validation Fail? (K-Fold)
20:29
Dimitri Bianco
Рет қаралды 6 М.
Regression Trees, Clearly Explained!!!
22:33
StatQuest with Josh Starmer
Рет қаралды 685 М.
Statistics 101: Nonlinear Regression, The Piecewise Model
19:58
Brandon Foltz
Рет қаралды 28 М.
Finance vs Quant: Credit Risk Example
21:14
Dimitri Bianco
Рет қаралды 4 М.
The Truth on Quant Salaries - First Job
26:55
Dimitri Bianco
Рет қаралды 50 М.
Decision and Classification Trees, Clearly Explained!!!
18:08
StatQuest with Josh Starmer
Рет қаралды 864 М.
Polynomial Regression in Python
20:18
NeuralNine
Рет қаралды 50 М.
Understanding Generalized Linear Models (Logistic, Poisson, etc.)
20:19
Python Isn't That Great
20:46
Dimitri Bianco
Рет қаралды 4,8 М.