Hello .Thank you for such amazing lectures .Could you solve my doubt about why we didnt standardized data set in this dataset using Standard Scaler?
@SebastianRaschka6 ай бұрын
That's a good question. That's because decision trees split always one feature at a time and don't consider feature combinations at a given split. So, in that case the decision tree splitting does not require feature scaling. It's been some time since I recorded these, but I may have explained it in one of the earlier videos.
@jagritsinghal59395 ай бұрын
Thank you for solving doubt. I think I need to rewatch scaling video.
@FluteStream3 жыл бұрын
Hey, thanks for the lectures. Could you also please post the homework and the solutions as well.
@SebastianRaschka3 жыл бұрын
Sorry, I don't want to post them online because I may use them in some form in future semesters.
@FluteStream3 жыл бұрын
@@SebastianRaschka No probs. Can you recommend a book that goes through the implementation of the mentioned ML algorithms in Python from scratch.
@SebastianRaschka3 жыл бұрын
@@FluteStream Sorry, I am not aware of such book :(
@hikmetsezen1553 жыл бұрын
Hi Sebastian, with export_graphviz and plot_tree could we only make tree graph of fitted dataset, or is there way to make tree graph of predicted dataset from trained model? Thanks!
@SebastianRaschka3 жыл бұрын
Hi Hikmet. As far as I know that's not possible with the provided utility functions. Those viz. functions take the fitted tree and show the number of training examples at each leaf node. For an independent test set / prediction set, the structure of the tree would still be the same, but my guess is that you are interested in showing where (i.e., which leaf nodes) these predicted instanced end up. As far as I know it's not possible to that easily with the given API.
@hikmetsezen1553 жыл бұрын
@@SebastianRaschka Thank you for the answer. It is interesting that it is not possible with at least an easy trick. I believe it should be import a full visualization of performance of trained model with unseen test data to see which splitting(s) is contributing possible errors and information about gini&entropy values at nodes, etc.
@SebastianRaschka3 жыл бұрын
@@hikmetsezen155 The underlying tree can be accessed via the .tree_ attribute. I think you could probably override some values there (e.g., check help(tree.tree_), where tree is your fitted tree) and replot it. I think it can be a bit tedious though and would require some tinkering to get the test set examples into that tree. Maybe someone has already done it, and maybe there is a solution that can be found somewhere on GitHub/the internet
@Moviesslider3 жыл бұрын
Hello Sebastian, could you please let me know how we can find the right algorithm to use for the given problem. If possible please make a video on the same. Waiting for your response.
@SebastianRaschka3 жыл бұрын
It really depends on the situation and trade-offs. (1) Some algorithms perform better on certain datasets than others, (2) some models are faster than others, and (3) some are more interpretable than others. For (1), I covered some algorithm and model evaluation techniques in later videos and in arxiv.org/abs/1811.12808. For (2) I recommend Christoph Molnar's Interpretable ML book: christophm.github.io/interpretable-ml-book/. For (3), it depends on the algorithm but also on the library and implementation. For discussion of this topic and GPU acceleration, we recently wrote an article here: www.mdpi.com/2078-2489/11/4/193