💡 Here's my FREE 7-step guide to help you consistently design great software: arjancodes.com/designguide.
@deez_gainz3 жыл бұрын
I think its the data science, natural science and non IT related engineering people would actually benefit the most from your software design centric videos. I`m one of them and we literally code spaghetti on the daily basis without ever getting taught the SOLID principles =). Thanks and you're making better those that listen!
@althayrL3 жыл бұрын
I'm a professional data scientist and I'm following the channel since the beginning. It was essential to me in learning to be a better software engineer, even if this is not my main job requirement but my every day tool...
@selimrbd3 жыл бұрын
Same here, data scientist greatly benefitting from this channel
@TheMightyOprah3 жыл бұрын
Agreed - working as a data scientist who is proficient in data wrangling, ML, etc., but definitely lacking in solid software development principles, more videos like these would help me a ton!
@ArjanCodes3 жыл бұрын
Thanks! It’s definitely an area I’d like to do more videos on in the future.
@Jordan-bi4tn3 жыл бұрын
Same, very happy to see Arjan covering this topic as it’s what I was looking for few months ago when I first discovered his channel
@ZaneSelvans3 жыл бұрын
Yes PLEASE do more videos like this at the intersection of data science / ETL pipelines and software engineering. It's extremely helpful for those of us who have come into building software from another adjacent field and are now struggling with big messes of our own making :)
@ArjanCodes3 жыл бұрын
Thank you Zane, will do!
@kevon217 Жыл бұрын
i second this request!
@tunapedia3 жыл бұрын
I am a senior data scientist, and I benefit from all your videos. Building architecture, productionizing and scaling up ML models is challenging. It requires good software engineering practices and a good understanding of the full software development stack. Good work as usual Arjan.
@ArjanCodes3 жыл бұрын
Thank you, glad you liked it!
@DanielTobi0011 ай бұрын
Hello Tunapedia, I came across your insightful comments on this video. I'm currently deepening my skills in data science and recently secured second place in an NLP competition on Zindi. I admire your expertise and would appreciate any guidance or insights you can provide on potential job opportunities in the field. Thank you.
@mhmdjouni36692 жыл бұрын
I'm a data scientist and machine learning researcher, and looking into code design and refactoring from your perspective is very helpful for me in terms of coding! Thanks a lot
@joaopedrorocha56932 жыл бұрын
This helper function to compose is a gold nugget . I think it should go into the functools module so we could simply import it. The idea is so intuitive that it wouldn't be a problem if it wasn't explicitly defined on the codebase.
@loumote3 жыл бұрын
The "Unsatisfying cliffhanger" is me realizing I now have to go through a lot of refactoring because I've done this lazy single-variable function chains waaay too much... Great job as always, thank you Arjan !
@anelm.51273 жыл бұрын
Learned the most out of your refactoring videos . Really enjoy them. Especially Solid principled in practice made them super easy to understand.
@ArjanCodes3 жыл бұрын
Great to hear, thanks!
@sai19213 жыл бұрын
I'm a simple man. I see Arjan post, I hit like button. As a DS student, this actually helps a bunch. Thanks brother!
@gregorybutcher26472 жыл бұрын
How on earth does this man not have more subscribers. I mean most people would benefit it's their problem if they don't watch these lmao I'm just glad I'm one of the first to hear his wisdom.
@shopsmartin58513 жыл бұрын
All data science programming I’ve ever seen is usually written for a one-off experiment with very little principles applied, whether SOLID or reproducibility. The code is often not object oriented and is more functional - and written in declarative linear steps in one script. Even this code you are starting with is in better shape. I’ll be watching for sure to see these software development principles applied to that sort of programming style.
@aliwelchoo3 жыл бұрын
As a data scientist that was already watching your content, definitely looking forward to this series!
@ArjanCodes3 жыл бұрын
Thanks!
@alchemication3 жыл бұрын
This is actually what I do at work - working in a Data Science team as a Software Engineer with some prior ML knowledge. I have to tell you that the code you received for refactoring here is actually what I would consider a state of the art design ;- ) No offence to Data Scientists, I totally understand how complex their world is!! Hopefully as the discipline matures a bit more, and sadly more projects fail due to quick & dirty solutions - we will be all in a better place. Thank you for your work.
@ArjanCodes3 жыл бұрын
You're most welcome and I absolutely agree with you - data science is a very complex field and it makes total sense that data science education programs have to spend all their time on data science concepts, leaving little room for software engineering practices!
@sdar19883 жыл бұрын
I always used coding as a tool to test my hypothesis. You videos put perspective into why and how writing code is much more than that. I am not a trained software engineer, but, professionally a data scientist. I feel your videos are really helping me fill glaring gaps in software design process while conceiving my data projects and this is important for the data science community as most are not from the software engineering background. Please make more videos in this series. Godspeed.
@ArjanCodes3 жыл бұрын
Hi Arjun, thank you, I'll definitely continue in this direction. I think there are a lot of things to cover, so stay tuned!
@DrPizza923 жыл бұрын
I’m a JS guy but have learned so much from watching your videos. Thanks!
@1oglop13 жыл бұрын
I love this, this video saves and the comments save me a lot of time returning code reviews to data people over and over! Now I can just send them here to explain what is not spaghetti!
@michaelt69223 жыл бұрын
Thank you for your content Arjan, I have intermediate python skills but have been learning a lot from your refactoring videos. Moving to OOP for my projects has been a steep but rewarding curve. Thanks again!
@anzei3313 жыл бұрын
Great vid, was looking forward to this for a while since you mentioned on Reddit that you had plans to get into ML/DS from software engineering perspective. Much better to refactor a project which is a real world scenario, rather than simple hypothetical examples which are abundant.
@Michallote2 жыл бұрын
Arjan I'm at awe at you ease of reworking things just by looking at them. And it works every time! I just recently followed all your advice in a program I'm developing and it took me a day just to get the thing running again in the new format. We are incredibly lucky to have you teaching us this stuff. Most courses will say over and over the design principles but getting to see them applied so naturally really makes them stick. Thank you so much
@VikasGuptacherie3 жыл бұрын
I really liked this novel method of "Code Refactoring" & "Code-Roast" to look things from software best practices and see how to correct these common mistakes. I would like to see more such video.
@visualapproach71552 жыл бұрын
I love these refactoring series. So informative. Thanks, not only to Arjan, but to the people who submit their code to literally be picked apart and rebuilt.
@niklase59013 жыл бұрын
I am really intrested in design for data science applications. I used to be a programmer, but did other stuff for a lot of years, the reason I am back in programming is data science. But I find there is lack of practises that I am used to from programming applications lacking in the world of data science. So this is a great one!
@AbhirupMishra3 жыл бұрын
I really loved this video. I work in Quantitative Finance, where we have to write a lot of code (usually in a scientific programming language, a.k.a Python), and I've benefited a lot from these videos. A lot of a code that I've encountered is usually a spaghetti code, and just starting to think of solving the problems from good design principles has really helped in increasing the flexibility, maintainability and readability of my code. I always look forward to watching these videos! Hopefully, you'd cover more advanced topics of Python and designing systems in the future.
@ArjanCodes3 жыл бұрын
Thanks, I'll definitely do more videos like this in the future!
@Tobbzn3 жыл бұрын
Some feedback: While seeing your face is always a bright point of any day, I still felt that you would often cut to a fullscreen camera view of yourself while talking about the code you just cut away from, which made it a bit hard to follow the structure of the code. Like, at 3:10 you said "You can see this happening here" during a cut where we literally can't see it happening, which caused a weird disconnect in my brain where I felt like I had to switch gears with each cut, trying to take in as much information as possible before the next cut would interrupt the reading. It's an interesting video, but these cuts made it hard to follow.
@cristopherfreitas7623 жыл бұрын
I totally agree with this.
@ArjanCodes3 жыл бұрын
Yes, I also noticed this a bit too late. Will make sure this is better in the next videos.
@BBB-zy6er3 жыл бұрын
@@ArjanCodes Your other videos, editing-wise, have excellent pace and I don't notice the cuts at all, making it easy to follow along. This one felt like the cat was standing on the "cut" key.
@ArjanCodes3 жыл бұрын
Haha, I did start working with a cat (read: video editor ;) ) since a few weeks. It’s clear we still need to fix a few things in the process, but I’m on it.
@leestoddart70143 жыл бұрын
absolutely - this was really stopping me understand the process. Stay in the small box if you are talking about the specific code
@pawelkubik3 жыл бұрын
It's worth pointing out that those single-variable function calls are often preferred, because network composition is rarely purely sequential. In general, it is a DAG. For experimenting it's important to be able to quickly access intermediate results of the network and a chain of calls make it much easier. In practice it's more important to detect repeatable and meaningful patterns in the network and split them into separate classes, e.g. a network may consist of a sequence of 12 layers, but it could be conceptually easier to view it as a sequence of 4 blocks - 3 layers each. tl;dr - don't refactor out all single-variable function calls right away
@ArjanCodes3 жыл бұрын
Good to know, thanks!
@pawelkubik3 жыл бұрын
In my experience, almost every new ML engineer start the journey from solving a very simple problem like classification and implement kind of a "Trainer" object. There is a lot of inversion of control to adjust certain parts of the experiments. It seems like a stable framework, but collapses pretty quickly when they try to do something more complicated.
@pawelkubik3 жыл бұрын
There are few popular frameworks that approach this a bit more maturely. I think would be interesting to see an analysis and comparison of libraries like Keras, Ignite and Pytorch Lightning from perspective of an experienced programmer. They all invent some kind of callback or hook mechanism to control data loading and model training.
@leif_p3 жыл бұрын
Worth pointing out that both sklearn's Pipeline and torch's Sequential compose _classes_ satisfying certain interfaces and return _classes_ (with possibly different capabilities). Which is a bit more complicated than function composition, but usually necessary in real-world situations where the aggregate process needs more capabilities than just being Callable.
@jessehalliday29483 жыл бұрын
I just love watching you delete lines of code, keep up the great and informative videos
@astronemir3 жыл бұрын
Hi Arjan, I’m an astronomer learning to code more properly, and I work exactly with code like this often. This was so unbelievably helpful. Thank you for starting this series and I’m looking forward to more like it. It’s difficult to prototype things in a Jupyter notebook, get it running, then refactor to something shareable and useable and understandable by others that may need to work with it. You’re teaching me a lot, keep it up!
@joaopedrorocha56932 жыл бұрын
I'm proto astronomer, passing through the same process as you :D
@jeancerrien30163 жыл бұрын
Wonderful video! 🙏 Among many other things, you've shown me three nice ways to compose a sequence of functions: 1) with a torch network 2) with a scikit-learn pipeline 3) with functools.reduce I agree the third is very attractive. Some may find it a bit strange that the order of the functions switches, but that's not a defect in my eye.
@programmertheory2 жыл бұрын
I remember dealing with MNIST data sets in college when I was learning Machine Learning. I was taking an OOP course at the same time and my first ML (Machine Learning) assignment was a single-layered neural network with 10 perceptrons. Even though I went object-oriented with the assignment it took forever to go through the training data and testing data, 12+ hours in total in runtime. It wasn't that accurate either, like 75-80%. However, I redid the assignment, abandoning most, if not all, OOP principles and going towards something more procedural and mathematical (linear algebra to be precise). There was a huge difference in my experience. The code was easier to read, easier to understand, and a lot faster, when going through the training and testing data in less than 1 second and was reaching 92-96% accuracy.
@red_cape.3 жыл бұрын
I'm a newb in python, and being experienced in other languages it is hard to flip the switch to a new one, Arjan videos have beem crucial to my undestanding of the "Pythonic" way. Thanks man! Keep em coming ... I don't know if it is your focus here but would love to see you talk about a project using PyQt5 ;)
@ArjanCodes3 жыл бұрын
Thank you, glad you like the videos and good topic suggestion!
@drhilm3 жыл бұрын
I wish I have seen this video two years ago. I write this kind of project all the time. I learned the hard way to do it like that.
@xxshogunflames3 жыл бұрын
Looking forward to part two! Learned a lot and will be rewatching
@ArjanCodes3 жыл бұрын
Thanks Jonathan, glad you liked it!
@MateuszModrzejewski2 жыл бұрын
Fantastic video, I'm eager to watch the two next parts. From my PhD studies in AI I can tell the majority of research code in ML and AI is terribly written and barely readable, even with published works. The guidelines for clean ML code are just starting to emerge and at times I feel there's even more confusing ML config / scheduling / architecture tools released every day than confusing JS frontend tools (and there's a JS framework released almost every day lol). Good to see plain old good design being used in this context. Content like this is VERY valuable, hope to see more ML refactoring videos! All the best!
@ArjanCodes2 жыл бұрын
Thanks and glad to hear you enjoyed the video! Let me know what you think of the other two. I'll certainly revisit more data science oriented content focused on design. Doing this miniseries was a lot of fun.
@MateuszModrzejewski2 жыл бұрын
@@ArjanCodes so I've already watched the other two and really enjoyed them as well . Very clean, understandable and applicable approach and I think your channel really nicely fills a gap in intermediate to advanced programming topics. I really appreciate the references to Dijkstra, Hoare, SOLID, GRASP etc. - super rare to see that on YT. I've also watched your Hydra video and I really like how it compliments this miniseries - Hydra is getting lots of interest in the community these days. Another tool that's growing in popularity and also could be interesting for you for a future video is PyTorch Lightning - it introduces an opinionated design into PyTorch and also aims to clean up some of the clutter which can be found in 90% of AI code.
@brunosompreee2 жыл бұрын
Thanks! I'm a Data Engineer and this helps a lot!
@ArjanCodes2 жыл бұрын
Thanks so much Bruno, glad it was helpful!
@MCRuCr3 жыл бұрын
You shouldn't make pure data science/machine learning content, because there is already plenty of that. A sort of "Software design for data scientists [Dummies]" could be a great contribution!
@TheMightyOprah3 жыл бұрын
100% agree with a series on Software Design for Data Scientists!
@ArjanCodes3 жыл бұрын
I agree - I also wouldn't feel very comfortable doing pure data science / ML stuff since that's not my main area of expertise. But I'll definitely think more about how design principles and patterns can be used in this setting!
@sergeiparshin94883 жыл бұрын
@@peterdowdy174 Probably Kedro could be useful to combine notebook and code itself. P.S. Kedro - open-source Python framework for creating reproducible, maintainable and modular data science code
@alchemication3 жыл бұрын
@@peterdowdy174 Hey Peter, I have been struggling with this topic for a few years and ended up here: Notebooks are great for local/quick/dirty experiments, but not for a proper/production grade code. For many, many reasons... Once I accepted this - my life is a happier place ;) Greetings and all the best!
@alonyariv89993 жыл бұрын
Yes please, that is such an important content to have
@ShaderKite3 жыл бұрын
I'm loving it! Please continue doing videos like this one :D I'm learning a lot from it - your videos are one of the most valuable/useful ones I've seen for Python or software design in general
@ArjanCodes3 жыл бұрын
Glad to hear it, thank you!
@MichaelTVickers3 жыл бұрын
I’ve been hunting for a nice way to do function composition in standard-library python for awhile and this version with type hints is 👍
@Bakobiibizo Жыл бұрын
maybe not when this came out, but now is a helluva time to start doing data science material
@amir35153 жыл бұрын
Very stimulating and educational video. Love the pace. Thank you.
@AdeelEjaz3 жыл бұрын
Really good video, very well explained, and I can see in comments below you have noted the jump cuts away from code. Really will make the video perfect! Thank you
@garrywreck42913 жыл бұрын
Great video! IMHO, a simple loop over functions list is much easier and readable: x = 12 for func in (add_three, add_three, mul_two, mul_two, ): x = func(x)
@coert3 жыл бұрын
Once again, excellent stuff Arjan. Definitely going to work with the function composition!
@ArjanCodes3 жыл бұрын
Thanks so much Coert! :)
@igordemetriusalencar58613 жыл бұрын
The most important thing I've learned (I'm still learning) is to write good, cleaner, and reproducible data science code was: "Functional programming paradigm". R (with tidyverse, and tidymodel approach), and Julia programming language made me code almost like I was using a "General System Theory" from Bertalanffy, (ins -> transformations -> outs). With this approach, I can change the ins without break all the code, or I can change the functions (transformations, each one with its own rule) without break all code logic. Since I use Python only for NLP tasks I do not use a functional programming paradigm with it, but I know it is possible, maybe easier in Python (function composition was good to know it). The OO paradigm for Data Science that some data scientists use does not make any sense to me, of course, I am not a professional programmer, maybe for not having ground on computer science, I think that way. By the way, I'm learning a lot with you! Thank you very much!!!
@ArjanCodes3 жыл бұрын
Thanks Igor, glad you like the content! Using pure functions is certainly a great starting point. What OO programming brings to the table is that it provides a nice mechanism for structuring data representations via (data)classes and collection objects such as lists, dicts, and so on. Ideally, you'd have a marriage of both that provides a clear structure of the data, and has data manipulation pipelines with very limited coupling and side effects.
@igordemetriusalencar58613 жыл бұрын
@@ArjanCodes Thank you! I will try to apply this approach to my NLP study codes, I know I have a lot to learn to be able to understand OO stuff, classes, dataclasses, but your videos are helping me a lot.
@benjaminthorand9569 Жыл бұрын
PLEASE give us more from just this very content! Awesome videos, going to spread the word! : ]
@ArjanCodes Жыл бұрын
Thanks! Will do!
@iliqnew3 жыл бұрын
Once more. A very useful and nice video! Thank you!
@ArjanCodes3 жыл бұрын
Glad it was helpful!
@kobebyrant94832 жыл бұрын
Function composition is really cool and make the code very concise and clean. However, I feel like we achieve it at the cost of readability of the code and additionally make it hard to debug intermediate calculation/steps if suspect something is wrong(in reality this happens very often when there is too much math involved in the code). Some (picky) managers might not like it during code review/pull request for the reasons stated
@greatfate2 жыл бұрын
Exactly what I was thinking
@vladimirtchuiev2218 Жыл бұрын
This looks more like a deep-learning project than a data-science one (using Torch, Tensorboard to follow the network training, instead of something like Pandas), which is actually exactly what I need right now, I work a lot with Pytorch and Pytorch Lightning and I'm looking to improve my code. The issue that I have with torch.nn.Sequential is that its annoying to debug when you have an error in your network-building lego, but if you sure that the lego is correct it is more clean to use Sequential.
@Astana13373 жыл бұрын
I like to use multiple inheritance for string Enum classes. For example: class MyEnum(str, Enum): RED = 'RED' BLUE = 'BLUE' GREEN = 'GREEN' *Make sure the str comes first. Then you can use the class like normal, MyEnum.RED, and you can also use a string literal. It avoids the need to use the 'name' attribute. Lastly you also get equality if you are comparing the enum to a string literal.
@DistortedV123 жыл бұрын
Okay this video is gonna blow up imo
@justfoundit3 жыл бұрын
Using the Sequential is 1 way, and it works nicely when the model has a linear flow, however if you want to build a model with - for example - 2 outputs that's sitting on different levels of the model you need to use the non-sequential way, and then the X for all intermediate stage starts to make sense :)
@ArjanCodes3 жыл бұрын
In this case I would prefer to have a class for defining an Acyclic Directed Graph. Perhaps PyTorch also has this... I didn't check.
@kevon217 Жыл бұрын
Really cool compose function. Going to use that.
@matthewtaruno3 жыл бұрын
One point to consider from a data scientist: a lot of the times we like quick and dirty iterations to our exploratory and predictive insights. Many times (especially under time constraints) quick and dirty is better than slow and beautiful. That's why I personally love notebooks. As long as it is idempotent (notebook runs from start to end without issues) and the environment is containerized, it is reproducible. But I see the merit for both. There is a lot of power in writing scalable and reusable code in this space to organize to complex pipelines that supercharge society's solutions. This is why, over time, I now have learned to use a hybrid of both - but maybe not in the most optimal or well-principled way. Which leads to my suggestion! Would you be able to make a video on how you would use Jupyter Notebooks/Kaggle Kernel Notebooks/Google Collab Notebooks in tandem with with an internal packaged up repository as you have it in the video for DS projects? Maybe this means just maintaining your currently directory structure as shown in this video but adding a "notebooks" folder to the root folder where all that type of analysis is done since we can call your modules from that notebooks folder (not sure how this would be manifested, you probably have a better idea). You use .py scripts for most things that you can install these scripts as modules for use in other scripts or even notebooks, and that is what I have been doing to keep my notebooks cleaner. But I am sure your perspective on how to have fast iteration times to high value insights, maintain a scalable pipeline, yet keep everything reusable in doing this kind of work - even maybe some sort of generalized approach shown through a video example - would be invaluable. I think this would be a game changer for myself and a lot of people in DS and ML. As for this video, your other content has been useful, but seeing it directly applied to the type of work I do on a regular basis brings your concepts to life for me. Please keep these software design principles applied to DS crossover content coming! Thank you for what you do :)
@ArjanCodes3 жыл бұрын
Thanks and great suggestion regarding the combination of notebooks with running python scripts in a repository. I'll look into it!
@marwensallem13973 жыл бұрын
Nice video 😊 Hope it reaches all my data scientist colleagues. There are many similarities in machine learning projects, this makes me think of why there is no custom Design Patterns for ML projects ?
@ArjanCodes3 жыл бұрын
Thanks! I'll try to come up with a few ideas for this and cover that in future videos.
@sergioquijanorey74263 жыл бұрын
Really nice video. When working with ml / ds problems, I always end up using ugly designs / hacks that makes the job done. An then refactoring is such a pain. Thanks you for this advice :D
@ArjanCodes3 жыл бұрын
Thank you Sergio, glad you liked it!
@BjarneThorsted3 жыл бұрын
Next time, you should definitely do a tensorflow/keras project. Would love to see how you would go about cleaning up the code in a project like that. full disclosure: I've written a very convoluted DL project with tf.keras and I'm 100% positive it can be written better
@ArjanCodes3 жыл бұрын
Great suggestion! Feel free to submit your code as a Code Roast, and I'd be happy to take a look if it's something I can cover on the channel.
@BjarneThorsted3 жыл бұрын
@@ArjanCodes I will try and see if I can package it up in a meaningful way. Right now it is split across two private github repos and trains on a rather large and proprietary image dataset
@nicolabombace20043 жыл бұрын
As always a great video! The only suggestion I would add is maybe to turn off Intellisense for the video, because all the red squiggly lines are a bit overwhelming and actually useless because the code works!
@ArjanCodes3 жыл бұрын
Thanks for the tip! I might do that for future refactorings (at least in the beginning :) ).
@SupernovaGiacomo3 жыл бұрын
Wow thanks Senpai! Will definitely share on my linkedin and with my data engineering team
@ArjanCodes3 жыл бұрын
Thank you, happy you like it!
@TheGagman20003 жыл бұрын
Reiterating the others, very useful video for data scientists! I liked the idea of replacing the nested call with the compose function, but what about an "apply" function instead ? def apply_composition(x, *functions): for func in functions: x = func(x) return x For me, this seems easier to read than the functools solution... and its similar to the idea of a torch.nn.ModuleList container in Pytorch
@jessicameneguel49543 жыл бұрын
This way you are replacing x as f(x) in the same fashion as the original implementation.
@ingovb61552 жыл бұрын
Thanks for making this (and similar) videos. They are very helpful and insightful
@ArjanCodes2 жыл бұрын
Thank you Ingo, glad you liked the video!
@canvasbagfight3 жыл бұрын
I’ve written a lot of spaghetti code to process scientific data. It’s usually so bad that it just stays as a notebook that’s copied over and laboriously edited for each new time I repurpose it. Really think this is useful content. More please.
@iliqnew3 жыл бұрын
Yes please! More of these
@sombrero79353 жыл бұрын
The one issue I have with this design is that is based solely on pytorch, so if you like to go to another framework such as tensorflow, this will require quite a bit of refactoring (without taking into account the new framework coding stuff), thus most likely making breaking changes to consumers that use the project
@ArjanCodes3 жыл бұрын
In general, this is a really hard problem to solve. Especially since most frameworks like Pytorch, TensorFlow, etc. ask you to "marry" the framework and use their data types all over the place, which then makes it hard to replace the framework with something else. I'll look into this and try to come up with some ideas to do a video about this.
@supratikchowdhury21073 жыл бұрын
Yes to more Data Science!
@mhFFFFFF2 жыл бұрын
Maybe already answered, but does Pandas have function composition (aka network or sequential)? IMO this is a huge benefit of using the R tidyverse, the %>% command is called a “pipe” but it seems to work exactly like function composition and is extremely well-supported and flexible.
@doublegdog3 жыл бұрын
Great video. What do you think of folder refactoring? In some repos, I have seen people putting files/classes in a separate folder called "commons" for utility files that are used agnostically across the project. I think this would be a great idea to touch on in a future video. Nonetheless, the best python videos on youtube hands down! Keep up the great content!
@TimGrob3 жыл бұрын
Overwriting the 'forward' function in the Torch Model and updating the state (tensor) of the neural network at each step is actually the recommended way to do it by PyTorch.
@jimogren63063 жыл бұрын
Great video! One thing that I did not quite understand: when you changed the ExperimentTracker from an abstract base class into a protocol then the TensorboardExperiment no longer inherits from ExperimentTracker. I do not see the connection between the two classes anymore. After the refactor, to me ExperimentTracker seems like an unused class. Or am I missing something?
@ArjanCodes3 жыл бұрын
After changing the ExperimentTracker to a Protocol class, the inheritance relationship between it and TensorboardExperiment is indeed gone. However, ExperimentTracker is used in the Runner class where it defines the interface that is expected for connecting the Runner with the experiment tracker. The result is that you can now create other experiment tracking classes that integrate seamlessly with the Runner class, as long as they implement the methods defined in ExperimentTracker.
@gustavojuantorena2 жыл бұрын
Awesome! I think there are few tutorials about software design topics for data science.
@kazmkazm96762 жыл бұрын
Thanks for your great contents. However, I didn't find your custom composition function useful. However, PyTorch's Sequential or Scikit Learn's Pipeline seem more proper.
@esteenbrink2 жыл бұрын
At 14:25 you decide to remove the protocol inheritance, making it implicit. There is no difference to the working of the code, though it does make life harder for anyone needing to change and understand this class, for it is not clear anymore that it should adhere to the protocol.
@songokussj4cz3 жыл бұрын
Hi Arjan. Love your stuff. Would you be able to create comprehensive video about "How to structure bigger project"? I've got task to create PySide2 application with at least 3 windows (Main, Settings, Results) and I'm not sure how to structure it so it's not inside one file because that's just too much of a chaos. How to connect signals to what functions and where to write them, shoul each window (code) be individual file, how to connect everything, how to parse variable from one window to second?
@RichardVodden13 жыл бұрын
Would you ever consider overriding `__str__` on an Enum to return `self.name`? That would avoid having to add `stage.name` in all those f-strings. Feels neat to me from a code repetition perspective, but it does violate the "Explicit is better than Implicit" guidance of the zen of python. I'd be really interesting in your opinion.
@ArjanCodes3 жыл бұрын
Great suggestion, and I think it works really well in this particular case.
@_shikh4r_3 жыл бұрын
I'm taking notes 📝
@felipealvarez19822 жыл бұрын
I would love to know about the vscode keyboard shortcuts you love the most
@hudabdulwahab2499 Жыл бұрын
this video is amazing - can we please get another data science / ml pipeline refactor?
@ilyaster423 жыл бұрын
That's great video! Thank you a lot!
@tehdusto2 жыл бұрын
27:07 yo dog I heard you like lambda functions, so I put a lambda function in your lambda function so you can function while you function. ...but really this function composition business is actually breaking my mind. I'll need to practice this one.
@esteenbrink2 жыл бұрын
Sponsored by 'basically'. Just kidding, great content. Keep it up.
@EW-mb1ih2 жыл бұрын
Except using protocol instead of ABC, your video is nice :) Protocol makes things less clearer. Silly question: why do we need to avoid storing intermediate results in the same variable?
@jakobullmann7586 Жыл бұрын
It’s an interesting video, but I think it’s actually misguided advice for Data Science/ML projects. Data Science projects have a different dynamics from software engineering projects, hence the need for MLOps platforms. Tracking is needed in the experimentation stage, when things change quickly, and writing abstractions to become independent of a particular experiment tracking platform is not creating value for anyone. What’s actually important is that the experimentation code is decoupled from the model code (which is why Tensorflow and LightGBM use callbacks… PyTorch doesn’t, but PyTorch Lightning does, which is why I would always use PyTorch Lightning and not raw PyTorch). Moreover, where I feel abstractions are really powerful is for the model itself, because I’m order to do model selection I may have to apply a fair evaluation to models that utilize different frameworks (e.g. PyTorch vs LightGBM) or even different problem framings. The first point is what MLflow Models tries to accomplish.
@tonyli70143 жыл бұрын
Great topic!
3 жыл бұрын
I loved this video. It was the best momento to apply the design solid principles to data science because I work with it at daily base. Could you apply solid principles to panda's library because this is the most used library for data processing? Again, Thank you very much!!
@atillakoseoglu40892 жыл бұрын
Dear Arjan, I am a 3 months of rookie in python(learned classes , functions basics etc) And interested in data things , not development 🙀 Is it a problem you think? I mean to find a job and career-wise Thank for your kind answers and advices 🙏
@Glitchiz573 жыл бұрын
Great video Thanks ! See you next week
@ArjanCodes3 жыл бұрын
Thanks, glad you liked it!
@rshelansky3 жыл бұрын
Thanks for these videos they have been fun to watch. I see the benefit of function composition, however, In practice (data science) when composing functions I have never not had a whole slew of unique parameters and contexts to pass to each function along the chain. Is there an equally elegant solution to this problem.
@ArjanCodes3 жыл бұрын
Hi Robert, good question. I like using either closures for this or partial functions (from functools). For example with closures, you can define a function (with parameters, contexts, etc) that returns another function and then that's the function that's passed to the composition. In terms of the example in this video at the end, you could do the following, where n is an extra parameter, add_n is a closure that returns a function: def add_n(n: int): def add(x: int): return x + n return add ... compose(add(5), add(12), multiplyByTwo, ...)
@mathmo2 жыл бұрын
@@ArjanCodes Robert, not sure whether @ArjanCodes would approve of this, but you could define a Callable ABC base class for your functions that implements a __rmul__ (or sth like that) method that you implements function composition for the __call__ methods and initialize the instances with whatever parameters you want that are not part of the functional input data. And if you make the __call__ method accept and return a dict you can also compose functions of different arities.
@zeki75402 жыл бұрын
Thanks Arjan!!
@ArjanCodes2 жыл бұрын
You're welcome Zeki, glad you liked the video!
@mnsosa3 жыл бұрын
Where can I learn professional Machine Learning design projects? All I found is Jupyter Notebooks, but I want to do it more professional.
@gercius2 жыл бұрын
You are the Bob Ross of coding
@ArjanCodes2 жыл бұрын
Thanks Gercius, happy you’re enjoying the content!
@some848843 жыл бұрын
Debug of functions composition it's painful. It's much better to have variables with unique names between calls
@vlplbl853 жыл бұрын
Great stuff
@ArjanCodes3 жыл бұрын
Thank you Vladimir!
@smalltimer6663 жыл бұрын
Hi Arjan, I write a lot of models and I wanted to ask if you have tips regarding what I imagine is a very simple issue. Version hell. I write code on multiple machines, using multiple styles: jupyter notebooks, org buffers, and of course scripts. Everything is almost always contained in a pipenv environment. But when I try to pipenv install on different machines I keep getting all sorts of version-related errors. I think I am missing some key insight here. There is no way python has such a sloppy design :D Any tips will be really appreciated!
@cajmrn13 жыл бұрын
DVC, mlflow, and/or kedro. will change your life. they changed mine :).
@christiencodes30863 жыл бұрын
Do you have Kite installed for autocomplete ?
@davidoh63422 жыл бұрын
How do you handle errors if one of the composition function raises error?
@_veikkomies3 жыл бұрын
How can Tensorboard do anything using the experiment tracker class since you removed the inheritance and I can't see how the two classes are linked any more. What's the point of the experiment tracker class now?
@ArjanCodes3 жыл бұрын
That’s the whole idea of protocols. The relationship no longer exists between superclasses and subclasses, but you use protocols to define the interface at the place where it’s needed and Python’s structural typing system then does the type checks. So in this example, the goal of the experiment tracker protocol class is not to act as a superclass, but to act as an interface of the part of the code that uses it, here that’s the main file and the Runner class.
@_veikkomies3 жыл бұрын
@@ArjanCodes Ahh thank you
@ravenecho24103 жыл бұрын
okay catching up on vids 😋
@carlosg15353 жыл бұрын
9:10 Why do you think abstract bases classes should only have abstract methods and not atrributes?
@ArjanCodes3 жыл бұрын
Overall, I find this gives more flexibility and offers a better separation of responsibilities. In this case, there are several responsibilities of the original abstract class: defining what the interface is between the experiment tracking and the rest of the code, keeping track of the experiment stage, and providing helper methods. I prefer to keep the single responsibility of the abstract class to define the interface and then use either inheritance or composition to provide the other features you need. For example here, I moved the set_stage implementation to the Tensorboard experiment tracker. Alternatively, if you want to be able to reuse the basic implementation of handling the experiment stage, you could create a subclass "BasicExperimentTracker" that provides that implementation, and then your more specific experiment trackers could inherit from that class.
@Booyah Жыл бұрын
Why do you switch from showing the code you're discussing, to showing yourself full screen and removing the code from view?
@aj35lightning3 жыл бұрын
the fuulscreen camera cuts to you were kinda distracting because i want to see the code youre describing. rn its harder because i have to listen to you and also remember what you say for when the code comes back up and try and play back what you were saying while still listening to you continue explaining things. hope this isn't too critical. overall your content has helped me lots
@ArjanCodes3 жыл бұрын
Yes, we went a bit overboard with the cuts on this one. In the future I'll pay extra attention that this doesn't happen.
@aj35lightning3 жыл бұрын
@@ArjanCodes I'm a fan of how the Kevin Powell channel does it. He sits in a masked circle in the corner and then when he wants to emphasize what he's saying the circle gets bigger to bring the focus to him. I just noticed it in his latest css Subgrid video