MLOps Tutorial #1: Intro to Continuous Integration for ML

  Рет қаралды 70,103

DVCorg

DVCorg

Күн бұрын

Пікірлер: 109
@dvcorg8370
@dvcorg8370 2 жыл бұрын
Please note we have deprecated the dvcorg/cml-py3 container image. You can get the same results with: - container: docker://dvcorg/cml-py3:latest + steps: + - uses: actions/checkout@v3 + - uses: iterative/setup-tools@v1
@nagarjunavarikoti8160
@nagarjunavarikoti8160 4 жыл бұрын
You made a complex topic sound very simple with your easy walkthrough steps! Please keep up the good work.
@dvcorg8370
@dvcorg8370 4 жыл бұрын
really appreciate it, Nagarjuna! Always feel free to let us know if there's a topic you'd like to see :)
@sakshamgulati1578
@sakshamgulati1578 3 жыл бұрын
@@dvcorg8370 could you please make a video on how to make unit tests for models in MLOps?
@malcolmdecuire7529
@malcolmdecuire7529 3 жыл бұрын
Starting in ML from a non-CS background was already hard enough, but Elle came thru and just made me smile and feel better about this complex subject. I'm rewatching this entire series again. After looking at udemy, coursera, and even a few other websites there isn't someone talking about how to go from making ML projects on ur laptop to production environment. Honestly, I'm grateful for the inspiration and I'm more committed to this self-learning route.
@BudiArsana
@BudiArsana Жыл бұрын
That diff report in pull request is awesome, thank you for sharing. I will try to use this technique in the future.
@phanikirans4728
@phanikirans4728 3 жыл бұрын
I doff my hat to you Elle...for a very crisp,easy to understand and uncluttered explanation of MLOps...
@sayakpaul3152
@sayakpaul3152 4 жыл бұрын
Excellent walkthrough! Would be cool to incorporate experiment tracking tools like Weights and Biases to automatically report metrics. But for starters, this is really a job well done!
@091carl
@091carl 3 жыл бұрын
Wow, incredible clarity in your presentations. Thanks for all the great work, Elle!
@DaredevilGotU
@DaredevilGotU 4 жыл бұрын
This is so cool. I Loved it. We can use this for writing test cases in PRs. Thank you.
@MLOps
@MLOps 4 жыл бұрын
Soo cool to see this Elle! thank you for sharing and teaching us a thing or two in the community!
@t.ganesh1692
@t.ganesh1692 4 жыл бұрын
Thank you for the excellent tutorial Elle and @DVCorg!
@itsravimalhotra3
@itsravimalhotra3 3 жыл бұрын
Wow. This was soo good. She made it so easy to understand.
@יהונתןאיזנשטיין
@יהונתןאיזנשטיין 2 жыл бұрын
Great tutorial. Thank you!
@dvcorg8370
@dvcorg8370 Жыл бұрын
Glad it was helpful!
@mayurlohana
@mayurlohana 3 жыл бұрын
You are defining things in rightful manner and things are understood easily. AMAZING 🤩
@dvcorg8370
@dvcorg8370 3 жыл бұрын
Thanks so much, Mayur! The kind words are really appreciated :)
@AleksandrBlekh
@AleksandrBlekh 4 жыл бұрын
Excellent tutorial. Keep it up!
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Thanks Aleksandr! Much appreciated :)
@AleksandrBlekh
@AleksandrBlekh 4 жыл бұрын
@@dvcorg8370 It's my pleasure! :-)
@stopznak86
@stopznak86 8 ай бұрын
Great stuff, I'm learning
@johannesallgaier5722
@johannesallgaier5722 3 жыл бұрын
Great video! Such precise and clear explanations! Thank you for sharing.
@Kommalapatin
@Kommalapatin 3 жыл бұрын
pretty to explain the topics about the MLOps..keep it up.good work elle.
@bhagwatchate7511
@bhagwatchate7511 3 жыл бұрын
Great explanation
@dvcorg8370
@dvcorg8370 3 жыл бұрын
Glad you think so!
@danielbaena4691
@danielbaena4691 3 жыл бұрын
Thank you so much for this video and all your work, it is just amazing!
@dvcorg8370
@dvcorg8370 2 жыл бұрын
You're very welcome!
@DataScienceGarage
@DataScienceGarage 3 жыл бұрын
Hi! That's is the tutorial I was searching for. Thanks a lot!
@ris2043
@ris2043 Жыл бұрын
Excellent
@dvcorg8370
@dvcorg8370 Жыл бұрын
Thank you! Cheers!
@hyattBaker
@hyattBaker 3 жыл бұрын
Thank you that was very helpful!
@regularSenseAppeal
@regularSenseAppeal 4 жыл бұрын
Very good thank you. Superbly explained.
@philiperiskallaleal6010
@philiperiskallaleal6010 4 жыл бұрын
Awesome presentation. Thank you for your great work
@dvcorg8370
@dvcorg8370 3 жыл бұрын
Thanks Phillipe!
@shroukmansour7642
@shroukmansour7642 3 жыл бұрын
What is special about github actions and CML so I use them instead of using something like jenkins for example??
@gdibble
@gdibble 2 жыл бұрын
🔥🔥🔥
@philiperiskallaleal6010
@philiperiskallaleal6010 4 жыл бұрын
Dear Elle, what would be the required changes for implementing CML into GITLAB? Does GITLAB has some type of "GitHub Actions" functionality? If so, where can I check for it?
@dvcorg8370
@dvcorg8370 3 жыл бұрын
Good q- GitLab has something called GitLab CI, which is extremely similar and gives you must of the same functionality! There are a few subtle differences in how you setup things like environmental variables/secrets, but it's not too bad. We have some docs here: dvc.org/doc/cml/start-gitlab
@MohammedBakheet
@MohammedBakheet 4 жыл бұрын
Very nice explanation indeed, thank you so much, keep it up
@tanim980
@tanim980 Жыл бұрын
you are just amusing!
@Chevignay
@Chevignay Жыл бұрын
Really great video thank you
@jackbauer322
@jackbauer322 4 жыл бұрын
What's the main difference with DVC ? How they articulate together ? or not ? thanks again !
@dmitrypetrov3542
@dmitrypetrov3542 4 жыл бұрын
DVC and CML complement each other. CML was created by the DVC team - see cml.dev A bit more tech details: DVC is usually used to transfer data to CI/CD (CML) runners.
@jackbauer322
@jackbauer322 4 жыл бұрын
@@dmitrypetrov3542 Ok ! So from my understanding DVC is for experiment tracking and CML is more for for CI/CD MLOps ?
@dmitrypetrov3542
@dmitrypetrov3542 4 жыл бұрын
​@@jackbauer322 exactly. DVC - data & ML experiments. CML - team collaboration & ML training.
@soumantadas8564
@soumantadas8564 4 жыл бұрын
This is extremely helpful Elle and DVCorg. Had a follow-up question - if I wanted to generate multiple metric files and residual plots from the train.py script (say because I am running a loop varying max_depth over [5,10,15] or varying some other hyperparameters), what would be the best way to modify the workflow so that I can see all the data and viz in one commit? A crude way could be to store the metrics and plots with diff names in train.py and in the cml.yml file add them separately to report.md. However, as the no of loops increase, this wouldn't be a scalable method.
@dvcorg8370
@dvcorg8370 4 жыл бұрын
So what if you were to write out your metrics in one file using longform? So for example.... max_depth | accuracy 5. | 87 10. | 90 15. | 92 And likewise, put all your plots on one axis- so like, many lines of different colors, using your favorite plotting library. Then you'd be able to print your table and your summary plot in your cml report with only one line of code each, no matter how long your loop is.
@soumantadas8564
@soumantadas8564 4 жыл бұрын
@@dvcorg8370 Ahh yes, a very nice workaround. Thanks.
@OmarHisham1
@OmarHisham1 2 жыл бұрын
15:08 - I made an an amazing model cat in the background : Yaaa
@dvcorg8370
@dvcorg8370 Жыл бұрын
Congrats!
@jjpp3301
@jjpp3301 3 жыл бұрын
this is great! thanks for sharing
@rostyslavbryiovskyi4591
@rostyslavbryiovskyi4591 3 жыл бұрын
Hi, thanks for comprehensive explanation!) But I have one more question. Can I use CML with Azure TFS ?
@dvcorg8370
@dvcorg8370 3 жыл бұрын
Yes you can! See these docs: cml.dev/doc/cml-with-dvc. And please join us in our Discord server if you have more questions! discord.gg/rpgRdvfyAf
@iPondrio
@iPondrio 7 ай бұрын
Do you have any video showing how to configure the token ? I’m having a hard time with that config
@anikethdeshpande8336
@anikethdeshpande8336 4 жыл бұрын
Awesome tutorial!
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Thanks Aniketh!
@IrtizaKaleem
@IrtizaKaleem 3 жыл бұрын
Hi Elle, can you shed some light if I can do the same, but with a different docker image, such as continuum/anaconda3, so I can do the same for a conda environment? Other than the docker image link, what else would I need to change?
@shaunirwin2016
@shaunirwin2016 4 жыл бұрын
Very nice tutorial! I really like this concept of integrating into the normal software stack. How would one handle the situation of adding new metrics over time? E.g. If you begin a project only displaying F1 score, but as you train more models you realise you are also interested in seeing and comparing the precision. Could this be catered for using CML?
@dmitrypetrov3542
@dmitrypetrov3542 4 жыл бұрын
Yep, using the existing software stack for ML is one of the ideas behind CML. That's a really good question. The flow relies on Git a lot. So, if the scores were stored\commited then you can derive F1 as well as precision. However, if the scores were not stored/committed you might need to return back, create another experiment just to get the right scores to compare. How do you do that with the other tools or approaches? One relevant discusion - github.com/iterative/dvc/issues/4210
@shaunirwin2016
@shaunirwin2016 4 жыл бұрын
@@dmitrypetrov3542 thanks very much for the reply! Yes, I thought the solution might be something along those lines. For database approaches such as MLFlow one can log metrics later on to previous experiments/runs. I suppose with a git-based system of storing metrics one could manually add an extra commit with the new scores? Or of course rerun the experiment in the normal way with the new scores included, as you suggest. Although for long training times that could be a problem, if you are actually just wanting to do scoring, not training.
@dmitrypetrov3542
@dmitrypetrov3542 4 жыл бұрын
​@@shaunirwin2016 yes, an additional commit is one of the solutions. Re long-running experiments - you are right, but the same happens with logging tools like mlflow - you need to retrain to get the metrics. The only difference, the commit is not needed.
@mehrdat
@mehrdat 8 ай бұрын
thank you very much. but why i have errors. i couldn't run after first commit. i tried nearly everything. it is deom the the line of the importance plot. what it could be?
@philiperiskallaleal6010
@philiperiskallaleal6010 3 жыл бұрын
Dear Elle, would you be so kind as to show/describe how one can implement a dvc pull request that is meant to be run by a .github/workflows "yaml"'s file, so that it is only run on the git remote repository? An approach through which would be possible to "gitignore" the dvc data, while allowing the git remote a temporary access to the data to properly test the CML commited. Perhaps use some kind of data cache by the git remote repository, and later an automatic deletion of this cached data?
@dvcorg8370
@dvcorg8370 3 жыл бұрын
One approach is using a local DVC config file, which lets you have a different data remote/different credentials for when you're working locally than what's in your CI/CD system. That means you can still have a DVC config file that gets pushed to your Git repo, but you'll have a local version that gets used when you're developing in your workspace. Docs here: dvc.org/doc/command-reference/remote#example-add-a-default-local-remote Another thought that comes to mind is that you could make the credentials to pull from the DVC remote only available to the runner (via secrets). You might then write a control flow statement... if those environmental variables are present, then run dvc pull. else, don't. : If you want to discuss this in more detail, stop by the CML channel on our Discord: discordapp.com/invite/dvwXA2N
@toilinginobscurity3091
@toilinginobscurity3091 2 жыл бұрын
Let's say we have a couple of commits in the experiment branch and we want to merge the branch with squashed option. What would happen then? All the reports would be combined?
@muhammadfarjadaliraza4546
@muhammadfarjadaliraza4546 3 жыл бұрын
Awesome video, want to know how to use tpu and gpu ?
@SheeceGardazi
@SheeceGardazi 3 жыл бұрын
thanks for sharing the talk
@fabianpena2776
@fabianpena2776 4 жыл бұрын
Thx. The tutorial is amazing. In comments, I am not able to see the PNG files, only the links. Do I need to configure something more?
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Hm, that sounds like you might be missing a flag in your cml-publish function. Do you have `cml-publish --show-md >> report.md`? If you don't have the `--show-md` flag, you'll get a link to your image instead of an embedded picture.
@fabianpena2776
@fabianpena2776 4 жыл бұрын
@@dvcorg8370 Thank you again! Now, it works for me :)
@vishal-rana
@vishal-rana 4 жыл бұрын
Beautiful.
@derekcorcoran5129
@derekcorcoran5129 4 жыл бұрын
Hello Elle, this looks great, it seems that it works for Python only? I develop Machine Learning tools in R, and I would love to help integrate this if possible
@dvcorg8370
@dvcorg8370 4 жыл бұрын
The tools we're using here (GitHub Actions and CML) work with any language! Here's a blog about a project using R: mribeirodantas.xyz/blog/index.php/2020/08/10/continuous-machine-learning/ There's a GitHub Action for getting R on your runner, too: github.com/r-lib/actions
@derekcorcoran5129
@derekcorcoran5129 4 жыл бұрын
DVCorg thanks, you are doing an amazing job
@sayakpaul3152
@sayakpaul3152 4 жыл бұрын
One thing I figured that the actions do not always trigger upon a new commit to a branch. Is there a way to prevent it?
@dmitrypetrov3542
@dmitrypetrov3542 4 жыл бұрын
They trigger on push requests. For several local commits and a single push it will run only the last one. So, you need to push on each of the commits.
@mirmohammadjaber2676
@mirmohammadjaber2676 4 жыл бұрын
Have you deleted the experiment branch from the repository?
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Yes, but you can see the closed PR and browse the branches at previous points in time github.com/andronovhopf/wine/pull/2
@carloslopez7204
@carloslopez7204 4 жыл бұрын
How can I set a secret token in GitHub actions? My program is calling an API so a need to write the secret token but I don't know if it's correct to write it in cml.yaml because it gonna be public
@dvcorg8370
@dvcorg8370 4 жыл бұрын
You can add the secret to your GitHub repository, which will give the runner access to it via an environmental variable. You can set it so the variable will be hidden even in logs- check out their docs! docs.github.com/en/actions/reference/encrypted-secrets
@jordieclive
@jordieclive 4 жыл бұрын
what can this CML tool do that circleci Continous Integration can't do?
@dvcorg8370
@dvcorg8370 4 жыл бұрын
To be clear, CML isn't a competitor to Circle CI. Circle CI is more analogous to GitHub Actions or GitLab CI; it's a continuous integration system. CML is a toolkit that works with a continuous integration system to 1) provide big data management (via DVC & cloud storage), 2) help you write model metrics and data viz to comments in GitHub/Lab, and 3) orchestrate cloud resources for model training and testing. Currently, CML is only available for GitHub Actions and GitLab CI. But it could in the future integrate with Circle CI (i.e., as an Orb).
@jordieclive
@jordieclive 4 жыл бұрын
@@dvcorg8370 thanks for detailed reply. I've got it clear in my head now 😃, I watched the other bids in the series and you explain very clearly..I look forward to videos setting up cloud workflow with CML and versioniglng like S3 , gcp. I'm not sure if you are planning to do DL content.. As a suggestion I Would love to see pytorch workflows on cloud with say multigpus . And like basic training tests in CML workflow , like sanity check :fitting/ evaluation on single batch etc. Please keep up tutorials!
@dvcorg8370
@dvcorg8370 4 жыл бұрын
@@jordieclive No problem! Let us know any other questions you have :)
@hamdikhaled6955
@hamdikhaled6955 4 жыл бұрын
Thanks a lot
@jwc7663
@jwc7663 4 жыл бұрын
Scenario: Need NN model and want to test in using GPU. Is it possible as well?
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Yes! We'll be covering that use case in a video soon. For now we have some an example project to browse: github.com/iterative/cml_cloud_case
@jwc7663
@jwc7663 4 жыл бұрын
@@dvcorg8370 That looks good. Will it support local machine(not cloud) as well?
@dvcorg8370
@dvcorg8370 4 жыл бұрын
@@jwc7663 Yes- you can set GitHub Actions (& GitLab CI, too) to use self-hosted runners, which can be a local machine. Check out the docs here: docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners
@efels_com
@efels_com 4 жыл бұрын
@@dvcorg8370 I would love to see the self hosted GPU flow with the ability to compare the results from the model that is in the master branch repo. And using dvc to roll the data set back to the data set that was used to train the model in master branch. So we could compare both models, on new and old data.
@dvcorg8370
@dvcorg8370 4 жыл бұрын
@@efels_com We can do this! Adding this to the list of to-dos.
@leilainigodelacruz3648
@leilainigodelacruz3648 4 жыл бұрын
Hi, Thanks for your very useful video. I have a question , because I was trying to replicate this example in my own repo and failed in this part of the cml.yaml ` steps: - uses: actions/checkout@v2 - name: train_model env: repo_token: ${{ secrets.GITHUB_TOKEN }}` do you mean by GITHUB_TOKEN a secret key that I assign in Settings/Secrets tab from the repo? which is a private key. If this is true, I dont know why ifI put my own private key name it doesnt work :(
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Hi Leila! You don't have to assign any value to GITHUB_TOKEN- it is assigned by default in a GitHub repository. Please delete any secrets you might have added and try again. If it doesn't work, stop by our Discord channel where we can do more hands-on troubleshooting :) discord.gg/bzA6uY7
@leilainigodelacruz3648
@leilainigodelacruz3648 4 жыл бұрын
@@dvcorg8370 Thanks! It did work!
@davidbalakirev5963
@davidbalakirev5963 3 жыл бұрын
Hands up if you also had an espresso while watching this.
@jalaj1
@jalaj1 4 жыл бұрын
Hi can you make video on mlcertific.com It is providing free certification on MLOps
@drm8164
@drm8164 Жыл бұрын
i love u
@dvcorg8370
@dvcorg8370 Жыл бұрын
🦉 We love you too!
@jackbauer322
@jackbauer322 4 жыл бұрын
How would mlflow come in here?
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Good question- you can integrate lots of tools with CML. For example, you can use it with Tensorboard to get a link to your Tensorboard in a PR whenever the model trains. Check out this use case: github.com/iterative/cml_tensorboard_case/pull/3 We haven't tried with MLFlow in particular yet, but expect there could be a similar approach.
@jackbauer322
@jackbauer322 4 жыл бұрын
@@dvcorg8370 Thanks ! Can't wait for the next videos :)
@jeremykusnadi5148
@jeremykusnadi5148 4 ай бұрын
how do you get around the " `GLIBC_2.28' not found " error?
@dvcorg8370
@dvcorg8370 3 ай бұрын
This error typically occurs when trying to run a program that was compiled with a newer version of the GNU C Library (GLIBC) than what's installed on your system. Check that version requirements match up and you should be all set!
MLOps Tutorial #2: When data is too big for Git
10:52
DVCorg
Рет қаралды 21 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
Introduction to MLOps
45:39
Pragmatic AI Labs
Рет қаралды 42 М.
GitHub Actions Tutorial - Basic Concepts and CI/CD Pipeline with Docker
32:31
TechWorld with Nana
Рет қаралды 1,6 МЛН
Webinar: MLOps automation with Git Based CI/CD for ML
57:52
CNCF [Cloud Native Computing Foundation]
Рет қаралды 18 М.
MLOps Tutorial #4: GitHub Actions with your own GPUs
13:35
Docker For Data Scientists
57:10
Abhishek Thakur
Рет қаралды 61 М.
AI Is Making You An Illiterate Programmer
27:22
ThePrimeTime
Рет қаралды 251 М.
An introduction to MLOps on Google Cloud
23:56
Google Cloud Tech
Рет қаралды 44 М.