MLOps Tutorial #6: Behavioral tests for models with GitHub Actions

Рет қаралды 8,558

Күн бұрын

Пікірлер: 20

@dvcorg8370 2 жыл бұрын

Please note we have deprecated the dvcorg/cml-py3 container image. You can get the same results with: - container: docker://dvcorg/cml-py3:latest + steps: + - uses: actions/checkout@v3 + - uses: iterative/setup-tools@v1

@trevormiller931 3 жыл бұрын

This channel is a goldmine. So glad I ran into it.

@DistortedV12 3 жыл бұрын

So cool. Been working with large NLP models recently, and testing is a super important topic. Glad you are covering it.

@gopietz 4 жыл бұрын

Something I'm still looking for after watching all these videos: Validating a bunch of models after a data change or on a scheduled event, automatically pushing the best config to master and saving the best model file. Is something like this possible? Great job on the videos.

@dvcorg8370 4 жыл бұрын

Definitely possible- putting this on the video list!

@mturewicz 4 жыл бұрын

Thanks for doing this series, dvc and cml look interesting. What about the artifacts, deployment, monitoring and re-training aspects of MLOps?

@dvcorg8370 4 жыл бұрын

Good question- for artifacts, you can handle them a few ways: - When you use DVC pipelines + cloud storage, you can `dvc push` the artifacts generated during your training run (like a model, or transformed dataset) into cloud storage before shutting off the runner. That way, they're saved and connected to the code that created them. - In GitLab, you can permanently save artifacts, and there's a flag in the function `cml-publish` that lets you use their artifact hosting service. (GitHub has something similar, but because it's not permanent we don't recommend it as much) - You can also push artifacts to cloud storage without DVC and use whatever convention you like to keep track of them Right now, DVC and CML don't explicitly handle deployment or monitoring. We focus mostly on the development aspect of the ML lifecycle. As for retraining, we have a big discussion about how experiments are handled (including model retraining) here: github.com/iterative/dvc/issues/2799 Some features that haven't been officially released yet may help with retraining, so look for some announcements soon :)

@yonigottesman 3 жыл бұрын

Really nice! I noticed that in the pr the new test_score.json file is not commited. wouldn't we want the pr to change that file?

@dvcorg8370 3 жыл бұрын

Very observant- yes, you're right that `test_score.json` is not committed! Some people don't like to commit during CI runs, but you totally can if you want with an autocommit (see: github.com/marketplace/actions/git-auto-commit ) Another workflow is not to commit, and then once you've merged, re-run the pipeline on main and commit the updated `test_score.json` file manually (to main). It's all about your personal comfort with committing during CI.

@sorenchannel42 2 жыл бұрын

Where is the data coming from magically? When I run the workflow, the sanity check fails.

@mathguy198 4 жыл бұрын

In mlOps tutorial #3 the command for diff used was: dvc metrics diff --show-md main > report.md For this tutorial the command used was: dvc metrics diff main --targets test.json --show-md >> report.md What is the difference in both of these commands and which one is to be used in what conditions?

@dvcorg8370 4 жыл бұрын

Good question- in Tutorial #3, we were using a DVC pipeline, so DVC implicitly "knows" which metric files to compare when doing a diff. In this video, we haven't declared a DVC pipeline, so we have to "tell" DVC which file is our metric. That's why I'm using the flag `--targets` to point to `test.json`. Does that help?

@mathguy198 4 жыл бұрын

@@dvcorg8370Thanks, that surely helped me clear my doubt☺️, thanks for such wonderful tutorials looking forward for more from dvcorg❤️

@jackbauer322 4 жыл бұрын

Hi, I'm confused about that original and perturbed and the whole thing in fact... Can you explain what it does? I see model confidence... About what ? What is hugging face ???? Sorry i'm lost

@dvcorg8370 4 жыл бұрын

Let me see if I can help! The idea is that we have an NLP model that does sentiment classification- is a sentence positive or negative? HuggingFace is a library of powerful pre-trained models you can use for this task The idea of the perturbation test is that we want to make sure our classifier is robust to typos in a sentence. For example, if I have the sentence, "I hate this ice cream shop", my model will (hopefully) say this has a NEGATIVE sentiment. Now if I randomly insert a typo- "I hate this ice cream ahop"- the model should still say the sentiment is NEGATIVE. We can run this test (does the classifier make the same prediction with and without typos?) on thousands of sentences and report the score. It gives a rough measure of the robustness of the model to noise. In this video, we use GitHub Actions to make sure this test is run anytime a model is checked in to the project via a Pull Request. Does that help?

@jackbauer322 4 жыл бұрын

@@dvcorg8370 it does !!! Clearer now !!! Thank you ! I have another question... Can GitHub actions be used to roll out a spark mllib model on a spark cluster in GCP ?

@dvcorg8370 4 жыл бұрын

@@jackbauer322 ooh, we haven't tried- as long as the cluster can be configured as a GitHub self-hosted runner, it's possible. But you'd have to see if it's supported docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners

@jackbauer322 4 жыл бұрын

@@dvcorg8370 thanks for the hint ! i'll have a look :) keep up the good work and your smile :)