Please note we have deprecated the dvcorg/cml-py3 container image. You can get the same results with: - container: docker://dvcorg/cml-py3:latest + steps: + - uses: actions/checkout@v3 + - uses: iterative/setup-tools@v1
@trevormiller9313 жыл бұрын
This channel is a goldmine. So glad I ran into it.
@DistortedV123 жыл бұрын
So cool. Been working with large NLP models recently, and testing is a super important topic. Glad you are covering it.
@gopietz4 жыл бұрын
Something I'm still looking for after watching all these videos: Validating a bunch of models after a data change or on a scheduled event, automatically pushing the best config to master and saving the best model file. Is something like this possible? Great job on the videos.
@dvcorg83704 жыл бұрын
Definitely possible- putting this on the video list!
@mturewicz4 жыл бұрын
Thanks for doing this series, dvc and cml look interesting. What about the artifacts, deployment, monitoring and re-training aspects of MLOps?
@dvcorg83704 жыл бұрын
Good question- for artifacts, you can handle them a few ways: - When you use DVC pipelines + cloud storage, you can `dvc push` the artifacts generated during your training run (like a model, or transformed dataset) into cloud storage before shutting off the runner. That way, they're saved and connected to the code that created them. - In GitLab, you can permanently save artifacts, and there's a flag in the function `cml-publish` that lets you use their artifact hosting service. (GitHub has something similar, but because it's not permanent we don't recommend it as much) - You can also push artifacts to cloud storage without DVC and use whatever convention you like to keep track of them Right now, DVC and CML don't explicitly handle deployment or monitoring. We focus mostly on the development aspect of the ML lifecycle. As for retraining, we have a big discussion about how experiments are handled (including model retraining) here: github.com/iterative/dvc/issues/2799 Some features that haven't been officially released yet may help with retraining, so look for some announcements soon :)
@yonigottesman3 жыл бұрын
Really nice! I noticed that in the pr the new test_score.json file is not commited. wouldn't we want the pr to change that file?
@dvcorg83703 жыл бұрын
Very observant- yes, you're right that `test_score.json` is not committed! Some people don't like to commit during CI runs, but you totally can if you want with an autocommit (see: github.com/marketplace/actions/git-auto-commit ) Another workflow is not to commit, and then once you've merged, re-run the pipeline on main and commit the updated `test_score.json` file manually (to main). It's all about your personal comfort with committing during CI.
@sorenchannel422 жыл бұрын
Where is the data coming from magically? When I run the workflow, the sanity check fails.
@mathguy1984 жыл бұрын
In mlOps tutorial #3 the command for diff used was: dvc metrics diff --show-md main > report.md For this tutorial the command used was: dvc metrics diff main --targets test.json --show-md >> report.md What is the difference in both of these commands and which one is to be used in what conditions?
@dvcorg83704 жыл бұрын
Good question- in Tutorial #3, we were using a DVC pipeline, so DVC implicitly "knows" which metric files to compare when doing a diff. In this video, we haven't declared a DVC pipeline, so we have to "tell" DVC which file is our metric. That's why I'm using the flag `--targets` to point to `test.json`. Does that help?
@mathguy1984 жыл бұрын
@@dvcorg8370Thanks, that surely helped me clear my doubt☺️, thanks for such wonderful tutorials looking forward for more from dvcorg❤️
@jackbauer3224 жыл бұрын
Hi, I'm confused about that original and perturbed and the whole thing in fact... Can you explain what it does? I see model confidence... About what ? What is hugging face ???? Sorry i'm lost
@dvcorg83704 жыл бұрын
Let me see if I can help! The idea is that we have an NLP model that does sentiment classification- is a sentence positive or negative? HuggingFace is a library of powerful pre-trained models you can use for this task The idea of the perturbation test is that we want to make sure our classifier is robust to typos in a sentence. For example, if I have the sentence, "I hate this ice cream shop", my model will (hopefully) say this has a NEGATIVE sentiment. Now if I randomly insert a typo- "I hate this ice cream ahop"- the model should still say the sentiment is NEGATIVE. We can run this test (does the classifier make the same prediction with and without typos?) on thousands of sentences and report the score. It gives a rough measure of the robustness of the model to noise. In this video, we use GitHub Actions to make sure this test is run anytime a model is checked in to the project via a Pull Request. Does that help?
@jackbauer3224 жыл бұрын
@@dvcorg8370 it does !!! Clearer now !!! Thank you ! I have another question... Can GitHub actions be used to roll out a spark mllib model on a spark cluster in GCP ?
@dvcorg83704 жыл бұрын
@@jackbauer322 ooh, we haven't tried- as long as the cluster can be configured as a GitHub self-hosted runner, it's possible. But you'd have to see if it's supported docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners
@jackbauer3224 жыл бұрын
@@dvcorg8370 thanks for the hint ! i'll have a look :) keep up the good work and your smile :)
@1etcetera14 жыл бұрын
could you, please, make your chrome 150% and terminal 120%? or larger. Thanks for your videos! Good luck!
@dvcorg83704 жыл бұрын
thanks for the suggestion- we can probably do that in future videos!