Machine Learning Pipelines with DVC (Hands-On Tutorial!)

  Рет қаралды 21,530

DVCorg

DVCorg

Күн бұрын

Пікірлер: 30
@MLOps
@MLOps 4 жыл бұрын
I love learning with you Elle!
@dvcorg8370
@dvcorg8370 4 жыл бұрын
awwwwwww thank you :) :) :)
@btylerparker
@btylerparker 4 жыл бұрын
Just wanted to say thank you for the DVC tooling and for the quality video content coming out on this channel. I recently got back into a ML personal project and wanted to focus on it being easy to run and iterate on so I can maximize what little sporadic time I can dedicate to developing it. Looking around for tooling solutions and finding DVC was a real boon as it cleanly and thoughtfully addressed a lot of my pain points, concerns, and planned features (for instance, I was developing a DAG data dependency execution pipeline already). Thanks again to the team for the time and care put into this software, patreon'd.
@dvcorg8370
@dvcorg8370 4 жыл бұрын
This is amazing to hear, Tyler! Thank you so much for letting us know
@sagnikroy6405
@sagnikroy6405 2 жыл бұрын
I like how she explains, exactly the same as a college senior. Thank You.
@umeshtiwari9249
@umeshtiwari9249 Жыл бұрын
nice tutorial makes it easy to understand
@dvcorg8370
@dvcorg8370 Жыл бұрын
Glad you think so!
@yaledioma
@yaledioma 4 жыл бұрын
Best channel and best instructor
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Aw thanks! :D Makes my day.
@RedShipsofSpainAgain
@RedShipsofSpainAgain 4 жыл бұрын
2:57 Wanted to add for viewers/users that when you run this dvc run command, make sure you LEAVE NO SPACES BETWEEN THE PARAMETERS in the -p flag, or else dvc will throw an error. So double check you're typing -p prepare.seed,prepare.split (No spaces between the two arguments).
@mrinaldesai8867
@mrinaldesai8867 2 жыл бұрын
This video saved my life
@andrezx10r
@andrezx10r Жыл бұрын
Awesome tutorial!! I made my own little pipeline to practice in a small test project and that`s great!!!
@dvcorg8370
@dvcorg8370 Жыл бұрын
Nice work!
@AchinAbhi
@AchinAbhi 3 жыл бұрын
Got this "ERROR: failed to reproduce 'dvc.yaml': Parameters 'train.n_estimators' are missing from 'params.yaml'." in the step dvc.repro.... It seems the params.yaml has train.n_est and not train.n_estimators.. Changing the dvc.yaml file resolves this.
@ankitraj_23
@ankitraj_23 3 жыл бұрын
I am getting this error Traceback (most recent call last): File "src/featurization.py", line 59, in train_words = np.array(df_train.text.str.lower().values.astype("U")) numpy.core._exceptions.MemoryError: Unable to allocate 1.85 GiB for an array with shape (20017,) and data type
@elenaponomareva3200
@elenaponomareva3200 4 жыл бұрын
Thank you very much for the tutorials! Very inspiring and helpful :) I have a question about hyperparameters tuning with dvc (and maybe CML). Is it possible? I have a dvc pipeline and I want to test a bunch of parameters (I am also using params.yaml). For example learning rate of [0.1, 0.01, 0.001, 0.0001]. It looks like I need to manually create a new branch and run 'dvc repro' for each learning rate? And then somehow compare these branches? Is there a more efficient way?
@dvcorg8370
@dvcorg8370 4 жыл бұрын
Hi Elena, good question- you can use DVC for hyperparameter tuning. Right now, you'd want one commit per experiment (they don't have to be different branches to do `dvc metrics diff`- you can also compare several commits on one branch!). However, we're working on a new feature for doing lightweight experiments without committing each time. Check out the feature in progress here: github.com/iterative/dvc/wiki/Experiments
@prraoable
@prraoable 4 жыл бұрын
@@dvcorg8370 Hi Elle, that was again a very nice tutorial, but this is exactly the question I had as well - when building models, it's unlikely that we'd want to commit each time (I'd typically run some form of hyperparameter optimization using an external tool, rather than tuning by hand). So the lightweight experiments without individual commits sounds super useful!
@elenaponomareva3200
@elenaponomareva3200 4 жыл бұрын
@@dvcorg8370 Thank you very much! Looks great!
@dvcorg8370
@dvcorg8370 4 жыл бұрын
@@prraoable Yep, this makes perfect sense. Right now, you could do: 1 commit one automatic hyperparameter search. In a sense, the parameters of your search (say, grid size & density, or priors for a bayesian tool) could be the experiment. But ultimately, moving towards a lighter way of exploring parameter space is needed! The model we're planning to reach is explore locally, commit your favorite(s) at the end of the day.
@basicmachines
@basicmachines 7 ай бұрын
It says in the description that the command dvc run has now been replaced with 'dvc stage add' but as far as I can see stage add does not actually run the new pipeline stage. Would 'dvc exp run -n' work, or is the current procedure to do 'dvc stage add -n' followed by 'dvc exp run'?
@yeyerrd
@yeyerrd 3 жыл бұрын
Hi, that was a great video! Could you also give access to the data so that we can follow the whole tutorial?
@dormarcovitch8801
@dormarcovitch8801 Жыл бұрын
with the 'dvc repro' on the prepare stage for a new project / cloned one i still need to do the 'dvc pull' or it should do it on backstage ?
@dvcorg8370
@dvcorg8370 Жыл бұрын
Hi @dormarcovitch8801! dvc repro will pull the data as defined in the prepare stage if it has not been pulled before or if it has changed. As you make changes to the following stages, if nothing changes with the data in the prepare stage, dvc repro will skip that portion of your pipeline.
@DevSense19
@DevSense19 4 жыл бұрын
Watch video on DVC and CML intergration for ML pipelines with demo : kzbin.info/www/bejne/fmK5c6aBbL2cgdU
@philiperiskallaleal6010
@philiperiskallaleal6010 2 жыл бұрын
Dear all, I would like to know how I can pass the correct Conda's Python environment to the dvc repro command. I have a situation (common to most Python users) that my machine has several different Python Environments (all of which are managed by Conda). Particularly for this case, I am in need that DVC runs its pipeline (that is defined in the dvc.yaml file) under one of these Python versions. How can I do that?
@davidaliaga4708
@davidaliaga4708 11 ай бұрын
it seems dvc run is not supported anymore, is it?
@dvcorg8370
@dvcorg8370 11 ай бұрын
@davidaliaga4708 You are correct sir! We love an astute viewer! ❤️ dvc run was deprecated and replaced with dvc stage add to set up your stages with dependencies and outputs. You can find the documentation here: dvc.org/doc/start/data-management/data-pipelines Once your pipeline is set up, you can run dvc repro to run only the stages that have changed!
@fatmazehraortak2960
@fatmazehraortak2960 3 жыл бұрын
when i run prepare command i got a error like that: ERROR: failed to run: prepare.dvc -d src/prepare.py -d data/data.xml -o data/prepared python src/prepare.py data/data.xml, exited with 127. and i found a solution. Ypu need to activate conda with this command "conda activate"
@hemanntth6894
@hemanntth6894 3 жыл бұрын
Not able run the command unzip code.zip...what to do???
Versioning Data with DVC (Hands-On Tutorial!)
13:04
DVCorg
Рет қаралды 68 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 19 МЛН
My scorpion was taken away from me 😢
00:55
TyphoonFast 5
Рет қаралды 2,7 МЛН
Data Versioning and Reproducible ML with DVC and MLflow
26:46
Databricks
Рет қаралды 21 М.
Intro to Pachyderm | The Data Foundation for Machine Learning
17:39
Managing Machine Learning Experiments with DVC
15:37
DVCorg
Рет қаралды 2,7 М.
Hands-on with DVC | Data Versioning in MLOps
17:13
Jayesh Sharma
Рет қаралды 4,8 М.
Version Control with DVC in a nutshell 🥜  (No Code!)
8:20
Machine Learning Experimentation with DVC and VS Code
56:37