🔥 Integrate Weights & Biases with PyTorch

Рет қаралды 46,258

Күн бұрын

Пікірлер: 31

@brandomiranda6703 3 жыл бұрын

amazing! gpu utilization? That is so useful now I can increase the batch size so much more easily without having issues with nvidia-smi...etc etc!

@carterfendley3145 3 жыл бұрын

The "log_freq=10" made my training loop unbearably slow (on a different model than video). Granted, by most DL standards I have a slow computer. Love your stuff! Hope this saves someone a minute.

@raphaelhyde2335 Жыл бұрын

Great video and walk-through, I really like how you explain the details and steps Charlies

@brandomiranda6703 3 жыл бұрын

now I can track the gradients without a hassle? no additional get gradients functions...nice!

@maxxrenn Жыл бұрын

Great knight rider reference "Evil charles with a goatee"

@vladimirfomenko489 3 жыл бұрын

Great tutorial, Charles, thanks for sharing!

@kanybekasanbekov2955 3 жыл бұрын

Does Wandb support PyTorch Distributed Data Parallel training? I cannot make it work ...

@WeightsBiases Жыл бұрын

yep, here's some docs: docs.wandb.ai/guides/track/advanced/distributed-training

@JsoProductionChannel 5 ай бұрын

My NN is not learning even thought I have the optimize step in my def train(model, config). Does someone have the same problem?

@maciej12345678 Жыл бұрын

i have problem with connection in wandb wandb: Network error (ConnectionError), entering retry loop. windows 10 how to resolve this issue ?

@jakob3267 3 жыл бұрын

Awesome work, thanks for sharing!

@oluwaseuncardoso8150 11 ай бұрын

i don't understand what "log_freq=10" mean? Does it mean log the parameters every 10 epochs or batchs or steps?\

@HettyPatel Жыл бұрын

THIS IS AMAZING!

@brandomiranda6703 3 жыл бұрын

how does one achieve high disk utilization in pytorch? large batch size and num workers?

@brandomiranda6703 3 жыл бұрын

what happens if we don't do .join() or .finish()? e.g. there is a bug in the middle it crashes...what will wandb do? will the wandb process be closed on its own?

@WeightsBiases 3 жыл бұрын

In the case of a bug or crash somewhere in the user script, the wandb process will be closed on its own, and as part of the cleanup it will sync all information logged up to that point. If that crashes (e.g. because the issue is at the OS level or things are otherwise very on fire), the information won't be synchronized to the cloud service but it will be on disk. You can sync it later with wandb sync. Docs for that command: docs.wandb.ai/ref/cli/wandb-sync If you have more questions like these, check out the Technical FAQ of our docs: docs.wandb.ai/guides/technical-faq

@Oliver-cn5xx 2 жыл бұрын

the gradients are numerated like modex x1.x2 what do x1.x2 refer to?

@brucemurdock5358 6 ай бұрын

Americans are so imprecise in their vocabulary. I understand you're trying to make the explanations more palatable but I personally prefer someone being more calm, collected and precise in their vocabulary and choice of sentences. Many academicians may prefer this. Besides that, thanks for the video.

@brandomiranda6703 3 жыл бұрын

How do things change if I am using DDP? (e.g. distributed training and a bunch of different processes are running? Do I only log with one process? That is what I usually do)

@WeightsBiases 3 жыл бұрын

There's two ways to handle it -- logging from only one process is simpler, but you sacrifice the ability to see what's happening on all GPUs (good for debugging). Explanatory docs here: docs.wandb.ai/guides/track/advanced/distributed-training