Can't thank you enough for your tremendous work and helping us creating massive solutions in a fraction of the time.
@orjichima60683 ай бұрын
This is quite insightful. Please can you share the vscode files? Thank you.
@APCMasteryPath2 ай бұрын
Many thanks for your comment. Here is the link to my code with thorough explanation: drive.google.com/file/d/1_SdKJX1t4g2s5sC1qtRZ3sShEXwBwE49/view?usp=drive_link. Hppe you find it useful for your use case.
@Czarlsen4 ай бұрын
Hi, thanks for the video. Is there any difference between pushing model to hub as safetensors and as pytorch_model.bin? My original model had safetensors format on hugging face but I see that after finetunning the model was pushed in pytorch_model.bin format. Does it make any differnece?
@APCMasteryPath4 ай бұрын
Many thanks for your comment. I do not think there is going to be much of a difference. Here you can find the link to a useful discussion about the conversion between safetensors and bin formats for your preferred use case : discuss.huggingface.co/t/bin-to-safetensors-without-publishing-it-on-hub/39956. Hope you find this useful. I have just released a new video walking through the latest conversational chat template for LLama 3.1 and how to use it for 2 use cases. Here is the video link: kzbin.info/www/bejne/pajJqXl3lLFonZY
@Czarlsen4 ай бұрын
@@APCMasteryPath setting safe_serialization=None in model.push_to_hub_merged() solved this issue, thanks. I have another question. What do you mean in 24:09 min of the video that "it's not read by chatbot". i have seen couple of chat/instruct models with only save tensor format and they work correctly
@Czarlsen4 ай бұрын
Also why using max_steps instead of num_train_epochs. Should we not loop over all of the training data?
@APCMasteryPath4 ай бұрын
@@Czarlsen Many thanks for your comment and contribution. I am glad that you sorted out your issue. The main issue that I used to face was that the majority of the Hugging face models that could be in the chatbots through the ollama model file method would have a GGUF file within the respository. When I tried to use those without any GGUF embedded, I was given an error. If you have links to these videos that made it work, I would be more than happy to update my information. Many thanks again for your comments.
@APCMasteryPath4 ай бұрын
@@Czarlsen This is quite an important topic nowadays. You can picture it in this way. If you have 1000 rows of data to finetune your model, looping through the whole rows once would take less than 5 mins using Unsloth, you might need to loop again multiple times to make sure that the model has learnt properly and you can check that in the learning loss figures. Looping once or multiple times can be easy in case you have a proper GPU and low number of rows. If you have a large number of rows such as the alpacas dataset (100k rows) or one of the use cases that I released in my latest video (bills of quantities codification - 41k rows), 1 epoch could be equal to 1000s of steps. This can be quite unfeasible if you just want to have a working prototype and you do not have the right infrastructure for the job. I am going to give you a real life example, I do possess an Nvidia RTX 3090 with 24gb of memory and I am running locally. One of the use cases was to train Llama3.1 using the conversational chat template method on a large dataset containing bills of quantities data (41k rows). I chose 500 steps to get the ball rolling and have a working prototype as fast and efficient as possible. The finetuning time was 41 minutes. 500 steps in my case were equal to circa 0.1 epoch. This means that to loop once over the data would take 410 minutes (~= 6 hours & 50 minutes) to get a working prototype. Ideally, if I want to go for production I would use a larger model and looping multiple times over the data. Hope this gives you some insights about what is happening in the background. The link to my latest video which sheds the ligh on some of these aspects could be found here: kzbin.info/www/bejne/pajJqXl3lLFonZY