Code Here including short explanation on how to get dataset. github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb
@andrewbritt31174 ай бұрын
Hello, thanks for the really informative walkthough. I was looking to go back through your notebook for further review, however the notebook no longer available from the link.
Thank you, finally a proper instruction for classification on KZbin🥇🥇🥇🥇
@MLAlgoTrader2 ай бұрын
Thanks for kind words!!. Feel free to share video if you can lol since KZbin Algo hates me. I'll do more llm stuff
@Daniel-fm7yh23 күн бұрын
Finally a legit document classification tutorial - thank you! what a legend!
@MLAlgoTrader23 күн бұрын
Wow thanks for nice comments ! Share video lol 🤣 I want to make more such videos but am looking for the right topic. I am thinking rlhf or something up for suggestions
@divanabilaramdani88183 ай бұрын
thank you so much, ive watched the whole video and it helps me alot
@MLAlgoTrader3 ай бұрын
Thanks a lot lol it is my best video haha 😂 Please share if you can .
@am7-p4 ай бұрын
Once again, thank you for the informative channel and sharing this video.
@MLAlgoTrader4 ай бұрын
Thanks I thought you guys on average didn't like LLM videos lol. My click through rate is low so it makes me happy you say that.
@am7-p4 ай бұрын
@@MLAlgoTrader What is click through rate ?
@am7-p4 ай бұрын
@@MLAlgoTrader Also, please consider that knowing what you are working on helps me to plan for the next steps of my development Currently, I use and pay for OpenAI API, but I do plan for implementing a LLama in my home-lab. Once I start to learn and practice LLama, I will go through your videos again.
@MLAlgoTrader4 ай бұрын
This was is small like
@MLAlgoTrader4 ай бұрын
Honestly, it is completely random. My next videos are on sequential bootstrap, implementing a gap trading strategy both with stocks and with options, the dangers of backtesting, and then I also plan to do ib_insnyc for begginers. ...I think it llama 3 8b params works free version of colab for a bit until you get kicked of gpu. There is also this api I used I think you get quite a bit free at first. docs.llama-api.com/quickstart .
@최용빈-g3kАй бұрын
Thank you so much:) I have a question! when doing fine-tuning with sequence classification head, why don't you use apply_chat_template? Is there any reason?👀
@MLAlgoTraderАй бұрын
Hey sorry I didnt' response I didn't have time since I was moving. Let me try to remember what on earth I did and get back to you lol
@michelejoshuamaggini38223 ай бұрын
Thank you for your video! I have the following question for you: when you're making predictions before fine-tuning the model, are you evaluating the capabilities of the model with a zero-shot learning?
@MLAlgoTrader3 ай бұрын
Not exactly. You more of add a last linear layer then a classification layer so those weights are not yet trained.
@michelejoshuamaggini38223 ай бұрын
@@MLAlgoTrader in that case how could I implement this linear layer then a classification layer in your code? I'm interested in comparing 0 shot and few shot learning with this model.
@MLAlgoTrader3 ай бұрын
@@michelejoshuamaggini3822 So this automatically adds these layers. To be precise. "The LLaMa Model transformer with a sequence classification head on top (linear layer). LlamaForSequenceClassification uses the last token in order to do the classification, as other causal models (e.g. GPT-2) do. Since it does classification on the last token, it requires to know the position of the last token. If a pad_token_id is defined in the configuration, it finds the last token that is not a padding token in each row. If no pad_token_id is defined, it simply takes the last value in each row of the batch. Since it cannot guess the padding tokens when inputs_embeds are passed instead of input_ids, it does the same (take the last value in each row of the batch). " See huggingface.co/docs/transformers/main/en/model_doc/llama2 I don't have time now, but I can show you in code sometime how you would see it.
@MLAlgoTrader3 ай бұрын
@@michelejoshuamaggini3822 As for the 0 shot classification, that is something like a car is something with 4 wheels and a motorcycle is something with 2 can you please classify car or motorcycle? What I do here is like 0 shot classification for sentiment analysis kzbin.info/www/bejne/pH6uiqiJdrGgpsU
@AfekShusterman-x3f4 ай бұрын
great video mate! loved it
@MLAlgoTrader4 ай бұрын
Glad you enjoyed
@R8man0122 ай бұрын
@MLAlgoTrader Hi, can I have one question for you please? I am trying to use the Llama3 8b model for text classification. I have about 170k records and 11 categories. The maximum accuracy I was able to achieve was 68%. The data is properly preprocessed, I also used for example the Bert and Roberta model where both models had over 90% accuracy. I would expect better results from a model type like Llama3 8b. I used both 8bit and 4bit quantization (both have similar results) and LoRA. I also played with different hyperparameters, but the results were hardly different. Do you think that, in short, this model might be a bad choice for text classification? Couldn't we discuss this somewhere in a private message with more details? Thank you.
@MLAlgoTrader2 ай бұрын
Hey thanks for the message. I just don't have time since I need to move apartments. Roberta is more direct for classification for it so it isn't that surprising.. I will try to get back to you sometime just don't have time at all now
@md.abusayed67892 ай бұрын
@R8man012 @MLAlgoTrader I am working with a classification problem and using llama3 q-lora. On 10k rows(data) it's performance around 98% accuracy. What I am facing for 10k data it's taking 1 hour (trainable params: 13,639,680 || all params: 7,518,572,544 || trainable%: 0.1814). How do I make it fast and it work for the whole dataset(2.4 million rows) for a reasonable amount of time?
@MLAlgoTrader2 ай бұрын
@@md.abusayed6789 Faster/Stronger GPU or multiple GPUs like the big boys have :)
@licraig76523 ай бұрын
Thank you so much for the toturial, it's so clear. I'm wondering If I can add some context to each training text, such as some explanation of how to classify different sentiment, I don't know if it works, LLMs like llama hava ablebility to understanding context, Maybe it would help, What's your opinion?
@MLAlgoTrader3 ай бұрын
Do you mean like in the prompt describing more clearly what you want for positive neutral and bearish?
@licraig76523 ай бұрын
@@MLAlgoTrader Yes, I would add the prompt at the begining of each text, something like "Classify the text messages to 1. positive, explanation: xxxxxxx. Example: xxxxx 2. negative, explanation:xxxx, example:xxxxxx. the message is "Tesla's market cap soared to over $1 trillion ...""
@MLAlgoTrader3 ай бұрын
For some llms it does better even that way before fine tuning but fine tuning makes that less necessary. Check out deeplearning.ai course on llama index he does it similar to what you suggest
@licraig76523 ай бұрын
@@MLAlgoTrader Thank you so much.❤
@rizmira2 ай бұрын
Hello, thank you so much for this video. Could you please explain how to load the model once it saved please ?
@MLAlgoTrader2 ай бұрын
Hey thanks for nice words! This was so long ago I forget where I have it. Their documentation sucks for this I'll try to find my example but it might take me a few days.
@MLAlgoTrader2 ай бұрын
Hey sorry I said I would get back to you, but moving apartments so might take longer than I expected sorry..
@rizmira2 ай бұрын
@@MLAlgoTraderNo problem, I'll wait! Thank you so much for taking the time to look for it.
@rizmiraАй бұрын
@@MLAlgoTrader Hello ! I hope everything goes well ! I come back to you to know if you know how soon you can find this, my internship ends in a few days but I still can’t load my registered model. Your help would really help me a lot. Thanks again!
@salmakhaled-hn6gw4 ай бұрын
Thank you so much, it is very informative. Could I ask you when will you provide the notebook you worked on?
@MLAlgoTrader4 ай бұрын
Yes the delay is cuz I need a notebook t explain how to get the data
@MLAlgoTrader4 ай бұрын
So I literally was about to share video but I had a bug so needed to restart. Must wait 24 hours due to api limit. So I'll send it 25 hours from now lol!
@@salmakhaled-hn6gw No problem. There are a few more things I left out hopefully we can cover them in another video like loading the model and merging with QlORA weights. Does the part about getting the data make sense? You need that to run the notebook!
@dariyanagashi89584 ай бұрын
Hello! Thank you so much for your tutorial, it is very helpful and easy to follow. I started applying it in on my custom binary dataset, but stumbled on the training step. I get the error with this line of code: labels = inputs.pop("labels").long() KeyError: 'labels' My inputs look like this: ['input_ids', 'attention_mask'] and I don't understand which "labels" are you referring to in that line. If it is not difficult for you, could you explain what it means? I would be most grateful! UPD: I renamed the columns of my dataset to "text" and "labels", and it solved the issue! 😀
@MLAlgoTrader4 ай бұрын
I will get back to you
@MLAlgoTrader4 ай бұрын
Hey sorry haven't gotten to this. Haven't forgot I will look this week sometime just overwhelmed.
@dariyanagashi89584 ай бұрын
@@MLAlgoTrader hi! I actually updated my comment that I found the workaround for that issue, although I still vaguely understand how it helped. Need to read more documentation, I guess. Anyways, thank you for your tutorial, it helped me with my thesis 😊
@MLAlgoTrader4 ай бұрын
Wow very happy to hear!!!
@MLAlgoTrader4 ай бұрын
Your comment made my day. I'll do more videos related to nlp/llm/rag/etc.. soon I hope
@khachapuri_4 ай бұрын
Is there a way to remove attention-mask from Llama-3 to turn it into a giant BERT (encoder-only transformer)?
@MLAlgoTrader4 ай бұрын
Being on 0 sleep I'll quote chatgpt and get back to answering you later lol.... Turning Llama-3 into an encoder-only transformer like BERT, by removing the attention mask, is theoretically possible but involves more than just altering the attention mechanism. Here are the steps and considerations for this transformation:Modify Attention Mechanism: In Llama-3, which is presumably an autoregressive transformer like GPT-3, each token can only attend to previous tokens. To make it behave like BERT, you need to allow each token to attend to all other tokens in the sequence. This involves changing the attention mask settings in the transformer's layers.Change Training Objective: BERT uses a masked language model (MLM) training objective where some percentage of the input tokens are masked, and the model predicts these masked tokens. You would need to implement this training objective for the modified Llama-3.Adjust Tokenizer and Inputs: BERT is trained with pairs of sentences as inputs (for tasks like next sentence prediction), and uses special tokens (like [CLS] and [SEP]) to distinguish between sentences. You would need to adapt the tokenizer and data preprocessing steps to accommodate these requirements.Retraining the Model: Even after these modifications, the model would need to be retrained from scratch or fine-tuned extensively on a suitable dataset because the pre-existing weights were optimized for a different architecture and objective.Software and Implementation: You need to ensure that the transformer library you're using supports these customizations. Libraries like Hugging Face Transformers are quite flexible and might be useful for this purpose.This transformation essentially creates a new model, leveraging the architecture of Llama-3 but fundamentally changing its operation and purpose. Such a project would be substantial and complex but interesting from a research and development perspective.
@khachapuri_4 ай бұрын
@@MLAlgoTrader Thank you so much, appreciate the response! Since its a classification task it makes sense to remove the mask (make it encoder-only) and retrain the model to another objective function. I was just wondering technically how would you remove the mask from llama-3? and maybe also add a feedforward layer? Is it possible to edit the architecture like that?
@amitocamitoc22944 ай бұрын
Interesting!
@MLAlgoTrader4 ай бұрын
Glad you think so!
@azkarathore43553 ай бұрын
Can you make video for fine-tuning llama 3 for machine translation task
@MLAlgoTrader3 ай бұрын
Always down for new idea but I don't know if I can get to that soon..I had an idea to do text summarization which can be done in a similar architecture way to machine translation but different metrics of course.
@MLAlgoTrader3 ай бұрын
What I mean is I am willing but probably can't do it in next two months. What languages were you thinking just wondering?
@azkarathore43553 ай бұрын
@@MLAlgoTrader English to urdu language
@aibutsimple4 ай бұрын
notebook please sir provide me
@MLAlgoTrader4 ай бұрын
It will be available later today. It is just useless if you can't get the data and I can't get data till evening