Fine tuning LLama 3 LLM for Text Classification of Stock Sentiment using QLoRA

Рет қаралды 5,420

Күн бұрын

Пікірлер: 70

@MLAlgoTrader 4 ай бұрын

Code Here including short explanation on how to get dataset. github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb

@andrewbritt3117 4 ай бұрын

Hello, thanks for the really informative walkthough. I was looking to go back through your notebook for further review, however the notebook no longer available from the link.

@MLAlgoTrader 4 ай бұрын

@@andrewbritt3117 github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb

@andrewbritt3117 4 ай бұрын

@@MLAlgoTrader thanks!

@ranxu9473 4 ай бұрын

thanks dude that's very useful for me.

@MLAlgoTrader 4 ай бұрын

@ranxu9473 thank you

@liberate7604 2 ай бұрын

Thank you, finally a proper instruction for classification on KZbin🥇🥇🥇🥇

@MLAlgoTrader 2 ай бұрын

Thanks for kind words!!. Feel free to share video if you can lol since KZbin Algo hates me. I'll do more llm stuff

@Daniel-fm7yh 23 күн бұрын

Finally a legit document classification tutorial - thank you! what a legend!

@MLAlgoTrader 23 күн бұрын

Wow thanks for nice comments ! Share video lol 🤣 I want to make more such videos but am looking for the right topic. I am thinking rlhf or something up for suggestions

@divanabilaramdani8818 3 ай бұрын

thank you so much, ive watched the whole video and it helps me alot

@MLAlgoTrader 3 ай бұрын

Thanks a lot lol it is my best video haha 😂 Please share if you can .

@am7-p 4 ай бұрын

Once again, thank you for the informative channel and sharing this video.

@MLAlgoTrader 4 ай бұрын

Thanks I thought you guys on average didn't like LLM videos lol. My click through rate is low so it makes me happy you say that.

@am7-p 4 ай бұрын

@@MLAlgoTrader What is click through rate ?

@am7-p 4 ай бұрын

@@MLAlgoTrader Also, please consider that knowing what you are working on helps me to plan for the next steps of my development Currently, I use and pay for OpenAI API, but I do plan for implementing a LLama in my home-lab. Once I start to learn and practice LLama, I will go through your videos again.

@MLAlgoTrader 4 ай бұрын

This was is small like

@MLAlgoTrader 4 ай бұрын

Honestly, it is completely random. My next videos are on sequential bootstrap, implementing a gap trading strategy both with stocks and with options, the dangers of backtesting, and then I also plan to do ib_insnyc for begginers. ...I think it llama 3 8b params works free version of colab for a bit until you get kicked of gpu. There is also this api I used I think you get quite a bit free at first. docs.llama-api.com/quickstart .

@최용빈-g3k Ай бұрын

Thank you so much:) I have a question! when doing fine-tuning with sequence classification head, why don't you use apply_chat_template? Is there any reason?👀

@MLAlgoTrader Ай бұрын

Hey sorry I didnt' response I didn't have time since I was moving. Let me try to remember what on earth I did and get back to you lol

@michelejoshuamaggini3822 3 ай бұрын

Thank you for your video! I have the following question for you: when you're making predictions before fine-tuning the model, are you evaluating the capabilities of the model with a zero-shot learning?

@MLAlgoTrader 3 ай бұрын

Not exactly. You more of add a last linear layer then a classification layer so those weights are not yet trained.

@michelejoshuamaggini3822 3 ай бұрын

@@MLAlgoTrader in that case how could I implement this linear layer then a classification layer in your code? I'm interested in comparing 0 shot and few shot learning with this model.

@MLAlgoTrader 3 ай бұрын

@@michelejoshuamaggini3822 So this automatically adds these layers. To be precise. "The LLaMa Model transformer with a sequence classification head on top (linear layer). LlamaForSequenceClassification uses the last token in order to do the classification, as other causal models (e.g. GPT-2) do. Since it does classification on the last token, it requires to know the position of the last token. If a pad_token_id is defined in the configuration, it finds the last token that is not a padding token in each row. If no pad_token_id is defined, it simply takes the last value in each row of the batch. Since it cannot guess the padding tokens when inputs_embeds are passed instead of input_ids, it does the same (take the last value in each row of the batch). " See huggingface.co/docs/transformers/main/en/model_doc/llama2 I don't have time now, but I can show you in code sometime how you would see it.

@MLAlgoTrader 3 ай бұрын

@@michelejoshuamaggini3822 As for the 0 shot classification, that is something like a car is something with 4 wheels and a motorcycle is something with 2 can you please classify car or motorcycle? What I do here is like 0 shot classification for sentiment analysis kzbin.info/www/bejne/pH6uiqiJdrGgpsU

@AfekShusterman-x3f 4 ай бұрын

great video mate! loved it

@MLAlgoTrader 4 ай бұрын

Glad you enjoyed

@R8man012 2 ай бұрын

@MLAlgoTrader Hi, can I have one question for you please? I am trying to use the Llama3 8b model for text classification. I have about 170k records and 11 categories. The maximum accuracy I was able to achieve was 68%. The data is properly preprocessed, I also used for example the Bert and Roberta model where both models had over 90% accuracy. I would expect better results from a model type like Llama3 8b. I used both 8bit and 4bit quantization (both have similar results) and LoRA. I also played with different hyperparameters, but the results were hardly different. Do you think that, in short, this model might be a bad choice for text classification? Couldn't we discuss this somewhere in a private message with more details? Thank you.

@MLAlgoTrader 2 ай бұрын

Hey thanks for the message. I just don't have time since I need to move apartments. Roberta is more direct for classification for it so it isn't that surprising.. I will try to get back to you sometime just don't have time at all now

@md.abusayed6789 2 ай бұрын

@R8man012 @MLAlgoTrader I am working with a classification problem and using llama3 q-lora. On 10k rows(data) it's performance around 98% accuracy. What I am facing for 10k data it's taking 1 hour (trainable params: 13,639,680 || all params: 7,518,572,544 || trainable%: 0.1814). How do I make it fast and it work for the whole dataset(2.4 million rows) for a reasonable amount of time?

@MLAlgoTrader 2 ай бұрын

@@md.abusayed6789 Faster/Stronger GPU or multiple GPUs like the big boys have :)

@licraig7652 3 ай бұрын

Thank you so much for the toturial, it's so clear. I'm wondering If I can add some context to each training text, such as some explanation of how to classify different sentiment, I don't know if it works, LLMs like llama hava ablebility to understanding context, Maybe it would help, What's your opinion?

@MLAlgoTrader 3 ай бұрын

Do you mean like in the prompt describing more clearly what you want for positive neutral and bearish?

@licraig7652 3 ай бұрын

@@MLAlgoTrader Yes, I would add the prompt at the begining of each text, something like "Classify the text messages to 1. positive, explanation: xxxxxxx. Example: xxxxx 2. negative, explanation:xxxx, example:xxxxxx. the message is "Tesla's market cap soared to over $1 trillion ...""

@MLAlgoTrader 3 ай бұрын

For some llms it does better even that way before fine tuning but fine tuning makes that less necessary. Check out deeplearning.ai course on llama index he does it similar to what you suggest

@licraig7652 3 ай бұрын

@@MLAlgoTrader Thank you so much.❤

@rizmira 2 ай бұрын

Hello, thank you so much for this video. Could you please explain how to load the model once it saved please ?

@MLAlgoTrader 2 ай бұрын

Hey thanks for nice words! This was so long ago I forget where I have it. Their documentation sucks for this I'll try to find my example but it might take me a few days.

@MLAlgoTrader 2 ай бұрын

Hey sorry I said I would get back to you, but moving apartments so might take longer than I expected sorry..

@rizmira 2 ай бұрын

@@MLAlgoTraderNo problem, I'll wait! Thank you so much for taking the time to look for it.

@rizmira Ай бұрын

@@MLAlgoTrader Hello ! I hope everything goes well ! I come back to you to know if you know how soon you can find this, my internship ends in a few days but I still can’t load my registered model. Your help would really help me a lot. Thanks again!

@salmakhaled-hn6gw 4 ай бұрын

Thank you so much, it is very informative. Could I ask you when will you provide the notebook you worked on?

@MLAlgoTrader 4 ай бұрын

Yes the delay is cuz I need a notebook t explain how to get the data

@MLAlgoTrader 4 ай бұрын

So I literally was about to share video but I had a bug so needed to restart. Must wait 24 hours due to api limit. So I'll send it 25 hours from now lol!

@MLAlgoTrader 4 ай бұрын

Code: github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb

@salmakhaled-hn6gw 4 ай бұрын

@@MLAlgoTrader Thank you so much🙏

@MLAlgoTrader 4 ай бұрын

@@salmakhaled-hn6gw No problem. There are a few more things I left out hopefully we can cover them in another video like loading the model and merging with QlORA weights. Does the part about getting the data make sense? You need that to run the notebook!

@dariyanagashi8958 4 ай бұрын

Hello! Thank you so much for your tutorial, it is very helpful and easy to follow. I started applying it in on my custom binary dataset, but stumbled on the training step. I get the error with this line of code: labels = inputs.pop("labels").long() KeyError: 'labels' My inputs look like this: ['input_ids', 'attention_mask'] and I don't understand which "labels" are you referring to in that line. If it is not difficult for you, could you explain what it means? I would be most grateful! UPD: I renamed the columns of my dataset to "text" and "labels", and it solved the issue! 😀

@MLAlgoTrader 4 ай бұрын

I will get back to you

@MLAlgoTrader 4 ай бұрын

Hey sorry haven't gotten to this. Haven't forgot I will look this week sometime just overwhelmed.

@dariyanagashi8958 4 ай бұрын

@@MLAlgoTrader hi! I actually updated my comment that I found the workaround for that issue, although I still vaguely understand how it helped. Need to read more documentation, I guess. Anyways, thank you for your tutorial, it helped me with my thesis 😊

@MLAlgoTrader 4 ай бұрын

Wow very happy to hear!!!

@MLAlgoTrader 4 ай бұрын

Your comment made my day. I'll do more videos related to nlp/llm/rag/etc.. soon I hope

@khachapuri_ 4 ай бұрын

Is there a way to remove attention-mask from Llama-3 to turn it into a giant BERT (encoder-only transformer)?

@MLAlgoTrader 4 ай бұрын

Being on 0 sleep I'll quote chatgpt and get back to answering you later lol.... Turning Llama-3 into an encoder-only transformer like BERT, by removing the attention mask, is theoretically possible but involves more than just altering the attention mechanism. Here are the steps and considerations for this transformation:Modify Attention Mechanism: In Llama-3, which is presumably an autoregressive transformer like GPT-3, each token can only attend to previous tokens. To make it behave like BERT, you need to allow each token to attend to all other tokens in the sequence. This involves changing the attention mask settings in the transformer's layers.Change Training Objective: BERT uses a masked language model (MLM) training objective where some percentage of the input tokens are masked, and the model predicts these masked tokens. You would need to implement this training objective for the modified Llama-3.Adjust Tokenizer and Inputs: BERT is trained with pairs of sentences as inputs (for tasks like next sentence prediction), and uses special tokens (like [CLS] and [SEP]) to distinguish between sentences. You would need to adapt the tokenizer and data preprocessing steps to accommodate these requirements.Retraining the Model: Even after these modifications, the model would need to be retrained from scratch or fine-tuned extensively on a suitable dataset because the pre-existing weights were optimized for a different architecture and objective.Software and Implementation: You need to ensure that the transformer library you're using supports these customizations. Libraries like Hugging Face Transformers are quite flexible and might be useful for this purpose.This transformation essentially creates a new model, leveraging the architecture of Llama-3 but fundamentally changing its operation and purpose. Such a project would be substantial and complex but interesting from a research and development perspective.

@khachapuri_ 4 ай бұрын

@@MLAlgoTrader Thank you so much, appreciate the response! Since its a classification task it makes sense to remove the mask (make it encoder-only) and retrain the model to another objective function. I was just wondering technically how would you remove the mask from llama-3? and maybe also add a feedforward layer? Is it possible to edit the architecture like that?

@amitocamitoc2294 4 ай бұрын

Interesting!

@MLAlgoTrader 4 ай бұрын

Glad you think so!

@azkarathore4355 3 ай бұрын

Can you make video for fine-tuning llama 3 for machine translation task

@MLAlgoTrader 3 ай бұрын

Always down for new idea but I don't know if I can get to that soon..I had an idea to do text summarization which can be done in a similar architecture way to machine translation but different metrics of course.

@MLAlgoTrader 3 ай бұрын

What I mean is I am willing but probably can't do it in next two months. What languages were you thinking just wondering?

@azkarathore4355 3 ай бұрын

@@MLAlgoTrader English to urdu language