WARNING: Bad News for LLM Fine-Tuning

Рет қаралды 6,473

20 күн бұрын

🔗 Links 🔗
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? Gekhman et al.!
arxiv.org/pdf/2405.05904v2
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - ko-fi.com/1littlecoder
🧭 Follow me on 🧭
Twitter - / 1littlecoder
Linkedin - / amrrs

Пікірлер: 64

@unclecode 18 күн бұрын

Hey, as usual, such a good paper you brought up. I read this paper a few days ago, and tbh I am a bit skeptical, let me share my points, like to know yours. First, I totally agree that what fine-tuning is really about. It's not about teaching the model new facts, but helping it access its existing knowledge more efficiently or in specific ways. Like if we want the model to always respond in JSON format, we'd fine-tune for that, we're not teaching it new info, just tweaking how it presents what it knows. Now, I've got three main concerns with this study: 1/ They didn't mention how much of the model they actually fine-tuned. If they used something like LoRA, which is common, they're only training a tiny fraction of a massive model. That's a major flaw in their methodology because fine-tuning a small portion of model with unknown knowledge could just be adding noise to the model's activations, leading to hallucinations. This could invalidate their whole claim. 2/ They only tested on a huge model like PaLM 2-M, which probably has over 340 billion parameters (If I am not wrong). What if the results are totally different for smaller models, like a 7B one? We really need to see this tested on a range of model sizes to draw any meaningful conclusions. 3/ What if they fine-tuned most of the model, like 80%? That'd be more like pre-training, and the results could be way different. They didn't explore this scenario at all. These gaps make me skeptical about how useful or generalizable their findings really are. It feels like they missed some crucial aspects of how fine-tuning actually works in practice. I couldn't find these details in their paper. To be honest, I didn't go through in details, perhaps have to check it again. Kudos to your taste in selecting papers for your channel.

@delojikuhlii1 18 күн бұрын

What do you think about prompt tuning/learning as a solution? It showed good results without finetuning the whole model. Is there some similar study for this approach?

@davidlepold 18 күн бұрын

Promo tuning? U mean "simple" prompt optimization?@@delojikuhlii1

@mirek190 18 күн бұрын

you just described to learn model follow instructions better. That is different learning method.

@unclecode 16 күн бұрын

@@delojikuhlii1 Very good point. In-context learning and prompt tuning are fabulous, with a really interesting impact. I suggest you check out Anthropic Documents; they've done a lot of intriguing research, and their new model 3.5 uses this approach effectively. One community member deciphered the system prompt, showing how much we can improve a model with in-context learning. The best approach is to start with in-context learning and experiment. You might think you're at the edge of the model's ability, but often a new trick works better. If in-context learning hits its limit, then play around with RAG, which is another form of in-context learning, injecting facts and knowledge. When you see the model's issue isn't about knowledge but the response style, it's time for fine-tuning. Many times, a small training dataset is enough to improve the model without causing confusion or hallucinations. LLms are trained to be chatty and provide answers no matter what. So, start with in-context learning, move to RAG, and then fine-tuning. In fine-tuning, consider the kind of data, the amount, and how much of the model parameters you want to fine-tune. Decide which layers to freeze and which to make trainable. There's a lot to discuss, and it's really fun. I suggest everyone explore this, as it helps you understand how these models think and act.

@delojikuhlii1 Күн бұрын

@@mirek190 At the end, isn't fine tuning also adpting the model to do the task you want, just better? What do you understand as its goal?

@Macorelppa 18 күн бұрын

Love your videos 😊

@nocturnomedieval 18 күн бұрын

Great. Therr is also a late june paper appeared in nature journal, semantic entropy applied to detect hallucinations. You can keep it in backlog for calmer weeks.

@fhsp17 18 күн бұрын

Thumbnail: STOP FINE-TUNING. Opens the video. It's discussing a google clickbait title paper. They state way more than they are entitled to by this paper. As if it was a general answer for every use case and every method. It's just for their own controlled closed-book setup using precisely whatever method they used to finetune (which should be meticulously described in the paper for validation, because it's the only one the results are useful for). No more. Lol.

@dlyog 18 күн бұрын

Great work and completely agree

@marcfruchtman9473 17 күн бұрын

I have to say I am not convinced. We would need to see more "examples", where this adversely affects different models. Also, my guess is that the method used to fine tune will have a different effect. Another issue, is that I am not seeing too much in the way of "specifics". I would like to be able to see the example set of all questions with answers (without fine tuning) vs hallucinated responses for the fine trained model to see how it correlates with their definitions of hallucinations.

@1littlecoder 17 күн бұрын

There was another paper about fine-tuning and knowledge forgetting, let me see if I can find it!

@aks8285 18 күн бұрын

This i could correlate with my experience with vision models, and they also perform similar on fine tuning like you said.

@noorahmadharal 18 күн бұрын

How do you find the new papers on LLMs?

@testales 18 күн бұрын

I wonder what the implications of this are for finetuning diffusion models or whether that is a completely different story.

@DB-Barrelmaker 18 күн бұрын

I thought since last year that the miracle of llms was that they managed to understand referencing AKA linguistic pointers. The increase in hallucination upon fine-tuning clearly points to a negative on that front. That means the door is open!

@medirobot96 18 күн бұрын

How to know whether the data we use in fine tuning is unknown to llm or not?

@mirek190 18 күн бұрын

ask model ?

@KevinKreger 17 күн бұрын

I can enhance hallucinations with one ICL example if there is a near void in that space.

@Tony-cw6om 18 күн бұрын

Where can we find similar papers for knowing what's happening and learn new things?

@supreme4256 18 күн бұрын

I wonder it too. How can we know that this is the most update thing we should know about?

@MichaelBarry-gz9xl 18 күн бұрын

Hugging face papers. Arxiv.

@MichaelBarry-gz9xl 18 күн бұрын

Papers with code

@Tony-cw6om 18 күн бұрын

Thanks I'll look at them. Let me know if there are other websites / sources as well

@SonGoku-pc7jl 18 күн бұрын

yes, better example of fine-tunning i see, is for the kind of style of speak, similar to somebody you make fine-tuning for example with alot of transcript interviews. As you said, it serves the style

@Basant5911 18 күн бұрын

Fine tuning create misalignment in weights, hence do it with caution.

@neffex-purpose 17 күн бұрын

@1littlecoder Could you please post a video on DSPy?

@__________________________6910 18 күн бұрын

Ohhh Noooo

@therobotocracy 17 күн бұрын

How about diffusion models? Fine tuning is night and day!

@elon-69-musk 18 күн бұрын

👍

@freeideas 18 күн бұрын

I find this disturbing. How, then, do we give an LLM new knowledge? RAG makes prompt size quite a bit larger and more expensive, and there are a few pieces of information that will be fed to the LLM in the prompt over and over. Seems way more efficient to teach the LLM. One example: baby otters are very good swimmers but they can't dive because too much air is embedded into their fur. This is too obscure for most LLMs to know, but this information will dramatically affect the quality of reasoning about the lives of baby otters. Do I need to feed that plus 1000 other obscure truths into an LLM's prompt every time the LLM is used? Apologies if the answer is already in the video, but it was not clear to my simple mind. :)

@MichaelBarry-gz9xl 18 күн бұрын

Continued pretraining imbues it with new knowledge. Finetuning only affects the "style" i.e the way that it expresses that which it already knows. That being said a mixture of RAG and FT is about as good as you'll get, unless you've got a small fortune to spend

@freeideas 18 күн бұрын

So let's say I want to teach an LLM all about the Star Trek show. If I fine-tune it on the Star Trek wikipedia, the Klingon dictionary, and the transcript of every Star Trek episode and movie, can I at least depend on that model to get Star Trek questions correct most of the time, perhaps at the expense of real world knowledge? Using RAG for this purpose would probably not work well for questions like, "which two species would be most likely to ally against Humans?" Because, given a large number of species, too many vectors would be pulled to feasibly compare every pair. But a good background knowledge might lead to an easy plausible answer.

@MichaelBarry-gz9xl 18 күн бұрын

First, check to see if the knowledge already exists in the LLM. If so, you're good to go. But if it's not already in there it's going to hallucinate like crazy. In depends on whether the new data is in or out of distribution of the previous data (think Venn diagrams) the wider the gap between the "circles" the greater the hallucinations. There's a good chance models like llama 3 already have what you want considering the sheer volume of pretraining data. But let's pretend it knows nothing about star trek. Then you implement a vector database and use FT to teach it how to pull the data out of the vector database. It will be stuck with the reasoning abilities it already has, and you're effectively specialising it into one task at the expense of others, but it will do the trick. But will won't have "depth of understanding" it will basically be Google on steroids. But if the data is in distribution (overlapping Venn diagrams) then it will have a deep "depth of knowledge". That's the difference between pretraining (deeply inter-connected neurons) vs finetuning (a shallow layer of loosely connected neurons)

@MichaelBarry-gz9xl 18 күн бұрын

@@freeideasalso the best data by far, would be everything you mentioned PLUS scrape every website you can find where people talk about star trek I. E chat logs. You may think those chat logs and forums are low quality but believe me unstructured data is far superior to structured wiki like data. To have both is even better

@freeideas 18 күн бұрын

@@MichaelBarry-gz9xl yes, that's good enough; at least i can teach an LLM knowledge about Star Trek, possibly causing it to forget knowledge about Star Wars and the real world. Of course we are both theorizing and speculating until i actually try it, but at least you made me feel better about trying. :)

@ChristianNode 18 күн бұрын

just fully retrain the model on the new data.

@Cat-vs7rc 18 күн бұрын

Fine-tuning is just additional pre training.

@MichaelBarry-gz9xl 18 күн бұрын

No, far from it

@Cat-vs7rc 17 күн бұрын

@@MichaelBarry-gz9xl why is it far from it?

@MichaelBarry-gz9xl 17 күн бұрын

@@Cat-vs7rc They're completely different. After pretraining the model is imbued with knowledge, but it spits out random garbage. Finetuning is for showing it how to format it's outputs correctly. Otherwise it just goes off on a tanjent.

@Cat-vs7rc 16 күн бұрын

@@MichaelBarry-gz9xl But the video says don't finetune ;)

@MichaelBarry-gz9xl 16 күн бұрын

@@Cat-vs7rcAt this point I can't remember what the video says in it's entirety without rewatching, I doubt he says not to fine tune, rather I think he is expressing what finetuning is and what it isn't. When to use it and when not.

@user-yg2qv4kf4r 18 күн бұрын

We don't have proper way to build generative AI.

@ArtisanTony 18 күн бұрын

My experience is that the data you fine tune with is amalgamated (blended) so that you cannot get the exact responses you want.

@AA-wp8pp 18 күн бұрын

i am gonna start writing these shitty papers LMFAO... It never happend to my models, dont finetune it for a and ask it b

@AA-wp8pp 18 күн бұрын

also halusination is worse with rag... also pappers about that so no need for writting one myself lol

@1littlecoder 18 күн бұрын

Should we do author 😁😁😁

@msokokokokokok 18 күн бұрын

This is a shitty paper. Fine tuning only works to re-orchestrate prior skills. New skills can not be learnt in fine tuning. Try answering in french on a english pre trained model. It will not only screw up french but also english.

@1littlecoder 17 күн бұрын

The paper says exactly what you said, why do you think it's not a good paper?