New Trick for Fine-Tuning LLMs

  Рет қаралды 2,460

code_your_own_AI

code_your_own_AI

Ай бұрын

The study by Google Research investigates the impact of fine-tuning large language models (LLMs) with new factual knowledge and its potential to induce hallucinations. Specifically, it explores whether introducing new, previously unknown information during the fine-tuning process leads to the generation of factually incorrect responses by the models. The concern is that LLMs might learn to generate information that is not grounded in their pre-existing knowledge, increasing the likelihood of hallucinations.
A. Methodology
The researchers designed a controlled setup focused on closed-book question answering (QA) to study the effect of new knowledge. They varied the proportion of fine-tuning examples that introduced new knowledge (Unknown examples) versus those consistent with the model's pre-existing knowledge (Known examples). The methodology involved:
Dataset Construction: Using ENTITYQUESTIONS, which consists of factual triplets from Wikidata converted into QA pairs.
Categorization: Introducing a hierarchical system (SliCK) to classify fine-tuning examples into four categories based on the model's knowledge: HighlyKnown, MaybeKnown, WeaklyKnown, and Unknown.
Evaluation: Measuring the model's performance on test sets with varying proportions of Unknown examples, and analyzing the impact on hallucinations and knowledge integration.
B. Main Results and Insights
The study yielded several significant findings:
Integration of New Knowledge: LLMs struggle to integrate new factual knowledge through fine-tuning. Unknown examples are learned significantly slower than Known examples, indicating difficulty in incorporating new information.
Induction of Hallucinations: As the model learns new knowledge through fine-tuning, there is a linear increase in its tendency to hallucinate. This suggests that exposure to new knowledge can indeed encourage the generation of factually incorrect responses.
Role of Early Stopping: Implementing early stopping during fine-tuning minimizes the risk of hallucinations. This approach prevents the model from overfitting to Unknown examples, which are primarily responsible for inducing hallucinations.
Importance of MaybeKnown Examples: Fine-tuning with a mix of Known categories, particularly MaybeKnown examples, enhances the model's ability to utilize its pre-existing knowledge effectively. This balanced approach yields better overall performance compared to fine-tuning solely on HighlyKnown examples.
C. Insights
The study provides crucial insights into the fine-tuning process of LLMs:
Risk Management: Introducing new factual knowledge during fine-tuning carries the risk of increased hallucinations. To mitigate this, strategies such as early stopping and filtering out Unknown examples can be effective.
Knowledge Utilization: LLMs primarily acquire factual knowledge during pre-training, while fine-tuning is more effective for optimizing the use of this knowledge rather than integrating new facts.
Practical Implications: For practical applications, it is essential to carefully design fine-tuning datasets and monitor training dynamics to balance the benefits of new knowledge with the risks of hallucinations.
In summary, the study highlights the challenges and risks associated with fine-tuning LLMs on new knowledge, emphasizing the need for careful management of the fine-tuning process to maintain model accuracy and reliability.
[text generated by GPT4o]
All rights w/ authors:
arxiv.org/pdf/2405.05904
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
#airesearch
#newtech
#insights

Пікірлер: 27
@wdonno
@wdonno 29 күн бұрын
Ha!!! You have always been telling us to blend ‘new’ data with some ‘old’ data when conducting SFT! Your intuition was spot on. You also always reminded us to mind the format of the new data, to match as much as possible the format of the original training data.
@borisguarisma8810
@borisguarisma8810 29 күн бұрын
Wow! I need to watch this again while reading the paper...thank you!
@code4AI
@code4AI 29 күн бұрын
Glad it was helpful!
@AdamBrusselback
@AdamBrusselback 29 күн бұрын
This is interesting. When ive been SFT new tasks, I was originally having problems getting models to learn the output with just an input + output example. I noticed much, much better performance on the final task when I augmented the training data to include answering questions about the input format, pulling out specific data points from it, generating intermediate representation, etc.
@code4AI
@code4AI 29 күн бұрын
Great observation. However with closed LLMs, or so called "open" ones but without any transparency of what their pre-training dataset included, global corporations being afraid of legal implications based on copyright violations .... we have no chance of really optimizing for a coherent fine-tuning dataset. Damn it.
@i_accept_all_cookies
@i_accept_all_cookies 29 күн бұрын
I've been fine-tuning SLMs like TinyLlama, Phi 2, and Gemma 2b. This might explain some of the accuracy variance I've been seeing.
@desmur36
@desmur36 29 күн бұрын
If this holds, this implies we need to sequence our training data in a format that scaffolds the model from lightly known to known knowledge. Intuitively this makes sense. Most students learn through a process of building on known concepts that are easy to grasp, then expanding that to more advanced topics using that base knowledge as a foundation. It also begs the question what was the sequence in the pretraining dataset? Was that carefully curated? And how would organize the internet from fundamental to advanced concepts? I think we got lucky with research papers because they always follow this sequence of known to new knowledge.
@kimcosmos
@kimcosmos Күн бұрын
not in physics. There the general learning is always deceptively simplified. And every year you unlearn the previous years simplifications
@kimcosmos
@kimcosmos Күн бұрын
can't you use a domain specific causal reasoning dataset filtering of LLM hallucinations in CoT for your SFT?
@milindgaharwar827
@milindgaharwar827 28 күн бұрын
It seems generally reasonable that 'new data + conceptually related known data' should lead to fewer hallucinations - when compared to only new data or new data + conceptually unrelated known data. It would probably not make a big difference IF there were a mechanism in the model architecture itself to find common patterns in different learnt concepts. Do please share if you are aware of any such research direction.
@zekehobbs7738
@zekehobbs7738 27 күн бұрын
At what volume of new tokens does this break down? Ie 1k, 10k, 100k, 1M, 10M etc.
@kishoretvk
@kishoretvk 29 күн бұрын
so pre-training of a LLM ? can we do it with a 7b or 8b model ? can we further fine tune a pre trainled LLM and avoid this ?
@code4AI
@code4AI 29 күн бұрын
Some argue, that fine-tuning is just a continuous pre-training, kind of. IF we have an open source LLM, where we know all the pre-training datasets and formats and complexities... then we might have a chance to create an additional coherent fine-tuning dataset. With closed LLMs however.... No chance.
@gileneusz
@gileneusz 29 күн бұрын
23:15 ICL is high compute demanding... with longer prompts you will get slow prompt processing...
@code4AI
@code4AI 29 күн бұрын
Not in parallel processing like ring attention
@proterotype
@proterotype 29 күн бұрын
I wonder if combining Fine Tuning with RAG would solve this
@code4AI
@code4AI 29 күн бұрын
No. We need our fine- tuned LLMs within our active agents when complex external info (RAx) is returned to the LLM.
@proterotype
@proterotype 29 күн бұрын
@@xspydazx interesting stuff! I understand you may not have personally trained an LLM on embeddings, but have you done the type of workflow in your first, longer comment? If so, how well have you witnessed it working, that is to say, how accurate are the results of the method you outline in your first longer comment
@gileneusz
@gileneusz 29 күн бұрын
I think you missed the point that this paper is about dense models like Llama 3, which is trained on large amount of tokens, this will not appear as much for models that are not as dense as Llama 3
@code4AI
@code4AI 29 күн бұрын
Smile. The formulation "it will not appear as much ..." Is hopeful, but do we have any validated data on this?! Why should MoE be immune, and if "maybe", to what degree?
@gileneusz
@gileneusz 29 күн бұрын
@@code4AI I have no idea, all these fine tuning stuff is just pure experimental. Which is good, we are still learning
@gileneusz
@gileneusz 29 күн бұрын
25:38 pre training is too expensive, but if you split your knowledge into many AI models, you can train smaller models and it would be much cheaper...
@code4AI
@code4AI 29 күн бұрын
Look at the performance of Snowflake Arctic 128x3.66B. Any questions left?
@gileneusz
@gileneusz 29 күн бұрын
@@code4AI that's opposite spectrum. Snowflake suffer because 3.66B models are just undertrained there.
@propeacemindfortress
@propeacemindfortress 29 күн бұрын
🤔
@user-ec3rm9wr1n
@user-ec3rm9wr1n 29 күн бұрын
No one except the British 🤧🤢....
New xLSTM explained: Better than Transformer LLMs?
22:33
code_your_own_AI
Рет қаралды 4,9 М.
RAG explained step-by-step up to GROKKED RAG sys
59:31
code_your_own_AI
Рет қаралды 3,6 М.
New Gadgets! Bycycle 4.0 🚲 #shorts
00:14
BongBee Family
Рет қаралды 16 МЛН
When Steve And His Dog Don'T Give Away To Each Other 😂️
00:21
BigSchool
Рет қаралды 16 МЛН
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 248 М.
LLM - Reasoning SOLVED (new research)
47:51
code_your_own_AI
Рет қаралды 12 М.
New Discovery: LLMs have a Performance Phase
29:51
code_your_own_AI
Рет қаралды 12 М.
No more Fine-Tuning: Unsupervised ICL+
31:09
code_your_own_AI
Рет қаралды 4,6 М.
In-Context Learning: EXTREME vs Fine-Tuning, RAG
21:42
code_your_own_AI
Рет қаралды 3,4 М.
Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework
1:00:40
RoPE Rotary Position Embedding to 100K context length
39:56
code_your_own_AI
Рет қаралды 1,8 М.
Fine tuning LLMs for Memorization
46:51
Trelis Research
Рет қаралды 7 М.
LLM Fine Tuning Crash Course: 1 Hour End-to-End Guide
1:21:01
AI Anytime
Рет қаралды 42 М.
BEST RAG you can buy: LAW AI (Stanford)
19:12
code_your_own_AI
Рет қаралды 2,8 М.
iPhone 15 Unboxing Paper diy
0:57
Cute Fay
Рет қаралды 3,4 МЛН
i like you subscriber ♥️♥️ #trending #iphone #apple #iphonefold
0:14
iPhone 15 Pro vs Samsung s24🤣 #shorts
0:10
Tech Tonics
Рет қаралды 13 МЛН
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 8 МЛН
APPLE совершила РЕВОЛЮЦИЮ!
0:39
ÉЖИ АКСЁНОВ
Рет қаралды 251 М.