Parameter Efficient Fine Tuning Methods

Рет қаралды 936

Күн бұрын

Пікірлер: 4

@xspydazx 8 ай бұрын

I have found that tuning is actually an artform: we need to rememeber that the last layer is replaced by this new peft layer : or even proportions of layer weights which will be replaced by the generated peft layer! ie with the lora config various modules are targtted: So with this understanding in hand we can even Retrain a layer or Target specifc layers in the model to be working on: (usefull when you evaluate the output generated at various layers as you can also target specifc parts of the model which is "Storing" unwanted data: (storing is not correct in truth the llayer has these words and sequence as higher probablitys in a wword to word matrx (this is what each layer is holding)).... So when training itss important to train for diferent targetss or you will be essentially Overwriting the model and never making progress with fine tuning sessions : (Which do need to be performed regularly) ...as data is Probable ! so even repeated data is kind to the model as in life some data is heavily emedded ; such a truisims etc : if a truisim is sparse then it willl never surface and other more negative falshoods will rise higher , hence positive and negative sample to match! A Model can be trained for multiple purposes :::: as well as any different models for different purpoes be merged to create a single model :( they will need to be realigned to thier training prompts and dats (they will be quite off but after a few steps will travel towards a stedy line ..... ie not dropping unless the lerning rate is changed: (hence good convergemces into the modal:) : these can be considered to be Multi FIt Models ! ::: they ae fit to multple tasks : this can be from dictionary tasks, ie what is a.... to extract the entitys from this text : to generate a function in xLang to produce a result : the same structured prompt is not used : ie NOT insrtruct Or Chat : as these are mearly place holders for multi purpose task based querys :So we craft customized prompts and give examples of how they should be completed : Hence completions : ie Define Cat : and the model would genrate , a description : a Url Source : a 'picture, and a sound: ... hence given the correct data in the formatted response would be generated by the model: In fact we are mearly showing an onput seq to out seq as well as collecting a bag of words which pertain to a specific arrangement ie the output: SO the place holder we leave will be mask filled by the model or genrated: in fact when the model recognizes this patern if Define > Target it will automaticall remember the templarte it was trained with and generate the data from the traiing set or generate the most probable answrer : hence these rompts can be considered tasks : these tasks become hidden within the model: the lesson to be learned is that ALL structured data should be entered in this way so that the model can use the examples to make prediction in the future: .... Ie: a collection of time over distance calculations : with each variable as a plug in value but the calculation structured correctly in the prompt , this will be used as a THOUGHT template to solve simular shaped data: for enabling full acess to the task a masked model should also be used which requests the missing variables from the user to plug into the calcualtion for the next turn response or response chain (NOT A ONE SHOT) as even a response chain can be a single completion : Hence in Training ! A COMPLEXED PROMPT is important unless your just dumping data into the model with the hopes that it will be used later in some text generation task or sumarization project: So the importance is randomizing or focusing the peft setting for diffierent traing tasks enables for embedding of tasks within the model as well as giving the model the tools to calcualte or generate correct and formatted predictions: Hence designing the CHain of thpoughts you wiish the model to take: I personally have trained amodel in these methods: and the model even self corrects and discusses internally the response : hence the response also may be quite slow sometimes due to the fact that each variable in the conversation does have some form of maping to a hidden function which may not be displayed without the explicit prompt templat: ie thoughts: As i have also framed many conversations with self inside this space : when using a chat template it does not mean that it does not do these calculations UNSEEN!: now we can bvisuallize our preprogrammed processes : LeroyDyer/Mixtral_AI_MasterTron

@xspydazx 8 ай бұрын

when using paremter effecinet fine tuning : YOu DO Select the target modules to be overwritten : the peft is a copy of these modelue which you will replace so although th emodel weight are frozenyour actually tuning the weihts which will be merged into those frozen layer: hence if you target all the hidden layers then the peft will be a copy of thse layer , so you will be fine tuning the last known postions of those weight as the peft creation COPYS thse Targetted Layers! as you will note : TO Generate 1b parameters takes alarge amount of memeory ! so this process is reduced by simply copying the target layers to anew model: ie the peft model !!! hence his is what is trained on top o the frozen model ! and after merged into the parent: but can even be used seperatly (with the base model)

@xspydazx 8 ай бұрын

ie : lora rank 7 / 14 = 18,350,000 paramters, 6/12 = 15,728,000 parameters with (all a QkV,O,Proj layers targetted) ... hence for each session Chaging these setting to touch Different tensors and biases.... after you exhaust your best options : ie you should be training a few models for different tasks side by side ; then you can merge the collection.... enabing for retargetting of the same selection for the next set of training stages : This is important to retain information and transfer skills : as well as devlop a model such as chat gpt : as well as devloping a RAGless model ! its important to underrstand that we are training methodolgies into the model (tasks) and teaching it the examples of how the tasks should be performed as well as how rthey should be calculated (chain of thoughts or chain of functions) hence the model should have these functions internally ie : For entity detection ( after recieveing the input ) its chain of intenal thoughts should be : 1. tokenize text: 2: create a list of entitys from the list using a set of entiy lists (generated internally) then return the detected entitys: It would need to generarte a function to tokenize the text, as well as generate entity lists based on the topic of the text ::: then push the text through the funcitons to produce and output: If the model is given 1000k pretrained examples and overfit on the task (to 0.2 loss) then when testing on unseen datset you will see the model genrrating funcitons and attempting to utilze the functions internally to answer these question : If the model makes mistakes (obvioulsy) it will mean that we will need to give the model 100k examples (loosely fit) , at a loss of 0.9 - 1.2 .... enabling for the model to gather the examples and later recal these examples sucessfully using its known methods: later smaller datsets from the 100k rerhrased in the prompt should be also used to train the model to get closer to the dats (always use a dataset of (1k) to align the model.... hence merge with another model and align ! this methodolgy is call task embedding ! (it works well ) so when training you may find yourself with many 1k alignment datsets (used for after merging (its like a check to see if the model lost past knowledge) .... hence giving it the same old example to remember ... hence even these become frequent memeorys! So we discovered now we can create FREQUENT MEMEORYS! with alignments and overfitting! hence the artform of merging methodolgies! there are things you may ask chat gpt to do and you would expect that these are performed internally ..... but they are not (the system is a scam) ... the front end manages intents discovered from the input query and produces an agent to perform the task: these tasks are basically plugins!!! >> (a neural net needs training and extensions are essentially a Rag of Skills!!).... All can be internal !