Zephyr 7B beta - How much does DPO really help?

Рет қаралды 8,474

Sam Witteveen

Күн бұрын

Пікірлер: 26

@Kim-uu8fc Жыл бұрын

I love it, I am using the GGUF version and it's super fast and coherent.

@onoff5604 Жыл бұрын

Very helpful! Fascinating detail on the DPO for improved skills in some areas may have detracted from others (logic/math/reasoning?)

@clray123 Жыл бұрын

I haven't looked at this model in particular, but I believe the most important metric should be the kind of mistakes the model makes in worst case. For example, as impressive as it is that it can generate coherent long stories, a simple "adversarial" test in which it claims some extremely illogical statement is a-ok basically disqualifies it as being "intelligent" in any common sense of the word. Same goes for inconsistency checks, in which two different propositions are given to the model to evaluate, and the results are entirely contradictory (such as claiming yes and no for the same proposition formulated differently). This is the sort of benchmarks we should be applying to the models if we really want to check their "intelligence" - as you would certainly not call smart someone and much less entrust them with any important task if they made catastrophic mistakes on very simple tasks every now and then (the famous "GPT-4 vision can't count 12 apples" test case). The pivotal issue here is that if a model is "able to" produce such wrong outputs, it cannot be entrusted to check for their occurrence either. It's like hiring a baby sitter who is known to have killed some babies occasionally (to watch over another baby sitter). That, and not some SciFi fairy tales about AIs taking over the world is the real "inconvenient truth" about AI safety that we (actually, the purveyors of this tech) should be worried about.

@thetabletopskirmisher 8 ай бұрын

Very true. I keep asking for the AI to revise the scene that they just gave me and they keep regurgitating the same scene in the present tense even after repeated prompts... And they keep going until you actually start the first sentence in the past tense. Oy. AI is a tool. Machine Learning is probably the best description for them right now. No intelligence.

@clray123 8 ай бұрын

@@thetabletopskirmisher I come to believe that the models actually do what their training algorithm told them to do - classify a sequence of symbols based on which symbol was the most frequent as next in the training data set, using a (trained) similarity comparison of individual symbols (in context). There is not really any "quantum leap" or magic generalizations into "reasoning" happening, our perception of these generators'/classifiers intelligence is just our illusion of it, and how good that illusion is maintained (which all vendors admit) is a function of storage capacity (i.e. parameter count), but most importantly of the training data. In that regard the models are still really like the toy "makemore" tutorial of Andrej Karpathy. They make more of what they have been exposed to. And not much else. This also makes sense given the time complexity of generating the next token. They do it in a single pass. There is no room for iterations and recursion found in e.g. traditional planning algorithms (and it is hard to argue that reasoning does not involve planning in some way) We know that the transformer algorithm is not even Turing complete. So why should we expect them to "somehow" acquire skills beyond those theoretical bounds, just on the grounds of having been calibrated with more and more data?

@PankajDoharey Жыл бұрын

The overfitting is causing the model to go haywire on math task but makes it good on writing tasks.

@geoffreygordonashbrook1683 Жыл бұрын

If you have any insights on how custom-training and storing models for Zephyr might look what would be amazing. (E.g. Do the unique ways the model was made make it more difficult to customize?)

@andrewdang3401 Жыл бұрын

Do you have a script to make synthetic dataset in dpo format using rag

@hisyamudin Жыл бұрын

still fail answering this "You are in a dark room with a candle, a wood stove and a gas lamp. You only have one match, so what do you light first?"

@matikaevur6299 Жыл бұрын

depends on what is your most pressing need .. i'd light wood stove first - f'c the light, warmth is probably first priority .. at least where i live ;) unless it's trick question .. then the answer would be "match"

@Nick_With_A_Stick Жыл бұрын

How do these licensing work? Like its released as open source, but it is created using open ai models, which when you use, you agree to not train a model on its outputs. So who would be breaking the law here? Model creator, or model users? Or neither since the supreme court has recently presented generative model outputs can’t be copyrighted?

@Nick_With_A_Stick Жыл бұрын

@@Tony-eo8zz isn’t the tos something you agree to tho? Or it kinda like a store, if you walk into a store and break the rules, they can only kick you out, not like press charges or anything, unless you actually broke the law not the rules?

@TOASTEngineer Жыл бұрын

@@Nick_With_A_Stick In theory it's the former, in practice it's... fuzzy. They can try to sue you, but it's expensive and time consuming and they might not win, and they can't sue everyone. Probably if Meta tried to train a model on GPT4 outputs they'd get sued, but if individual people do the worst that can happen is OpenAI deletes your account and they don't seem to be doing even that.

@Nick_With_A_Stick Жыл бұрын

@@TOASTEngineer thank you! :) that was perfectly summed up.

@Nick_With_A_Stick Жыл бұрын

@@Tony-eo8zz gotcha!

@samwitteveenai Жыл бұрын

It certainly is very fuzzy. I know that OpenAI staff have encouraged corporate users to use their models to actually make datasets (for internal use). I think the ToS actually says you can't use it to compete with them. Also if you are using a dataset made by someone else then technically you didn't agree to the ToS to get that data. So far I don't know of anyone who has been sued by OpenAI for this. It certainly would be an interesting test case.

@angstrom1058 Жыл бұрын

This model keeps asking its own questions after answering. Also, with RAG procedural documents (Step 1 do this, Step 2 do that), it seems to create wrong or confabulated answers. Anyone have suggestions for running these locally with reasonable speed using an NVIDIA GTX 1650 Ti?

@GAGONMYCOREY Жыл бұрын

A 1650 Ti is a pretty old graphics card at this point. You might never see reasonable inference speeds on 16XX, even with performance advancements.

@angstrom1058 Жыл бұрын

@@GAGONMYCOREY. OK, thanks for the reply! I have a Razer with a 3070 w/ 8GB VRAM. That will probably work a bit better.

@samwitteveenai Жыл бұрын

You should be able to run the Ollama version with your 3070

@angstrom1058 Жыл бұрын

Thanks, Sam!!@@samwitteveenai