What Happens When You Use Prolog to Enhance LLMs?

Рет қаралды 1,963

Future Is Amazing

Күн бұрын

Пікірлер: 36

@SteveRowe Ай бұрын

I'm glad you did the experiments with Prolog. Good first-principles research. Publish and keep up the good work.

@FutureIsAmazing569 Ай бұрын

I'm not in academia right now, my publishing days are long forgotten, but thanks!

@juandesalgado 28 күн бұрын

These are great ideas, I hope you can continue developing them further. Planning problems are a possible follow-up, though those tend to combinatorially explode when treated as search problems, which is what Prolog code would probably do.

@FutureIsAmazing569 28 күн бұрын

Oh, that's exactly what I have for one of my next videos. I use ASP to solve word problems. (but it's not very exciting at the moment, not sure it will see the light of day). And yep, word problems ( or something like Sudoku ), is doable by Prolog, but quickly explodes

@ioannischrysochos7737 Ай бұрын

LLMs are much better to give prolog without errors than other languages. The drive is to combine LLM with Symbolic logic. The chain of thought can use external symbolic logic. We must expect to see such things in the future.

@Salveenee Ай бұрын

100% agreed

@FutureIsAmazing569 Ай бұрын

@@ioannischrysochos7737 it does feel like the future you’re talking about has already arrived in the form of o1. It does feel like this might be already be present in it.

@ubit123 Ай бұрын

@@FutureIsAmazing569 there is a difference between statistics and formal logic. In some cases you need to be sure that answer is correct, on most of the cases 99% correctness will suffice.

@adrianojordao4634 Ай бұрын

Prolog is more exciting that llms. But nobody knows prolog, or logics. Wrong time. But defenitly a part of agi what ever that is.

@vitalyl1327 Ай бұрын

@@adrianojordao4634 Prolog works well with LLMs both ways - not just Prolog generated by LLMs, but Prolog execution traces explained to LLMs the way they can understand. There are some potentially interesting "explainable prolog" attempts out there, check out pyexpert Python package for example.

@MarcoServetto Ай бұрын

One way that I've found interesting when I ask it to write code is to do the following: Generate a version of code On separate chats ask to discuss (a) why this code is right (b) we know as a fact that there is a mistake in this code in line 1. Explain why. (c) we know as a fact that there is a mistake in this code in line 2. Explain why. ///and so on (we can skip lines with no meaningful code on) (AA) here is a bunch of discussion about this code. Rank them and list the ones that are more correct. (BB) here is some code and a discussion on why it is wrong. Fix the code Rinse and repeat. Of course if you use a language with a type system you can also compile the code and provide the error messages in the mix.

@FutureIsAmazing569 Ай бұрын

This works wonderfully, when you can hold your finger on the pulse, and have clear objectives. Doing this during automation steps is a bit more challenging. So my intuition is to always try to get away with a zero-shot or (if available) one-shot prompting - just so I can automate this easier down the line. But you're right, especially for code generation, that might not be enough and we have to resort to complicated tactics, like the example you've given.

@Dron008 Ай бұрын

Wow, that is really interesting idea, I think it can be used somehow.

@KCM25NJL Ай бұрын

I tried this initial problem you gave at the start of the video with 4o, o1-mini and then o1-preview. The first 2 stated the exact same thing..... essentially they concluded that Alice was inclusive of the number of Sisters instead of +1. Preview on the other hand got the correct answer. When I queried in the same context as all three questions why the first 2 questions were wrong (asking o1-Mini), it suggested that it should have checked the Problem Statement for any ambiguity prior to giving an unambiguous response. It was only when I said that the ambiguity lay in the errant interpretation of the problem statement, that the problem statement did not have any grammatical ambiguity..... o1-mini acquiesced and admitted it's fault. It would seem that even with CoT and reflection built in, the scaling laws still apply for accuracy.

@FutureIsAmazing569 Ай бұрын

This is explored in quite an old (January 2023) Wei et al.'s paper.:"Chain-of-Thought Prompting Elicits Reasoning in Large Language Models" arxiv.org/pdf/2201.11903 "That is, chain-of-thought prompting does not positively impact performance for small models, and only yields performance gains when used with models of ∼100B parameters." o1-mini is exactly 100B, while o1-preview is 300B, so you're absolutely right, scaling laws do apply

@VictorGallagherCarvings Ай бұрын

What a great idea ! Could this approach be used with smaller models ?

@FutureIsAmazing569 Ай бұрын

I did not try Prolog with smaller models, but I would suspect they should be good at it. Great idea to try later, thanks!

@franzwollang Ай бұрын

I've thought for many years now that the eventual *true* union between programming and AI will be reached when AI models are somehow built as complex sets of fuzzy predicates and can thus seamlessly merge their internal fuzzy logic representations with statements in a logical programming language (e.g. Prolog), creating a generally homoiconic system. This would give them a way to apply complex, fuzzy pattern matching where beneficial or efficient, strict pattern matching where beneficial. And best of all, everything the AI system would do or think is automatically interpretable because the fuzzy atoms could be mapped to specific localized regions (by definition of what an atom is) of the approximate data manifold the system learns when ingesting data, identifying the atoms, and distilling predicates. If we could then build the logical programming language as a layer on top of a functional programming language to implement any imperative logic required... and build the functional language on top of a low-level systems language to implement the abstract data types, mapping to various hardware idiosyncrasies, and hardware optimizations... and preserve the ability at each language layer to reach down to lower layers for more control when necessary --that would be even more elegant. And if we could build the functional and low-level languages incorporating techniques to expose facets of those languages in a form that can be transformed into fuzzy logic (i.e. vectorizing the call graph using graph sketches, exposing the mapping from the low-level language AST to assembly code such that the AI could execute a guided evolutionary optimization algorithm to adapt and optimize itself to new hardware automatically --especially important as hardware becomes insanely complex with tons of non-linear self-interactions and/or incorporates biological elements) would be even more elegant. Ok, sorry for the rant. I like your idea to mix Prolog with an LLM! It is a very good intuition.

@FutureIsAmazing569 Ай бұрын

Thanks a lot for sharing your rant :) Blending AI models with fuzzy logic, plus integrating with logical languages is amazing new area to study. Hey, if we sprinkle some Qubits in there, consciousness is guaranteed!

@franzwollang Ай бұрын

@@FutureIsAmazing569 Never go full Deepak Chopra, my friend! Quantum computing can only (ever?) speed up specific algorithms by a quadratic factor. Quantum processing units (QPUs?) will be like GPUs or TPUs or hardware noise samplers in computers --that is, task specific accelerators. Thanks for your video!

@FutureIsAmazing569 Ай бұрын

But I'm not going full Deepak Chopra :), Just a bit of Sir Roger Penrose!

@johanndirry Ай бұрын

Not sure if Prolog code is the best approach, since it is very limited in what kind of problems it can solve. I was experimenting with GPT4 restating the problem as a graph and solving the problem in Python using graph algorithms. However, o1-preview made that approach obsolete too.

@FutureIsAmazing569 Ай бұрын

I agree - even simple word puzzles are quite difficult to do in Prolog ( Unless you're using a special library, like bibmm.pl). Something like MiniZinc is way better at it. I chose Prolog for this project, since I found GPT-4o is quite good at writing Prolog code. But yep, you're right, o1-preview makes almost every logic enhancement obsolete

@andychristianson490 Ай бұрын

Can you do a video on doing similar, but with SAT solvers? E.g. generate Alloy.

@FutureIsAmazing569 Ай бұрын

Yep, I thought of doing a video specifically for logic puzzles with Z3. But from what I've already tried, LLMs are way worse in generating Z3 (also tried ASP), compared to Prolog. I think that might be due to sheer amount of training data available on the wild, which LLMs were exposed to. Did not try Alloy, maybe I'll try aggregating various reasoning systems in one video. I also have an idea to pick one and fine-tune LLama3 on it to the max.

@vitalyl1327 Ай бұрын

@@FutureIsAmazing569 they're ok in generating Z3 code if you do it step by step and via a code analysis feedback loop - like you should with any other language.

@FutureIsAmazing569 Ай бұрын

you’re right, but any non one-shot prompting would have introduced additional complications to an already complicated multi-step process. While Prolog seems quite fine even with one-shot

@vitalyl1327 Ай бұрын

@@FutureIsAmazing569 I'm mostly using small local models, so many-shot is a default even with Prolog and Datalog. It's not too hard, and having an unlimited feedback loop improves model performance many-fold, so it's a good idea in general to do it with any tool. Another nice thing with local models is that you can do inference harnessing - nudge the model to select only tokens that form correct syntax, and provide a very tight feedback loop for tool usage. Even if you're getting an ok Prolog most of the time with one-shot, it's never guaranteed to be ok for all cases, so a feedback loop is needed even for the very large and powerful models anyway.

@FutureIsAmazing569 Ай бұрын

@@vitalyl1327 thanks for the insight. The monster PC I use to run local models has been idle lately, since it’s been quite hot lately. I should get back to doing just local models in a week! I agree that feedback loop should be default for such tasks

@timseguine2 Ай бұрын

I don't see a reason why you can't use gpt-o1 as the base model for this approach. Considering the model is also apparently better at coding. It seems like it might then also be able to generate correct prolog code for more complex problems.

@FutureIsAmazing569 Ай бұрын

gpt-o1 will certainly perform perfectly for this approach. The only problem I had is that o1 chain of thought reasoning was beating the Alice in the Wonderland+, so there was no point of improving it. But you're absolutely right, the approach is still valid. Just as the next paper comes out, which poses a problem o1 can't solve, I will be back at it!

@szebike Ай бұрын

Maybe OpenAI were "inspired" by users like you etc. I assume they take a lot of freedom of the interpretation of "observing user chatlogs for safety".

@FutureIsAmazing569 Ай бұрын

I would not go so far :) I think they are pretty competent at what they do. But if they would benefit from it, I think I would be fine with it. Whatever it takes to advance this amazing new tech!

@szebike Ай бұрын

@@FutureIsAmazing569 Well you said you hoped to make a buck or two with your approach you can't tell where they get their ideas from (from my experience people from high academia background are smart but usally very uncreative).So if you want to make money keep yur important ideas for yourself until they are market ready (you can use local models to help you). Given that S.Altman is their CEO I would be more cautious if you look how he behaved towards very poor people with his crypto currency back then. (If TLDR then the short version: He" buyed" biometric scans of eyeballs of those people without *informed consent* for some cryptocurrency per eyescan until the government Kenyan government halted it. OpenAI also used very low paid Kenyan workers to create trainingdata not long ago.)