Yup, QwQ is CRACKED: Prompt Chaining with Qwen and QwQ reasoning model (Ollama + LLM)

Рет қаралды 21,106

IndyDevDan

Күн бұрын

Пікірлер: 69

@vedantaggarwal1092 Ай бұрын

what are your device's specs?

@patrickbrady535 Ай бұрын

2023 M2 Macbook Pro with 64 GB RAM and 12 cores. Shows up at 3:39 in the video.

@blackwhite-3607 Ай бұрын

@@patrickbrady535 wow macbook pro can run ollama qwq 32b?

@Scarsuna Ай бұрын

@@blackwhite-3607 Thanks to chip design, the ML Compute framework is able to utilize system ram for vram. Imagine if your video card had 64GB RAM! Now you might understand how a 20GB model can run easily on a Macbook Pro.

@edwardduda4222 8 күн бұрын

@@blackwhite-3607 Apple Silicon has incorporated RAM with up to 128 GB and 16 neural cores for LLM inference. I recently just got the M4 with 48GB and it's amazing being able to both build and run very large models. It doesn't beat an nvidia rig, but it's great for being a laptop.

@stonedizzleful Ай бұрын

Bro you are down the rabbit hole on this stuff it's so impressive. Some of the best quality AI content on youtube!

@indydevdan Ай бұрын

Thanks for the kind words. The rabbit hole is so deep. Meta prompting + o1 + OpenAI 12 day launch content in the works. The things we can do with this tech is mind boggling.

@Scarsuna Ай бұрын

@@indydevdan I am not a coder at all and running Ollama/OpenWebUI via WSL2. However, I understood quickly I needed to come up with a better system for prompting and just found your channel. Got a lot to learn! Thanks to your suggestions in this video, I asked Grok to generate an XML format for specific instructions to edit a particular piece of text and it actually worked on the first try! Normally, I have to ask the AI 2-3 times before it understands the directions.

@sebastianmalcolm3597 Ай бұрын

Super insightful prompting chaining examples! Thankyou IndyDevDan. 5:42

@ashgtd Ай бұрын

Amazing video. I love how you are pushing the tech to do as much as it can. i def try this out myself

@cannyaii Ай бұрын

Let's just call it Qwik, ironic but easy to say

@notnotandrew Ай бұрын

One might say that it's quick to say.

@stanislavtrifan96 Ай бұрын

John Qwik

@Maisonier Ай бұрын

Wow dude, this is amazing! liked and subscribed.

@plasmax Ай бұрын

Love the reasoner - extractor pattern. Prompt chaining seems very useful, especially for agents with tool use - you could have the reasoner decide what step to take next, then have an extractor, then verifier with inspection tools that goes back to the reasoner with new information in case something went wrong with the reasoning…

@jtjames79 Ай бұрын

I would really like to preserve all that thinking process, merge them all into one giant file, and turn it into a knowledge graph. I like to talk about philosophy and futurism, so the details are often very important. I don't care how long it takes. If the AI needs to get back to me tomorrow, that's what I'm used to anyway with people.

@johang1293 Ай бұрын

Sounds like you should look at graph rag.

@jtjames79 Ай бұрын

@johang1293 The database part is still Greek to me. Most of the tutorials are like "draw a circle, now draw the rest of the owl". 🤷‍♂️

@alexjensen990 Ай бұрын

Dude! I don’t know how I haven't watched one of your videos yet. Assuming that all your videos are like this... I have found my new favorite channel... I sometimes feel like I'm the only one I know who is really nerding out on prompt engineering in more complicated (and better performing) manner... To be honest, I feel like I am the only one I know who is really into using generative AI. In any event you’ve got a new regular sub here on YT.

@alexjensen990 Ай бұрын

PS- Ironically, I already follow you on GitHub somehow. I don’t recall checking out your repos, but I look forward to following your work man. Cheers.

@samson3523 Ай бұрын

yep hes fucking spot on always, I thought I was the only one in my IT team on the cutting edge (I am) , but at least we've got indydevdan

@ABUNDANCEandBEYONDATHLETE Ай бұрын

There are millions dude.... Most people don't talk about it. It almost feels like a intern as an assistant, can't wait until PhD level. Never making a template from scratch again lol

@JimMendenhall Ай бұрын

This is a very nice approach. Thank you for sharing. I'm wondering if QwQ will output the final answer in a specified tag like similar to how some other models will. That could help with the extraction for sure.

@MaelSimonApprenTyr Ай бұрын

Yeah, then after you can extract it with a simple regex 😀

@wSevenDays 24 күн бұрын

thank you very much for this prompt guide!

@Gunnerpigs Ай бұрын

After I watched your previous video about prompt levels I wondered how I should implement dynamic variables, is the way you do it in your 5:47 example the best way? Or are there other better ways of utilizing the dynamic variable?

@RuneX_ai Ай бұрын

Thank you for reviewing!

@sd5853 Ай бұрын

I don’t always get everything from your vids as I’m a fkin noob with limited follow through but fk me I love your vids and I’m getting so much value out of it. Thank you for putting it out there

@MarxOrx Ай бұрын

This is cool but you can also just use the structured output in ollama and force the output to put the chain of thought in one key and the final result in another one. Then you don’t need the second LLM pass at all.

@TravisChristopher Ай бұрын

Is that an EGPU for Mac on your desktop? Would love to hear a bit about your setup…

@iainmckenzie1 Ай бұрын

Curious to know if you've done any work with DSPy Dan? We've just started piloting it in my organisation so will be generating preliminary results soon but it's an interesting concept. Would be cool to hear your thoughts.

@jacquesdupontd Ай бұрын

Very interesting. Is this script for prompt chaining possible on an IDE like Windsurf for each of the prompt give to an agent like Sonnet 3.5 ? Thanks for your work

@GjentiG4 Ай бұрын

Quick question. Are you putting the subtitles automatically? If so what tool are you using?

@paulyflynn Ай бұрын

Crushing it ! Can I call you Mr. Hands

@Truth_Unleashed Ай бұрын

There can be only one!

@NLPprompter Ай бұрын

i disagree, he's bigger than that mate his hands wizard look at how his hands handle the keyboard invoking strokes, sounds like keyboard wizard vim shortcuts on OS!

@aaagaming2023 Ай бұрын

Would you say prompt chaining like this is as efficient as a framework like LangGraph in a production context?

@perelmanych 21 күн бұрын

You can achieve almost the same output quality just by using the next prompt like: "Use previous answer to formulate the final answer in the json format"))

@supermold Ай бұрын

I mean it is slow, but in the time it took me to watch this video it wrote snake game on my rtx 3060. Technically I only asked it for an outline, but it decided phase three of outlining how to code snake game was showing me the code that outlines how to instruct your computer to run snake game on it so idk, I guess it failed successfully. Considering I just gave it some generic and poorly worded instructions in a state of sleep deprivation, I'm pretty impressed and excited to see what else it's capable of.

@ThoughtFission Ай бұрын

And here I had hoped we mere mortals could get something useful out of this. Not for those who don't program I guess ;o)

@daryladhityahenry Ай бұрын

Wait....... How do you fit the context length for extraction? I mean, it's sooooo long.... And your ram only 64GB and you use 32B model. hmm. I really wonder why it can fit and works really well. >,

@canelj Ай бұрын

Thanks for sharing this! 🤩

@Sam-kj8ew Ай бұрын

nice. What IDE is that?

@JohnLewis-old Ай бұрын

Can you do a multi-run test to see how many of those chained outputs fail? Are we talking 70% or 99%?

@TheAIBlueprint Ай бұрын

The model is fully open? Or is through an API? Does any info, data, meta-data go back to anyone’s servers or is it 100% tested local?

@marcshawn Ай бұрын

What is your Macbook Specs?

@magnusquest Ай бұрын

Using this to run a local aider --architect Qwen + QwQ stack :D

@MaelSimonApprenTyr Ай бұрын

Are you using this stack? Or an idea? If you are using it, I'm really curious about the perfs ! I want to switch from sonnet 3.5 to something local to reduce my climate impact

@magnusquest Ай бұрын

@@MaelSimonApprenTyr Currently running QwQ > output to file > input to aider. As Dan mentions the pitfall of this reasoning model is that it outputs its whole thought process so it would take super long for architect to run efficiently. Extracting the specific steps and details with prompt chaining is best here, but still takes quite a bit longer than using something like GPT o1

@MaelSimonApprenTyr Ай бұрын

@@magnusquest I'll have to check this in details, thanks man ! 😊

@KennylexLuckless Ай бұрын

hat it speaks Chinese would indicate that it is a well-trained model. Mandarin Chinese is the most spoken language in the world, while English only becomes the largest if you count those who speak it as a second language. As a Swede, I often encounter U.S. bias in AI responses, where it uses feet and inches even though the metric system is the most widely used. I have to use a system prompt to make it use metric, but it often leaves the conversion in the answer, which I have to remove later.

@Scarsuna Ай бұрын

Europeans have a hard time creating innovative projects of their own due to draconian censorship laws in the EU. That's the problem!

@bukitsorrento Ай бұрын

MCP x Ada/Agent OS, haven't see any channel covering memory server, also check awesome-MCP server repo. Can't wait.

@indydevdan Ай бұрын

100% feels big - I'm still digesting MCP. Massive OAI releases incoming.

@alexjensen990 Ай бұрын

TL; DR - I am not convinced that Qwen is all that great. Admittedly, I haven’t put the newest one through its paces yet due to time constraints, but I intend to do my due diligence when time permits. I will elaborate below on why Qwen, and for that matter a good number of recent models and papers, all seem to be at best not really moving the industry forward. At worst, well, they seem like they are an intentional grift or at a minimum plagiarism. Due to there inability to correctly answer characters lacking certain identifying aspects. I’m sorry if that sounds too general, but in a effort to keep this short I will simply say that if you use o-1 & o-1 mini; Claude, Haiku, and Opus; Llama 70B & 405Band many of its variants; Gemini Pro; and to some degree Mistral though I find that Mistral has fewer, more focused strengths, what you find is that they, at least in English, are able to generalize and abstract in mind blowing ways already. Particularly OpenAI and Anthropic’s models. Despite their flaws, Gemini, Llama, Mistral, and Grok all have extremely amazing ability to infer from a query what the next words should factually be given the effectiveness of the prompt and, often, even with ineffective prompts. However, Qwen out of the box has not, as of yet, shown me anything, but party tricks. Previous versions of Qwen have performed absolutely horribly when I have tested their ability to make certain connections between ideas that every other model, even the 6B-8B models, tend to do without missing a beat. I have several theories as to why and as a scientist, enthusiast, and member of the human race it bums me out. I will leave those theories to your imagination. With all that said, I look forward to putting this new Qwen through its paces and see if it finds a place in my stack. Believe it or not, this was by far the TL; DR because I could go on for some time about this topic. Anyway… Great video. I look forward to following your content moving forward. Cheers.