"We were right" - How to use o1-preview and o1-mini REASONING models

Рет қаралды 24,786

Күн бұрын

Пікірлер: 81

@WalterKeenan 3 ай бұрын

High quality content with a very absorbing style of delivery. Every time I watch one of your videos, I pick up at least one (and possibly several) new tricks. Thanks and keep up the great work.

@vincentjean6756 3 ай бұрын

Can't wait for competition and the price drops coming in the upcoming weeks. what a great time to be alive, I LOVE it.

@drlordbasil 3 ай бұрын

mimic'd thinking depth and time with llama 3.1 using groq, hella fast, hella smart! Love that you put "WE" were right, we all work as a hive mind finding what works and finding what doesn't by eachother, even following leads from the closed companies. Love this space, love this time we are in. Thanks for another great video to watch while working.

@deltagamma1442 3 ай бұрын

Have you used claude 3.5 sonnet? Do you find llama 3.1 better? Is your use case coding?

@drlordbasil 3 ай бұрын

@@deltagamma1442 I've used llama 3.1 mainly as its free for my research, preference def claude 3.5 sonnet. Use-cases vary as I have ADHD and love coding new projects. I have done most automation online possible with llm agents or NN/RL/Meta agents.

@nsdalton 3 ай бұрын

I like the AI Coding Meta part. Recently I tried to build out an app with quite a lot of files in the frontend and I ran out of tokens. But then I made an instance of Claude, that had extensive knowledge about my app and it's functionality, create a series of prompts that would focus on different areas of the app. It made sure that the app context and architecture was kept intact across the app. Came out to about 60 prompts, but it saved me soo much time and it was surprisingly accurate.

@i2Sekc4U 3 ай бұрын

Hi, could you please show how to do this! This is impressive 😊

@indydevdan 3 ай бұрын

Great engineering work @nsdalton. A big mistake I see the LLM ecosystem making is going to broad when going narrow is how you get real value - today. Having explicit prompts with knowledge about your app is a great instance of this.

@WenRolland 3 ай бұрын

Just for kicks, here is a test chapter I created with a custom GPT I'm working on. 00:00 Introduction: Why Prompt Chaining is Key 01:05 Understanding the 01 Series Model Update 01:57 KZbin Chapter Generation: 01 vs. Claude 3.5 03:06 Using Simon W's CLI LLM Tool for Chapter Generation 04:29 Comparing Results: 01 Preview vs. Claude 3.5 05:58 The Advantage of 01's Instruction Following 07:55 AI Coding Review: 01's Superior Performance 10:24 Simon W's File-to-Prompt Library for Code Review 12:01 Running 01 Preview for AI Coding Solutions 14:54 Key Learnings: Instruction Following in the 01 Models 16:38 Sentiment Analysis: Testing on Hacker News 19:16 Iterating with Large Token Prompts 21:37 Final Results: Detailed Sentiment Analysis with 01 27:52 What's Next: The Future with Reasoning Models

@indydevdan 3 ай бұрын

Not bad at all. Definitely more detailed.

@WenRolland 3 ай бұрын

@@indydevdan Like in your prompt, I asked it to optimize for SEO but also to identify significant subject changes that would be of interest to the viewer.

@---Oracle--- 3 ай бұрын

Hello Dan. I want to say that the coherence, elegance and clarity with which you present, articulate and code is profound and unique. We all want to see you succeed beyond your wildest dreams. Amazing content, pioneer🎉

@Truth_Unleashed 3 ай бұрын

Great video another example of why you are my new fav ai dev channel! Thanks!

@KS-tj6fc 3 ай бұрын

5:45 Suggestion - Have o1-preview create ##Chapters ### Section 1 (00:00-08:44) #### 00:01 #### 01:35 #### 03:45 #### 05:18 ### Section 2 (08:45-12:59) #### 08:45 Then list the keywords for the sections, allow you you to select which key words to keep/prioritize (GUI with +/-) # of times keyword is listed in section and TOTAL number of ####. So if there are 5 #### you suggest 3-4 ####, or 3 #### headings and have it reconfigure just Section 2, perhaps not have AIDER on all 4 of the ####, maybe 3 times maximum. My thought process here was your small 6 words into an expanded prompt into an image. This is tweaking the output via basic and efficient HITL review to then nudge/guide an iteration by o1-preview to take its better than Sonnet output and perfect it. Ok - back to the video!

@fups8222 3 ай бұрын

another amazing video Dan! keep up the great work👍

@mikew2883 3 ай бұрын

Hey Dan. I did not see the XML formatted prompt examples in the libraries you listed. Can you possibly guide us to where to find them? Thanks!

@lydedreamoz 3 ай бұрын

Nice video as always. I would love you to focus more on o1-mini for coding in your next video because it was supposedly optimized for coding and it’s far less expensive !

@riley539 3 ай бұрын

Not going to lie, as a sophomore Computer Science student, this video kind of opened my eyes on the possibilities of LLMs

@davidjohnson4063 3 ай бұрын

Job = gone give it 2 years

@riley539 3 ай бұрын

@@davidjohnson4063 I think that the "internet of things" will evolve into the "AI of things" until AGI appears. In the meantime, most computer science jobs are not replaceable (except management). Regardless, Chain prompting is revolutionizing LLM use - although I still believe there is a ceiling for LLM applications

@akhilsharma2712 3 ай бұрын

@@riley539 lol but junior jobs are replaceable aka yours (in the future)

@ben2660 3 ай бұрын

yeah ur cooked switch to data science and build the AI's

@riley539 3 ай бұрын

@@ben2660 This take is not very bright. Computer science jobs will always exist, but Market specialization is more important now than ever. Luckily I am also a career-changer and have a decade of experience in the energy generation and distribution industry - where I plan to return to in a tech role.

@JimMendenhall 3 ай бұрын

Are you a tier-5 OpenAI user? How are you getting API access to these models?

@KS-tj6fc 3 ай бұрын

Assuming this is the case. What are the 1/M token API costs for o1-preview and preview mini?

@JoshDingus 3 ай бұрын

Open router provides access and o1 is very expensive

@mikew2883 3 ай бұрын

The new models are actually available through OpenRouter API.

@andydataguy 3 ай бұрын

Openrouter offers the models at a 5% upcharge $3 / $12 for mini $15 / $60 for preview Guessing o1 will be $75 / $300 (allegedly will be released EoM)

@KS-tj6fc 3 ай бұрын

@@andydataguy crazy prices! I thought SOTA LLM were suppose to move towards instant inference, unlimited context windows and ever decreasing costs per a top level guy at Anthropic during the Engineering Worlds Fair just a month ago: kzbin.info/www/bejne/e6amYnqNnbaXgac

@SimonNgai-d3u 4 күн бұрын

No wayyy, I can't wait for o3 mini to be released. It should be a next new level!

@i2Sekc4U 3 ай бұрын

Can you put the resources you refer to in all your videos somewhere? Or just in the description of the video?

@techfren 3 ай бұрын

Amazing video. Lots of great nuggets of info

@rluijk 3 ай бұрын

Great video! Thanks for all the value given!

@MaJetiGizzle 3 ай бұрын

Have you tried yaml as a file format for AI prompting? It uses far less tokens while still creating necessary delimitation versus XML or JSON.

@CostaReall 3 ай бұрын

That's a beautiful thumbnail! How did you prompt that?

@silvansoeters 3 ай бұрын

great stuff! learned a lot from this video.

@sambarjunk 3 ай бұрын

Great video, can you share the xml prompts you used in this video?

@IdPreferNot1 3 ай бұрын

Are you tier 5 for API access or is there a workaround?

@MichaelLikvidator 3 ай бұрын

Which plugin calculates token amount on the bottom right?

@MariuszWoloszyn 3 ай бұрын

Can you share the prompt files used in the video?

@faisalhijazi9782 3 ай бұрын

Great content as usual 👏

@techfren 3 ай бұрын

You are my favourite 🔥🔥

@JoshDingus 3 ай бұрын

Same here, let's get a community going indydevdan!

@techfren 3 ай бұрын

@@JoshDingus for sure! We Stan Indy dev dan in my discord community too

@tomaszzielinski4521 3 ай бұрын

The ability to clean up jsons still remains valuable, as the tokens wasted on useless data here must have costed a lot :P

@pawsjaws 3 ай бұрын

Its not so much prompt chaining but the Qstar type RL type stuff is key. Tuning the model with the right optimized reasoning routes. Prompting is legit and chaining it certainly works. But in no way is this only prompt chaining. They're even claiming one single model (which shocked me too).

@Deadshotas9845 3 ай бұрын

Please test o1-mini as well for content generation as well as coding

@indydevdan 3 ай бұрын

Next vid we focus on ai coding with o1-mini. Stay tuned.

@ПотужнийНезламізм 3 ай бұрын

I don’t think code review is possible for larger code base, where you need to add 20 files and diff 2k to analyze, that’s requires some vector db and ran ChatGPT against it somehow

@andydataguy 3 ай бұрын

Thanks for sharing!

@aresaurelian 3 ай бұрын

The base model should know when it needs to infer or not, and thus tell us if it must infer to reach a better result, and ask us if we are willing to use the extra token cost for it. We want convenience, agency, and the system must be capable and able to do actual work. Verb. Action, doing, producing. The less we must tinker with prompts and models ourselves, the better for the general end user. User must be synonymous with agent, and thus, users can be ai agents, doing real work, and vice versa.

@internetperson2 3 ай бұрын

You are describing precognition

@aresaurelian 3 ай бұрын

@@internetperson2 A mini model could recognize if the prompt seem complex enough for using inference models to handle it better. A larger search model should realize there was no obvious result matching the specific problem too, and recommend an inference model.

@internetperson2 3 ай бұрын

@@aresaurelian This is wishful thinking imo, you cant trust a mini model's gut about assessing the level of required compute to arrive to a satisfactory result for a given problem. I'm not saying such a tool is infeasible, but I am of the mind it would suck.

@aresaurelian 3 ай бұрын

@@internetperson2 I could be optional. When the customer/user/agent is displeased, the model would learn to behave in a manner suiting them.

@DemetriusZhomir 3 ай бұрын

You build your prompts quite wisely - that's what most people don't do, especially while benchmarking. Those miss the whole potential of LLMs, yet making their conclusions 🤦‍♂️

@filmonyoha7134 2 ай бұрын

I have also observed that among developers, they say it's trash but they keep on giving it completely different requirements within a single context when they know LLMs are relying on past context

@DemetriusZhomir 2 ай бұрын

@@filmonyoha7134 yeah, and this is when we understand that prompting education is not a bad idea actually.

@filmonyoha7134 2 ай бұрын

@@DemetriusZhomir by the way I did my bachelor's in data science but seeing how my job will become obsolete with data analysis being much easier with ai do you recommend me going back and doing computer engineering

@DemetriusZhomir 2 ай бұрын

@@filmonyoha7134 follow your passion. Humans will always be reviewing AI outputs - that requires us to be experts. In my opinion. But we gotta be agile to adapt if we end up being wrong about the future. In time, you can learn something else.

@App-Generator-PRO Ай бұрын

and where is claude 3.5 haiku? :(

@youriwatson 3 ай бұрын

14:02 hahaha i liked and subbed

@user-eg2oe7pv2i 3 ай бұрын

Best way ? Always pre test run .a dummy run ..like gamer in wow hitting dummy for dps eval. And tell it when the pre test is over and the test start

@fieldcommandermarshall 3 ай бұрын

👑👑👑

@KyleFES 3 ай бұрын

LFG 🔥!!

@carkawalakhatulistiwa 3 ай бұрын

Is better is they just call gpt 4.5 o1

@toddschavey6736 3 ай бұрын

So we are finally going have software engineers --write-- down their requirements and use cases.... cause you can feed them to AI agents to implement, test, and review Finally.

@Stevenpwalsh 3 ай бұрын

Technically Tree-of-thought not Chain-of-thought

@Catdevzsh01 3 ай бұрын

meow [nice] :D

@amandamate9117 3 ай бұрын

thats a nice demo, but whos gonna wait minutes to get sentiment analysis for couple comments? way too slow .

@MustardGamings 3 ай бұрын

What do you do when you think do you isntatly figure things out or do you ponder and think??

@Trendilien69 3 ай бұрын

this constant noise of you typing in the keyboard is distracting and annoying.

@retratosariel 3 ай бұрын

Deal with it

@lexscarlet 3 ай бұрын

"if you're subscribed to the channel you know what we're about." yeah but I'm not, so I don't, so like, maybe make an introduction about what you're about? You have 19k subs (rounding up) over 2 years, clearly the content isn't selling itself.

@internetperson2 3 ай бұрын

It's a pretty good bleeding edge meta AI channel focused on extracting the most value out of the best tools depending on your use case

@indydevdan 3 ай бұрын

As a new viewer that makes sense, your first point is solid feedback. As for subs, I think you're mistaking large numbers with impact. Realistically this content is for a subset of a subset of engineers that are or want to be near the edge of AI. If this channel gains subs fast, it means I've done something wrong.