A tip: start a new chat for each of the questions. It will likely respond better then, as it uses all of the previous questions as context, and quite heavily so.
@DevAndDoc4 ай бұрын
Thanks, you are correct! In this case we experimented quite a bit before the recording, but for ease of presentation and input, we were happy giving additionally context to o1 as we didn't feel it would significantly affect the demonstration, but scientific rigor we should have done zero shot prompting for each one separately
@40bombala4 ай бұрын
Great talk. Interesting to see how AI is helping the wider medtech industry. Just a small tip. Always try to use fresh sessions when asking unrelated questions. Us humans have a remarkable ability to ignore the past and move on to the next problem in the set, but LLMs will analyse the entire history prior to marking them as irrelevant (even with the initial message indicating that it's a quiz). As a result, accuracy and precision drops the deeper you go into the conversation.
@ShpanMan4 ай бұрын
The full model should be arriving next month, would be interesting to give it even harder tests.
@Manwith6secondmemory4 ай бұрын
I created a simple website, with anthropic API keys, took me a couple of hours. You enter a patients information, their history, and symptoms, it returns possible diagnosis(s) and a patient specific treatment plan. My cousin who is in med school stress tested it and she was like omg how did you make this, its amazing, and I was like its just a wrapper haha
@flickwtchr3 ай бұрын
Oh goodie.........a new generation of doctors that will know how to lean on a chatbot. Trying to get a doctor's actual attention is already hard enough. Now we will have to question what they are being advised to do or not to do by their AI, but then these AI systems, the ANI ones already in place, are protected by "proprietary information" legalize.
@Manwith6secondmemory3 ай бұрын
@@flickwtchr Doctors make mistakes ALL THE TIME. Instead of relying on a doctor, you can now interact with a chatbot anywhere (instead of forking over $1500 for a checkup) and give it your health information and symptoms and it can diagnose/recommend treatment options. This is a good thing, it opens up the field, now doctors are no longer sacred guardians of information.
@jd_real14 ай бұрын
Great video! I’m also excited for o1. I gave it 350 records to sort and analyze, and it did the same work in 20 seconds what would have taken me 3 hours in Excel. Very impressive
@LucaCrisciOfficial4 ай бұрын
The problem with ARC puzzle Is that substantially It Is a visually reasoning task. When you translate It into a matrix you are not testing the same thing as for humans. I think LLM will only get better at this task improving the vision capabilities, not only the reasoning ones. And with this I won 1 million dollars :-)
@sevilnatas4 ай бұрын
Also, many people were accomplishing this "reasoning" by using RAG processes and making multiple api calls to both hold the model's hand through the reasoning process and also as a way to confirm results. Supposedly, much of this won't be necessary, if it delivers on its promises. I'd like to see the model come back with requests for clarification or additional information.
@pmiddlet723 ай бұрын
Another doc here. Glad to see someone's talking more about this area. But maybe don't stray so hard into the benchmarks and the stuff tangential to medicine and more into the state of AI. Trying to understand how 'the computer' thinks about a diagnosis vs a clinician would be a great deep dive here (but requires some extensive studt). Published BMs are a bit of a slippery issue. So, one thing that's problematic with this overall model of using LLMs in additional to toolchains to try and infer a diagnosis for some patient event is that a) multimodality (or lack thereof): the most *important* components in generating the most likely diagnosis is a solid H&P. *before anything else* - provided you're able to get it. For something that's, say, a skin infection, using CV with better reasoning, 01 appears to do reasonably well. However, the confidence in one diagnosis may override a solid differential that arguably such a model should be using to challenge its initial assumptions(it should be a bit skeptical that it's initial thought is always right). b) use case: functional guidance that works for things the transformer can reason are mostly 2D projections. For this, such as reading x-rays, CT, and MRIs, 01 appears to do well here too (it does better, like any radiologist, if there's some context given to it). There are others, but I'll stop here. Case in point to exemplify the H&P and multimodal need - when it comes to discriminating between similar presentations of 2 different skin infections, for which you can upload photos, the model I found can really falter. Discriminating between one of various presentations of MRSA in my more recent experiment with this, Serratia sp (which can look quite similar to MRSA, but differ quite a bit in presentation). I'm in neurosurg, but my focus is in translational medicine. My example is really 'wound/external infection sort of management', which comes more under wound surgeons, but we are all responsible in this regard, so I thought it was something reasonable to test. The photos we used were controlled in areas such as distance, color, focal distance, and resolution - but closely appearing presentations it Dx'd wrong around 65% of the time). I have some thoughts about what would improve the rate of correct inference in these matters, but this will be a much deeper dive. I also think this is less about scaling and more about a need for a new architecture that deals with more generalized approaches to this domain. Where I'd really want to see AI be a 'true assistant' would be in the OR - for example: to help guide endovascular neurologists in complicated embolization prrocedures where DSA / fluoroscopy is the norm for visual information during the procedure (i.e. what might be the best route for embo given a person's vascular anatomy - what vessels might be too weak or small to inject an embo liquid and reach a larger 'birds nest' of dAVFs - what would have been the most likely cause of a CVST given what is known about the patient - H&P, images, etc).
@djayjp4 ай бұрын
I don't get what the apparent error was in the last case (?). It stated it as "approximate" after all 🤔
@DevAndDoc4 ай бұрын
It used a wrong / loose conversion, there is also a phenomenon called cross tolerance where you could be less tolerant / more sensitive to a new opioid that targets slightly different sub receptors. In short, a clinician should account for these variables and approach with caution, you'd rather under dose rather than over dose in conversions!
@djayjp4 ай бұрын
@@DevAndDoc Ah I see gotcha! Thx for explaining and keep up the good work!
@Tomolopolis4 ай бұрын
Great episode guys! 🔥
@DevAndDoc4 ай бұрын
Thank you for the support!
@chickendinner64564 ай бұрын
It is nice to see real experts tests this model and not relay on OpenAI internal testing or random tech youtuber.
@DevAndDoc4 ай бұрын
Thank you, our aim is to have people who actually live and breathe AI and healthcare dissecting these topics for other clinicians and the wider world. If there is anything you'd like to see in the future please let me know :) - Doc
@AlfarrisiMuammar4 ай бұрын
Still waiting for full version of gpt o1
@andreaskrbyravn8553 ай бұрын
audio wouldnt work for this models, unless people want to wait for an answer.
@HrvojeSpoljar3 ай бұрын
it would but same as with human response, if question is complex it would take more time to provide answer, possibly some time to think before responding. GPT would probably be able to start responding with some steps and kind of 'thinking out loud' before it gets down to final answer
@michaelhartjen32144 ай бұрын
wait till next year, this is just the start.
@sevilnatas4 ай бұрын
Proprietary and Open Source is not supposed to exist at the same time. The origin of OpenAI was as a non-profit and supposed to be both open source, safe and benefiting society. Seems that is all out the window now.
@HrvojeSpoljar3 ай бұрын
they were never open source. Open in name was just a ploy.
@sevilnatas3 ай бұрын
@@HrvojeSpoljar I was under the impression that they had said that they would eventually be open sourced, but that may be my mistake.
@human_shaped4 ай бұрын
Minor pedantic correction: it isn't OpenAI GPT o1, it's just OpenAI o1. Sam doesn't like the name GPT. The o1 series is a fresh start without the GPT.
@djayjp4 ай бұрын
Actually even OpenAI refer to it as "ChatGPT o-1 Preview" upon opening a chat window with it.
@MrC0MPUT3R3 ай бұрын
@@djayjp Maybe they were referring to the API which calls the model 'o1-preview'. Make sense because ChatGPT is the web product and not the model itself.
@flickwtchr3 ай бұрын
Well if Sam doesn't like it, I'll make certain to keep referring to it as GPT.
@bauch164 ай бұрын
It will never be as bad as now
@tzardelasuerte4 ай бұрын
Nope. And o1 full is done it's just that the inference cost is too high that's why we got the preview. But my point is that its not what if it's when o1 comes out it's going to be even better than this. And gpt5 is right behind that. And it just doesn't stop after that. It continues over and over
@onlythetruth74 ай бұрын
@@tzardelasuertethat is if they don’t reach a wall. Be it lack of data especially if synthetic data doesn’t work that well or architecture limitations that scaling couldn’t fix, be it they’ll still advance but at a way slower pace past a certain level. I hope they find other architectures if that’s the case or they are able to make it more efficient somehow.
@Boopy3574 ай бұрын
Nothing regresses to worse in any sort of technological situation. Doesn't mean everything scales infinitely.
@tzardelasuerte4 ай бұрын
@@onlythetruth7 if everything stays the same and this theoretical wall you talk about comes to be. You dont take into account nothing is staying the same, new models continue coming out and new methods of training keep coming out and new methods of inference keep coming out. Add to that that ai is helping build and research and even train these models the theoreticall wall becomes even less important.
@onlythetruth74 ай бұрын
@@tzardelasuerte still using the same decade old architecture to train. Hopefully new stuff comes out fast and you’re right
@EamonnMooney4 ай бұрын
Slight correction, it's not called GPT o1 just Open AI o1. But great content and very scary.
@djayjp4 ай бұрын
Even OpenAI lists it as "ChatGPT o-1 Preview" when you open a chat window with it.
@standingbear9984 ай бұрын
it will aid the depop movement greatly. get your heads out of the sand
@xXstevilleXx4 ай бұрын
Well, I don't know what to say, other than I am glad I have no children. I feel sorry for those who do. Looks like a cold and dark "future" for them