Can OpenAI's o1 solve complex medical problems?

Рет қаралды 10,226

Күн бұрын

Пікірлер: 41

@satioOeinas 4 ай бұрын

A tip: start a new chat for each of the questions. It will likely respond better then, as it uses all of the previous questions as context, and quite heavily so.

@DevAndDoc 4 ай бұрын

Thanks, you are correct! In this case we experimented quite a bit before the recording, but for ease of presentation and input, we were happy giving additionally context to o1 as we didn't feel it would significantly affect the demonstration, but scientific rigor we should have done zero shot prompting for each one separately

@40bombala 4 ай бұрын

Great talk. Interesting to see how AI is helping the wider medtech industry. Just a small tip. Always try to use fresh sessions when asking unrelated questions. Us humans have a remarkable ability to ignore the past and move on to the next problem in the set, but LLMs will analyse the entire history prior to marking them as irrelevant (even with the initial message indicating that it's a quiz). As a result, accuracy and precision drops the deeper you go into the conversation.

@ShpanMan 4 ай бұрын

The full model should be arriving next month, would be interesting to give it even harder tests.

@Manwith6secondmemory 4 ай бұрын

I created a simple website, with anthropic API keys, took me a couple of hours. You enter a patients information, their history, and symptoms, it returns possible diagnosis(s) and a patient specific treatment plan. My cousin who is in med school stress tested it and she was like omg how did you make this, its amazing, and I was like its just a wrapper haha

@flickwtchr 3 ай бұрын

Oh goodie.........a new generation of doctors that will know how to lean on a chatbot. Trying to get a doctor's actual attention is already hard enough. Now we will have to question what they are being advised to do or not to do by their AI, but then these AI systems, the ANI ones already in place, are protected by "proprietary information" legalize.

@Manwith6secondmemory 3 ай бұрын

@@flickwtchr Doctors make mistakes ALL THE TIME. Instead of relying on a doctor, you can now interact with a chatbot anywhere (instead of forking over $1500 for a checkup) and give it your health information and symptoms and it can diagnose/recommend treatment options. This is a good thing, it opens up the field, now doctors are no longer sacred guardians of information.

@jd_real1 4 ай бұрын

Great video! I’m also excited for o1. I gave it 350 records to sort and analyze, and it did the same work in 20 seconds what would have taken me 3 hours in Excel. Very impressive

@LucaCrisciOfficial 4 ай бұрын

The problem with ARC puzzle Is that substantially It Is a visually reasoning task. When you translate It into a matrix you are not testing the same thing as for humans. I think LLM will only get better at this task improving the vision capabilities, not only the reasoning ones. And with this I won 1 million dollars :-)

@sevilnatas 4 ай бұрын

Also, many people were accomplishing this "reasoning" by using RAG processes and making multiple api calls to both hold the model's hand through the reasoning process and also as a way to confirm results. Supposedly, much of this won't be necessary, if it delivers on its promises. I'd like to see the model come back with requests for clarification or additional information.

@pmiddlet72 3 ай бұрын

Another doc here. Glad to see someone's talking more about this area. But maybe don't stray so hard into the benchmarks and the stuff tangential to medicine and more into the state of AI. Trying to understand how 'the computer' thinks about a diagnosis vs a clinician would be a great deep dive here (but requires some extensive studt). Published BMs are a bit of a slippery issue. So, one thing that's problematic with this overall model of using LLMs in additional to toolchains to try and infer a diagnosis for some patient event is that a) multimodality (or lack thereof): the most *important* components in generating the most likely diagnosis is a solid H&P. *before anything else* - provided you're able to get it. For something that's, say, a skin infection, using CV with better reasoning, 01 appears to do reasonably well. However, the confidence in one diagnosis may override a solid differential that arguably such a model should be using to challenge its initial assumptions(it should be a bit skeptical that it's initial thought is always right). b) use case: functional guidance that works for things the transformer can reason are mostly 2D projections. For this, such as reading x-rays, CT, and MRIs, 01 appears to do well here too (it does better, like any radiologist, if there's some context given to it). There are others, but I'll stop here. Case in point to exemplify the H&P and multimodal need - when it comes to discriminating between similar presentations of 2 different skin infections, for which you can upload photos, the model I found can really falter. Discriminating between one of various presentations of MRSA in my more recent experiment with this, Serratia sp (which can look quite similar to MRSA, but differ quite a bit in presentation). I'm in neurosurg, but my focus is in translational medicine. My example is really 'wound/external infection sort of management', which comes more under wound surgeons, but we are all responsible in this regard, so I thought it was something reasonable to test. The photos we used were controlled in areas such as distance, color, focal distance, and resolution - but closely appearing presentations it Dx'd wrong around 65% of the time). I have some thoughts about what would improve the rate of correct inference in these matters, but this will be a much deeper dive. I also think this is less about scaling and more about a need for a new architecture that deals with more generalized approaches to this domain. Where I'd really want to see AI be a 'true assistant' would be in the OR - for example: to help guide endovascular neurologists in complicated embolization prrocedures where DSA / fluoroscopy is the norm for visual information during the procedure (i.e. what might be the best route for embo given a person's vascular anatomy - what vessels might be too weak or small to inject an embo liquid and reach a larger 'birds nest' of dAVFs - what would have been the most likely cause of a CVST given what is known about the patient - H&P, images, etc).

@djayjp 4 ай бұрын

I don't get what the apparent error was in the last case (?). It stated it as "approximate" after all 🤔

@DevAndDoc 4 ай бұрын

It used a wrong / loose conversion, there is also a phenomenon called cross tolerance where you could be less tolerant / more sensitive to a new opioid that targets slightly different sub receptors. In short, a clinician should account for these variables and approach with caution, you'd rather under dose rather than over dose in conversions!

@djayjp 4 ай бұрын

@@DevAndDoc Ah I see gotcha! Thx for explaining and keep up the good work!

@Tomolopolis 4 ай бұрын

Great episode guys! 🔥

@DevAndDoc 4 ай бұрын

Thank you for the support!

@chickendinner6456 4 ай бұрын

It is nice to see real experts tests this model and not relay on OpenAI internal testing or random tech youtuber.

@DevAndDoc 4 ай бұрын

Thank you, our aim is to have people who actually live and breathe AI and healthcare dissecting these topics for other clinicians and the wider world. If there is anything you'd like to see in the future please let me know :) - Doc

@AlfarrisiMuammar 4 ай бұрын

Still waiting for full version of gpt o1

@andreaskrbyravn855 3 ай бұрын

audio wouldnt work for this models, unless people want to wait for an answer.

@HrvojeSpoljar 3 ай бұрын

it would but same as with human response, if question is complex it would take more time to provide answer, possibly some time to think before responding. GPT would probably be able to start responding with some steps and kind of 'thinking out loud' before it gets down to final answer

@michaelhartjen3214 4 ай бұрын

wait till next year, this is just the start.

@sevilnatas 4 ай бұрын

Proprietary and Open Source is not supposed to exist at the same time. The origin of OpenAI was as a non-profit and supposed to be both open source, safe and benefiting society. Seems that is all out the window now.

@HrvojeSpoljar 3 ай бұрын

they were never open source. Open in name was just a ploy.

@sevilnatas 3 ай бұрын

@@HrvojeSpoljar I was under the impression that they had said that they would eventually be open sourced, but that may be my mistake.

@human_shaped 4 ай бұрын

Minor pedantic correction: it isn't OpenAI GPT o1, it's just OpenAI o1. Sam doesn't like the name GPT. The o1 series is a fresh start without the GPT.

@djayjp 4 ай бұрын

Actually even OpenAI refer to it as "ChatGPT o-1 Preview" upon opening a chat window with it.

@MrC0MPUT3R 3 ай бұрын

@@djayjp Maybe they were referring to the API which calls the model 'o1-preview'. Make sense because ChatGPT is the web product and not the model itself.

@flickwtchr 3 ай бұрын

Well if Sam doesn't like it, I'll make certain to keep referring to it as GPT.

@bauch16 4 ай бұрын

It will never be as bad as now

@tzardelasuerte 4 ай бұрын

Nope. And o1 full is done it's just that the inference cost is too high that's why we got the preview. But my point is that its not what if it's when o1 comes out it's going to be even better than this. And gpt5 is right behind that. And it just doesn't stop after that. It continues over and over

@onlythetruth7 4 ай бұрын

@@tzardelasuertethat is if they don’t reach a wall. Be it lack of data especially if synthetic data doesn’t work that well or architecture limitations that scaling couldn’t fix, be it they’ll still advance but at a way slower pace past a certain level. I hope they find other architectures if that’s the case or they are able to make it more efficient somehow.

@Boopy357 4 ай бұрын

Nothing regresses to worse in any sort of technological situation. Doesn't mean everything scales infinitely.

@tzardelasuerte 4 ай бұрын

@@onlythetruth7 if everything stays the same and this theoretical wall you talk about comes to be. You dont take into account nothing is staying the same, new models continue coming out and new methods of training keep coming out and new methods of inference keep coming out. Add to that that ai is helping build and research and even train these models the theoreticall wall becomes even less important.

@onlythetruth7 4 ай бұрын

@@tzardelasuerte still using the same decade old architecture to train. Hopefully new stuff comes out fast and you’re right