I Asked OpenAI and Gemini Deep Research to Write About My PhD Topic...

Рет қаралды 21,703

Күн бұрын

With the release of OpenAI's "Deep Research" feature, I decided to test it and Google's Gemini 1.5 Deep Research feature out by making them both write a literature review for my PhD dissertation topic.
Kyle's Dissertation: escholarship.o...

Пікірлер: 138

@stepthut 6 күн бұрын

only around 5 years ago the complaint with GPT's was that sometimes it used the wrong sentence structure- nouns and verbs were confused and tense was wrong and the sentences didn't quite make sense. jump forward and the complaint was that the GPT didn't quite understand logic or reasoning sometimes. now today we are complaining that the GPT doesn't always reference the appropriate galaxy when it is making it's argument, a mistake buried deep within it's research paper. It is mind boggling how fast it's all going.

@SethGrantham-k1x 7 күн бұрын

Finally, a new AI produce / service that is actually useful and will lead to positive results instead of just spam and laughs. This one is great, and will help tons of people including myself.

@vaolin1703 7 күн бұрын

I'm using it to research how some weird / unusual APIs work. For one specific thing it managed to find more information in 6 minutes than I could in 1.5 hours. I will keep testing it.

@jeffwads 7 күн бұрын

AI has produced more than just spam and laughs for a while now.

@starmorpheus 7 күн бұрын

Clearly a skill issue on your end. ChatGPT’s release in 2022 has changed the game as soon as it dropped. From productivity to personal work. If it took until now for you to find a use, then you haven’t even bothered using what we already had available.

@Justashortcomment 6 күн бұрын

That’s a fair comment. But the bottom line here is that it relies on o3 which is “the next level”. To set up a system like this you need a really smart foundation model. 2 months down the line, it will probably be even smarter. ;)

@pareak 7 күн бұрын

Say, how many people are aware of the fact that our expectations just go higher and higher? Like, this is the first iteration and it can already do something 99% of people these days would not be able to! Sometimes I remind myself of the fact that just 2 1/2 years ago, SOTA was GPT-3.5 with which you could have a good conversation and that's it! Where are we going to be in 3 more years...?

@Hardcore10 7 күн бұрын

It’s really interesting We’re at that point in AI to where if it doesn’t give us a thesis like a graduate student would it doesn’t impress us anymore

@anta-zj3bw 7 күн бұрын

It took him 5 years to develop his final Thesis. Deep Research, in its infantile state, developed a research starting point in as much time it took him to finish brushing his teeth. And we are not too impressed by that. Wow

@damondragon324 7 күн бұрын

Great video, I was already waiting for it :) The amount of citation is probably a compute limitation per task. I still remember when gpt 3.5 and 4 would always cut off in the middle of code output and I always had to prompt it to continue. The amount of tokens the new model can output is insane in comparison.

@carlkim2577 7 күн бұрын

This is a very high bar to pass. A very niche topic graded by one of the top experts. I'm impressed it even got this close.

@randomuser5237 7 күн бұрын

It doesn't actually have access to paywalled journals, only what's publicly available on internet. They said they were going to work on that later (perhaps using individual access). So this would only work for disciplines where many journals are available on arxiv or similar sources (and you have to tell it to look in those sources). You can upload pdfs also I think so that would also help. It's also very prompt dependent. The initial set of questions it asks is for that reason. The better you specify what you want from the report (language style, in-depth description, double check citations etc) the better the results will be.

@Fatman305 6 күн бұрын

You can likely feed it its own output and tell it "you made some slight errors on this sentence, need to mention 2x the galaxies", basically clues but not revealing much, and have it redo it substantially better. Of course the super users would feed it prescraped data and have it use that optimized data plus let it browse for related/similar data to enhance it. In other words, he should give it his actual phd paper and have it take it to the next level...

@ttul 5 күн бұрын

Working in the context of business analysis, I’ve found Deep Research really helpful in pulling together reasonably lightweight briefings on a topic and also reasonably good at getting it “right” and not missing too many of the big concepts that I might have included had I written the briefing myself. In its current form, it’s already incredibly helpful and I’ve put together about ten documents with its help that I edited and was able to make use of. That’s so vastly superior to my level of output previously that it has to be taken very seriously. And this is the _worst_ it’s ever going to be… Consider what another year of improvement on this product and the underlying models will yield. What a time to be alive.

@architech5940 7 күн бұрын

I wrote a thesis for my bachelor's with the assistance of ClaudeAI. Claude 100% can create graphs using the data you give it. You can pass in a data file and ask it to create [homogeneous category] graph with the attached data file. Annotate the plot, add a legend, and summarize the illustration.

@Axio-Flex 6 күн бұрын

OpenAI's 4o and o1 model can do that but they limit the interconnectivity and functionality of the latest models and work them in little at a time. Currently deep researcher might not have code execution in its skill set, doesnt mean it lacks the ability it just is brand new and not fully integrated yet.

@oimrqs1691 7 күн бұрын

Great analysis!

@Dr.UldenWascht 7 күн бұрын

Fantastic. Precisely the sort of review I was looking for.

@KyleKabasares_PhD 7 күн бұрын

Thanks for watching, glad you found it helpful!

@pc_screen5478 7 күн бұрын

I think the reason it doesn't cite more sources is bc it reaches the limit of its context window. O3's context window is 200k tokens and even a short paper will have more than 10k tokens, not to mention the reasoning it does for each source it finds + the tokens in the final report. Gemini deep research is shallower in terms of reasoning and has 2M context window so it can get away with hundreds of sources

@LoveLifePD25 7 күн бұрын

We must have AI doctors in everyone's pocket ready to prescribe medication to reduce hospital loads by 50%

@Kulimar 7 күн бұрын

Google really needs to step up their rollout / marketing game.

@dwiii1635 7 күн бұрын

I wonder if you had prompted it to go deeper and acquire x amount more citations if it would have presented a more thorough paper? Might have to prod the AI to go deeper similar to when coding with it.

@TechnoMageCreator 7 күн бұрын

AI reflects perfectly the user awareness, knowledge, wisdom and experience. The effort you put into it all the way up to personal biases and hallucinations will be reflected back. Yes you put 2 min and ask an answer you get a bigginers answer. Now take that article, make all corrections and add more ways to expand and clarify. Guide the AI properly and the possibilities are unlimitless. The only limit is our own minds.

@ryzikx 6 күн бұрын

@TechnoMageCreator 6 күн бұрын

@ryzikx with that attitude I bet you think it doesn't, and that still makes me correct. I would say talk about this topic with an AI and you might find new patterns, but that also might be useless since your vocabulary is composed of the word no. Enjoy your "no AI" I guess

@ChrlzMaraz 6 күн бұрын

It would be interesting to see you run the prompt again but account for its previous shortcomings and address those in the prompt crafting.

@philipscott6203 7 күн бұрын

Great review! I think we might be seeing the result of a context-window limitation in this model as to why it does not cite more sources. As it is, I think they must be playing some very interesting games with the context to be able to find and retain all of the information that was found in a comprehensive way... If we look at a 200k token context window, I feel like you would fill that up immediately looking through, oh, say ~10 research papers?

@patruff 6 күн бұрын

Thanks for this, I feel like there's so much hype about this, good to see someone actually using it instead of just claiming AGI HARD TAKEOFF

@Axio-Flex 6 күн бұрын

Thank you very much for this report, its refreshing to get an actual expert on something weighing in on the accuracy and complexity of these AI's so called PHD level intelligence. Too many of the videos on this topic are made by tech bros with too much hype and not enough critical analysis skill.

@GNARGNARHEAD 7 күн бұрын

how wild is it that it's performing anywhere near this level though 😆

@davegordon6233 7 күн бұрын

I assume that the first result is just a template. You may ask to add data to the tables, rephrase specific chapters etc

@CiaoKizomba 7 күн бұрын

This was highly informative and honest.

@KyleKabasares_PhD 7 күн бұрын

Thank you for your comment! I’m glad you watched it and found it so :)

@deter3 7 күн бұрын

i have worked the similar report last Apr by using Claude api and very complicated workflow . The weakness for such report are 1 . hallucinations , either you read it carefully and click all citation and you might find out or you will not 2. quite informative but lack of perspective , as you stated like a graduates students' work , but not insights being discussed . It's a information collectors .

@sebkeccu4546 4 күн бұрын

incredible for a 1st pass in 15min, dont forget that a student does dozens or more passes before it arrives at your end paper. If you would have it improve it, it might have gotten at your level in a second pass, and would still only require 30min while you were drinking someting :-) And let's not forget this is also the very first version. I wonder how it would have resulted if you would have let it have one or more passes, giving small hints based on what you notice that could improve.

@ksinha88 7 күн бұрын

You could just ask it to reference more sources like 20 to 30 references. Also, you could ask her to proofread itself and to verify things from the sources and check if there’s any hallucinations. Pretty impressive we can do all of this in 15 minutes work that it took you five years to do.

@eliotcougar 7 күн бұрын

The limit of 10 sources seems to be artificial... I saw it produce 10 sources in many public examples... It looks like the model stops looking for new sources as soon as it finds 10 valid sources... It's entirely a load-limiting measure... Technically, one could let it run for hours to get a much more comprehensive review, but we're not there yet...

@imthinkingthoughts 7 күн бұрын

yeah this exactly is what i was thinking

@Brent-ob8km 7 күн бұрын

we will be soon when we can run it locally

@gunnar_langemark 7 күн бұрын

It's a little disappointing that you did not bother to actually compare the two Deep Research versions. I've been using the Gemini version for a month, and was curious about OpenAIs version.

@stickman1695 7 күн бұрын

it's not even close, google gets absolutely crushed. even r1 with search is better than it

@KyleKabasares_PhD 7 күн бұрын

Thanks for the feedback, I’ll probably do a more serious comparison between the two in the future in other domains that I’m familiar with.

@AlfarrisiMuammar 7 күн бұрын

We are only 1 year away from LLM replacing the majority of white collar jobs. 😂

@wonmoreminute 7 күн бұрын

Curious why that’s funny?

@AlfarrisiMuammar 7 күн бұрын

Because I am a farmer.

@akfortyfo7024 7 күн бұрын

That's pretty laughable. Companies being created right now using an AI-first mindset likely won't need as many white collar workers, but you're telling me all of these existing companies are currently and successfully architecting agents to replace people? Nope. I work for a fortune 500.The current systems are too complex, unmaintained, and tasks are too nuanced for that to happen within a year. 3 years maybe.

@byronfriesen7647 7 күн бұрын

great contribution to our understanding

@KyleKabasares_PhD 7 күн бұрын

@@byronfriesen7647 thank you!!

@rRobertSmith 6 күн бұрын

You should have given it a second pass and insisted that it use the other data points. Rewrite the paper with more confidence!? would have been a good 2nd pass.

@citogrid 7 күн бұрын

Progress, but it only becomes impressive when it adds to your PhD thesis, something that is still very far away. What grade would you give it compared to your original work?

@CosmicCells 7 күн бұрын

I think it already can add to your PhD thesis. Not so much at the end when you have aquired all this knowledgeand are writing it together, but at the beginning, when you have to work yourself into different novel topics and do lots of literature review. Obviously Ai tools like scispace and elicit can help you there as well. If you get your results you can possibly counterreference them with other literature. Overall if I were writing my PhD today I think o1, o3, deepseek, all these literature research and summarization tools would have saved me soooo much time and would have made many things much more fun and easy... So I dont think the era of where these tools are "adding" to your PhD thesis is so far away at all. Also depends on your competency. If you are an Einstein these models might not do much for you but for an average PhD student, these tools can definately nudge you in the right direction and act as guidance in case your supervisor is not that helpful (which I have heard can happen...). So yeah from my view I would say we arent far away at all. We are already there.

@danielmartinmonge4054 7 күн бұрын

What a random moment to start getting impressed. I’ve been impressed since GPT-3, and this is HUGE. I don’t know any researcher who doesn’t already find these tools useful. Workers see a 15-20% productivity boost in almost any office work you can think of. And when it comes to contributing to a PhD, this new model can extract relevant information from a massive bibliography in minutes, instead of the months it would take manually. The only thing is that you still have to double-check every result for hallucinations. But even this problem is becoming less and less frequent.

@VaneNickOke 7 күн бұрын

Very well said.

@citogrid 7 күн бұрын

@ I wonder if the next "internal OpenAI" step would be to assign a whole lot of Deep Research bots to do Deep Research on Artificial Intelligence and try to add to the knowledge base on the road to AGI... Certainly exciting times to come.

@electricpaper269 7 күн бұрын

The reasoning models desperately need the ability to upload pdfs to them. How is that not a thing? I want to ask questions about specific books and articles. Deepseek does that though, but o3 is a lot better.

@djayjp 7 күн бұрын

Pretty sure you can do that with o1.

@mirek190 7 күн бұрын

Bro ..did you get stuck in 2023 ?

@bottymcbotface007 7 күн бұрын

He's right. I can only upload images now, not PDFs. Maybe there are different upload permissions in different regions? 🤷🏻‍♂️

@mirek190 6 күн бұрын

@ you can upload pdf ....

@AustinThomasPhD 7 күн бұрын

OpenAI Deep research halucinates way too much. Gemini deep research may not be as thorough and nuanced, but it is way better at admitting when it can't find sources in support of a statement (or even if it finds evidence refuting it). Open AI will presumably get there eventually, but this is a little undercooked. Not as undercooked as Tasks, but still a bit undercooked. I don't need it to write a whole review paper; I just need solid sources and a place to start from. In my experience (very different field), Gemini does pick up on a lot of sources. To me, that is more useful.

@Kalvy01 7 күн бұрын

Yes but that’s by design. They are bread crumbing product releases as planned. They have MUCH better models internally

@Antiposmoderno 7 күн бұрын

As a philosopher. I would love if an IA could give me citations with references of any anything that I ask

@PseudoProphet 7 күн бұрын

Google is charging $20 for unlimited use. OpenAI $200 for 100 questions. 😂😂

@Vaprium 7 күн бұрын

I don't think people usually write 100 essay's, so i think it's enough. There's also quite a big difference between google's ai and openai's.

@elawchess 6 күн бұрын

@@Vaprium 100 is "enough" but you missed the bit where it's $200

@Vaprium 6 күн бұрын

@@elawchess How much do plus users get?

@elawchess 6 күн бұрын

@@Vaprium I don't think plus users have access yet but on Twitter I think Sam was saying they will get 10 per month of this expensive version. They will then develop a cheaper model and give everyone more usage.

@alex_316 7 күн бұрын

Thank you so much, enough for me to wait several more iterations and then probably pay for it, but not now

@SimplyApollo 7 күн бұрын

3:48 do you see right there it referenced you? "Kabasares et al"

@NemosYouTube 7 күн бұрын

Good review Kyle!

@stephenrodwell 7 күн бұрын

Thanks, helpful video. 🙏🏼

@SteveGamesOnline 7 күн бұрын

0:23 this is a really missleading benchmark. Especially when all the other models other than OpenAI deep research didn't have access to the internet.

@DjamelKramcha 7 күн бұрын

Very true, but in the end it doesnt matter.

@VladamirOffPutin 7 күн бұрын

The point is the promise of adding “tools” to “reasoning”. The surface has barely been scratched. Just imagine this “tool” and others soon to come combined with a locally stored model in a year or 5.

@SteveGamesOnline 7 күн бұрын

@@DjamelKramcha what do you mean by that?

@lio1234234 7 күн бұрын

@@SteveGamesOnlinein the sense of capability is what matters, so whether it got a better score because they finetuned it for this use or not doesn't effectively matter because it still achieves it autonomously.

@SPOOKEXE 6 күн бұрын

I just noticed on the "Humanity's Last Exam" at the start of the video, they hadn't shared the tools that deep research used with the other models, no browser or python tools. Unfair benchmark?

@Miguel_Noether 5 күн бұрын

You have to read their moto first

@ewallt 6 күн бұрын

Did it suggest you look into your dissertation for more information?

@lolerie 6 күн бұрын

I will just tell you it did index a lot of dissertations, some from 2012 e.g.

@markupton1417 7 күн бұрын

Better prompting, better results....

@BlindedByLogic 7 күн бұрын

Seems like it created a so-so PHD level paper in 15 minutes.... In a few months time, the paper will likely be a bit than so-so, and I will only continue to get better. Where PHD level research papers can be done in an hour which may have taken traditionally around 4-7 years.

@ellielikesmath 7 күн бұрын

i can tell you the majority of time spent writing these papers is not spent on the writing stage. the vast majority, 99%+, is spent thinking and discussing what to do, writing code, checking the code, running the code, thinking some more, discussing some more, etc. if it's just compiling things for a review, that might be different, tho, tbf.

@chrisnash7678 7 күн бұрын

What s time yo be alive.

@chrisnash7678 7 күн бұрын

@ellielikesmath right but now you can do all the thinking with none of the research.

@chastetree 7 күн бұрын

The majority of the time for science PhDs is spent designing and running experiments, as well as compiling and analyzing data. The writing of the final thesis is generally done on the side and is not the main activity.

@maxk3062 6 күн бұрын

Live stream with openai Deep Research would be great!

@LuizFernando-xy6fo 5 күн бұрын

Is it possible to do this using playground or is it only for subscribers? Excuse my English, it's not really my native language.

@DickyKurniawan-u3m 3 күн бұрын

For gpt, you must paid $200 to used deep research. But for Gemini you can used trial and get 3 month free.

@GabrielGarcía6-12 5 күн бұрын

For searches currently Perplexity is better

@Gamerhero45 5 күн бұрын

If you had literally just pumped the transcript of how you reacted to it’s first draft, into ChatGPT as a response to its first generation, it would have fixed everything for you. News flash, this model is coming after your research work so do not discount it

@Miguel_Noether 5 күн бұрын

A scientific paper is not just collecting old information, it's about generating new verifiable results (at least physics and math)

@JavedAlam-ce4mu 7 күн бұрын

read the asterisk - it says "with browsing + python tools" - you can't compare its performance to other LLMs on that test if they didn't have the same tools. It literally was able to use the internet to find the answers.

@DraganAlves 7 күн бұрын

So it's still like a mid-wit undergrad student?

@Recuper8 7 күн бұрын

He said more like an unconfident graduate student (Masters degree student).

@lipinglin1994 7 күн бұрын

I tried it and it is not that impressive. I would give it time they tune it and Phd are dead.

@LucaCrisciOfficial 7 күн бұрын

Wait but generally a research review involvers the work of many degreed/PhD/experts for weeks or even months. It's obvius that can't be at this level yet

@DraganAlves 7 күн бұрын

What's "obvious"? What do you base that on? Nothing's obvious, and that's why we test each new model that comes out.

@LucaCrisciOfficial 7 күн бұрын

@DraganAlves I base on my knowledge about the actual capabilities of AI systems. They are great but not yet capable of doing autonomous research at "light speed", that I know (It will be, but not yet actually)

@Lucasbrlvk 7 күн бұрын

❤

@charles120001 7 күн бұрын

Most of the sources it cites do not exist; it hallucinates. Hence, you can't trust what it says. This is because ChatGPT still has no access to Scholarly databases, unlike Gemini and Perplexity. I'd double-check if I were you.

@ricksanchez4659 7 күн бұрын

are you sure you are talking about Deep research, and not just vanilla 4o?

@aguspuig6615 6 күн бұрын

deep research does have acess im pretty sure

@Nasser-bp6qf 6 күн бұрын

the sources are fetched from online searches, so they are actually 100% real

@ttul 5 күн бұрын

No doubt ChatGPT has access to Anna’s Archive…

@charles120001 5 күн бұрын

@@ttul What's Anna's Archive? No, no, I don't use that : )

@shanep2879 7 күн бұрын

I am glad you did not give us the information to your research paper. I made this mistake recently and I’m watching it all over the planet right now and I think I regret it, but I really am not sure. It was on dark energy. I wrote five also wrote nine papers on reasoning applications for AI.

@riksstaden4927 6 күн бұрын

What's the point of doing science if it's not shared? If the research paper matches the reality of how the world works, then it should stand to scrutiny. Additionally, if one person has a fact that matches what reality is really like, then if only that person knows about it, then it's the same effect on the world as if no one knows about it.

@Sergey-z5l1g 6 күн бұрын

It's not super impressive, but I find it helpful for researching how some obscure APIs work. Saves me a lot of time.

@martinmitrov788 5 күн бұрын

What if it was 2022 :P Wouldnt you be then impressed?

@Miguel_Noether 5 күн бұрын

Can 99% of the people of the world replicate the same research in 15 minutes? Even the 1% in that research area ?

@missoats8731 4 күн бұрын

@@Miguel_Noether That's a valid question. But you have to keep in mind that the outcome has to be of some value at least or it doesn't matter how fast it does it. If it writes a research paper in 15 minutes but half of it isn't quite correct or it forgets to take important things into consideration you can't really do anything with it. It's a bit similar to what's happening with video generation. The outputs get more and more impressive and a human would have to put a lot more effort in for the same result. But as long as you can't tell a coherent story with it, it's almost useless. That being said, I find Deep Research quite impressive and I'm sure it will get a lot better very fast and will create a lot of value.

@nguyenvu8262 4 күн бұрын

If you can, please try one research with domain that you are not familiar with. In coding, it's reported that AI is not that helpful because the time and psychological load it takes to fix the code outweighed the benefit of they write code in the first place. So if some place it could randomly get wrong means you have to check ALL places to make sure it's okay. That's maybe worse that you wrote it yourself.

@thebluriam 7 күн бұрын

These AI systems will not advance in reasoning until they have their own ability to write and update their own notes and work on solving problems in isolation from other problems, and then to compare the results against their hypotheses, update their conception of the world, and rework the ideas they want to test until they get somewhere novel. Right now, they’re no where near this kind of capability, and I really don’t know why systems like ChatGPT and Gemini don’t already have this capability. Creating a tree of documents is trivial, and maintaining a tree of metadata about what is in the documents is trivial

@holykim4352 7 күн бұрын

you lack a fundemental understanding of ai models. Youve essentially just described reinforcement learning, but to your first point, a model is a very large set of vectors and points that are used to run statistical probability running. At the end of the day, these models are activated by a file calling on the output based on the input. Thats not to say its not improving in reasoning though, there are attention layers among other layers that essentially change weights of vectors to return better outputs.

@angloland4539 7 күн бұрын

@sizwemsomi239 7 күн бұрын

Honestly the combination of perplexity and deepseek R1 in pro...is far more advanced...

@damondragon324 7 күн бұрын

@@sizwemsomi239 I like Deepseek but if I look at the comparison from "AI Explained" OpenAIs version performed better. You can check it out yourself if you want to see the comparison

@danielmartinmonge4054 7 күн бұрын

I haven’t tried it, but it is supposedly running on the O3 model, which is WAY more powerful than DeepSeek, which performs more or less like O1 but is cheaper. Perplexity, last time I checked, does not have the tools to go back and forth with reasoners, and it is just a bit better than the regular web search that every model has. How did you come to this conclusion? Did you find any papers or benchmarks? It would be amazing if you could get even close to a model thousands of times larger while using one that performs tens of times worse, without the final fine-tuning that Perplexity applies. That would be a huge breakthrough no one is talking about.

@sizwemsomi239 7 күн бұрын

@danielmartinmonge4054 perplexity has deepseek R1 in their pro search...please try it..

@Rami_Zaki-k2b 7 күн бұрын

Everybody knows you are God. No one expects AI Agents to beat God ... That is why people are more interested in an OpenAI Deep Research VS a Google Deep Research comparison ... Can God answer my prayers ?! 😁

@xorqwerty8276 6 күн бұрын

It’s not that good, probably as good as a young intern

@michaelspoden1694 6 күн бұрын

It was just released like 3 days ago and being able to say it's probably as good as a young intern is actually a great compliment even though you said it's not that good imagine what it will be in 6 months 3 months or having it run a second time feeding it that information and telling what you want to add

@Nasser-bp6qf 6 күн бұрын

if seen many smart engineers and researchers on X that say the research that used to take them weeks, deep research does it in hours

@xorqwerty8276 6 күн бұрын

@ agree it’s a stepping stone. I think it’s being advertised as phd research that could move humanity forward but at this stage that’s not the case in my opinion. However yea in 6 months it will probably be better than all of us. Can’t wait!

@Miguel_Noether 5 күн бұрын

If you're not in the area, how could you possible do the same thing in 15 minutes? Can you do it?

@xorqwerty8276 4 күн бұрын

@ how much good research is done by people that don’t know the area?