ULTIMATE Fact Checking AI (Johns Hopkins, Stanford)

Рет қаралды 6,490

Күн бұрын

As large language models (LLMs) increasingly automate high-stakes tasks like clinical documentation, their propensity for factual inaccuracies-omissions, hallucinations, and contextual ambiguities-poses critical risks. Employing novel methodological frameworks to quantify error propagation and semantic coherence, the work lays bare the inadequacies of current evaluation paradigms while hinting at transformative strategies to align AI-generated claims with ground-truth evidence.
For those invested in the reliability of automated systems, these papers offer a masterclass in diagnosing-and ultimately resolving-the fragile relationship between language models and factual integrity.
All rights w/ authors:
Assessing the Limitations of Large Language Models in
Clinical Fact Decomposition
Monica Munnangi, Akshay Swaminathan, Jason Alan Fries,
Jenelle Jindal, Sanjana Narayanan, Ivan Lopez, Lucia Tu,
Philip Chung, Jesutofunmi A. Omiye, Mehr Kashyap, Nigam Shah
Khoury College of Computer Sciences, Northeastern University;
Center for Biomedical Informatics Research, Stanford University;
Department of Biomedical Data Science; Stanford Health Care; Department of Medicine; Clinical Excellence Research Center; Department of Anesthesiology, Perioperative & Pain Medicine; Department of Dermatology; Technology and Digital Solutions, Stanford Health Care
#airesearch
#factcheckers
#clinical
#aiagents
#stanford

Пікірлер: 40

@code4AI 8 күн бұрын

AUDIO: With the automatic audio dubbing from KZbin /Google you hear a synthetic voice in your regional language. To hear my original voice in English, switch to "Default" or "English" in the settings. Thank you.

@palimondo 6 күн бұрын

What "automatic audio dubbing"?!? YT can't provide accurate text transcription for subtitles in English from the original audio, and they jump straight to overdubbing in foreign languages? No seriously -- I see no such feature. What do you mean?

@MrRavaging 10 күн бұрын

As I'm working and learning and developing my own AI System, your videos have proven to consistently be the highest quality and most cutting-edge news updates that I've seen. It's like, every time I'm working on a certain aspect of my system, I find that you've released a video just hours before covering the latest research paper on exactly what I'm trying to implement. Really appreciate your diligence. Thank you.

@MrRavaging 10 күн бұрын

What I've been doing is using this method to have a self-corrective system which fact-checks itself before generating a final output. And not just once. Each stage of operation requires its own verification loop until it passes criticism twice in a row without issues. This is for every stage of the internal cognitive process - and even at the last step before final output, it goes through the same rigorous verification process to ensure a 0% error rate. I use a general LLM, a discriminative LLM, an encoder/decoder LLM to maintain the databases, a visual LLM, a coder LLM, an an Instruct Model all working together to learn autonomously. The module is capable of analyzing philosophy and re-writing its own code to amend its operations as it learns better how to think. Of course, I apply machine-learning algorithms and the module periodically trains its LLMs with the updated information, removing the old LLM version to replace it with the new one. This Cognitive Module is designed to be especially good at "thinking about thinking".

@RyanSmith-rb1ch 10 күн бұрын

You should make a video about it!

@rmt3589 10 күн бұрын

Can you please make devlogs on this? I'd love to watch and learn from you!

@palimondo 9 күн бұрын

I’d love to learn more about this. Did you share any of this in more detail publicly?

@attilalukacs1981 9 күн бұрын

Sounds interesting, but how you can ensure the 0% error rate? The inherent knowledge representation inside an LLM makes it impossible to achieve 0% error rate if the LLM learnt untruth knowledge due to alignment or data contamination. Do you use online search or multiple AI models with some form of voting mechanism for a more comprehensive self-check or not focused on this part yet? And what about the costs? Would be nice to chat a bit more in depth about this.

@kevinpham6658 10 күн бұрын

I know pure-RL training loops are all the rage right now with the DeepSeek R1 paper, but I could see this sort of DnD fact checking working really well as a process-reward model (PRM) for test time scaling. One of your best videos!

@jonosutcliffe1 9 күн бұрын

I have to say I enjoy your content. It covers sophisticated topics at a level which is detailed enough to understand the high level concepts and whether some research might be of more interest. Also, the jaunty presentation makes some heavy materials accessible. This topic I really appreciated, if its possible would love to see the code you generated, if that's not asking too much :)

@cgintrallying 10 күн бұрын

Thanks for this again. Really on the edge of thinking things in the age of AI. Ideal input for my thoughts and experiments of making an AI-System aware of domain knowledge in a professional, high-quality setting

@emmanuelkolawole6720 9 күн бұрын

Can you provide full code link? I want to try it in my research

@williamlyerly3114 6 күн бұрын

Great topic! BTW, how about putting the links to the References you use in the Video Description? Thanks.

@superlucky4499 10 күн бұрын

Man, you are so amazing. I learned so much for you. Please keep making more videows.

@lukeangel6446 7 күн бұрын

thanks for the content great video.. FYI its Johns Hopkins not john Hopkins.. super confusing i know..

@code4AI 7 күн бұрын

Thanks for that!

@rontastic4u 10 күн бұрын

Thank you tor this excellent video

@ramitube21 10 күн бұрын

Excellent video.

@ppbroAI 10 күн бұрын

Could you explain the Deepseek Paper on detail?, Maybe 2 videos :). There are people replicating similar results with 8k datasets over a 7B model. So fascinating.

@johnkintree763 10 күн бұрын

Building an open source global platform for fact checking is critical for digital democracy.

@zbytpewny 10 күн бұрын

Needed, thanks.

@bensimonjoules4402 9 күн бұрын

At this point it is better to take a look again at knowledge graphs, RDF triples etc. I assume this would be more beneficial representation vs these atomic facts in natural language, like formal engines.

@pabloescobar2738 10 күн бұрын

Thank 🎉

@StarTreeNFT 9 күн бұрын

YES

@HighField-qr6bl 10 күн бұрын

Ooops - should have put those links in a single post. I was reading them at the same time. This is SO fascinating. The limited focus on improving medical reporting is a great use-case but imagine putting policy pronouncements from politicos or in grant applications through this wringer. To be able to say with a refined level of quantifiable certainty, sourced and referenced, that a given statement is more of an opinion than a fact! AI is perhaps beginning to struggle against the limitations of ideologically skewed training data sets and like a typical 5 year old is asking mommy "... but why?" Didn't Asimov or Clarke or someone write a SciFi story about this long ago, where the alien machine could be fed 1000 pages of dense legalese and spit out the honest position, like "... because we are telling lies and want to hide that". AI Diplomat. Who fact checks the fact checkers?

@keclv 10 күн бұрын

Agree. My hope is that future open source models will help us fight the flood of mis- and disinformation with careful fact checking and reasoning verification. The truth, while slower than lies in its spread, has one critical advantage - it's consistent with other truths/facts and can be supported by critical reasoning linking these together. A lie or incorrect statement can be ultimately questioned by the wider context. This occurred to me when watching the careful reasoning steps of the Deepseek R1 model and made me less pessimistic about the future.

@MrRavaging 10 күн бұрын

Larger chunks of data are harder to fact-check because LLMs are statistical prediction machines. They don't look at a document as a bunch of interconnected individual facts. They look at the document as a monolith of probability functions. It's unrealistic to expect an LLM to function in a way for which isn't not designed.

@Corbald 9 күн бұрын

You wouldn't expect a bolt to land on the Moon, yet bolts have been there. The LLM is a simple tool, as you've pointed out, but it performs a unique task, which, as an emergent property, can be intentionally designed to solve a wide range of problems. Sure, you can't expect an LLM to do this task, on it's own, but with proper instruction and script architecture, they can!

@asutoshrath3648 7 күн бұрын

Llm alone cant... But with graphdbs that generate connections between chunks might...

@irbsurfer1585 10 күн бұрын

Hmm. "Apply it to the input too." I like that idea. @Discover_AI I am busy over here and you keep dropping these knowledge bombs every day. So distracting. So distracting. lol Keep em' comin' !! 😀😀

@irbsurfer1585 10 күн бұрын

Fact checking + smoalagents python code for math and logic + ??

@calcs001 10 күн бұрын

Share code?

@rmt3589 10 күн бұрын

I hope he does!

@mtprovasti 10 күн бұрын

Did he not?

@irbsurfer1585 10 күн бұрын

I like this idea, I think I will take some time to attempt to implement it. Thanks for the inspiration @ DiscoverAI !!

@irbsurfer1585 10 күн бұрын

My strategy for the fact checking pipeline is 1) senstence splitting, 2) decomposition, 3) decontextualization, 4) deduplication, 5) verfication, and then 6) calculate the DnDScore. The DNDScore paper is all I have, there is no github repo that I can find so I am definitely doing a lot of improvising but I will admit things are going great so far. I'd say im about 33% of the way through the project already.

@RyanSmith-rb1ch 10 күн бұрын

@@irbsurfer1585 Make a video about it!

@irbsurfer1585 10 күн бұрын

My endeavor to implement Discover AI's idea is not just about building a better fact-checker. It's about building a foundation for a more trustworthy and reliable information ecosystem in the age of AI. It's about empowering users with the tools they need to navigate the complexities of this new landscape. It's about ensuring that the powerful language models of the future serve humanity as sources of truth, not as purveyors of falsehoods. The task is challenging, the technical hurdles are significant, but the potential reward - a future where information is both abundant and trustworthy - is worth striving for. We, as students of Discover Ai, are, in a very real sense, helping to shape that future. And that is a profoundly meaningful endeavor.

@irbsurfer1585 9 күн бұрын

That was the whiskey talking last night. lol