Three Ways to Evaluate LLMs
5:49
3 ай бұрын
Пікірлер
@LeonidAndrianov
@LeonidAndrianov 7 күн бұрын
Interesting, thank you
@SnorkelAI
@SnorkelAI 7 күн бұрын
Glad you think so!
@monostechhelp1143
@monostechhelp1143 Ай бұрын
Thankyou sir
@SnorkelAI
@SnorkelAI Ай бұрын
You're welcome!
@robrever
@robrever Ай бұрын
That was a pretty fast multi-modal chart analysis. Was this in real time? Very impressive!
@luizaugustoferreira5286
@luizaugustoferreira5286 Ай бұрын
Really good video
@SnorkelAI
@SnorkelAI Ай бұрын
Glad you think so!
@robrever
@robrever Ай бұрын
I've watched like 15 of your videos and I still have no idea what snorkel does. But the videos are great! Keep it up 😊
@SnorkelAI
@SnorkelAI Ай бұрын
Ha! Glad you like the videos! Snorkel AI helps large organizations build bespoke models that yield real enterprise value through data-centric AI-and we enable them to do it FAST. Basically, we help organizations use their subject matter experts' knowledge and insight to label their own proprietary data 10-100 times faster than they could do it manually.
@A392Hz
@A392Hz 2 ай бұрын
GPT 4o (four "Oh" for "omni"), not GPU "forty".
@SnorkelAI
@SnorkelAI Ай бұрын
Thanks for the clarification! We are aware of the error. We turned around this video **very** fast. At the time, it was unclear what the proper name was.
@tarlen
@tarlen 2 ай бұрын
Super helpful overview, thank you
@SnorkelAI
@SnorkelAI Ай бұрын
Glad it was helpful!
@uuPacific
@uuPacific 3 ай бұрын
actually your method (label function) can reduce overfitting
@himanshugarg6062
@himanshugarg6062 3 ай бұрын
why are we trying to reverse all the automation that we've already achieved..? didn't we build AI to automate things we couldn't before..? don't we already have systems that can tell you the status of your shipping, it's called a tracking screen. Use the AI to "build" the interface, not "be" the interface. just like we humans do already.
@himanshugarg6062
@himanshugarg6062 3 ай бұрын
the same product you're showing might be helpful in creating AI systems that can "build" the interface but why do you need to market it to the wrong use-case, AI should not be responding with my balance, rather take me to the right screen or even build one customized to the query.
@SnorkelAI
@SnorkelAI 3 ай бұрын
Hi there! That's a good point! We're not in the business of telling people what to do with their generative AI use cases, but different companies and different users prefer different experiences. You sound like someone who is very technically adept, but not everyone is. We've all struggled to find information we need on a website even when they're well-designed, and they're not always well-designed. Making all of that information quickly available in a chat interface can build a much better experience, especially for users who are not as proficient as you are in understanding website architectures.
@jean-charles-AI
@jean-charles-AI 4 ай бұрын
Very well explained, thank you.
@SnorkelAI
@SnorkelAI 4 ай бұрын
Glad it was helpful!
@brunoscaglione6831
@brunoscaglione6831 4 ай бұрын
Title is misleading
@SnorkelAI
@SnorkelAI 4 ай бұрын
What would you suggest as an alternate title?
@brunoscaglione6831
@brunoscaglione6831 4 ай бұрын
@@SnorkelAI Realistic LLM expectations and a glimpse into the future
@vivekpadman5248
@vivekpadman5248 5 ай бұрын
Is this approach used on all three levels of training? Base instruct ane chat fine-tuning? And are there different things to be considered for the above?
@SnorkelAI
@SnorkelAI 5 ай бұрын
I'm not 100% clear on your question. Are you referring to pre-training, fine-tuning and alignment? If so, this approach could be used on fine-tuning and/or alignment. It could also theoretically be used on pre-training, but I suspect that would yield poor results.
@vivekpadman5248
@vivekpadman5248 5 ай бұрын
@@SnorkelAI yes that was exactly my question, thanks 😊. I have one follow up question here. Why do you think it would yeild poorer results on pre training phase any insights on that and in that case what kind (size and arch) of pretrained student model should be used with a specific teacher llm Or anything would work?
@SnorkelAI
@SnorkelAI 5 ай бұрын
Sorry for the slow reply here. KZbin didn't surface your reply comment the same way it did your initial comment. We're getting a bit outside the bounds of what can be reasonably answered within a KZbin comment, but I think we can reasonably say this: Distilling a model means using its output to train a smaller model. For pre-training, that would mean creating an immense volume of raw generated outputs to form the parent model. Several studies have shown that pre-training generative models on other models' generated output tends not to work so well. We don't yet fully understand why, but we understand that it is a questionable practice at present.
@vivekpadman5248
@vivekpadman5248 5 ай бұрын
@@SnorkelAI no worries man, getting such a nice detailed reply is all that matters. Ah understood it properly now, also I guess the limits of the parameter size will come into picture while doing that if we use it for pretraining. Clean data plua synthetic data is anyways available now. Thanks again 😊🙏
@vivekpadman5248
@vivekpadman5248 5 ай бұрын
Very nice short informative video. I'm looking to create a distilled model on reasoning tasks for games which could run locally. This will help 😊 thanks
@SnorkelAI
@SnorkelAI 5 ай бұрын
Glad it was helpful!
@clashcodes0855
@clashcodes0855 6 ай бұрын
free?
@SnorkelAI
@SnorkelAI 6 ай бұрын
Included in Snorkel Flow. 😃
@RyluRocky
@RyluRocky 6 ай бұрын
Well done!
@SnorkelAI
@SnorkelAI 6 ай бұрын
Thanks!
@chakpak
@chakpak 6 ай бұрын
Programmatic labeling of images at scale is so cool. 🎉
@SnorkelAI
@SnorkelAI 6 ай бұрын
We think so too!
@chakpak
@chakpak 6 ай бұрын
Wow! Preprocessing is 🔥🔥
@chandra7599
@chandra7599 6 ай бұрын
What does Wayfair do... Intro would be helpful to understand and connect with the content.
@SnorkelAI
@SnorkelAI 6 ай бұрын
Wayfair sells furniture and home goods online. That's good feedback, thanks!
@gobdovan
@gobdovan 7 ай бұрын
I came here to understand precisely how the Snorkel software assists with the issue. However, you discussed a general RAG system and mentioned that Snorkel expedited your process in different ways without specifying what Snorkel AI actually does. In your description, you stated, 'We explore how Snorkel Flow accelerated development of[...]'
@SnorkelAI
@SnorkelAI 7 ай бұрын
We're in early days on this kind of video content. The intent of this one was to talk about the case broadly and concisely. Were you looking for more of a product demonstration?
@gobdovan
@gobdovan 7 ай бұрын
@SnorkelAI, I appreciate the explanation. I was expecting a product demo based on the video's description, which mentioned exploring *how* Snorkel Flow accelerates development. I'm familiar with Snorkel the package, and the pain of developing labeling functions. The video was recommended to me, so I checked your GitHub for updates. It appears the repo hasn't been updated recently, and your focus seems to have shifted to Snorkel Flow. However, the video did not cover it in detail, and I couldn't find any comprehensive product presentations. Is Snorkel Flow aimed at larger corporations, or will it be available as a SaaS for broader access? Could you recommend any videos that present the product, particularly in the context of creating datasets for ASR/translation? I'm looking for more efficient methods to build such datasets.
@SnorkelAI
@SnorkelAI 7 ай бұрын
What you surmised is correct. Snorkel Flow is currently aimed at large enterprises. We will likely have more product demos heading for the channel soon. In the meantime, you can watch this one, which has a bit of product demo. kzbin.info/www/bejne/kGPPemyEYpqAhLM You can also sign up for a product demo here: snorkel.ai/demo/
@arazmalek887
@arazmalek887 9 ай бұрын
Thanks for the information, but listening to you talking like: 'aaaaaaa eeeeee anddddddddd' was really frustrating
@420_gunna
@420_gunna 9 ай бұрын
Thank you Snorkel for putting this channel together! All of your videos + guests have been compact and informative -- really good brand marketing, I think.
@420_gunna
@420_gunna 9 ай бұрын
When you talk about distilation requiring large, unlabeled datsets... to be clear for my understanding, it's not necessarily that they're unlabeled data, it's more like we don't care about the dataset's labels, and instead use the teacher model's output distribution as the replacement pseudolabel. I guess you COULD create a distilled model by training against some data distribution that the teacher wasn't itself trained on... but I can't imagine why you would want to do that😄
@SnorkelAI
@SnorkelAI 8 ай бұрын
Sort of. Typically, you would use this for data that is, in fact, unlabeled-think sections of contracts or paragraphs from text books. You could also employ this approach for data that has labels that don't fit your desired schema, in which case your statement of "we don't care about the dataset's labels" would be 100% correct. As for your second comment, there could be a number of reasons you may want to do that. Perhaps the teacher LLM does quite well on a particular labeling task when given a highly-engineered prompt. This approach would let you transfer that performance into a smaller and cheaper model.
@kendwyer1277
@kendwyer1277 9 ай бұрын
Very informative, thanks
@420_gunna
@420_gunna 9 ай бұрын
Awesome video! Data-centric AI is really awesome, and is a tractable space for the open source community to work in.
@SnorkelAI
@SnorkelAI 8 ай бұрын
It really is!
@axe863
@axe863 11 ай бұрын
Complicated Nonstationarity is really horrific for a wide range of methods/models
@riser9644
@riser9644 Жыл бұрын
Link to the blog code or ppt would be good
@lionhuang9209
@lionhuang9209 Жыл бұрын
where can we get PPT?
@mechwarrior83
@mechwarrior83 Жыл бұрын
please
@askeletalghost
@askeletalghost Жыл бұрын
I simp so hard for Emad
@yhWang-y4j
@yhWang-y4j Жыл бұрын
Hi, thanks for the nice sharing! Could you please provide the sildes you use in the video so I can further study?
@yorailevi6747
@yorailevi6747 Жыл бұрын
Tip 4: Toss out noisy examples. More data is not always better! Should be rephrased; Toss out non-decisive/opaque examples while keeping variability of examples.
@InquilineKea
@InquilineKea Жыл бұрын
Can she train on the video data of my entire life
@InquilineKea
@InquilineKea Жыл бұрын
Why does she pattern match so hard with Fred sala?
@annapurnasolutionsllc6463
@annapurnasolutionsllc6463 Жыл бұрын
Does California Institute of Technology pay women lesser than men then - per Anima's comment ?
@ayushsharma3148
@ayushsharma3148 Жыл бұрын
Hey guys. I want to save this video to my youtube playlist. Can you please open save / add to playlist option?
@NukulSharma
@NukulSharma Жыл бұрын
Tried HoloClean on bigger datasets, tensors just explodes out of memory. Any pointers which can help?
@irshviralvideo
@irshviralvideo Жыл бұрын
Why use AI when you have simpler models that can be easy to explain???
@EuphonicEscapes
@EuphonicEscapes Жыл бұрын
It is sad that there is almost no point in using an Apple Pencil any more. Or rather, if you were a digital pencil user... Your job went poof. People simply don't care about digital paintings any more.
@CalvinJKu
@CalvinJKu Жыл бұрын
This is amazing. KZbin need to send more traffic to this!
@noraalturayeif996
@noraalturayeif996 Жыл бұрын
Thank you for this great summary! .. Could you please share the slides?
@avinashmahure281
@avinashmahure281 Жыл бұрын
Thank you for sharing this event.
@lionhuang9209
@lionhuang9209 Жыл бұрын
Very useful!
@faithandherghosts
@faithandherghosts Жыл бұрын
This is brave, important work. I’m grateful to have happened across this article. Just today, I reviewed news of dismissal of the accusations that members of the Fairfax, VA (US) Police Department protected a sex trafficking circle. The charges were dismissed due to evidence that the accuser (a Jane Doe) had identified as a consenting escort worker in her history of involvement with the individual at the center of the alleged trafficking ring. One thing that occurred to me in thinking about the efficacy of content skimming of sites hosting ads that recruitment hooks may be nested within, is that a lot of trafficking is off-web and involves youth that are trafficked (sold/bought) through in-person processes involving economic incentive pressure, false-choice coercion and direct threats levied against vulnerable people by powerful buyers in trafficking networks, and activities such as kidnapping and in-person drugging of victims, luring and enticement that leads to hostage involvement in the most dangerous networks in human/sex trafficking. Are there identified characteristics of heightened likelihood of trafficking in certain geographic areas - e.g. your mention of high rates of homeless youth being trafficked? Other possible metrics of likelihood might be having a known sex-tourism market, access to ports over-water transport (private boats, larger cargo/industry watercraft), and various socioeconomic measures (poverty increases, changes in industries due to disaster events or climate/seasonal tourism increase or decline/loss of protective factors such as NGOs serving vulnerable women and children, local and regional law enforcement presence being protective of victims or protective of perpetrators, stringency of port records and identification of international visitors, conflict or war-related events…etc. etc. When I reflected on the strong recruitment-trafficking line between Australia and wealthy western countries, I wondered about whether that activity may be showing secondary-broker activity relating to the conspicuously absent (non-English speaking, not online) SE Asian and Pacific Island recruitment/victim markets? I’m sure that the investigative authorities are looking into this, especially after the 200+ global child pornography/brutality arrests of last year…and, yet, because some of the buyer-markets in the trafficking economy are exceptionally covert in their activities, are there ways to identify potential likely areas of risk-of-victimization by some array of metrics that could help local/national/intl. authorities put measures in place to discourage trafficking and/or apprehend perpetrators, like port and water-access security cameras w 3rd party monitoring (because local law enforcement is known to be protective of some criminal networks in some places), access to responsive helplines and reporting lines in local languages, in-person safety measures and environmental protective measures in addition to security cameras at points of entry and exits connecting trafficking markets to the rest of the world (overland and by water) - things like lighting or businesses, police kiosks, clearing of possible low-visibility routes of on-foot transit for people who may be kidnapping or buying young people for trafficking, automobile traffic check-points? Similar metrics may be of use to discover new or transitory victim markets in the US - such as the Gulf Coast (which has a large population of SE Asian immigrants and workers in the shrimping and other port-based trade+travel+hospitality industries, as well as a lot of vulnerable people that may become increasingly vulnerable following destabilizing weather events, slow-season economies, increased costs of living, etc. South American coastal areas (and inland areas with road/river connections to buyer markets and export routes on both Caribbean and Pacific coasts) that have factors that increase risk of predator activity can be easily accessed by boat and transport to/from the US mainland and secondary transport+sale brokers may not be as heavily investigated as perhaps it ought to be…? I deeply appreciate the work you’ve done on data tracking to help show/predict active trafficking activity. I’m grateful that there are committed intl. investigators working to discover and end sex trafficking and child trafficking. Please be careful out there, because the people involved in some of these networks are very powerful and very dangerous, as you are likely aware. If you ever need someone to contract-work on reviewing and labeling data sets, I have training and experience in content analysis, theme-labeling, rubric-based scaling of similar content to hone specificity and gauge possible secondary/tertiary needs for additional or sub-theme labels. I love that sort of bot-mind work and I am 💯 % on board with the effort to use remotely-available information to shine a spotlight on the nodes and mechanics of online & offline trafficking network activities. Much appreciation - and be safe out there. Godspeed and much protection from all the good in the world.* *this is another way of saying ‘…all those nasty mf’er and all their evil ways are gonna be brought down like a sledgehammer-heavy bolt of lightning hitting the ground, and so they better not f* with anyone ‘cause they got eyes on ‘em from all over the sky.’ Cheers…and thanks for the good thinking and the chance to share thoughts here. 😁
@djethereal99
@djethereal99 Жыл бұрын
Paper link?
@SnorkelAI
@SnorkelAI Жыл бұрын
arxiv.org/abs/2205.02318 here you go
@jonaslandsgesell4322
@jonaslandsgesell4322 Жыл бұрын
Nice summary video
@sachinvernekar6711
@sachinvernekar6711 Жыл бұрын
7.53 The PAC rule doesn't really apply here. What if we are able to label only a few type of easy cases. This means we are not uniformly labelling samples from the original data distribution.
@sinaghotbi
@sinaghotbi 2 жыл бұрын
At 28:00, it was not clear to me why accuracies are independent? Is that an empirical evidence? Is that a weak assumption?
@astromikael
@astromikael 2 жыл бұрын
Great presentation - thank you!
@ayoolafakoya9841
@ayoolafakoya9841 2 жыл бұрын
Julien is very awesome
@uncle-millennium
@uncle-millennium 2 жыл бұрын
Excellent presentation. Very lucid. Give this lady a raise.
@dermorgendanach93
@dermorgendanach93 2 жыл бұрын
Holy god!! that's an awesome work thanks for sharing