That was a pretty fast multi-modal chart analysis. Was this in real time? Very impressive!
@luizaugustoferreira5286Ай бұрын
Really good video
@SnorkelAIАй бұрын
Glad you think so!
@robreverАй бұрын
I've watched like 15 of your videos and I still have no idea what snorkel does. But the videos are great! Keep it up 😊
@SnorkelAIАй бұрын
Ha! Glad you like the videos! Snorkel AI helps large organizations build bespoke models that yield real enterprise value through data-centric AI-and we enable them to do it FAST. Basically, we help organizations use their subject matter experts' knowledge and insight to label their own proprietary data 10-100 times faster than they could do it manually.
@A392Hz2 ай бұрын
GPT 4o (four "Oh" for "omni"), not GPU "forty".
@SnorkelAIАй бұрын
Thanks for the clarification! We are aware of the error. We turned around this video **very** fast. At the time, it was unclear what the proper name was.
@tarlen2 ай бұрын
Super helpful overview, thank you
@SnorkelAIАй бұрын
Glad it was helpful!
@uuPacific3 ай бұрын
actually your method (label function) can reduce overfitting
@himanshugarg60623 ай бұрын
why are we trying to reverse all the automation that we've already achieved..? didn't we build AI to automate things we couldn't before..? don't we already have systems that can tell you the status of your shipping, it's called a tracking screen. Use the AI to "build" the interface, not "be" the interface. just like we humans do already.
@himanshugarg60623 ай бұрын
the same product you're showing might be helpful in creating AI systems that can "build" the interface but why do you need to market it to the wrong use-case, AI should not be responding with my balance, rather take me to the right screen or even build one customized to the query.
@SnorkelAI3 ай бұрын
Hi there! That's a good point! We're not in the business of telling people what to do with their generative AI use cases, but different companies and different users prefer different experiences. You sound like someone who is very technically adept, but not everyone is. We've all struggled to find information we need on a website even when they're well-designed, and they're not always well-designed. Making all of that information quickly available in a chat interface can build a much better experience, especially for users who are not as proficient as you are in understanding website architectures.
@jean-charles-AI4 ай бұрын
Very well explained, thank you.
@SnorkelAI4 ай бұрын
Glad it was helpful!
@brunoscaglione68314 ай бұрын
Title is misleading
@SnorkelAI4 ай бұрын
What would you suggest as an alternate title?
@brunoscaglione68314 ай бұрын
@@SnorkelAI Realistic LLM expectations and a glimpse into the future
@vivekpadman52485 ай бұрын
Is this approach used on all three levels of training? Base instruct ane chat fine-tuning? And are there different things to be considered for the above?
@SnorkelAI5 ай бұрын
I'm not 100% clear on your question. Are you referring to pre-training, fine-tuning and alignment? If so, this approach could be used on fine-tuning and/or alignment. It could also theoretically be used on pre-training, but I suspect that would yield poor results.
@vivekpadman52485 ай бұрын
@@SnorkelAI yes that was exactly my question, thanks 😊. I have one follow up question here. Why do you think it would yeild poorer results on pre training phase any insights on that and in that case what kind (size and arch) of pretrained student model should be used with a specific teacher llm Or anything would work?
@SnorkelAI5 ай бұрын
Sorry for the slow reply here. KZbin didn't surface your reply comment the same way it did your initial comment. We're getting a bit outside the bounds of what can be reasonably answered within a KZbin comment, but I think we can reasonably say this: Distilling a model means using its output to train a smaller model. For pre-training, that would mean creating an immense volume of raw generated outputs to form the parent model. Several studies have shown that pre-training generative models on other models' generated output tends not to work so well. We don't yet fully understand why, but we understand that it is a questionable practice at present.
@vivekpadman52485 ай бұрын
@@SnorkelAI no worries man, getting such a nice detailed reply is all that matters. Ah understood it properly now, also I guess the limits of the parameter size will come into picture while doing that if we use it for pretraining. Clean data plua synthetic data is anyways available now. Thanks again 😊🙏
@vivekpadman52485 ай бұрын
Very nice short informative video. I'm looking to create a distilled model on reasoning tasks for games which could run locally. This will help 😊 thanks
@SnorkelAI5 ай бұрын
Glad it was helpful!
@clashcodes08556 ай бұрын
free?
@SnorkelAI6 ай бұрын
Included in Snorkel Flow. 😃
@RyluRocky6 ай бұрын
Well done!
@SnorkelAI6 ай бұрын
Thanks!
@chakpak6 ай бұрын
Programmatic labeling of images at scale is so cool. 🎉
@SnorkelAI6 ай бұрын
We think so too!
@chakpak6 ай бұрын
Wow! Preprocessing is 🔥🔥
@chandra75996 ай бұрын
What does Wayfair do... Intro would be helpful to understand and connect with the content.
@SnorkelAI6 ай бұрын
Wayfair sells furniture and home goods online. That's good feedback, thanks!
@gobdovan7 ай бұрын
I came here to understand precisely how the Snorkel software assists with the issue. However, you discussed a general RAG system and mentioned that Snorkel expedited your process in different ways without specifying what Snorkel AI actually does. In your description, you stated, 'We explore how Snorkel Flow accelerated development of[...]'
@SnorkelAI7 ай бұрын
We're in early days on this kind of video content. The intent of this one was to talk about the case broadly and concisely. Were you looking for more of a product demonstration?
@gobdovan7 ай бұрын
@SnorkelAI, I appreciate the explanation. I was expecting a product demo based on the video's description, which mentioned exploring *how* Snorkel Flow accelerates development. I'm familiar with Snorkel the package, and the pain of developing labeling functions. The video was recommended to me, so I checked your GitHub for updates. It appears the repo hasn't been updated recently, and your focus seems to have shifted to Snorkel Flow. However, the video did not cover it in detail, and I couldn't find any comprehensive product presentations. Is Snorkel Flow aimed at larger corporations, or will it be available as a SaaS for broader access? Could you recommend any videos that present the product, particularly in the context of creating datasets for ASR/translation? I'm looking for more efficient methods to build such datasets.
@SnorkelAI7 ай бұрын
What you surmised is correct. Snorkel Flow is currently aimed at large enterprises. We will likely have more product demos heading for the channel soon. In the meantime, you can watch this one, which has a bit of product demo. kzbin.info/www/bejne/kGPPemyEYpqAhLM You can also sign up for a product demo here: snorkel.ai/demo/
@arazmalek8879 ай бұрын
Thanks for the information, but listening to you talking like: 'aaaaaaa eeeeee anddddddddd' was really frustrating
@420_gunna9 ай бұрын
Thank you Snorkel for putting this channel together! All of your videos + guests have been compact and informative -- really good brand marketing, I think.
@420_gunna9 ай бұрын
When you talk about distilation requiring large, unlabeled datsets... to be clear for my understanding, it's not necessarily that they're unlabeled data, it's more like we don't care about the dataset's labels, and instead use the teacher model's output distribution as the replacement pseudolabel. I guess you COULD create a distilled model by training against some data distribution that the teacher wasn't itself trained on... but I can't imagine why you would want to do that😄
@SnorkelAI8 ай бұрын
Sort of. Typically, you would use this for data that is, in fact, unlabeled-think sections of contracts or paragraphs from text books. You could also employ this approach for data that has labels that don't fit your desired schema, in which case your statement of "we don't care about the dataset's labels" would be 100% correct. As for your second comment, there could be a number of reasons you may want to do that. Perhaps the teacher LLM does quite well on a particular labeling task when given a highly-engineered prompt. This approach would let you transfer that performance into a smaller and cheaper model.
@kendwyer12779 ай бұрын
Very informative, thanks
@420_gunna9 ай бұрын
Awesome video! Data-centric AI is really awesome, and is a tractable space for the open source community to work in.
@SnorkelAI8 ай бұрын
It really is!
@axe86311 ай бұрын
Complicated Nonstationarity is really horrific for a wide range of methods/models
@riser9644 Жыл бұрын
Link to the blog code or ppt would be good
@lionhuang9209 Жыл бұрын
where can we get PPT?
@mechwarrior83 Жыл бұрын
please
@askeletalghost Жыл бұрын
I simp so hard for Emad
@yhWang-y4j Жыл бұрын
Hi, thanks for the nice sharing! Could you please provide the sildes you use in the video so I can further study?
@yorailevi6747 Жыл бұрын
Tip 4: Toss out noisy examples. More data is not always better! Should be rephrased; Toss out non-decisive/opaque examples while keeping variability of examples.
@InquilineKea Жыл бұрын
Can she train on the video data of my entire life
@InquilineKea Жыл бұрын
Why does she pattern match so hard with Fred sala?
@annapurnasolutionsllc6463 Жыл бұрын
Does California Institute of Technology pay women lesser than men then - per Anima's comment ?
@ayushsharma3148 Жыл бұрын
Hey guys. I want to save this video to my youtube playlist. Can you please open save / add to playlist option?
@NukulSharma Жыл бұрын
Tried HoloClean on bigger datasets, tensors just explodes out of memory. Any pointers which can help?
@irshviralvideo Жыл бұрын
Why use AI when you have simpler models that can be easy to explain???
@EuphonicEscapes Жыл бұрын
It is sad that there is almost no point in using an Apple Pencil any more. Or rather, if you were a digital pencil user... Your job went poof. People simply don't care about digital paintings any more.
@CalvinJKu Жыл бұрын
This is amazing. KZbin need to send more traffic to this!
@noraalturayeif996 Жыл бұрын
Thank you for this great summary! .. Could you please share the slides?
@avinashmahure281 Жыл бұрын
Thank you for sharing this event.
@lionhuang9209 Жыл бұрын
Very useful!
@faithandherghosts Жыл бұрын
This is brave, important work. I’m grateful to have happened across this article. Just today, I reviewed news of dismissal of the accusations that members of the Fairfax, VA (US) Police Department protected a sex trafficking circle. The charges were dismissed due to evidence that the accuser (a Jane Doe) had identified as a consenting escort worker in her history of involvement with the individual at the center of the alleged trafficking ring. One thing that occurred to me in thinking about the efficacy of content skimming of sites hosting ads that recruitment hooks may be nested within, is that a lot of trafficking is off-web and involves youth that are trafficked (sold/bought) through in-person processes involving economic incentive pressure, false-choice coercion and direct threats levied against vulnerable people by powerful buyers in trafficking networks, and activities such as kidnapping and in-person drugging of victims, luring and enticement that leads to hostage involvement in the most dangerous networks in human/sex trafficking. Are there identified characteristics of heightened likelihood of trafficking in certain geographic areas - e.g. your mention of high rates of homeless youth being trafficked? Other possible metrics of likelihood might be having a known sex-tourism market, access to ports over-water transport (private boats, larger cargo/industry watercraft), and various socioeconomic measures (poverty increases, changes in industries due to disaster events or climate/seasonal tourism increase or decline/loss of protective factors such as NGOs serving vulnerable women and children, local and regional law enforcement presence being protective of victims or protective of perpetrators, stringency of port records and identification of international visitors, conflict or war-related events…etc. etc. When I reflected on the strong recruitment-trafficking line between Australia and wealthy western countries, I wondered about whether that activity may be showing secondary-broker activity relating to the conspicuously absent (non-English speaking, not online) SE Asian and Pacific Island recruitment/victim markets? I’m sure that the investigative authorities are looking into this, especially after the 200+ global child pornography/brutality arrests of last year…and, yet, because some of the buyer-markets in the trafficking economy are exceptionally covert in their activities, are there ways to identify potential likely areas of risk-of-victimization by some array of metrics that could help local/national/intl. authorities put measures in place to discourage trafficking and/or apprehend perpetrators, like port and water-access security cameras w 3rd party monitoring (because local law enforcement is known to be protective of some criminal networks in some places), access to responsive helplines and reporting lines in local languages, in-person safety measures and environmental protective measures in addition to security cameras at points of entry and exits connecting trafficking markets to the rest of the world (overland and by water) - things like lighting or businesses, police kiosks, clearing of possible low-visibility routes of on-foot transit for people who may be kidnapping or buying young people for trafficking, automobile traffic check-points? Similar metrics may be of use to discover new or transitory victim markets in the US - such as the Gulf Coast (which has a large population of SE Asian immigrants and workers in the shrimping and other port-based trade+travel+hospitality industries, as well as a lot of vulnerable people that may become increasingly vulnerable following destabilizing weather events, slow-season economies, increased costs of living, etc. South American coastal areas (and inland areas with road/river connections to buyer markets and export routes on both Caribbean and Pacific coasts) that have factors that increase risk of predator activity can be easily accessed by boat and transport to/from the US mainland and secondary transport+sale brokers may not be as heavily investigated as perhaps it ought to be…? I deeply appreciate the work you’ve done on data tracking to help show/predict active trafficking activity. I’m grateful that there are committed intl. investigators working to discover and end sex trafficking and child trafficking. Please be careful out there, because the people involved in some of these networks are very powerful and very dangerous, as you are likely aware. If you ever need someone to contract-work on reviewing and labeling data sets, I have training and experience in content analysis, theme-labeling, rubric-based scaling of similar content to hone specificity and gauge possible secondary/tertiary needs for additional or sub-theme labels. I love that sort of bot-mind work and I am 💯 % on board with the effort to use remotely-available information to shine a spotlight on the nodes and mechanics of online & offline trafficking network activities. Much appreciation - and be safe out there. Godspeed and much protection from all the good in the world.* *this is another way of saying ‘…all those nasty mf’er and all their evil ways are gonna be brought down like a sledgehammer-heavy bolt of lightning hitting the ground, and so they better not f* with anyone ‘cause they got eyes on ‘em from all over the sky.’ Cheers…and thanks for the good thinking and the chance to share thoughts here. 😁
@djethereal99 Жыл бұрын
Paper link?
@SnorkelAI Жыл бұрын
arxiv.org/abs/2205.02318 here you go
@jonaslandsgesell4322 Жыл бұрын
Nice summary video
@sachinvernekar6711 Жыл бұрын
7.53 The PAC rule doesn't really apply here. What if we are able to label only a few type of easy cases. This means we are not uniformly labelling samples from the original data distribution.
@sinaghotbi2 жыл бұрын
At 28:00, it was not clear to me why accuracies are independent? Is that an empirical evidence? Is that a weak assumption?
@astromikael2 жыл бұрын
Great presentation - thank you!
@ayoolafakoya98412 жыл бұрын
Julien is very awesome
@uncle-millennium2 жыл бұрын
Excellent presentation. Very lucid. Give this lady a raise.
@dermorgendanach932 жыл бұрын
Holy god!! that's an awesome work thanks for sharing