Writer CEO May Habib talks utilizing synthetic data to train AI models

Рет қаралды 8,852

Күн бұрын

May Habib, Writer CEO, joins 'Closing Bell Overtime' to talk the companies new AI model and how they are innovating AI training.

Пікірлер: 15

@TaskSwitcherify 2 күн бұрын

How do you avoid Garbage In - Garbage Out? When training on synthetic data, some of which is already hallucinated, flawed, and misinformed, don't you get even more hallucinated and "machined" outputs and a form of data poisoning?

@Maioubi Күн бұрын

Like humans, it's easier for AI to know something is good than produce the good thing. AI is very good at "labeling" stuff (classification) and the tech for that is much more mature from the early days of deep learning. Synthetic data allows the AI to convert its classifying ability into greater intelligence by generating tons of examples and discarding the bad output. The next model will then be slightly smarter and better at generating and labeling. It's not perfect but it can go very far if you have vast computing power, and we're not close to the ceiling.

@KK-pm7ud 4 күн бұрын

Sounds too good to be true

@joe_hoeller_chicago 3 күн бұрын

Synthetic data doesn’t work as good as you think for real world tasks, esp within domains that require you understand a context within a context.

@alshiferaw925 4 күн бұрын

The entire talk the lady said was a bunch of air.

@sim-racer 2 күн бұрын

Not really. She is right, smaller models perform much better when trained with high quality synthetic data generated from LLMs.

@Cellardoor187 Күн бұрын

No she did not, she is on point and this is a very smart venture. that "bunch of air" she produced got her a 2B dollar valuation. So perhaps get off your high horse.

@mymusicpublisher Күн бұрын

Not really. Her voice throws me off though.

@bluesque9687 Күн бұрын

I like your blonde hairstyle and the big ring earrings! A blast from the past!

@MrDonald911 Күн бұрын

Research already showed it doesnt work unfortunately.

@DanielKwan-b7g 2 күн бұрын

Training on synthetic data gets the illusion that a model works but bc it’s trained on fake data it’s less accurate lol. Lady, there is a reason why ppl dont want to go this route 😅

@Maioubi Күн бұрын

Any synthetic data is reviewed by AI as well, and AI is better at knowing good from bad than making good, kinda like us humans. Bad output is discarded. This cycle isn't perfect but it definitely grows more accurate over time, not less so. Look at o1 by OpenAI, mostly trained by synthetic data.

@briandouglas7375 Күн бұрын

Synthetic data will be flawed.

@Maioubi Күн бұрын

It's easier for AI to know something is good than produce the good thing, just like humans. AI is very good at labeling stuff (classification) and the tech for that is much more mature from before deep learning. Synthetic data allows the AI to generalize its classification skills into greater intelligence by generating tons of examples and discarding the bad output. The next model will then be slightly smarter and better at generating and labeling. If you have enough computing power, this cycle seems to have no upper bound.