now we need the full bee movie uploaded, but with the actual audio replaced by your dramatic reading of the script...
@carykh5 жыл бұрын
omg I have the 70 minute video of my voice on my iPhone, I suppose I have no choice but to upload it! check back in 1 hour. I bet somebody will edit it all together
@TastyBaldEagle5 жыл бұрын
@@carykh please
@PlasmaSabre5 жыл бұрын
@@carykh I would watch this :D Great work on the project btw, love your videos.
@kutip10275 жыл бұрын
Please I still want this
@kutip10275 жыл бұрын
If I need to I will volunteer as tribute
@rj99595 жыл бұрын
Only about 40% of words are able to be made out by the best lip readers. The rest of the words are assumed based on context. So this project has huge limitations to start with.
@dylanwijaya16625 жыл бұрын
@Eric Lee you like cereals>:)?
@dylanwijaya16625 жыл бұрын
@Eric Leeyou like mum buy cereal type >:) ?
@dylanwijaya16625 жыл бұрын
@Eric Lee ohhhh children school they give milk like teachers to student. it good because I can eat cereal with milk it free. So teacher give milk to children. Okeh?
@GaJ425 жыл бұрын
Okay not is it
@dylanwijaya16625 жыл бұрын
@@GaJ42 you like cereals>:)?
@YoshTea4 жыл бұрын
Holy hecc this is useful for animation
@CA192 жыл бұрын
YES
@boyinaband5 жыл бұрын
I love these videos.
@UmMeAmberE5 жыл бұрын
OOF IVE FOUND YOU
@MrZkitZ5 жыл бұрын
@@UmMeAmberE same
@yoyochinb37425 жыл бұрын
Wow
@4ltrz5555 жыл бұрын
Hello!
@RubenFedop5 жыл бұрын
So thats how i found your channel
@NeedForMadnessSVK5 жыл бұрын
"We just need to pick the right transcript" Me: Its going to be a Bee movie isnt it? "I read the entire Bee movie script on camera" NAILED IT.
@jurremioch3165 жыл бұрын
It just HAD to be the Bee Movie script, I cheered so hard when he said it.
@hoodlumscraggy18015 жыл бұрын
kzbin.info/www/bejne/d3uml5qOnaZonMU here is his bee movie script video
@jessdoesstuff67835 жыл бұрын
thought the exact same thing
@slicerthe84th2 жыл бұрын
NAILY
@ORyan40214 күн бұрын
I only guessed it because i have watched the video before
@toasttimestwo4 жыл бұрын
Cary: Read the lips of this guy. Computer: *S U M M O N S S A T A N*
@jobisTheWorst4 жыл бұрын
WHO SUMMONED ME
@72jysmith4 жыл бұрын
Cary:ME
@lameking28394 жыл бұрын
God: Let me introduce myself
@wolfyowoz4 жыл бұрын
666 likes I'm not gonna ruin that
@reinatr48484 жыл бұрын
Still 666 likes
@Amaya_Fox_205 жыл бұрын
"so how tough are you?" "I read the entire bee movie script" "yeah, so?" "I read it in front of my camera" "come right in, sorry for the wait"
@ashleysmith85285 жыл бұрын
You got a bottle of ketchup? yeah *Fails at opening ketchup cap Could I run this in some hot water?
@azadanzans53595 жыл бұрын
Kolio Pulio Why doesnt anyone know the last line?
@사다드-j6x5 жыл бұрын
@@azadanzans5359 , no no
@SoshJam4 жыл бұрын
AND SUBMITTED IT FOR A COLLEGE CLASS
@thatonewierdcowboy67925 жыл бұрын
Funny thing is... I actually correctly guessed “Have you got a moment?”
@ohyeahyeahimasian3925 жыл бұрын
same
@tripodgamer5 жыл бұрын
LIAR
@bensosnowski11285 жыл бұрын
I guessed it was a question, but that’s it
@ethen17725 жыл бұрын
I guessed are you being helpful?
@isaacphase27595 жыл бұрын
That was the only one I got
@legoyoda57765 жыл бұрын
"Or rather, I should say *OUR* lip reading A.I" *SOVIENT ANTHEM STARTS PLAYING*
@QS15975 жыл бұрын
Antonio Sustaita ah, the sovieNt union
@QS15974 жыл бұрын
SPOTILA NAVEKI VELIKAYA RUS
@我恨我自己4 жыл бұрын
Yes
@cailyndempster4 жыл бұрын
Soviet
@legoyoda57764 жыл бұрын
@@vvg_lol *YES!!!*
@agentstache1355 жыл бұрын
Reverse the program to animate the mouth movements EDIT: If Cary still has the animation files for some of his videos I don't think it'd be too hard to rip the mouth data from them (as a one dimensional matrix representing different mouth positions) and then use that with the audio from those videos
@iritesh5 жыл бұрын
that's what China did with the news anchoring AI
@exm32665 жыл бұрын
IIRC Adobe Animate recently released a feature that would assist in lip syncing, but I'm not sure if it's anything like the logic used here.
@JeffHykin5 жыл бұрын
You could also reverse the purpose of the AI: give it the original transcript and have it swap real words with similar-looking words. Limit it to only a few words per sentence, give it an oddly specific dictionary for substitutions, and you'd have truly automated the bad lip reading channel. Maybe that's what I'll do for my senior project.
@TheTonyMcD5 жыл бұрын
That would be incredibly useful to the anime industry. And with decent enough cgi, to the entire film dubbing industry.
@@benos1799 to bad Soviet has been gone for almost 30 years
@voltagedrop58995 жыл бұрын
Daily reminder that communism doesn't work.
@robinr27705 жыл бұрын
as a linguist, I feel for you, you took on a task way harder than you expected, good job regardless. unfortunately we can not see inside the mouth of someone speaking and that is where so much of speech happens. you can also consider the following: if you have the same vowel after 3 different consonants, your lips will always be in a different position, thus some sounds don't have unique lip positions at all. real life lip reading is mostly context and being able to tell where those highly distinguishable consonants are.
@duck77815 жыл бұрын
13:00 super easy I memorized the bee movie script
@EmanuilGlavchev5 жыл бұрын
Overfitting in real life :D
@OneFingerYT5 жыл бұрын
I actually read "have you got a moment" easily. The AI needs more training in phrases.
@theepicgamer45785 жыл бұрын
Your profile pic saids it all
@user-vn7ce5ig1z5 жыл бұрын
• The takeaway from this video is to give deaf people lots of kudos. • Decimating twice isn't 20% off, it's 19% off: ((N×0.9)×0.9) Close but no zikal (I think I need more practice lip-reading). • Dubbing words onto politician's mouths has already been done. It's the audio counterpart of deep-fakes (and BadLipReading).
@matthewzeller50265 жыл бұрын
I was going to comment that but I'm not even sure what the "correct" term is. Sure you could say "20%" but does "bi-decimate" work?
@jakef89134 жыл бұрын
"For example, after the word 'the' there should always be a noun" adjectives
@devinandcarrietotaldrama5054 жыл бұрын
The cat = The bad cat
@yourtypicalcube28302 жыл бұрын
@@pinkman_ Gerunds (-ing) are nouns, so you're using a noun there.
@Failzz85 жыл бұрын
14:14 interesting, so this is what being insane feels like.
@diamondgolem64015 жыл бұрын
I'm pretty sure it's more like 3:53
@TheKillerGut5 жыл бұрын
*Uses headphone*...ow
@knack33815 жыл бұрын
My right headphone is broken Which makes me sane, i guess
@PixelBytesPixelArtist5 жыл бұрын
A Traditional to simplified Chinese character converter would be amazing. If you guys want to try that project again I suggest trying to identify radicals and translate those instead of the characters themselves. Most differences between simplified and traditional are in the radicals
@eluisific32555 жыл бұрын
12:51 Jokes on you! I memorized the whole bee movie script!!!
@ShermyShroomy31014 жыл бұрын
what did he say then
@bastibob6604 жыл бұрын
Vannesa pull yourself together
@Fuley-la-joo4 жыл бұрын
According
@Crystal_5003 жыл бұрын
@@Fuley-la-joo to
@rebert_reid3 жыл бұрын
@@Crystal_500 all
@OrangeC75 жыл бұрын
Honestly, and I'm not sure if this is how KZbin does their captions, but I feel like a combination of lip reading and word recognition together would make very accurate captions, especially if it's tuned to be just right.
@sacripudding45865 жыл бұрын
That causes an issue. It wont know if it sees lips or not. It could just see like, as an example, a fortnitw characters lips. Alot of gameplay channels dont have webcams. It may see the wrong thing as lips, issues like that may screw up subtitles.
@caseygreyson41785 жыл бұрын
Please use this to translate Jojo Siwa so we know what she’s trying to say Also, don’t worry about the project’s accuracy. I have a Deaf sibling and when they talk to me it’s fine because I learned sign language growing up with them. But they hate lip reading because it’s so hard to read lips. Apparently opinions/studies sort of agree that lip reading is an awful way to communicate cause some sounds look the same. A pretty infamous one is “Olive juice” looking like “I love you”. They say only 30% of words can be read accurately. Pretty weird right?
@badlydrawnturtle84845 жыл бұрын
It's pretty obvious if you actually stop to think about it. (To quote Wikipedia for briefness) "Organs used for speech include the lips, teeth, alveolar ridge, hard palate, velum (soft palate), uvula, glottis and various parts of the tongue." Out of all of that, the only thing "lip reading" gets you information about is the lips and very occasionally the tip of the tongue; all of the rest of that critical information is invisible from the outside. It's remarkable that anybody ever thought lip reading was effective, really. Did they never stop to consider what their own mouth and throat are doing?
@caseygreyson41785 жыл бұрын
Badly Drawn Turtle Exactly! Sounds like Fa and Va look exactly the same. As well as Ga and Ka. The whole point of lip reading is that it’s just the shape of the mouth. You don’t have context or the sounds. In ASL we mouth words on most signs, but that’s just cause. If you do the sign for twins and mouth “twins”, no one is going to think you said “wins” because there is that context. But lip reading by itself (when my sibling tries to understand someone who isn’t signing) they struggle so much.
@boggers5 жыл бұрын
@@caseygreyson4178 yeah, there are around 40 phonemes in most languages, but traditional 2D animators use only 10 mouth shapes. eg. M B and P all use the same shape, there is one neutral looking shape that is used for about a quarter of the other sounds.
@ZombieGuts155 жыл бұрын
and, “Alligator food” looks like, “I love you”
@hoper76495 жыл бұрын
If the computer got 47% right. Then its pretty good.
@txaggyraf4 жыл бұрын
3:53 **Their smiles slowly turning into giant frowns**
@agentstache1355 жыл бұрын
The Gosper Glider Gun (4:20) is one of the smallest guns in Conway’s Game of Life. Like I’m not saying you needed to show a HBK Gun or anything, but at least show a Cordership Gun or something
@carykh5 жыл бұрын
not enough pixels in a KZbin video! And hey at least it's bigger than a queen bee
@tomryan34085 жыл бұрын
lol 420
@WangleLine5 жыл бұрын
Thanks for the random knowledge, stranger!
@mystery80935 жыл бұрын
*420 blaze it*
@bornach5 жыл бұрын
Most disappointed that there was no 2001: A Space Odyssey reference to HAL9000's decision to murder the crew based on lip reading evidence.
@microbialdoormat5 жыл бұрын
I, myself, am hard of hearing. As long as I have the tiniest bit of sound, I can read lips. And with dramatic wording, like yours, I read it just fine! So hah!
@sikor025 жыл бұрын
Dave, although you took very thorough precautions in the pod against my hearing you, I could see your lips move. ~HAL 9000
@bapldap33245 жыл бұрын
I was looking for this.
@razvanflorea11665 жыл бұрын
A Space Oddisey fans unite!
@kryswilkins86155 жыл бұрын
I’m afraid I can’t do that, Dave.
@leehttucec-99855 жыл бұрын
You said what we were all thinking, thank you
@MarkGamed5 жыл бұрын
We need the entire movie but with the AI instead of the actual audio EDIT: woah that’s a lot of likes
@agentstache1355 жыл бұрын
AI writes the music for the score for the Bee Movie, AI writes the script for the Bee Movie, AI animates the Bee Movie, AI makes a bad lip reading of the AI written Bee Movie, AI takes the bad lip reading of the AI written Bee Movie and writes a script to contextualize the random things, AI animates the contextualized script based on the bad lip reading of the AI written Bee Movie and animates it, and so _ad nauseam_
@alexandramuller90554 жыл бұрын
I love the conway's game of life reference "bring out the big guns" lmao For anyone wondering, the picture he slams on the table is a glider gun, it produces infinite gliders.
@H_fromDiscord_real6 ай бұрын
timestamp?
@Calthecool5 жыл бұрын
You had a video of you reading the bee movie script for 10 months? And you didn’t post it? - respect.
@ChristianGates5 жыл бұрын
Your neck moves too when you make certain syllables. Maybe you should incorporate that?
@Predated25 жыл бұрын
I think angles matter too. If he had done 2 angles, it probably would be able to look at the movements more precise and see where it went wrong. Then having 3-5 people reading the same thing both overly moving and normally, it should figure it out pretty quick.
@ChristianGates5 жыл бұрын
Predated O exactly
@AB-Prince5 жыл бұрын
decimated twice would be 19% off 100-(100/10)=90 90-(90/10)=81
@KentoNishi5 жыл бұрын
Roses are read Violets are blue AI can read Can Cary too?
@agentstache1355 жыл бұрын
There’s a Cosmo article about the video used titled “KZbinr had one night stand with a woman, she lied afterwards about being pregnant with twins” if anyone wants to know the context of the video
@art16375 жыл бұрын
Agent Stache what the fuck?
@43Jodo5 жыл бұрын
kzbin.info/www/bejne/lXualXiejtmnmLM Plug this into the Wayback Machine to actually watch the video. Asshole decided to delete it.
@agentstache1355 жыл бұрын
@@43Jodo How does that make him an asshole? Like it's something kinda personal and he probably just wanted it to be more as an update about why he wasn't gonna be a father to those who were following him at the time instead of a video for everyone to be able to see forever
@breakerboy3655 жыл бұрын
what is going on lol
@Crudecoronet5 жыл бұрын
Agent Stache What are you talking about
@kumamedia1235 жыл бұрын
I seriously thought he said “I love bobbies” 13:35
@cavemann_5 жыл бұрын
What an absolute madlad! He actually read the whole Bee Movie script!
@KrazyKyle-ij9vb5 жыл бұрын
I hope he likes jazz...
@Zorbeltuss5 жыл бұрын
If you could increase or decrease the score of words based on context you could probably reduce the amount of errors that occur, also that can be trained on separate material in the form of text transcripts from other sources, making it easier to see if it hurts or helps.
@Mastaachef5 жыл бұрын
13:39I ACTUALLY GOT IT RIGHT OMGGG! So this is what ultra instinct feels like?
@joelbraun85845 жыл бұрын
YEAH HAHA SAME "Both of you did terrible"
@araceli76045 жыл бұрын
3:53 me trying to have a normal conversation with someone Edit: Woah, that's a lot of likes...
@thecringeking8735 жыл бұрын
Same here
@hanac55865 жыл бұрын
this sounds exactly like me when I haven't slept in 24 hours but still have a lot to say
@deadbread34595 жыл бұрын
WhEn LibEarLs sPeAk tO mE tHeY sOuNd LIke ThaT XD XD WOW they ThInk Their so Gr8 :0) 😂😂😂😂😂😂😂
@SreenikethanI5 жыл бұрын
06:08 i swear I expecting he was gonna read the Bee movie script… AND HE DID! I'm like "YESS!"
@AriaLunaCampbell4 жыл бұрын
My technical mind: "This is pretty interesting." My linguistic mind, watching the section on the algorithm guessing syllables: "Please, for the love of everything, use the IPA! Ahhhhhhhh!" (To be clear, this is mostly a joke. At least he is using a standardized format for syllables. I just have this little part of my brain that's been spoiled by the IPA's unambiguous nature and figured there's probably someone else out there who'll get it.)
@ThePotatoLlamaz5 жыл бұрын
You should try to make a similar program that converts audio into little animated mouth movements for animators
@calebquadrio11315 жыл бұрын
Just saying I can lip read and the reason I can’t tell what ur saying is because no one talks like that
@RyBrown5 жыл бұрын
caluppy he was over pronouncing words and that made the AI confused I think.
@colex12225 жыл бұрын
@Radium X I was able to get Vanessa
@Weg0024 жыл бұрын
3:54 when I try to talk/listen to someone talking in a dream
@binaryorbitals5 жыл бұрын
Person: Read My Lips Cary: Say No More
@janeylala5 жыл бұрын
When you didn't understand anything but you still enjoyed the video. *THIS IS AMAZING! SO COOL!* Few mins later... *WHAT DOES THAT MEAN? WATEVER!*
@spikeus35705 жыл бұрын
14:16 Carykh: Quiet I want to talk! AI: LET ME TALK FIRST Carykh: Let me talk first, please *And then you loop this
@v.69845 жыл бұрын
carykh: *"On March the 11th, 2018, at 11 PM, I did the unthinkable."* Me: oh no, please tell me he didn't read the entire bee movie scri- carykh: *"I read the entire Bee Movie script on camera"*
@sirclashin5 жыл бұрын
Lmao
@vuxigeck52815 жыл бұрын
What a nice way to start off the year! Finding _yet another_ awesome channel I'm gonna be enjoying for a pretty long time, I think!
@jansopi69674 жыл бұрын
I should say *OUR* lip reading AI. Staline aproves
@samkelson79905 жыл бұрын
I am actually currently trying to do the opposite. Using google speech recognition API and gentle(which I found thx to ur vid so thx) I am creating a lip syncing programming that will take audio from the mic, convert it into phonemes, then animate a character. Now that itself isn’t to hard but I want to do it live(live audio) so I am kind of struggling.
@blasttrash5 жыл бұрын
is the project on github?
@npric28835 жыл бұрын
Isnt that animoji
@samkelson79905 жыл бұрын
@@blasttrash no not yet
@Kitulous5 жыл бұрын
@@npric2883 animoji takes your picture and maps your muscle movement to a 3D model on a screen. Their project is to get the audio without the camera part and map it to a character on a screen.
@machodong65525 жыл бұрын
Like vrchat?
@HappyLeeHL5 жыл бұрын
A really interesting idea. I had a similar idea some months ago but I couldn't do it myself. I think maybe you should focus on the link between words in order to create a meaningful sentence, like the KZbin subtitle algorithm which can correctly transcribe audio to text most of the time. Combining that kind of algorithm with your lip reading idea, it might be good lip reading instead.
@kamaljotsingh66755 жыл бұрын
hey what about an AI to play Super Mario afap? that may break the wr.
@HappyLeeHL5 жыл бұрын
@@kamaljotsingh6675 I've already made one, that can complete SMB almost as fast as the WR. kzbin.info/www/bejne/poTMgpp-j81-oM0
@nyroysa5 жыл бұрын
Holy Moly you are that super mario TAS man
@HappyLeeHL5 жыл бұрын
@@nyroysa Hi, nice to meet you here.
@galric42705 жыл бұрын
I got the “have you got a moment” right 😃
@an_annoying_cat5 жыл бұрын
AI should learn to animate so Cary could be able to upload more often
@binaryorbitals5 жыл бұрын
We went all of 2018 without a TWOW vid *That wasn’t very cash money of you*
@carykh5 жыл бұрын
i have 39 minutes left tho just kidding, twow 24a coming january i hope
@kevinlel5 жыл бұрын
carykh come on if you fly to a different time zone you could still get it out by 2018!
@TheRealPunkachu5 жыл бұрын
Wait twow is actually not just cancelled? No way!
@binaryorbitals5 жыл бұрын
carykh You told me the next episode would be between Christmas and New Years Eve. I am more disappointed at that then the fact that nobody made one of their TWOW submissions “The the the the the the the the the the”
@maxw1795 жыл бұрын
@@carykh When do you think we'll have season 2? I missed the beginning of the first season and can't stop bingeing the series.
@EduardoReyes-uz2lt3 жыл бұрын
No one is gonna talk about on how he used diffrent animation style in the beginning
@marcelinadelacruz88265 жыл бұрын
COMP: LAUREL AI: YANNY I HEARD "THE EARTH IS NOT FLAT"!!!
@serglian85585 жыл бұрын
You shouldn't reveal that you are deaf!
@greenwolf13635 жыл бұрын
I hear covfefe
@paranormalstick22895 жыл бұрын
I heard commit order 66
@Lilli_B5 жыл бұрын
this video is so last year
@krillbilly14355 жыл бұрын
*C o m e d y*
@sappyme5 жыл бұрын
Yeah I like the cool stuff from 2019 like the sequel to the Logan Paul suicide forest video and a sequel to fortnight
@izzypin9425 жыл бұрын
IN AN HOUR BOI
@zegamingcuber8575 жыл бұрын
Izzy Pin TIMEZONES BOI
@imie-nazwisko5 жыл бұрын
Way to start new year with a dad joke
@KoenDerp5 жыл бұрын
1:38 if you make the words "heaven high poop push" the opposite you get "hell low pee pull" which sounds like "hello people". wow im suprised i noticed that.
@ToHellWithReality5 жыл бұрын
9:45 Uhh... What's that censor bar supposed to be covering? Because I don't think it did what it was supposed to do.
@prokaryotesys5 жыл бұрын
ToHellWithReality their emails, I think.
@ToHellWithReality5 жыл бұрын
@@prokaryotesys I know that, but I didn't want to spell it out for two reasons. First, I didn't want to make it obvious for people looking for that kind of info. Second, comedic effect.
@krucible48895 жыл бұрын
@@ToHellWithReality just r/woosh them
@prokaryotesys5 жыл бұрын
@@krucible4889 oof i got wooshed thats one of my life goals tho
@betin7315 жыл бұрын
@krucible r/itswooooshwithfouros
@bggamingdeluxe56585 жыл бұрын
What else does K and H mean hmm... Cary *K* omments *H* ere dangit i was close. also this is my first comment from 2019, hehe
@Iwatoda_Dorm5 жыл бұрын
LIES!
@sand15735 жыл бұрын
BGGAMING Deluxe komrades
@awesomevideosonyoutube5 жыл бұрын
he's a time traveler huzzah
@TheBlacknoodles_5 жыл бұрын
Cary Killed Him
@tearlach475 жыл бұрын
Must be nice to be in 2019
@raball5 жыл бұрын
the blurry voice actually sounds great. i would turn that into music so fast
@DanielLopez-up6os5 жыл бұрын
How about sign language Recognition AI? and maybe translation, ASL to UK Sign language etc.?
@johnsensebe31535 жыл бұрын
ASL to English might be more useful, but I think the idea is great. The tricky part would be the dataset. ASL uses more than the hands, so you'd probably need different types of clothing to train it on, as well as different skin tones, etc.
@elllieeeeeeeeeeeeeeeeeeeeeeeee5 жыл бұрын
@@johnsensebe3153 Just train it on a black and white dataset
@johnsensebe31535 жыл бұрын
@@elllieeeeeeeeeeeeeeeeeeeeeeeee You're still going to have a variety of shades, short sleeves, long sleeves, no sleeves, frilly cuffs, etc.
@cuckling90315 жыл бұрын
like the one in unfriended 2?
@DanielLopez-up6os5 жыл бұрын
@@johnsensebe3153 once the basic Data set is created you can create the sceletal system, apply that to the person being interpreted and it should be fine. and the various dataset you could get from news brodcasts, meanwhile the transcripts are usually in the CC/subtitles even if they'res a interpreter on screen.
@NativLang5 жыл бұрын
CMUdict strikes again! Looked to me like some successes here. Now you got me wondering if you'd go even further weighting words / word neighborhoods by commonness, or by taking morphosyntax into account. Oh, and so much yes to the sinking smiles at 3:54 - that slow letdown of throwing out a hopeful spike solution and watching it fail.
@hanako-kun222 жыл бұрын
OH MY GOSH I GOT THE LIP READING RIGHT!!! BOTH OF THEM!! I am *GOD*
@MarcTelang Жыл бұрын
wait why isn't your channel verified
@denischikita5 жыл бұрын
I think you need to train netwot not only with lips, but with throat too. Because a lot of sounds became from vocal cords only
@migs13365 жыл бұрын
0:09 cause I'm communist Edit: 2:32 he uses the URSS to convert it to spectrogram two communist references in one video
@Kitulous5 жыл бұрын
URSS = ur SS
@RichardRMM5 жыл бұрын
@@Kitulous mein leben
@bananogamer69725 жыл бұрын
In Italian that would be easier because every letter has a sound
@Womenooo5 жыл бұрын
Not necessarily. He already incorporated the IPA (International Phonetic Alphabet) this is more accurate than any native language. It truly has a Sound matching only one sign. A language with less destincive sounds and less allophones would be ideal. And italian has 30 which is an OK low number, but 7 of them are vowls. And you want more vowls (i'd think) because vowls are created by obstructing the airflow in a different manner with the same tone. So an A and an O make the same with your vocal chords, but A you stretch your lips and O you round them the last part is the tongue which you don't see, but but you lips slightly move when you go through you vowels. (also yes I know italien has A E I O U so just 5 vowels, but phonetic vowls have a nother realizazion as transcripted vowels in written language. The vowls also include diphtongues (voewls that merge into each other) or vowls with a lightly different attribution. In english bad and bat have a different form of A just for example. So if he wants to make the system more accurate he would need a language with less allphones and less sounds that voiced and unvoiced differentiation. eg G K and T D and many more are basically the same sound, but one with your vocal choards vibrating the other without. You can find that out by looking in a mirror and placing a finger on your throat. than say ATA and ADA whole you say ATA you will feel nothing but saying ADA you will feel vibration. But both look exactly the same. And in phonetics both are basically considered the same. And there are many more examples of this in the english language. And they are bearing meaning. Like Tick and Dick... thats a massive one. or simple Dog and Dock. it basically guts a sentence. So this project basically is doomed to fail by just looking at the lips. The tongue is so very important. Lip reading is hard, and it works by guessing words. In a sentence some words do not make sense, so they are tossed out, but the AI cannot differentiate between a sensible utterance and a non-sensible, it can though make guess what word was said and maybe from that one could extrapolate a probable sentence that was uttered.
@bananogamer69725 жыл бұрын
@@Womenooo In the italian alphabet a letter is told in the same way anytime even if it has a specific letter before or after in the english language for example the T it is read in a way while TH is read in another way and they have different sounds in Italian we don't have this problem the letter E is always say in the same way even if it has a G or a F before of after (sorry for my English but as you can tell I'm italian)
@Womenooo5 жыл бұрын
@@bananogamer6972 no you don't understand. It is not about how true the phonetics of a language are to its alphabet. It is about how simple the phonetics are. I just have a basic knowledge of Italian at best but an example for a problem would propably be g and j. Geco an Julia would both look the same on the onset of the word. I am not certain on the example though. It is really a problem of many European languages that have many phonemes that are realized in the mouth and not on the lips thus it is impossible to read them without contextualization.
@bananogamer69725 жыл бұрын
@@Womenooo now I understand thanks
@ukkomies1005 жыл бұрын
Emanuele Bonandrini or finnish
@glanni5 жыл бұрын
When you said you would use the transcript of a movie i was getting very excited. When you were talking about doing the unthinkable, i knew it had to be it. When you said you read the entire bee movie script on camera, i literally started clapping before i could care about my family being in the same room. I respect you so much for this, you really gave a big sacrifice.
@ball565 жыл бұрын
14:04 oh good, I have mono audio setting on.
@alphabbbe85805 жыл бұрын
HAPPY NEW YEAR!!!
@aidanstg4455 жыл бұрын
TofuMaster83 Happy new year!!! (In 1 hour for me)
@swordchicken56295 жыл бұрын
and happy birthday bfdi!
@dolloptwerpandorange4025 жыл бұрын
O:08 Cary: Or I should say OUR lip reading AI *Soviet anthem starts playing*
@thatoneguy61395 жыл бұрын
Welp this is what I’m watching for the first vid of 2019
@a3dg6385 жыл бұрын
Fancy Spider same
@egg48615 жыл бұрын
Same bruhh
@Zalian5 жыл бұрын
I'm really curious how it would sound if the raw phoneme data was pushed into sound output instead of trying to match it up to specific words.
@official-obama Жыл бұрын
maybe it would play every phoneme simultaneously, and the more confident it was in a phoneme, the louder it would be.
@TheJustinator4 жыл бұрын
"Automate their entire channel." That's another hint for your next channel: lazykh
@data50235 жыл бұрын
As soon as you said, "Which movie to pick," I instantly went, "It's Bee Movie, isn't it?" I've never seen Bee Movie to be honest.
@cassie_e5 жыл бұрын
Do it the other way - generate lip shapes from the audio! Automated lip sync!
@agentstache1355 жыл бұрын
Diordnas Darkunn he’s mentioned the possibility of doing that before to save time on animation, though I personally think a more hard coded approach would work better than a neural network
@jacobfeinland78785 жыл бұрын
@@agentstache135 I thought for sure that would be what this video was about. I would love to see how well that works, either for generating actual video from audio or for using it to animate the character's lips.
@agentstache1355 жыл бұрын
@@jacobfeinland7878 My idea for how to do it copied from my comment from the video where Cary mentions it (the dance one) because it's an essay I'm not rewriting: Why would you need an AI for animating the lips? Why not just write (or use, I’m sure it already exists) an algorithm that takes a transcript (handwritten or using existing speech recognition (which I know is probably still technically an AI)) of what you’re saying as input and then move the mouth? I’m sure there are some parts that you’d have to manually do, eg screaming, but it’d be a lot more reliable and robust than an AI based on the audio. If I were to code it I’d mine a dictionary for the International Phonetic Alphabet (or some other pronunciation respelling) representation of each word. Then just figure out what mouth shape you make and how long you make it for each sound and put it all together into an animation. Obviously you’d probably still need to tweak it some more, depending on how time-accurate your transcript is, and that might be where an AI could help. But, I still don’t think an AI would be robust enough for the whole process, especially for a pretty discrete animation where if it picks the wrong mouth shape it’s pretty noticeable. Whereas if you were to just use it to help with temporal alignment, it being wrong would only show up as a small offset, less noticeable.
@tjahjobagaaa5 жыл бұрын
Using those mouths from bfdi.
@RRRR-jr1gp5 жыл бұрын
Wait animators actually lipsync the characters? It feels so dumb I mean who's going to care
@ne01nvader5 жыл бұрын
4:04 Don't blame poor computer, he is just trying to summon satan, nothing special.
@StarForgers5 жыл бұрын
I think that to a large degree this whole thing was flawed simply due to the angle that you are recording your face from. People don't look at a person from below normally. This make issues with some standard information sets one might normally use I would think.
@Versaucey5 жыл бұрын
It's not the A.I fault, it's ping is too high.
@GWindows3.15 жыл бұрын
Vsus what the hell
@TheNerdBird_5 жыл бұрын
. . . I want to punch you so bad. The AI is ran Locally meaning it's sub-instant reading.
@jeeeves5 жыл бұрын
@@TheNerdBird_ no its ping
@themanfromutopia47435 жыл бұрын
Look buddy, it's is short for "it is" but if you want to signify possession it's "its", not "it's", okay?
@TheNerdBird_5 жыл бұрын
@@jeeeves If it is ran locally, the ping would be less than a millisecond. It is ran locally. Quite annoying when people who don't understand technical and networking terms completely try to make statements to sound smart.
@treedeerthethingy48125 жыл бұрын
I HAVE NO FUCKING IDEA WHAT HALF OF THESE WORDS MEAN, BUT I LIKE IT
@lara42685 жыл бұрын
I was so proud when I guessed "do you have a moment"
@milesprower34885 жыл бұрын
0:03 It's The Captain from SpongeBob "are you ready kids, aye-aye captain! I can't hear you! AYE-AYE CAPTAIN! OHHHHHHHHHH!"
@user-en7dx1qp3k4 жыл бұрын
here's my solution to the number theory question at the end: let each number have 2018 digits ranging from 0-24, filling all unused spaces with 0s so if you were to write 1, you would get 00000...0001. each digit is used an equal number of times in this so the average sum for each place value is ((24)(25)/2)/25 = 12, so the answer is 12*2018 = 24216. Correct me if i'm wrong
@sanjaymatsuda45045 жыл бұрын
You could have used the video of the longest word, the full chemical name of titin. Instant 3 hours with a full transcript available everywhere.
@BagelBrain5 жыл бұрын
It would contain the same 5 samples of "words," though.
@thenimalu5 жыл бұрын
I live in Germany. It's Silvester. I am drunk. It's 6 am. I am watching Carykh. I hope I spell3d everything right. Happy new year!!!!
@godofdoor65585 жыл бұрын
best ai
@adamyoung67975 жыл бұрын
hsppy new yere
@LLAWLlET5 жыл бұрын
Frohes neues!
@bobross40825 жыл бұрын
Dude. I just started watching your videos. I don’t know what job you have. But your a genius. Your literally improving computer programming extremely. I don’t know actually terminology. But your gonna be making huge money someday if not already. Your gonna be the reason robots become a reality
@ecicce67495 жыл бұрын
I think the AI works pretty well for the amount of information it has. I guess you could only improve it by choosing the correct words based on grammar and context and what words most likely are next to each other. Also an additional System to output back to audio using a network that is trained on combining lip movement and the detected phonemes into input for a network(easy trained autoencoder) that outputs your voice would make the Project complete. Would loooove to see that.
@zib3505 жыл бұрын
I strongly agree with the word choosing idea!
@meganbennett9335 жыл бұрын
I actually got the "have you got a moment" one right.
@Spherical_Object Жыл бұрын
BFDI references 3:44, 5:27 "yeah i know she was so surprised" is the first line spoken in bfdi (by match) 12:40 flower's announcer crusher brief 15:39 "take the plunge" is the bfdi 1a name (yes i did watch the whole video four times [twice with captions], so what?)
@akraus535 жыл бұрын
Decimated twice means -19% not -20% *ugh*
@-epsilon62695 жыл бұрын
0:08 *COMMUNISM INTENSIFIES*
@anthonycaminiti87345 жыл бұрын
Stalin wants to know your location!
@marlon.80515 жыл бұрын
It's socialism not communism!
@d0nnyr0n5 жыл бұрын
@@marlon.8051 Similar thing...
@marlon.80515 жыл бұрын
@@d0nnyr0n socialism tries convince the population that communism is great and communism dont
@d0nnyr0n5 жыл бұрын
@@marlon.8051 That is not correct. See this *www.investopedia.com/video/play/difference-between-communism-and-socialism/* .
@jondoe53234 жыл бұрын
Thanks for helping my project on a video that an AI makes. I need it to read a transcript and create accurate voice and face. It then creates a video off of seeing images of faces off of the internet
@FoxBlocksHere4 жыл бұрын
"Tower owe wheat and sought owe-induced height eight of lamb late"
@cali0537114 жыл бұрын
[IN TENNIS BALL VOICE]: James-
@teamont55 жыл бұрын
tumbnail: automate their entire channel. NO THANK YOU.
3:54 !!do not play at night in the docks!! a demon stole my soul. DO NOT PLAY!!
@leofisher12805 жыл бұрын
Decimating twice does not mean dropping by twenty percent. Decreasing by 10% two times leaves you with 90% x 90% = 81% of what you started with, meaning a decrease of 19%. Nerd.
@TheVoitel5 жыл бұрын
Come on, this brilliant problem is not number theory, it’s trivial statistics: The problem is obviously equivalent to the expected value of a sum of X1,...X2018 iid, where Xn is distributed discretely uniformous on {0,...,24}. Since the expected value is linear, we get E(X1+...+X2018)=E(X1)+...+E(X2018) = 2018 * E(X1) = 2018 * (24+0)/2 = 2018*12
@MsHojat5 жыл бұрын
That's what I was thought as well. Pretty easy if true (it does seem true). However since I did it all in my head on the first run I somehow I accidentally messed up by dividing both 2018 by 2 and 24 by 2 (when I only had to divide one of them), getting literally half the right answer. I know technically it's even improper to divide 2018 by 2 at all, but my brain tends to just manipulate the numbers anywhere as long as it doesn't affect the order of operations. Clearly I suppose it's still a bit problematic.
@LaskyLabs4 жыл бұрын
I think the data you used to train the ai is very useful. Thank you for making it public.
@binaryorbitals5 жыл бұрын
At 6:06 I could tell it would be the bee movie script Wow Cary, ORIGINAL