This is LPC (Linear Predictive Coding): yₙ = eₙ − ∑(ₖ₌₁..ₚ) (bₖ yₙ₋ₖ) where ‣ y[] = output signal, e[] = excitation signal (buzz, also called predictor error signal), b[] = the coefficients for the given frame ‣ p = number of coefficients per frame, k = coefficient index, n = output index Compare with FIR (Finite Impulse Response): yₙ = ∑(ₖ₌₁..ₚ) (bₖ xₙ₋ₖ) where ‣ x[] = input signal The similarities between the two are striking. FIR is used in applications like low-pass filtering, high-pass filtering, band-pass filtering, band-stop filtering, etc. It is an almost magical type of mathematics that is used to generate these filters. For LPC, there are several different algorithms, many of which are implemented in Praat, the software that I used in this video to create my LPC files.
@huyvole97246 жыл бұрын
I met that formula when I learn Signal & System module (my school call it Digital Signal Processing)
@BichaelStevens6 жыл бұрын
16-17 minutes in: Please next time lower the audio or give a warning. The popping killed my hearing
@Rennu_the_linux_guy6 жыл бұрын
uhhh
@a1k0n6 жыл бұрын
In fact it's identical to an IIR filter, which has coefficients for both x and y, and your x coefficient is 1 and all your y coefficients are negated.
@RazorM976 жыл бұрын
How to stop prison radicalization
@KatzRool6 жыл бұрын
I was going to make a joke about how you already sound like speech synthesis when speaking English, but your English gets better every single video. Keep it up man!
@framegrace16 жыл бұрын
I think the clicks are because the program is cutting/pasting at random waveform values. This produces non-continuous gaps in the waveform that generates those clicks. I think the simple way to solve it, is to just wait until the value of the sample crosses the 0 line to perform the cut of the audio, and wait again a 0 crossing to introduce the next one.
@KuraIthys6 жыл бұрын
Interesting theory. That actually matches advice mentioned in the SNES manual in relation to audio samples. What it is trying to say exactly is ambiguous, but it warns against discontinuities in the waveform, which would result in clicking sounds. Of course, given the ADPCM coding, discontinuities on block boundaries would easily result if you're not careful. (since the samples within a block are all expanded using the same parameters, but across block boundaries the parameters change.)
@idk-bv3iw6 жыл бұрын
What about a simple fade-out/fade-in between the samples?
@TheBcoolGuy6 жыл бұрын
@@idk-bv3iw That's the method used in video editing.
@crimsun71866 жыл бұрын
You also have to determine a rithmic pattern dependant on the langauge and overall delivery, as words are not spoken at a constant pace.
@a1k0n6 жыл бұрын
I don't think that will work, because of all the excitation signal history in the bp[] array. Instantaneously changing the filter coefficients can lead to instability. One thing that might help, or might make it worse (I'm not sure) is to try implementing the transposed version where the bp[] array isn't just past output samples, but partially computed future samples. See the notes here: docs.scipy.org/doc/scipy/reference/generated/scipy.signal.lfilter.html
@x0j6 жыл бұрын
This doesn't fool me, I know you have a much more advanced synthesizer that you use for your videos. A nice coverup attempt though
@tomh63394 жыл бұрын
Dude. I haven't used Praat since University, was hit by waves of nostalgia in the most unexpected place. Your videos are the best, you're quite the renaissance man.
@educate99466 жыл бұрын
Now I can have Robot Bisqwit wake me up every morning.
@thefoolishgmodcube26446 жыл бұрын
Imaging having “SHALOM! SHALOM!” as a wake-up alarm
@kkeanie5 жыл бұрын
@David Plays Stuff I really need that. it would stop my depression
@PantsYT5 жыл бұрын
"Hyvää huomenta"
@sindavmi4 жыл бұрын
robot bisqwit is a pleonasm
@greasyfingers92506 жыл бұрын
"Yes, I use PHP. Because a programming language, that you know is much more efficient than one that you don't know." This is the truest statement I have ever heard.
@greasyfingers92506 жыл бұрын
@Michael Smith You can debug it line by line with xdebug, but c# or java are usually better for that kind of work.
@Kitulous6 жыл бұрын
in order to debug PHP you have to var_dump every single variable because the stack trace in PHP is a real mess.
@HermanWillems5 жыл бұрын
Short term yes, long term no.
@jlewwis19953 жыл бұрын
Finally a video that actually shows how to ACTUALLY MAKE a TTS voice from scratch, almost everything online about "how to make a text to speech synthesizer from scratch" is just "use this function to call the os TTS library lul"
@BichaelStevens6 жыл бұрын
We have reached peak AI revolution - machines making machines A voice synth making a voice synth
@akj76 жыл бұрын
Haha
@huyvole97246 жыл бұрын
-6.4°C
@Bisqwit6 жыл бұрын
Actually the truth was like -22. I just happened to do the recording a month earlier...
@imlxh7126 Жыл бұрын
Uberduck has a neural-network-based simulation of Microsoft Sam. Talk about overengineered lmao
@metadaat57916 жыл бұрын
I always liked the implication of GSM using LPC, that technically you're not hearing someone's actual voice, but a reconstruction made of a buzzer with a filter and hisses and pops from filtered noise. So, you're actually listening to a speech synthesizer's reconstruction of the other person's voice! :-) :-)
@chooha6 жыл бұрын
Hi bisqwit I don't know if you realize this but you are an inspiration for many of the viewers here, like a hero. So could you make a video about how you reached this insane level of skill, what your journey was like, and maybe some tips on how one can be as good as you ? Thanks for all the amazing content ^_^
@MissNorington6 жыл бұрын
Really outstanding video! Great work Bisqwit!
@magicstix0r5 жыл бұрын
The input signal can't be a pure sine wave because: 1.) The vocal chords don't emit pure sine waves; they emit something more like a buzz. 2.) A pure sinewave would almost be unaffected by the LPC filters because it's a single frequency. A buzz is extremely rich in harmonics, and the human ear keys off the presence or absence of those harmonics in determining what was said. That's why if you look at voice data in a spectrogram, you tend to see lots of streaks that move together or widen/shrink based on what's being said. In a sort of philosophical explanation, the input signal is "sampling" your LPC filters. A single single sine wave would result in sampling just a single data point. You need a lot of sine waves to get enough of a picture of the LPC filter to see what it looks like, which is what your brain is keying on to make sense of your words. Think of it kind of like an image. The sine waves are the pixels that you're building a picture of the LPC filter with. A single sine wave is like a single pixel; it doesn't tell you much. A buzz is loaded with lots of sine waves, so analogously it's loaded with a lot of pixels, so it can give you a better picture of the LPC filter, and thus a better picture of the formant it represents.
@Bisqwit5 жыл бұрын
Great explanation! Not an ELI5 though :-) But I would have settled for that.
@magicstix0r5 жыл бұрын
The constant clicks and pops are due to discontinuities at the frame boundaries. With an algorithm like this, they usually fix it using overlap-add. The gist of OLA is that your frames overlap and are weighted by a windowing function, then you sum them together where they overlap.
@oresteszoupanos6 жыл бұрын
Joel, regarding your question at 8:05, we cannot use a sine wave because it only has audio energy in 1 frequency, whereas to synthesise human speech, we need energies in "all" frequencies, so we can have base pitches and formants happening at the same time. Buzzers have a better spread of frequencies, compared to the more "pure" sine wave. Hope I made sense ^_^
@Bisqwit6 жыл бұрын
Good explanation, but not really an ELI5 :-) I understand the situation as indicated elsewhere in the video, but I was having trouble explaining in layman terms without referring to things like frequency spectrum; I wrote that request for the benefit of audience.
@oresteszoupanos6 жыл бұрын
@@Bisqwit Aha, I'd never heard the term ELI5 (Explain Like I'm 5) before! Here is my second attempt :-) Voice sounds are slightly complicated. Sine wave sounds are simple. Buzzers are super-complicated. We cannot use 1 simple sine wave, filter it, and get a complex voice sound. We have to start with a super-complex buzzer, then filter out some things, to be left with a less-complex voice sound.
@frisosmit89206 жыл бұрын
That's actually a very good explainaition. Your first explaination made me understand it. But then again, I'm not 5 years old.
@noneofyourbeeswax34605 жыл бұрын
But you could superimpose sine waves to get all the frequencies?
@Bisqwit5 жыл бұрын
Yes, and in fact all waveforms can be represented as a sum of sinewaves. That is what e.g. the Fourier transform is about, or the discrete cosine transform.
@OverSeasMedia6 жыл бұрын
bisqwit was the inspiration to write my own tools whenever i need one, Great video.
@wallaguest16 жыл бұрын
i cant understand how you have so much knowledge, its crazy
@DudeWatIsThis4 жыл бұрын
Bisqwit you fucking legend man. This is the way to handle the banter. Throw it straight back at them! Genius stuff. You win again, good sir!
@pixelflow5 жыл бұрын
Finally! A Bisqwit Vocaloid :3
@shivisuper6 жыл бұрын
These videos make me respect you even more. You're very knowledgeable!
@prizmarvalschi13194 жыл бұрын
This is kinda like how utau users create voicebanks Except we sing in 5 syllable strings for Japanese,sometimes more for others languages. And sometimes recorded in three or more pitches.
@noname-rr7hk2 жыл бұрын
I was searching for this video for half a year. Thankyou...
@fisu515 жыл бұрын
Kyllä
@BeeBaux5 жыл бұрын
Great! job bro. Thanks for making complex thing easier.
@d3ibit6 жыл бұрын
Joel, always a pleasure to watch a C++ (related in some way) video. Keep the good work!
@MrGoatflakes5 жыл бұрын
6:34 if you say this five times into a mirror at night you will summon a Bisqwit :P
@tomaszx77604 жыл бұрын
I remember play with " Say " speech synthesizer from Workbench 1.3 OS (at Amiga 500 computer)
@stennisrl6 жыл бұрын
Wow, what a cool video to wake up to. Excellent work!
@DrSid426 жыл бұрын
Just had an idea I will make my own speech synth. I wondered if there is some nice example low-level enough. And guess what. This guys had the same idea just in time to have it done now. Great job !
@miszczklasykuw30256 жыл бұрын
music in background adds nice atmosphere to video as always x)
@moth.monster5 жыл бұрын
Now we need to record the speech synth speaking and use that to make another synth
@kapiltyagi46396 жыл бұрын
The solution for the clicking in the sound is to simply fade out some of the frequency from the very end of the sample. Because LPC just converting the audio samples into the simple and low resolution waveform just bunch of float values and a gain.
5 жыл бұрын
Super interesting article. Thanks!
@skilz80986 жыл бұрын
Once again; another great video!
@davidcuny70024 жыл бұрын
The red lines in Praat indicate formants, not the overtones. The vocal chords produces pulses, which have a fundamental frequency (pitch) as well as overtones (multiples of the pitch). The tongue forms a series of "tubes" in the mouth, which causes the pulses to resonate at frequencies proportional to the length of those various chambers. The resonating frequencies of these "tubes" are formants, and different mouth shapes create different sets of resonating frequencies.
@RamLaska5 жыл бұрын
I did something like this in the early nineties. I recorded my voice on my Mac SE, and wrote a hypercard stack to play the correct sounds together. It didn't translate English into phonemes, you had to write out your own phonemes, but that wasn't quite so unusual at that time. I also only made one recording per phoneme, because ain't nobody got time to record every possible phoneme pair 😂
@thetastefultoastie60776 жыл бұрын
I've never seen `++i %= max` before. That's pretty cool. Edit: it seems this only works in C++ but not in C, Java or Javascript
@Bisqwit6 жыл бұрын
In C++, operator++() returns a reference to the object being modified. This is not the case in C. This has nothing to do with C++17 or about sequence points. If the expression was `i++ %= max`, it would be a different story. `++i %= max` is completely unambiguous in its meaning. The reason it does not work in C is because `++i` returns a non-lvalue copy of the variable in C, not a reference to it. (C does not have references.)
@thetastefultoastie60776 жыл бұрын
@@Bisqwit Thanks for the explanation! I used an online compiler to quickly try all versions of C++ and indeed it worked in all of them.
@Smaxx6 жыл бұрын
@@shaurz I'd just write a tiny inline function with a speaking name instead. ;) Like `incmod(v, m)`
@DrSid426 жыл бұрын
@@shaurz It seems weird to you because of different background. Finish folk did it like this for centuries.
@noneofyourbeeswax34605 жыл бұрын
@@DrSid42I don't think computers have been around for centuries
@yukimoe6 жыл бұрын
So you're basically teaching us how to make Vocaloid-like software? Nice.
@ceablue80376 жыл бұрын
@jj zun Yesssssssssssssssssssssssss
@mattg54615 жыл бұрын
Brilliant. I find this video a week after handing in my dissertation on vocal synthesis... This would have changed everything
@Bisqwit5 жыл бұрын
How so?
@mattg54615 жыл бұрын
There's just a lot of things you've covered in here that I wasn't able to find much concrete information about - things like accents and dialects especially. Lots of things like that which I knew from common sense but couldn't find actual written documentation to back up.
@gero93073 жыл бұрын
I created a voicebank CVVC and VCV type for utau, and while watching this video I experienced deja vu)
@AT-zr9tv3 жыл бұрын
Your videos are fantastic. This one particularly.
@GibusWearingMann5 жыл бұрын
I'm starting to become curious how to stop prison radicalization.
@DynamicFortitude6 жыл бұрын
8:00 Buzzer cannot be pure sine, because then the filtering of the frequencies would make no sense - there would be only one frequency in buzzer to start with. Buzzer needs to have rich frequency spectrum, but at the same time it needs to be harmonic (i.e. all frequencies are natural multiples of some base frequency = there is a defined pitch). You could use any function in form A*sin(x) + B*sin(2x) + C*sin(3x) +..., but of course the easiest way to produce signal like that is to use 1) square wave, 2) sawtooth wave (as you did), 3) triangle wave, 4) exp(sin(x)), etc.
@Bisqwit6 жыл бұрын
Good explanation, but not an ELI5. I had trouble explaining it in layman terms without invoking mathematics and frequency spectrums... That's why I wrote the annotation.
@DynamicFortitude6 жыл бұрын
@@Bisqwit Vocal cords are the buzzers. Air go through a buzzer, then through a tube (vocal tract) which amplifies some frequencies (formants), and dampen other. If buzzer sound would be just one sine wave, then the tube just makes it louder or more silent, nothing more. Tube cannot create new frequencies, acts as a filter only. So the aim for the buzzer is to generate many frequencies, so the tube (vocal tract) have something to choose from. White noise (during whispering) have all frequencies - so it is OK. Pitched sound is also OK, since it have many sine waves in it, as long as its base frequency is not too high (easier to understand bass singing than soprano singing!). High pitch have fewer sine waves in formants frequency range (~300-3000Hz). Try changing VoicePitch to ~1046 Hz (soprano's high C), and you won't be able to distinguish vowels o from u from a, or e from i.
@XTpF4vaQEp5 жыл бұрын
13:15 accidentally used the whisper effect
@farteryhr5 жыл бұрын
virtual singer Bisqwitoid confirmed (slap have you played with UTAU (singing synthesis software) in which it's very easy to make your own voicebank (and get quality high)? looking forward to that soooo much~ it's just wonderful to find another common interest of you and me.. phonology and speech/singing synthesizing! (but yes to get high quality it needs deeper understanding of singing in timing, rhythm, grammar, and much time to fine-tune pitch, volume, breathiness envelopes for songs)
@JokerCat-x2t7 ай бұрын
I know this is 5 years old, but it's still cool to listen to.
@adam78686 жыл бұрын
I think I remember asking about this at one point, glad to see a video done on it
@edo9k5 жыл бұрын
I wish I had seen this video when I was researching for the master's degree.
@Bisqwit5 жыл бұрын
What did you write about?
@pedropereirapt5 жыл бұрын
So inspiring! Thanks for this video, you got a new sub!
@themcc18796 жыл бұрын
Sample voice frame to C code... the Lisp lover in me says you should have used Lisp, code as data and data as code. Either way this was beyond interesting. I like your accent but to be honest everyone who speaks English has an accent. The voice speaking with an accent was diffently something I wasn't expecting this 月曜日。
@codeninja18326 жыл бұрын
This is interesting as a programmer, as someone who's trying to learn another language (old english, dead language sure, but fun), and as someone who asked you how to trill about a month ago haha. Still can't trill, but I'm on my way.
@Bisqwit6 жыл бұрын
Thanks for posting!
@1st_ProCactus5 жыл бұрын
Awesome !!!
@robertboran62346 жыл бұрын
Great Project. Thanks for sharing.
@GabrielCrowe6 жыл бұрын
Awesome stuff.
@gandolfphoenix13635 жыл бұрын
You used the speech synthesizer that you made to give the Tutorial!
@Bisqwit5 жыл бұрын
Yes, I used it in the first few seconds of this video.
@uxxlabrute6 жыл бұрын
Earthbound music in the background FeelsgoodMan
@Catbangin6 жыл бұрын
Cheer bisqwit! Almost near to guitar effects tutorial!
@dgmsstuff6 жыл бұрын
I'm speechless. No pun intended.
@Thebasicmaker4 жыл бұрын
I also made a speech syinthethizer using the same procedure but my language was BASIC! And the voice was mine too pronuncing a word and then cutting the part that I needed and the program just had to load the sounds and play it one after the other to speech reading a phrase I give to an input intruction
@firemaniac100105 жыл бұрын
I'm guessing the "buzz" can't be a pure sine wave because a pure sine wave has no harmonics; it's a pure tone. In other words, there's nothing to filter out except for one single frequency.
@alexhauptmann2985 жыл бұрын
ELI5 explanation for why you can't use a sine wave: the human voice is essentially a subtractive synthesizer. Most commercial music synthesizers can do some form of this. It's the same sort of "buzzer in a tube" model, except the tube is generally way simpler (unless you're Plogue, but that's another story). The reason a sine wave can't be used is because subtractive synthesis works by taking away frequencies from a harmonically-rich (i.e. complex waveform) sound. Any given wave can be recreated by an arbitrary number of sine waves, but a sine wave can't be broken down into something simpler. So essentially, a sine wave can't be used because it's not enough data. It mathematically cannot be subtracted from any further. This is...more complex than I was intending but oh well lmao
@Bisqwit5 жыл бұрын
Good explanation, but definitely not something that works for five-year-olds :)
@alexhauptmann2985 жыл бұрын
@@Bisqwit Haha, I figured. Is that a QRIO in the thumbnail btw? I wanted one SO BAD as a little kid and was thoroughly impressed with how realistic the synthesized speech sounded. Of course, now I know (from experience, even) that Japanese is a MUCH easier language to synthesize than English. Also while watching your video on Finnish phonetics, I found it interesting how it's sort of similar to Japanese (vowels with singular pronunciation, lengthened vowels and consonants). I wonder if that would make it technically easier to synthesize than English (at least, native-speaker English)...at the very least, it would make the plaintext dictionary rules much easier :P
@Bisqwit5 жыл бұрын
It’s a Nao, not Qrio. And yes, as a Finnish person who knows the basics of Japanese, I find Japanese much easier and familiar in many aspects compared to English.
@oo8dev6 жыл бұрын
Amazing!!
@smkyone6 жыл бұрын
kiitos
@zeppy131315 жыл бұрын
I can't speak for anyone else, but I was glad when this was Finnished.
@JoLiKMC6 жыл бұрын
I, for one, welcome our new, Finnish robot overlords. _Hail Roboisqwit!_ Seriously, though, this is neat-as-hell. It's also kind of… heartbreaking, in a way. I never considered how speech synthesis works, and now that I know? The magic… is gone. :(
@clearz36006 жыл бұрын
Interesting as always.
@j5679 Жыл бұрын
Very interesting video. I may have missed it but it seems like you are not incorporating stress accent into your synthesis, right? Algorithmically figuring out where the stress lies may be a bit of a challenge depending on the language (or be downright impossible), but the English Wiktionary actually provides this data and they also offer regular HTML dumps that contain IPA transcriptions. Finnish actually happens to be one of the best covered languages on the English Wiktionary, so if you ever decide to do a v2 of this project, incorporating Wiktionary's IPA data might be an idea. I'm not sure how much you know about phonetics but please be aware that IPA does not fully capture how words are pronounced. Phonemic transcriptions don't capture it by a long shot but even a narrow phonetic transcription can be slightly inaccurate (vowel qualities are a continuum, the different durations are on a continuum etc.). This all is to say that even if you use IPA data, the rest of the pipeline still needs to be tailored to a specific language and can't produce accurate output language-agnostically.
@Bisqwit Жыл бұрын
From Wikipedia: ”Since stress can be realised through a wide range of phonetic properties, such as loudness, vowel length, and pitch (which are also used for other linguistic functions), it is difficult to define stress solely phonetically.” In Finnish language (this synth aims for speaking like Finnish speakears do) emphasis (stress) is always on the first syllable. In my speech synthesizer, it is realized by using slightly higher pitch for stressed phonemes.
@Darksoulmaster6 жыл бұрын
Wow, i dont know what are you even talking about, but its cool.
@Bisqwit6 жыл бұрын
Speech synthesis
@krank38695 жыл бұрын
I always thought these videos were sped up but then i looked at the clock
@Sturmtreiben5 жыл бұрын
Which graphics software do you use for creating pictures like the one in 3:00? They somehow look really good.
@Bisqwit5 жыл бұрын
Thanks. I use LibreOffice Impress. I also do some postprocessing in kdenlive; basically all _animations_ are done in the video editor.
@Sturmtreiben5 жыл бұрын
Thanks, Joel!
@smallgoodwoodoodaddy6 жыл бұрын
I always liked your accent. So I liked it 👍 :D
@EodeseАй бұрын
1:49 this is exactly the process to make a UTAU voicebank
@BisqwitАй бұрын
Interesting. Is there a video about that?
@ruadeil_zabelin6 жыл бұрын
Note that std::wstring_convert is deprecated in C++17, so if you want to be standard conforming, you should replace it with something else.
@Bisqwit6 жыл бұрын
Noted. I used it for 1) its brevity and 2) because I couldn’t figure out a concise replacement that is not deprecated.
@ruadeil_zabelin6 жыл бұрын
@@Bisqwit Unfortunately there isn't a standard way anymore. The standards commity has said that they're working on a replacement, but will only readd it if it's fully compliant with the unicode standards (apparently this one didn't work in all cases). The only way seems to be fully implement it yourself (utf8 decoding isn't very hard luckily), or use a library like iconv or libicu.
@jfkd28125 жыл бұрын
11:01 Hey, it's imgui! Very nice to use
@Embedonix6 жыл бұрын
+1 for using 'goto' in your code :)
@Armadurapersonal6 жыл бұрын
Perfect for spurdo memes
@ddream2966 жыл бұрын
whoah nice!
@videogamemusicandfunstuff48736 жыл бұрын
11:01 This program looks really nice. What GUI library did you use?
@Kellykellamster6 жыл бұрын
Looks like imgui to me.
@Bisqwit6 жыл бұрын
Yep, correct. Imgui it is.
@minecrafttheobjectno5415 жыл бұрын
Did I hear a turret say "Weeee" when he said "thumbs up the video"?
@Bleenderhead6 жыл бұрын
I want to hear it sing Space Oddity.
@gazehound6 жыл бұрын
I'm early this time. Awesome video!
@Bisqwit6 жыл бұрын
Thank you!
@victorprokop22404 жыл бұрын
3:16 Mongolian throat singing!!! lmao
@Bisqwit4 жыл бұрын
I’m not sure if you are mocking, but the principle is actually similar. The purpose is to enunciate different subtones while keeping the primary tone unchanged.
@yohvh5 жыл бұрын
When you find a problem after you played the audio do you just in real time think of a solution and code it right there at that speed?
@aprilliac6 жыл бұрын
Rolling index, why didn't I think of that... Thanks for the excellent video. :)
@Bisqwit6 жыл бұрын
Yeah, a rolling index is a bit neater solution than doing a copy-backwards-by-1 loop after each iteration. On the other hand, the rolling index makes SIMD optimizations impossible, so it’s a tradeoff.
@NonTwinBrothers3 жыл бұрын
9:30 holy shit that scared me
@ivanbogdasaebersold46905 жыл бұрын
This will be my COVAS in Elite Dangerous...
@vegardertilbake16 жыл бұрын
Ha! This was so much fun!
@jamescumbria44995 жыл бұрын
Are you going to make this speech synthesizer a TTS voice for Windows?
@Bisqwit5 жыл бұрын
I don’t deal with Windows.
@pencrows5 жыл бұрын
was all the audio in this video speech synthesized edit:its not
@icu79925 жыл бұрын
why you don't use namespace std?
@Bisqwit5 жыл бұрын
The standard namespace is not a demon to be vanquished with -a magic spell- boilerplate code. It has a purpose.
@alejandroduarte52456 жыл бұрын
Great video :)
@arcnorj6 жыл бұрын
Can you explain just a bit what you did to generate the LPC sample from David Woods? I guess manually editing the pitch curve with Praat?
@Bisqwit6 жыл бұрын
I dumped the soundtrack of the video into a wav file using MPlayer. Then I opened the soundtrack in Audacity, and cropped it into just those three seconds or so, saved it into a new wav file. (Or maybe I dumped only three seconds from the soundtrack in the first place, using -ss and -endpos options. I don’t remember.) Then I opened the wav file in Praat, and did nothing else but synthesized the LPC from it (Analyze spectrum → To LPC (burg) → Save).
@clementpoon1204 жыл бұрын
Pipe a chatbot to it and add a GLaDOS voice to it, and you've got yourself a GLaDOS.
@distrologic29256 жыл бұрын
I legit thought the speech synthesizer was speaking for the first 3 minutes
@DanieleMarchei6 жыл бұрын
Yes but now we want to listent to the synthesizer's voice
@distrologic29256 жыл бұрын
you *are*
@HerrRussoTragik6 жыл бұрын
Ohhh in the past I've made a pseudo "TTS" using the winmm from windows.h and PlaySound function...
@MESYETI5 жыл бұрын
Wow! I might try to make one, it seems hard though
@maali825 жыл бұрын
rallienglantisuuntaisaiser!
@lubeckable6 жыл бұрын
His C++ level ... is over 9000
@metaorior6 жыл бұрын
Hello would you please do a tutorial series for beginners in developing?
@Bisqwit6 жыл бұрын
Probably, some of these years! When a good enough way to make it comes to my mind. I make my videos mostly through a creative process…
@metaorior6 жыл бұрын
@@Bisqwit i hope soon ! It's really hard finding good content in the internet. so i thought "hey maybe i could ask bisqwit" why not ^^