Пікірлер
@flashn00b
@flashn00b Ай бұрын
Surprised no one has made an UTAUloid of the Commodore 64's speech synthesis
@cazb73
@cazb73 Ай бұрын
This demo , if presented in year of the c64 release, would stop progress of home computers by decade. Why 16bits, if the 8bits are capable of this! (Ideally combined with late Tim Folin music)
@janderogee
@janderogee 3 ай бұрын
Wow, this is awesome! Amazing work! What also amazes me is that this channel, with so much groundbreaking stuff, has only 550 subscribers. Please keep up the good work, it is very inspiring. Is there a place where more info is found on how you all did this? Can't wait to see this used in games and such, I know there are limitations in how this can be used, but the ability to get such high quality sound at such a low bandwidth using a stock C64 is mind boggling!
@AppliedCryogenics
@AppliedCryogenics 3 ай бұрын
Sounds good! Seems like maybe you did spectrum analysis of the source audio, detecting peaks vs time, then additive synthesis of those peaks using SID voices?
@thealgorithm
@thealgorithm 3 ай бұрын
Not quite the case. Each layer parameter has its own phase and custom waveform and there is analysis of pre-post frequencies with a overall frequency sensitivity curve where the specific layer parameters are chosen. The entire thing is also digitally decoded (no sid oscillators used). However my past "frodigi" series did indeed do it the way you mentioned (and the quality was pretty subpar for many other reasons too - such as no phase data and only a sine waveform)
@acied6200
@acied6200 3 ай бұрын
Question, in the end it still plays it as a digi over the volume in d018 ?
@thealgorithm
@thealgorithm 3 ай бұрын
yes, all the decoding is done digitally and the output is sent to d418 (custom setup) to allow roughly 6 bits resolution
@acied6200
@acied6200 3 ай бұрын
I have been experimenting with the other techmique. Playing digis over 1 voice. The quality is way superior to d018 toggling. Its 8 bit out, but since its controlled by the frequency, you can even play up to 16 bit digis. Only drawback is that it takes more cycles per push. Currently i am on a project to use a 2nd c64 stock connected with a parallel user port cable to have 1 c64 calculate continiously a pseudo 3d road. I could imagine something wih this too. Let 1 c64 decode and push its values into dd01 , so the other c64 can be dedicated to only pick up from dd01 and play the digi.
@acied6200
@acied6200 3 ай бұрын
@@thealgorithm Or sync 2 c64s and have stereo :-)
@acied6200
@acied6200 3 ай бұрын
@@thealgorithm Had to look it up, the voice playing digi is the sid vicious code as found at codebase.
@thealgorithm
@thealgorithm 3 ай бұрын
@@acied6200 yes, its currently the highest quality method of getting digi output but needs to be cycle exact and uses more cpu time hence why I did not utilise that. Ideally you would need to write triangle or sawtooth, then set testbit, then write the sample value to high byte of freq register and then set silent waveform per sample update so that it will seek to the required value in the next update
@bufferjoetommas
@bufferjoetommas 3 ай бұрын
somehow i feel "old radio" nostalgia. wasnt unpleasant to hear at all . when compared to the muddines of a bad ripped mp3 back in the days XD
@dinkc64
@dinkc64 4 ай бұрын
Super impressive, wowsers!!!
@AftercastGames
@AftercastGames 4 ай бұрын
Is there any multiplication going on in the C64 decoder, or is it all addition and/or bit shifting? Is there any floating point math going on?
@thealgorithm
@thealgorithm 4 ай бұрын
There is no floating point maths or multiplications in the decoder. The core of the decoder is additive synthesis with each layer being added with different volume levels together and a phase accumulator per layer
@AftercastGames
@AftercastGames 4 ай бұрын
@@thealgorithm How do you do different volume levels on the decoder side without multiplication or bit shifting?
@thealgorithm
@thealgorithm 4 ай бұрын
@@AftercastGames Tables and indexing :-) 1k of tables for volume (32 levels with 32 values for each). The main core of the decoder reads an indexed channel and points to a volume table which adjusts the value and accumulates these for the other layers). Outside of the decoder, the parameter updater self modifies the decoder to point to the new volume tables, waveforms, pitch offsets etc. On a machine with such low cpu power, Tables are the key!
@acied6200
@acied6200 4 ай бұрын
Very cool !
@antivanti
@antivanti 4 ай бұрын
Imagine showing this to someone back in 1982. Tho I suppose even tho the C64 can do the decoding there might not have been a mainframe in the world at the time capable of doing the encoding?
@thealgorithm
@thealgorithm 4 ай бұрын
With phase optimisations and accuracy set to low, I can encode around 1 seconds worth of data in a minute (on a single core of a zen3 5800h) I would guess a supercomputer during that 1982 era (cray 1) would take several months to do that
@tsonez
@tsonez 4 ай бұрын
Impressive! Somehow reminds me of the backgroung music while being on hold on the phone.
@julienbraudel7109
@julienbraudel7109 4 ай бұрын
Can't wait to hear 11khz..
@NickFellows
@NickFellows 4 ай бұрын
Imagining a He-Man game with the actual music.
@glokyful
@glokyful 4 ай бұрын
bravo !
@-Error99
@-Error99 4 ай бұрын
Cool 👍
@pasromano75
@pasromano75 4 ай бұрын
Can you tell me here technical details ?
@thealgorithm
@thealgorithm 4 ай бұрын
In this current implementation (in this video) - since then there have been additions, But will talk about this particular one. The decoder on the C64 has 8 channels with each channel having 11 bit frequency resolution, 7 bit volume resolution and can place any one of the 8 predefined waveforms (sawtooth, pulse, triangle, noise, sine, res1, res2, trisaw) each having its own phase. The encoder does the heavy computational work being able to decompose a chunk (20/40/60/80ms) into multiple layers of waveforms, phase, amplitude and phase. The decoder mixes all these in realtime to recreate the audio on a stock c64 machine
@pasromano75
@pasromano75 4 ай бұрын
Ok, so , in brief, we have are no digi music , but music recreated with the original Sid waveform, it's correct ? Very interesting .... I think never done before ....!
@pasromano75
@pasromano75 4 ай бұрын
The encoder phase Is done on a regular PC ?
@thealgorithm
@thealgorithm 4 ай бұрын
@@pasromano75 The music is recreated by simple waveforms yes and when these are layered together with phase, amplitude and frequency it recreates the source audio. All waveforms are digitally mixed hence it can be used on other hardware too
@thealgorithm
@thealgorithm 4 ай бұрын
@@pasromano75 Yes
@pasromano75
@pasromano75 4 ай бұрын
I some kind of digi music ....?
@tsonez
@tsonez 4 ай бұрын
Great update and what a nice improvement! Keep up the good work. 👍
@FunkyM217
@FunkyM217 4 ай бұрын
I know someone who'd be very interested in this one...
@parzivalwolfram7084
@parzivalwolfram7084 4 ай бұрын
Damn, this is not bad at all. How long does encoding take on the host machine?
@thealgorithm
@thealgorithm 4 ай бұрын
It really depends on the parameters and quality settings. The above example took (on a ryzen 5800h in single thread) around a minute per second of encode. If i want to maximize the quality further, it can take a massive half an hour just to encode 1 second!
@fcycles
@fcycles 4 ай бұрын
Very impressive… the high frequency fidelity you achieved and as always I enjoy hearing the compressed result… and surprise to see how brass instruments sound like!
@halabardsznept3897
@halabardsznept3897 4 ай бұрын
@JimLeonard
@JimLeonard 4 ай бұрын
Can we hear a voice-only sample next time?
@thealgorithm
@thealgorithm 4 ай бұрын
There is a voice sample in this video, It contains a voice sample (tatu) unless its blocked in your region and muted. But to reiterate, this is not a speech encoder (other custom speech codecs will do a lot better for speech only). The whole purpose of this codec is to compress music to a "high" quality compared to its bitrate and to play it back on low cpu spec devices. Would be nice to see what other alternatives there are under 400-500 bytes a second playing on 80's retro hardware (without being a garbled mess) but not seen any yet. (exception being the amazing encodec or so but that requires powerful pc hardware and training data) and lots of ram and cpu for decode
@JimLeonard
@JimLeonard 4 ай бұрын
​​​@@thealgorithmBy "voice only" I meant normal speech, not singing, no reverb or echo effects. I know you are optimizing for music only, but hearing a speech example of only talking helps evaluate the codec. Speech-only codecs are not appropriate for music, but music-only codecs are usually also appropriate for speech. (Essentially, I'm trying to evaluate if your codec is fixed wide band, fixed narrow band, or adaptive wide band)
@thealgorithm
@thealgorithm 4 ай бұрын
@@JimLeonard here, this is an old example from a few years back, but you should get the rough example of the type of quality to expect from speech only (at under 300 bytes a sec) It is singing albeit more in a monotonous voice. suffice to say, this will give an example of how it will sound quality wise even if there is just pure monotonous speech www.dropbox.com/scl/fi/qq3lyxmq863csz5s7vfi4/awd3-diner50hz.mp4?rlkey=agipfhfo3f91onzo3hutpxccq&dl=0 Will encode a speech only with no singing example and link it to you soon
@thealgorithm
@thealgorithm 4 ай бұрын
total monotonous speech here (its actually under 160 bytes a second after packing) as there are the gaps between the speech. (but if including the gaps inbetween its roughly at 250 bytes per second www.dropbox.com/scl/fi/d51bor98i6itgeubtcd3c/speechtesta.avi?rlkey=e0b20kppa2zohtaofz2le9htk&dl=0 The toms diner one does sound considerably better though By the way the spectrogram is in realtime so you will have a rough idea of the ranges used (mainly it falls in the mid/low frequencies)
@JimLeonard
@JimLeonard 4 ай бұрын
@@thealgorithm Thanks! It seems to adapt pretty well, all things considering. At 500 B/s it is still competitive with other low-power speech codecs.
@antivanti
@antivanti 4 ай бұрын
How's the quality at higher bitrates? Also what's the fastest you can stream data from a disk on an unmodified C64 and drive? And when do you hit diminishing returns? Would be fun to see how far the quality can be pushed if one was to dedicate one entire side of a floppy to one song 😊
@thealgorithm
@thealgorithm 4 ай бұрын
The quality at higher bitrates is considerably higher (high enough to sound just the same nearly as the source pre-encode) however the pack rate in this case is roughly 5:1 (May be more beneficial to use other codecs such as adpcm in this case which would use less cpu time compared to the AWE method of decode. Fastest IRQ loader at the moment is roughly 7.7k a second on the c64 (However this is if you give it all cpu for loading). As the AWE decode uses 90% of cpu time and has non maskable interrupts for sample playback, This would roughly reduce loading speed to perhaps 600 bytes a second (which is just about enough to stream and playback larger segments of audio
@dajan777
@dajan777 4 ай бұрын
So you can squeeze whole 2 minutes of song into C64 memory in this quality?
@thealgorithm
@thealgorithm 4 ай бұрын
It would be roughly 1 and a half minutes as there would be roughly 46k or so of spare ram left, but ofcourse can be resequenced for repeating sections and in most cases should be able to fit whole songs into it (There was a human league example and la isla bonita example in earlier tests that were all in a single load)
@LPChip
@LPChip 3 ай бұрын
@@thealgorithm What if you add a ram expander, or go the C128 route?
@thealgorithm
@thealgorithm 3 ай бұрын
@@LPChip Then the possibilities are more. For example, I can then write data to the REU at a higher sample rate (and more layers) or/and with more accuracy, then play the decoded sample from REU at these high sample rates without using much cpu time.. For the C128, 2mhz can be activated in the border areas which can speed up decode and allow me to have a higher sample rate/or/and layers. However both these are not stock (I prefer coding on a stock machine - even though i have coded some easyflash cartridge music video demos before)
@nunyabiznas2696
@nunyabiznas2696 4 ай бұрын
So many years later, still so sick!
@kanalnamn
@kanalnamn 4 ай бұрын
Cool. :) How long is "long time" to do the encode? Hours per second of content? More? How does it perform on pure speech?
@demoscenes
@demoscenes 4 ай бұрын
Incredible! This is the future of waveforms! <3
@fcycles
@fcycles 4 ай бұрын
well, that is what I was curious for in your last test.. is all these samples come from one single .prg file or it's a montage of samples?
@thealgorithm
@thealgorithm 4 ай бұрын
All from one prg file
@JimLeonard
@JimLeonard 4 ай бұрын
The degradation is too high IMO. This might be better for spoken word content without bits wasted on background music.
@thealgorithm
@thealgorithm 4 ай бұрын
dedicated speech only codecs should perform better than this (This encoder is not specifically aimed at speech). The preencoded source file does not sound that different to the encoded output (and its still packing the data to 12:1) even with the additional degradation of phase accuracy. The freddie mercury part for example. The others suffer more when comparing source pre-encode to decoded output (which gets considerably better with the phase set to full resolution). These examples also were mainly at 40ms (with transient sensitivity set to very low). Bear in mind that the sample rate is pretty low. At some point I will implement the decoder on the Amiga side (This will allow me to have higher sample rates and less restrictions on waveform size) - and still use the same filesize increasing the compression ratio Still a lot of improvements to make and on the pipeline. For example, more updates and less layers (and vq'd volumed codebooks) should improve the overall quality (even though the volume tables will then be non optimal)
@JimLeonard
@JimLeonard 4 ай бұрын
@@thealgorithm I look forward to your Amiga experiments!
@dwsel
@dwsel 4 ай бұрын
Sounds great and the compression rate 🤯
@zvonimirstrucelj6190
@zvonimirstrucelj6190 4 ай бұрын
Any chance we'll see final release soon?
@dwsel
@dwsel 4 ай бұрын
All hail @thealgorithm !!!
@CodexPermutatio
@CodexPermutatio 5 ай бұрын
Holy shit!
@julienbraudel7109
@julienbraudel7109 5 ай бұрын
Great work.
@MichaelHuth
@MichaelHuth 5 ай бұрын
Impressive, will there be a publication describing the details?
@thealgorithm
@thealgorithm 5 ай бұрын
At some point yes, there is quite a lot of things to cover both on the decoder side and the encoding
@youtube-stole-handle-att3nd1
@youtube-stole-handle-att3nd1 5 ай бұрын
​@@thealgorithmcan't wait to read about it. Your work keeps pushing the boundaries of 8bit home computers, and the c64 specifically, not to mention that thw results are amazing from the algorithmic viewpoint (no pun intended, at least not by me 😊)
@joecincotta5805
@joecincotta5805 4 ай бұрын
Amazing
@muchkaev
@muchkaev 5 ай бұрын
does it works with voice too?
@thealgorithm
@thealgorithm 5 ай бұрын
Sure. Look at my earlier examples on my channel which have examples from various pop songs. Though since then i have improved and refined the encoder by quite a bit
@muchkaev
@muchkaev 5 ай бұрын
@@thealgorithm yep i know and i remeber that you have a problems with publishing commercial songs on KZbin. Maybe next time make something with voice from someone less known than Madonna :-)
@dwsel
@dwsel 4 ай бұрын
@@thealgorithm Can we hear something with voice encoded in the new algo?
@thealgorithm
@thealgorithm 4 ай бұрын
@@dwsel here you are (this is using phase quantisation, but expect around 15% better if this option is not used) kzbin.info/www/bejne/gpislJloateHn9k
@julienbraudel7109
@julienbraudel7109 5 ай бұрын
Impressionnant. Does it mean you could also think of a project to play protracker songs for instance, or 8 channels. MOD? Congratulations for your stunning work.
@thealgorithm
@thealgorithm 5 ай бұрын
I have some unreleased code that is able to play back 4 channel protracker songs (but also is able to depack each sample on the fly while mixing too) - This would allow large mods to fit in ram at one time. I have an earlier version of this in one of my single file c64 demos kzbin.info/www/bejne/aWPXaKiZpb-dpbc
@julienbraudel7109
@julienbraudel7109 5 ай бұрын
I definately love your work. No .MOD player on c64? You could do yours. Congratulations again for your work.
@pannonianbrute
@pannonianbrute 5 ай бұрын
Jawdropping!
@plasmaastronaut
@plasmaastronaut 5 ай бұрын
the sound effectively switched off after the start and u have to dl it and ru it thru audacity amplifier x50 to hear it.
@dwsel
@dwsel 5 ай бұрын
I thought I understood what's happening around Frodigi times, but at this point I can't keep up behind your genius
@RudeRud7
@RudeRud7 6 ай бұрын
👍
@JimLeonard
@JimLeonard 6 ай бұрын
I think this has more artifacts than the previous experiments. More layers and more volume bit depth sounds better in my opinion.
@thealgorithm
@thealgorithm 5 ай бұрын
Added 5 bit resolution now (without increasing precalced wavetable size. Quality is better though still needs some quality improvements. Have some ideas on improving it further at the same size. (Then when happy with it, will post some higher bitrate versions) kzbin.info/www/bejne/qabceGpuf92Xnas
@fcycles
@fcycles 6 ай бұрын
Your compression research/projects on C64 is one of, if not the best things I see! The quality of the sounds make it very interesting to hear and try to understand the lost of information decided by the encoder. Hearing this, I try to figure out how our/my brain encode it, how these information could be visualize... In fact the higher the compression can be and there is a point where we should be able to bring a visual display that represents in real time what we hear and that it could make a sense... by analogy a text spoken can be visualize by displaying the text...
@thealgorithm
@thealgorithm 6 ай бұрын
Thanks. Yes, this is the idea for this demo (to have lyrics in time to the singing). Though i am also still considering instead to encode at a higher bitrate and have snippets of songs instead. (By doubling the file size to 400 bytes a second, it nearly sounds as good as the original resampled original) - at least in the same madonna example
@fcycles
@fcycles 6 ай бұрын
@@thealgorithm I know this idea is far more complicated, but thinking about how we were doing our digimix back in 1991... our music guy was using the original song track and also an instrumental version only... to create a mix of the song. Maybe this kind of idea can be use here where the song refrain is with lyrics and high quality and other part instrumental but still in digital with higher compression rate?
@thealgorithm
@thealgorithm 6 ай бұрын
@@fcycles in theTCBI50k single filer demo released in December 2021 or so, i used a higher quality background layer and lower quality speech layer. However I had to choose a track specifically that had many repeating background patterns (and the rick astley track was exactly that). It did comprise of a suboptimal quality speech layer though The only issue here is that there is not many tracks that have such repeated sequences of background audio. I did also experiment in separating the bassline, drums etc and reconstructing them like a mod track - which may be somewhat promising and something to look into
@tsonez
@tsonez 6 ай бұрын
Incredible! Is this song a hard or easy case for the encoder? Would it work better for other types of songs or genres?
@thealgorithm
@thealgorithm 6 ай бұрын
The original song has background chords, drums, bassline (deep bassline) and speech combined, so it is a hard case for the encoder considering the low bitrate used. For more demanding titles, I can increase the layer amount or/and parameter updates though there is a limit on how many layers i can mix on the C64 decoder end (due to the slow processor)
@tsonez
@tsonez 6 ай бұрын
@@thealgorithm I see. So easier case would be something with less layering.
@thealgorithm
@thealgorithm 6 ай бұрын
@@tsonez Less layering is needed for audio that is less complex (e.g speech) - this in turn will use even less space
@AftercastGames
@AftercastGames 6 ай бұрын
Unbelievable. I’d love to see more info about the compression and the SID registers involved, if it’s documented anywhere.
@thealgorithm
@thealgorithm 6 ай бұрын
It uses digital mixing without using the oscillators of the SID. Of-course the thing runs on a stock c64 regardless. Plus point of this is that it can be feasible on other machines too. More information on the technique here kzbin.info/www/bejne/j6PInIeoeLZljNU
@AftercastGames
@AftercastGames 6 ай бұрын
@@thealgorithm Gotcha.. So the only register involved is the volume register then?
@thealgorithm
@thealgorithm 6 ай бұрын
@@AftercastGames correct. its output via one register only. This also means that its not machine specific and can be used on other devices (as it does not rely on sid specifics such as its channels or oscillator)
@SuperWiiBros08
@SuperWiiBros08 7 ай бұрын
I thought it was N64 but this is amazing too
@Nbrother1607
@Nbrother1607 Жыл бұрын
music too fast video too faster
@rokker333
@rokker333 Жыл бұрын
Please, can someone explain a bit the background techniques? From my point of view this is really impossible on an original C64 HW.
@Ic1621
@Ic1621 Жыл бұрын
Maria isnt show mercy for me! It was so brutal effect! Gz for programmer!