How NOT to Sample Audio! - Computerphile

  Рет қаралды 110,285

Computerphile

Computerphile

Күн бұрын

Could Dave recreate audio from a wav file preview image grabbed from a screen cap?
More about David Domminney Fowler: / daviddomminneyfowler
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com
Original Image: photos.app.goo.gl/c4pPY8xpCF6...
Dave's Code:
(angle brackets weren't allowed in YT description)
using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Extract_Audio
{
internal class Program
{
private static void Main(string[] args)
{
string file = "computerphile.png";
Bitmap b = new Bitmap(@"g:\Computerphile\" + file);
var values = new System.Collections.Generic.ListANGLEBRACKETOPENintANGLEBRACKETCLOSE();
for (int x = 0; x ANGLEBRACKETOPEN b.Width; x++)
{
int max = 0;
int min = b.Height;
for (int y = 0; y ANGLEBRACKETOPEN b.Height; y++)
{
if (b.GetPixel(x, y).GetBrightness() ANGLEBRACKETCLOSE 0.5)
{
min = Math.Min(y, min);
max = Math.Max(max, y);
}
}
values.Add(min);
values.Add(max);
}
int filter = 4;
for (int x = 0; x ANGLEBRACKETOPEN values.Count - filter; x++)
values[x] = (int)values.GetRange(x, filter).Average();
var f = new System.IO.StreamWriter($@"g:\Computerphile\{file}.txt");
foreach (var v in values)
f.WriteLine(v);
f.Close();
var wf = System.IO.File.OpenWrite($@"g:\Computerphile\{file}.wav");
var RIFF_HEADER = new byte[] { 0x52, 0x49, 0x46, 0x46 };
var FORMAT_WAVE = new byte[] { 0x57, 0x41, 0x56, 0x45 };
var FORMAT_TAG = new byte[] { 0x66, 0x6D, 0x74, 0x20 };
var AUDIO_FORMAT = new byte[] { 0x1, 0x0 };
var SUBCHUNK_ID = new byte[] { 0x64, 0x61, 0x74, 0x61 };
int BYTES_PER_SAMPLE = 1;
int samplerate = 48000;
int channelcount = 1;
int lastv2 = 0;
int stretch = 9;
int datalength = values.Count * stretch * BYTES_PER_SAMPLE;
int byteRate = samplerate * channelcount * BYTES_PER_SAMPLE;
int blockAlign = channelcount * BYTES_PER_SAMPLE;
wf.Write(RIFF_HEADER, 0, RIFF_HEADER.Length);
wf.Write(BitConverter.GetBytes(datalength + 40), 0, 4);
wf.Write(FORMAT_WAVE, 0, FORMAT_WAVE.Length);
wf.Write(FORMAT_TAG, 0, FORMAT_TAG.Length);
wf.Write(BitConverter.GetBytes(16), 0, 4);
wf.Write(AUDIO_FORMAT, 0, AUDIO_FORMAT.Length);
wf.Write(BitConverter.GetBytes(channelcount), 0, 2);
wf.Write(BitConverter.GetBytes(samplerate), 0, 4);
wf.Write(BitConverter.GetBytes(byteRate), 0, 4);
wf.Write(BitConverter.GetBytes(blockAlign), 0, 2);
wf.Write(BitConverter.GetBytes(BYTES_PER_SAMPLE * 8), 0, 2);
wf.Write(SUBCHUNK_ID, 0, SUBCHUNK_ID.Length);
wf.Write(BitConverter.GetBytes(datalength), 0, 4);
foreach (var v in values)
{
double v2 = (v - values.Min()) / (double)(values.Max() - values.Min()) * 255;
for (int x = 0; x ANGLEBRACKETOPEN stretch; x++)
{
double v3 = x / (double)stretch * v2 + (1 - x / (double)stretch) * lastv2;
wf.WriteByte(Convert.ToByte(v3));
}
lastv2 = (int)v2;
}
wf.Close();
}
}
}

Пікірлер: 490
@BillySugger1965
@BillySugger1965 3 жыл бұрын
I’m amazed there was enough information in that image to recreate an intelligible reconstruction!
@Meta11axis
@Meta11axis 3 жыл бұрын
Well, it was a one second audio clip. Discussing the waveforms of songs from artists (like they proceeded to do) which would be some minutes compressed into the horizontal pixel size of the screen is another thing entirely, and quite ridiculous IMO.
@AudryConsol
@AudryConsol 3 жыл бұрын
@@Meta11axis not if at the time of the picture the waveform was zoomed in super far, you wouldn't get the whole song but you might get some snippets, of it
@ngkktht774
@ngkktht774 3 жыл бұрын
he needed 9x stretch to get right speed at 48KHz sampling rate, so the data he had were kind of 5.3KHz sampling rate => bandwidth up to 2.6KHz, which is just about enough for intelligible voice...
@Meta11axis
@Meta11axis 3 жыл бұрын
@@ngkktht774 I love the fact that we deduced the same sampling rate using independent information from this video (see my comment above). Great stuff!
@GJ203
@GJ203 3 жыл бұрын
When I'd thought casually about this before I didn't think it was possible.
@joachim4660
@joachim4660 3 жыл бұрын
Honestly.. I didn't expect such a quality! :D
@EmanuilGlavchev
@EmanuilGlavchev 3 жыл бұрын
Me, too. I know there should be less loss on the lower frequencies... so some of the voicing does come through, but still... I expected a garble!
@BandanaDrummer95
@BandanaDrummer95 3 жыл бұрын
Same, though, retrospectively, I should have expected something like this because the most crucial information for intelligibility is low frequency (more so than high frequency which fills in additional information). Honestly, it sounds about where some of the worse hearing aids I've seen get
@CmdrKeene
@CmdrKeene 3 жыл бұрын
Me neither I would have thought it would have lost too much data, I had even started wondering about higher quality input images, like 4K or far higher. But clearly, it did not take too much to produce intelligible output
@warmachineuk
@warmachineuk 3 жыл бұрын
This is like scraping data from the output of some legacy program and just as horrific a process. Mind you, a good result considering what was lost.
@brod515
@brod515 3 жыл бұрын
what do you mean. I'm not sure what scraping data from a legacy program means.
@lysdexic
@lysdexic 3 жыл бұрын
reminds me of the “Visual microphone” project around 2014 :)
@LiEnby
@LiEnby 3 жыл бұрын
@@brod515 open the program, take a screenshot of it then move the mouse to the buttons and click them 😂
@bandname
@bandname 3 жыл бұрын
As an audio engineer I pondered this for some time. You've answered my question.
@Yaxqb
@Yaxqb 3 жыл бұрын
Ok guys, let's see who this Computerphile video really is ***rips mask off Audiophile***
@Anvilshock
@Anvilshock 3 жыл бұрын
AND HE'D HAVE GOTTEN AWAY WITH IT IF IT WASN'T FOR YOU MEDDLING KIDS!!
@MrJC1
@MrJC1 3 жыл бұрын
what are you talking about how NOT to??? this EXACTLY how I want to start sampling audio. I always am looking for weird sound sources to feed through walls of effects and filters and see what i get out of it. This sounds like something I could get lost in. Oh god... WHAT HAVE YOU DONE COMPUTERPHILE?
@Yaxqb
@Yaxqb 3 жыл бұрын
I imagine this amazing app Encode some song to an image Edit in your favorite photo editor Decode back to audio Profit We could have a whole new generation of music production tech that does weird stuff like adding glow effects, displacements and so on. Sounds like an interesting small mini-research area
@shmunkyman33
@shmunkyman33 3 жыл бұрын
@@Yaxqb Like the audio guy said, this effect already exists basically, you'd use a "bitcrusher" plugin which does this process, just without the intermediate step of taking a picture.
@charlieangkor8649
@charlieangkor8649 3 жыл бұрын
I have working prototypes of: 1) Store audio in digital by printing on B/W laser printer and scanning 2) Store audio in analog by printing on B/W laser printer and scanning 3) Store audio in analog (real analog, with infinitely smooth range of values and noise!) in digital files 4) Store audio in digital in the dithering of an artwork image I'm open to funding.
@Hamachingo
@Hamachingo 3 жыл бұрын
This is like those old cinema film reels where there's literally two visible waveforms running next to the pictures and it's being reproduced as audio. It works surprisingly well.
@cavalrycome
@cavalrycome 3 жыл бұрын
I was going to leave a similar comment. Extracting audio from a visual representation of the waveform on the soundtrack of a film was standard practice in cinema projection so they're kind of re-inventing the wheel in this video.
@themroc8231
@themroc8231 3 жыл бұрын
@@cavalrycome Optical sound was used in movie theaters until the turn of the century. If you remember the "Dolby" logo fancy theaters used to show before the movie, it was to tell the audience they used electronic processing on top of optical sound to minimize the noise resulting from the grain of the film in the quiet parts, when it was more noticeable. Although optical sound was not a sinusoidal representation, if you look at the sound track (as in the sound-carrying track) of a film you'll see that it is a full shape where. I think the width of the shape still translates to loudness, but i am not sure how pitch and tone were expressed.
@JavierAlbinarrate
@JavierAlbinarrate 3 жыл бұрын
the big difference between this and audio on film is bandwidth, for a given second you had many feet of information. Thus you have the intermediate values that are missing in this experiment.
@fllthdcrb
@fllthdcrb 3 жыл бұрын
@@themroc8231 To my knowledge, there were two types of optical sound track: variable width and variable density. In the former, I believe the _envelope_ would be the waveform being reproduced.
@themroc8231
@themroc8231 3 жыл бұрын
@@fllthdcrb I remember seeing an ilustration of a variable density film track in a book, but i seem to remember it was never widely adopted in the movie industry or maybe for a short period of time, though maybe it had more use in other applications.
@jordanlin4437
@jordanlin4437 3 жыл бұрын
I'm actually a bit curious how different the audio reconstruction would sound if instead of what looked like a linear interpolation he instead used the discrete Fourier transform or something. It probably wouldn't have the 8-bit sound but will still sound muffled, but that would be interesting to hear.
@jalvrus
@jalvrus 3 жыл бұрын
I think the main thing you'd get by using something like sinusoidal interpolation to reconstruct it would be to reduce the noise/static. It wouldn't help reconstruct the high frequencies that have been compressed out during the graphing process.
@Pystro
@Pystro 3 жыл бұрын
You can get the low-frequency component from the average of the minimum and the maximum, while the difference of the minimum and maximum gives you the volume of the high frequency sounds - but sadly no information about their pitch. To reconstruct that you'll probably want to do something like this: take white noise (or some other suitable noise), make a hard cutoff for frequencies that are within the sampling rate of your image, amplitude modulate that noise with the spread. The absolute easiest way to include high frequency sounds would be to just pick random values between the minimum and the maximum instead of the linear interpolation.
@Mr099660
@Mr099660 3 жыл бұрын
Quadratic interpolation should be enough, don't think something else would give better results
@nhandao8836
@nhandao8836 3 жыл бұрын
@@jalvrus yeh, on top of that he applied the moving average filter it acted like a low-pass filter and removed more high frequency information.
@3dlabs99
@3dlabs99 3 жыл бұрын
Next step: Train a deep learning network using raw sound files and images of the wave to do it.
@MichaelGrundler
@MichaelGrundler 3 жыл бұрын
I was about to comment the same: throw machine learning at it.
@Slettador
@Slettador 3 жыл бұрын
I doubt this would be very successful because this is in no way a new problem. Sampling audio at a lower frequency (which happens here because of the limited number of pixels in the screenshot image width) and limited bits per sample (due to the limited height of the screenshot image) and trying to improve the fidelity is a signal processing problem and generating more data from what limited data you have without context would be quite difficult for a ML algorithm. You could probably substantially improve it for specific purposes, like if you know you recorded someone's voice you could train the network by using recordings of that same person and that should yield decent results
@3dlabs99
@3dlabs99 3 жыл бұрын
@@Slettador Yeah it would surely work best for speech and if its the same person for sure. It will probably totally break for music for example. Speech is fairly forgiving.
@ijknm2531
@ijknm2531 3 жыл бұрын
and put that DNN on a physical (neuromorphic) chip
@ChrisBrennanSF
@ChrisBrennanSF 3 жыл бұрын
steganographic Rube Goldberg MIDI sequencer
@jordankokocinski506
@jordankokocinski506 3 жыл бұрын
This is hilarious. And cool! When I was younger, I remember discovering a feature in Audacity where you could import an image file to audio. I took a screenshot of a sample of audio and imported it, but it was rubbish. This program you've written, however rough, is surely more nuanced. The results were quite surprising to me! I did not expect it to resemble the original sound as much as it did.
@adamsbja
@adamsbja 3 жыл бұрын
Back in the late 90s I heard chatter about an espionage technique where they could point a laser at a window and get the vibrations of people talking inside. My dad's workplace had to have special panes put in to dampen that effect (I grew up in an interesting town). This is a great demonstration of that idea, via a different process.
@fakename8749
@fakename8749 3 жыл бұрын
You talk about bit depth at length, but sampling frequency also matters. The screenshot you've got is probably less than 1000px wide, which means 1000 measurements to work with, which isn't nearly enough according to Nyquist-Shannon.
@Computerphile
@Computerphile 3 жыл бұрын
Actually it's closer to 4000 as I grabbed off twin HD monitors :)
@theIpatix
@theIpatix 3 жыл бұрын
Nyquist Shannon shouldn't be a problem as long as the signal is properly low pass filtered before it's displayed on the screen (which I'd agree that it most likely isn't, I mean, who cares about the visual fidelity of sound). I wonder how well it would work with properly rendered smooth lines on screen (and an algorithm that can recover from those).
@ObsidianJunkie
@ObsidianJunkie 3 жыл бұрын
To be able to perfectly recreate the signal you need a sampling frequency that is 2x the highest frequency component in the original signal, but you can still get a half decent approximation with a really low sampling frequency, as evident in the result.
@kc9scott
@kc9scott 3 жыл бұрын
I was also really surprised that it was intelligible. My early assumption was that the screen resolution would be so low that you could really only use it to volume-modulate noise. But with 4000 measurements for the short time length of the sample, it is enough to reproduce the fundamental frequency of his voice. There will be lots of aliasing of the higher frequencies, but for recognizability of speech, the aliasing won’t matter (which actually isn’t surprising at all).
@widicamdotnet
@widicamdotnet 3 жыл бұрын
Yeah, the reproduction is intelligible speech at a sampling rate somewhere between 1-4 kHz (a ~1s snippet at around 4000 samples), but still much worse than "telephone" quality which routinely gets lowpassed to about 4 kHz and thus would be perfectly reproduced at 8 kHz. It's not surprising that it worked as well as it did, but still a fun experiment and a nice video :-)
@AI7KTD
@AI7KTD 3 жыл бұрын
I think tweaking the interpolation (using sin(x)/x for example) would drastically increase the quality of the recovered clip.
@neilloughran4437
@neilloughran4437 3 жыл бұрын
I guess the extra data you insert in between samples is "interpolation",,, I recall Roland did some quite advanced interpolation on their 30khz/12 bit samplers to actually modify intermediate point to point values in an intelligent way... i.e. a sine wave would retain it's inherent shape and wouldn't be a bunch of straight lines.
@domminney
@domminney 3 жыл бұрын
I was looking for the word dither but it escaped me!
@neilloughran4437
@neilloughran4437 3 жыл бұрын
@@domminney I wonder what the sound quality would be like if the sampled graphic could be interpolated with smooth sine like wave between the points... probably too much math for my brain to know :D. Hopefully cleverer people than me commenting :D
@neilloughran4437
@neilloughran4437 3 жыл бұрын
From Roland W30 manual (circa 1989) "... there was a need for a reliable way of "filling in" the spaces between points sampled. Roland has succeeded in developing a way of carrying out such high-speed calculations , and provide intelligent interpolation for the imaginary points lying between sample points. The sampler looks well beyond the points in question for information, and makes its calculations using the leading -edge technique known as differential interpolation. As a result, noise is much less ikely to even appear, assuring high quality sound."
@domminney
@domminney 3 жыл бұрын
@@neilloughran4437 for the sake of speed I made it linear, I've not actually watched the final edit yet as I'm on a zoom call but we did chat about filling in the data with better curves.
@neilloughran4437
@neilloughran4437 3 жыл бұрын
@@domminney cool!
@SkyOctopus1
@SkyOctopus1 3 жыл бұрын
As I understand it, the film used in cinema projectors has the audio recorded both as a visual analogue wave file and digitally. Teeny tiny wave images squished down the side of every frame.
@danieljensen2626
@danieljensen2626 3 жыл бұрын
Sort of, but they have a much easier time recovering the signal because you can just shine a light through the film onto a photodiode, and get the audio signal out as a voltage. And of course the strip of film is pretty long so you have much better frequency resolution than with this squashed audio clip in the picture.
@AaronOfMpls
@AaronOfMpls 3 жыл бұрын
@@danieljensen2626 Yup, and I imagine the light shines through a narrow slit so the photodiode is only "seeing" a tiny fraction of a second at a time. With 35 mm film running at 24 fps, 1 second of audio will be spread out across 18 inches / 46 cm of film. As for digital sound-on-film, that works much like a QR code, which gets scanned inside the projector. SDDS was printed as a long strip on the edge of the film, and Dolby Digital was printed between the sprocket holes. Meanwhile, DTS audio was stored on a CD, and a time code (which kept it in synch) was recorded on the film as a dashed line next to the analog soundtrack. Wikipedia has a picture of all of this on the "35 mm movie film" article ("File:35mm film audio macro.jpg").
@thekaxmax
@thekaxmax 3 жыл бұрын
Note on quality: human voice is a /lot/ easier to understand, and you don't need tone quality, which you do for music
@brod515
@brod515 3 жыл бұрын
I think the audio quality could be somewhat increased if the interpolation for stretching used cubic instead of linear. Since the audio waves are actual sine waves.
@danieljensen2626
@danieljensen2626 3 жыл бұрын
Or you could just upsample with an FFT.
@eDoc2020
@eDoc2020 3 жыл бұрын
Or don't interpolate on the time axis. Instead of stretching each sample 9 times to reach 48000Hz sampling just output a WAV file with 5333Hz sampling. Then all the interpolation needed for playback is done by the audio software. Audio files with crazy weird sample rates still play back fine on modern systems.
@brod515
@brod515 3 жыл бұрын
@@danieljensen2626 how does that work?
@SentientTent
@SentientTent 3 жыл бұрын
@@brod515 I think he is referring to a finite Fourier transform. Which is a way of approximating a signal by adding together sine waves with varying frequencies.
@brod515
@brod515 3 жыл бұрын
@@SentientTent I've heard of fourier transforms I'm just not sure how it works to upsample. It was my understanding that a fourier transform can take a signal and extract the individual sinusoidal frequencies that make up the signal. so if you apply it to the signal in question would we extract the individual frequencies then combine them as sine waves (thus upsampling).
@RelianceIndustriesLtd
@RelianceIndustriesLtd 3 жыл бұрын
So this how phone companies transmit audio in phonecalls
@SentientSeven
@SentientSeven 3 жыл бұрын
This was great! Also, Dave knows how to set up his camera, great quality video
@busTedOaS
@busTedOaS 3 жыл бұрын
Oh my, C# code in the description... I'm falling in love all over again.
@MrOliver1312
@MrOliver1312 3 жыл бұрын
I'm honestly not too keen on the syntax, most of it's okay, but the amount of brackets in any nesting just gets messy
@domminney
@domminney 3 жыл бұрын
C# is not what I do day to day 😉
@notimportant7682
@notimportant7682 3 жыл бұрын
I would have looked at that waveform and thought there was no way to recover anything but amplitude modulated noise with the that technique, thinking harder about it I understand why it worked
@woulg
@woulg 3 жыл бұрын
Same here, when he played it back and it worked it completely blew my mind. What a great episode
@notimportant7682
@notimportant7682 3 жыл бұрын
​@@MusicAtAlbionCollege forget what he said about filling it in later, I think if he did any post processing at all it may have been amplifying what used to be the high frequency sections, but the frequencies he gets I believe come directly from the averaging of the min and max values over the x axis, essentialy acting like a lowpass.
@shmunkyman33
@shmunkyman33 3 жыл бұрын
@@MusicAtAlbionCollege Well, all a waveform is is amplitude data. It's just a sampling of the amplitude of the air pressure over time, so the only difference in this setup is that the number of samples has been reduced. The frequency data is just an emergent property of the amplitude changing over time, so as long as he is able to match the time scale over which the amplitudes change (which he does with that "scale" variable), the frequencies will come out roughly the same (just filtered a lot due to the loss or corruption of information).
@vladpuha
@vladpuha 3 жыл бұрын
very educational. thank you! Please have an extended interview about audio headers and how to work with sound with some sample.
@EighteenCharacters
@EighteenCharacters 3 жыл бұрын
I love this episode! This is amazing!
@joseortiz_io
@joseortiz_io 3 жыл бұрын
Unbelievably awesome! So creative. I love it!
@kieran.stafford
@kieran.stafford 3 жыл бұрын
Love this video. I watched fascinated. Bloody awesome result guys. It'd be very interesting to try this with an ultra high definition image file to see how far you can push the boundaries of quality. Again brilliant video. Many thanks
@rojasbdm
@rojasbdm 3 жыл бұрын
Went much better than I expected!
@patricknelson
@patricknelson 3 жыл бұрын
Such an amazing result... I seriously thought it’s just be weird loud buzzing sounds. I’m surprised you could tell what was being said. Well done!
@brettbreet
@brettbreet 3 жыл бұрын
"The last V8. Return to base immediately!"
@Computerphile
@Computerphile 3 жыл бұрын
Oh my goodness, that was it! -Sean :) (there was a bug that meant you could slip sideways through the map and cheat)
@mortenohlsen7834
@mortenohlsen7834 3 жыл бұрын
I thought it would be Space Taxi or Impossible Mission he was remembering. Though the code using lastv2 made me think of The Last V8 though not remembering voice, just the soundtrack.
@ronnetgrazer362
@ronnetgrazer362 3 жыл бұрын
@@mortenohlsen7834 We have a visitor. Stay a while... staaay foreverrrr.
@deoxal7947
@deoxal7947 3 жыл бұрын
Hm what's this now?
@kellerkind6169
@kellerkind6169 3 жыл бұрын
I thought it was: GHOSTBUSTERS! MUAHAHAHAHAHAHAHA! P.S: Or "Stay a while, stay forever !" maybe ;-)
@robertbass682
@robertbass682 3 жыл бұрын
I will share this with my Computer Science students when we do our audio editing unit. I have had them try to generate samples from a Fourier analysis graph of an instrument playing a single note, but never from a scrunched up wave form. This may help drive home what sampling really is all about, at least numerically. Should be fun!
@SSJfraz
@SSJfraz 3 жыл бұрын
That's outstanding. Great work.
@LegitJDG534
@LegitJDG534 3 жыл бұрын
Impressive results, I didn't expect to be able to roughly make out different the syllables. makes me wonder if you could convert the waveforms generated via musical instruments, approximate the value being played and generate a midi file of the audio segment.
@Kinglink
@Kinglink 3 жыл бұрын
As I started. "This won't work." As I finished. "Damn! Let's go see the code." I always assumed the visual representation was just an approximation of the wave, or something else. But wow it actually worked! One of the best videos, and that's saying something.
@evgenysavelev837
@evgenysavelev837 3 жыл бұрын
This has been done before. It is called Shannon-Nyquist theorem, the way to restore the sound to the best quality possible is to use sinc approximations. Sinc is a short for sin(x)/x. There will be problems with aliasing, which is something you will never be able to correct for.
@MichaelAddlesee
@MichaelAddlesee 3 жыл бұрын
Yes, just what I was going to say. But given the squashed up waveform finding the actual 48kHz sample points is the real problem.
@evgenysavelev837
@evgenysavelev837 3 жыл бұрын
@@MichaelAddlesee Yep, I would also dare to say it is impossible to restore high frequency signal after it has been downsampled thus way (or any other way).
@mattstegner
@mattstegner 3 жыл бұрын
I work on Audition (I'm a Quality Engineer) at Adobe and just shared this with my team who will all get a kick out of it. Great video.
@ClearComplexity
@ClearComplexity 3 жыл бұрын
Would be interesting to get a high-resolution macro/microscopic shot of a record and emulate a needle tracing the groove to generate a wave file. You would need a high-quality image with great lighting and a fair bit of processing to get a clean groove guide though.
@BytebroUK
@BytebroUK 3 жыл бұрын
I'd watch that. This whole idea that started out as a bit of a joke has proved really interesting!
@supahfly_uk
@supahfly_uk 3 жыл бұрын
KZbin has stopped recommending computerphile vids they are literally the highlight of my life lol.
@schifoso
@schifoso 3 жыл бұрын
This was very interesting. I hope you do more videos with Mr. Fowler as he's an excellent presenter.
@emuccino
@emuccino 3 жыл бұрын
Dave: "At the end of the day, everything becomes a list of numbers.." Me: *exstitential crisis* 😳
@EdwardMillen
@EdwardMillen 3 жыл бұрын
He should have added "in computers". Well, and in maths I guess. And I suppose maths is... oh... nevermind
@matiascardullo9892
@matiascardullo9892 3 жыл бұрын
I mean, you can decompose each quark in your body into an array of coordinates xyz
@domminney
@domminney 3 жыл бұрын
I’m assuming that “in computers” was implied by the context, but one could argue it in the real world too
@DrorF
@DrorF 3 жыл бұрын
@@EdwardMillen Math is not just about numbers. In fact, numbers are just a small part of it, to my understanding.
@MichaelNatrin
@MichaelNatrin 3 жыл бұрын
So cool. Thanks for sharing your process!
@shadowwalker23901
@shadowwalker23901 3 жыл бұрын
I have a feeling I was the only one thinking switch it over to frequency domains..aka spectrogram no wasting so much data in a picture. Using a 512x512 picture you could store a 6 second 44100khz 8bit sound clip in mono with grayscale and stereo with color.
@charlieangkor8649
@charlieangkor8649 3 жыл бұрын
or simply dump raw .mp3 data into a matrix of 512x512x3 bytes (768 kB) and encode it as PNG. No need for spectrograms.
@tramsgar
@tramsgar 3 жыл бұрын
Nice new practice to paste code in the description! 👍
@cokpot635
@cokpot635 3 жыл бұрын
This is really interesting. Good job!
@cazino4
@cazino4 3 жыл бұрын
Super interesting!!! Great video!
@grimreboot
@grimreboot 3 жыл бұрын
Excellent project, thank you for the video!
@Md2802
@Md2802 3 жыл бұрын
The rights to a piece of recorded music are generally split between (1) the publishing, i.e. who wrote the song, and (2) the recording, i.e. who paid to have it recorded. Both have clearly defined ownership, and the photographer (or magazine / stock photo site / whatever) would not have the right to sell licenses for either.
@JeffBlaine
@JeffBlaine 3 жыл бұрын
"Stay a while. STAY FOREVER!"
@AdriaanZwemer
@AdriaanZwemer 3 жыл бұрын
HAHAHAHAHAAAaa
@nohjrd
@nohjrd 3 жыл бұрын
Haha, I was just going to comment that (but starting from "Another visitor..."). The was also "Get him my robots"
@ryan8488
@ryan8488 3 жыл бұрын
Yes!
@ryan8488
@ryan8488 3 жыл бұрын
Ghostbusters ahahahaha
@JeffBlaine
@JeffBlaine 3 жыл бұрын
@@ryan8488 Did that also have crude speech synthesis? My quote was from Impossible Mission
@Turbo3032
@Turbo3032 3 жыл бұрын
Isn't this basically a simpler version of what people have done to get audio from the vibrations in glass in a video?
@fisch37
@fisch37 3 жыл бұрын
It is similar, but the latter is probably a lot more complicated and honestly I have never heard of that happening. You would need a pretty high resolution for that
@victorbarroscoch
@victorbarroscoch 3 жыл бұрын
@@fisch37 Resolution shouldn't really matter that much. You might need a high speed camera though, the normal frame rate for video is 24-60 fps. With that you can only reproduce sounds that are 30Hz or lower (without running into aliasing issues).
@cadekachelmeier7251
@cadekachelmeier7251 3 жыл бұрын
I think the process is really analogous. In this case the sound is compressed by the image resolution. In the vibration video it's compressed by the video framerate. There were more tricks that they pulled out with the audio from video thing though like using the rolling shutter to get higher temporal resolution than you'd originally expect.
@1SmokedTurkey1
@1SmokedTurkey1 3 жыл бұрын
@@fisch37 Check out veritasium. It's been done. He has a video about it.
@realcygnus
@realcygnus 3 жыл бұрын
Cool
@pokepress
@pokepress 3 жыл бұрын
A ways back I remember Weird Al posting the picture of a waveform of a song he was working on. It was way longer than this, so at best you might have been able to get the lowest of frequencies out of it this way.
@thydevdom
@thydevdom 3 жыл бұрын
This was REALLY interesting!
@my4trackmachine
@my4trackmachine 3 жыл бұрын
HAHAHA I love this. I was surprised there was enough resolution to get it that clean. I can see this conversion being a rad VST for sound processing.
@gammaray0wn
@gammaray0wn 3 жыл бұрын
Really cool video! What you didn't touch on is how this shows how amazing the human brain is at interpreting human speech. Not only are our ears most sensitive to frequencies that match those of human speech, our brain can also extrapolate words and meaning from even heavily distorted and low information pitch, volume, and intonation content!
@horurkristinsson5292
@horurkristinsson5292 3 жыл бұрын
I remember an Amiga program called Octamed (tracker from '90) had ability to draw a waveform with the mouse onto a grid and use it in your song.
@Roxor128
@Roxor128 3 жыл бұрын
Fast Tracker II on MS-DOS can do that, too. Maybe Triton took inspiration from the earlier Octamed?
@elimalinsky7069
@elimalinsky7069 3 жыл бұрын
Back in the 1980s some local radio stations were broadcasting computer programs over the air for people to tape and load up on their Spectrums and C64s. Mostly in the UK, where audio tapes as computer data storage were in use the longest.
@veggiet2009
@veggiet2009 3 жыл бұрын
Sean thought 8bit voice, I thought about the very first recording cylinders
@davidyu1813
@davidyu1813 3 жыл бұрын
The reproduced audio reminded me of the English listening tests I took in high school
@fllthdcrb
@fllthdcrb 3 жыл бұрын
Sounds like it was torture.
@geiger21
@geiger21 3 жыл бұрын
as I Pole I can relate to that xD questionable voice quality + the shittiest boombox + super reverby class room. Boom, English lesson in Polish school xD
@jeromethiel4323
@jeromethiel4323 3 жыл бұрын
I remember a game from the 80's called sea dragon. And the start of the game had audio that said "sea dragon" 3 times, each speeded up. And that was over an Apple 2 speaker. So basically 1 bit audio, as the speaker was clicked one way, then the other. But at the time, amazing!
@erifetim
@erifetim 3 жыл бұрын
The outcome is much better than I've expected, would've loved to hear more examples
@Computerphile
@Computerphile 3 жыл бұрын
I bet Dave would do you one if you contact him :)
@grover-
@grover- 3 жыл бұрын
08:50 - that is truly amazing! I thought it would just be noise. Maybe next time you could consider the hue of the pixel and deduce an intermediate value from it to add more resolution? It's a new form of data exfiltration too. Sending voice recordings in images.
@AuthenticTerrificRickCastle
@AuthenticTerrificRickCastle 3 жыл бұрын
was listening to the conversation you've had in the very end - you can make a prompter-like DIY contraption that will help you look straight into the face of the person that you are talking to
@KrisCalabioMusic
@KrisCalabioMusic 3 жыл бұрын
I wanna see LegalEagle do an episode about that!
@tipx2master788
@tipx2master788 3 жыл бұрын
He is an American lawyer
@maighstir3003
@maighstir3003 3 жыл бұрын
Or LawfulMasses
@dougsteel7414
@dougsteel7414 3 жыл бұрын
Can't remember what it was called, years ago on my mac I had a shareware app that did this, you could do stuff like use photoshop to make reverb, do weird stuff like slight rotation
@recklessroges
@recklessroges 3 жыл бұрын
Dude! I understand the defensiveness, but it's totally not needed for me; it's a joy to see some actual practical, (not just an example) programming back on Computerphile. [where I will poke fun] "Would have been better if it was written in rust."
@taylorh140
@taylorh140 3 жыл бұрын
To me, it sounds like it might be partially due to the triangle waves (linear interpolation between two points). Pretty common in the old video games.
@AndersJackson
@AndersJackson 3 жыл бұрын
Around 13:00 you are talking of Nyqvist frequence, and there are a lot filtering of the original sound when you convert this.
@lagduck2209
@lagduck2209 3 жыл бұрын
That sounds different than just bitcrusher/lower samplerate. Linear interpolation bring some nice distortion too. Wold be pretty neat to have this as VST with width/height control, linear/cubic/parabolic option for interpolation, parameter for descretion of pixel brightness. (probably wouldnt work realtime well, but I can Imagine such procedural tool totally fine)
@EvolWe
@EvolWe 3 жыл бұрын
Impressive! this reminds me of "The Visual Microphone: Passive Recovery of Sound from Video" where they extract sounds from objects like plants using high speed cameras and micromovements.
@lucidmoses
@lucidmoses 3 жыл бұрын
Well, that one was just amazing.
@MubashirullahD
@MubashirullahD 3 жыл бұрын
Impressive. I didn't know you could do that. I assumed there would be too much overlap.
@bluerizlagirl
@bluerizlagirl 3 жыл бұрын
The height of the waveform will give you the bit depth. For instance, if the difference between the lowest and highest points was 400 pixels, then the resolution is somewhere between 8 and 9 bits (which would allow for 256 and 512 steps respectively). And the amount of time represented by one pixel in the horizontal direction is the inverse of the sampling rate. For instance, if one pixel represents 100µs, then the sampling rate is 1 000 000 µs in a second / 100 = 10 000 samples/second = 10kHz. Even easier, the original sample rate can be got by dividing the final sample rate by the time stretch factor.
@marksterling8286
@marksterling8286 3 жыл бұрын
Great video, very surprised about the quality of the output. Really interesting
@carlociarrocchi2793
@carlociarrocchi2793 3 жыл бұрын
I did something like that a few years ago. Instead of an audio I was trying to recover as much information as possible from the picture of a graph showing a single continuous line.
@rillloudmother
@rillloudmother 3 жыл бұрын
I could really use this. I deleted an audio file years ago but i still have the peak file for it. and i was always hoping that it would somehow be possible to reconstruct the audio from the peak file.
@Ping727
@Ping727 3 жыл бұрын
I just realized listening to this that computerphile is like computer file...
@Slarti
@Slarti 3 жыл бұрын
Visual Studio C#, I will therefore forgive you of any less than perfect code :) The best IDE and the most straightforward programming language and framework.
@BaronVonTacocat
@BaronVonTacocat 3 жыл бұрын
Sweet! Reducing the amount of bits would anonymize one's voice.... or you could stick to the arnold swarzenegger soundboard i suppose.
@lgmuk
@lgmuk 3 жыл бұрын
I learned a lot, thanks!!!
@christoffermedc
@christoffermedc 3 жыл бұрын
wow i can't understand how sound is representable that well in just a 2d picture, amazing!
@DrakiniteOfficial
@DrakiniteOfficial 3 жыл бұрын
I'm very impressed.
@colinstu
@colinstu 3 жыл бұрын
1:00 vs 8:42 ... amazing work. I've actually wondered if this was possible years ago, stunning to actually seen it done! But yeah, I was imagining this done with like a 3min song, squished to about that same size... that would lose way more depth I'm sure.
@danieljensen2626
@danieljensen2626 3 жыл бұрын
Yeah, this clip is only like a second long. With a 3 minute song it would sound at least 180 times worse, haha.
@Lemon_Inspector
@Lemon_Inspector 2 жыл бұрын
Dave there in the thumbnail is giving me a look like if I sample audio this way, I'm gonna end up in 6 pieces at the bottom of the nearest lake.
@ZT1ST
@ZT1ST 3 жыл бұрын
Is there a recommended area to lookup the documentation on how to derive the hexadecimal number arrays for the RIFF_HEADER to SUBCHUNK_ID ? Mainly asking because while I know those appear to be doing some backend stuff to get the wav file to work, I'm curious as to what other options are available for those, and/or if there's a matter of necessity to those values for .wav specifically.
@nielsdegroot9138
@nielsdegroot9138 3 жыл бұрын
The sound reminded me of Impossible Mission on the C64. @8:45. Good memories.
@cpt_nordbart
@cpt_nordbart 3 жыл бұрын
Stay forever!
@flochartingham2333
@flochartingham2333 3 жыл бұрын
Did he call Audacity; "old dusty?"
@karatsurba4791
@karatsurba4791 3 жыл бұрын
Yes
@domminney
@domminney 3 жыл бұрын
🤣🤣🤣 my saaarf east London accent getting in the way there
@iau
@iau 3 жыл бұрын
It is. Its usability is almost none.
@Anvilshock
@Anvilshock 3 жыл бұрын
Well, it is. Looks worse than Netscape Navigator 1, it does.
@ZipplyZane
@ZipplyZane 3 жыл бұрын
@@iau That's like saying GIMP is unusable. It's just not the highest end product. But, for many cases, you don't need the high end. You just want to record audio, apply some effects, and be done. Though I do hope they'll eventually make crossfading easier. It should be a lot more automatic, and not only in the special live recording mode.
@kalakxfif9473
@kalakxfif9473 3 жыл бұрын
very interesting! learned something new today
@rene0
@rene0 3 жыл бұрын
I.. did not expect that. I's expecting 'can maybe barely make up' not like 'can even identify the person talking'... Amazing.
@kasuha
@kasuha 3 жыл бұрын
The audio was just lacking high frequencies, low number of bits on amplitude is not such a big deal. I wonder if instead of interpolating through the range for each column, replacing the interval with appropriately scaled white noise wouldn't help.
@lysdexic
@lysdexic 3 жыл бұрын
Fun project! Sounds like the reconstructed output is crushed to 4-8bit - wonder if the Adobe sampling rate for the wave plot is 1:8 or something (plot 1 out of every 8 samples) maybe it’s being ring modded by your new sampling rate too. How many samples (divided by 9) is your output file?
@webdev10000hours
@webdev10000hours 3 жыл бұрын
Could you guys do a video on the technicals/computer mechanics behind the creation of complex games such as Apex Legends, Call Of Duty, Fortnite etc?
@F1ghteR41
@F1ghteR41 3 жыл бұрын
I think that Leonard French would be the guy to ask this, he has a engineering background and he's a copyright attorney.
@IlluminatiBG
@IlluminatiBG 3 жыл бұрын
Well wave us just a graph that can be represented in 2D image with 65536px height (assuming 16 bit) and 44100px (assuming 1 second for 44100Hz sampling) and 1 bit color depth, which would still be way more info than WAV file, as wave is 1D array, not encoding any pixels for the empty part of the graph, it will still be better than PNG compressed image.
@n-o-i-d
@n-o-i-d 3 жыл бұрын
very cool experiment
@Valeriano.A.R
@Valeriano.A.R 3 жыл бұрын
Could the audio be sampled as image using different color intensities for different subsampling frequencies? may this yield better audio quality in the on the reconstruction?
@3DCGdesign
@3DCGdesign 3 жыл бұрын
Not surprising as I’ve already seen where someone turned their sound wave “I love you” or something like that into a piece of wall sculpture and then you can scan it with an app to hear what the wave contains.
@Nejvyn
@Nejvyn 3 жыл бұрын
Does that work with any depiction of a soundwave tho or only with those sculptures? If it's the latter I'd assume that the app is just checking the pattern via a library and link it to the stored audio file on the manufacturer's servers or sth like that.
@3DCGdesign
@3DCGdesign 3 жыл бұрын
@@Nejvyn I was under the impression that the app could read any soundwave - but I did not investigate further. You could be right.
@jameswyatt1304
@jameswyatt1304 3 жыл бұрын
Interested in how large the binary or compressed-text versions would be...
@danieljensen2626
@danieljensen2626 3 жыл бұрын
He has something like 4,000 samples. Stored as 8bit integers (unsigned) the file would be 4kB (8 bits is 1 byte), plus a maybe few bytes for a header. Actually a bit less because 1kB is 1024 bytes, but I can't be bothered. You could even compress that, but I'm not sure why you'd bother for this short clip. My guess is most audio file formats do pretty much store the data like that, but with a higher sample rate and more bits per sample (CD's are usually 24bit). And I know most use compression other than raw wav files.
@davidg5898
@davidg5898 3 жыл бұрын
I'd say the aspects of music copyrights to be concerned about here -- at least in the USA -- fall between broadcast and performance rights. Basically, you wouldn't violating any rights merely by converting the image data into audio and listening to it privately, but if you shared the audio with others then you'd be treading into copyright infringement territory. Extracting audio from a photograph doesn't qualify the audio as a derivative piece from the photograph because you're changing the media type and thus the set of rules governing the rights. For example, sheet music vs. radio broadcast vs. live performance vs. using a song in a movie/show/video/commercial each have different set(s) of licensing/rights involved (with some overlapping). That said, if you wholly owned the rights to such a picture, you might be able to skirt the law by publishing the picture and also publishing your method by which audio can be extracted from it, so long as you're not actually doing the conversion for anyone or sharing anything you've personally extracted with anyone. It's no guarantee, though, because copyright suits can involve a lot of interpretation and a clever lawyer could still sway a judge/jury against you. I am not a lawyer, but have done an immense amount of research into these parts of music copyright law due to public programs an employer put on that I was involved with carrying out.
@jan_h
@jan_h 3 жыл бұрын
Another visitor. Stay a while. Stay forever!
@lambdaprog
@lambdaprog 3 жыл бұрын
Next enhancement: Measure the instantaneous frequency and use it to generate a new wave form based on the instantaneous amplitude. This will effectively ressucitate the lost phase information using a few assumptions on the human voice. Have fun!
@Lolwutdesu9000
@Lolwutdesu9000 3 жыл бұрын
Instantaneous frequency 😂
@SandeepChatterjee66
@SandeepChatterjee66 3 жыл бұрын
very interesting idea
@phoenixdk
@phoenixdk 3 жыл бұрын
I'm amazed at how well this worked. It would be fun to push it to complete failure, like 10 seconds of music, several instruments, in a 4000 px wide image.. would it even be tonal at that point? On the other hand, bit depth could be improved by simply enlarging the y-axis on the screengrab, which might improve articulation. And as mentioned, the code could be tweaked and improved further.
@mbarrio
@mbarrio 3 жыл бұрын
Would love to know the resolution of that png, assuming a 1080p monitor that would be a 1kHz audio, right?
@GodwynDi
@GodwynDi 3 жыл бұрын
I think he replied to another comment saying the picture was from a dual monitor setup.
Why Files Become Bigger in Emails - Computerphile
18:17
Computerphile
Рет қаралды 65 М.
How Digital Audio Works - Computerphile
12:25
Computerphile
Рет қаралды 261 М.
Каха и суп
00:39
К-Media
Рет қаралды 6 МЛН
Despicable Me Fart Blaster
00:51
_vector_
Рет қаралды 22 МЛН
THEY made a RAINBOW M&M 🤩😳 LeoNata family #shorts
00:49
LeoNata Family
Рет қаралды 42 МЛН
Formatting PowerShell Output
32:26
Tech Talks with Navneet
Рет қаралды 7
RWALL (Remote Write to All) - Computerphile
13:52
Computerphile
Рет қаралды 44 М.
Cracking Enigma in 2021 - Computerphile
21:20
Computerphile
Рет қаралды 2,4 МЛН
Why do calculators get this wrong? (We don't know!)
12:19
Stand-up Maths
Рет қаралды 2,1 МЛН
Random Boolean Networks - Computerphile
11:51
Computerphile
Рет қаралды 62 М.
Using AI to Create the Perfect Keyboard
12:05
adumb
Рет қаралды 1,4 МЛН
A problem so hard even Google relies on Random Chance
12:06
Breaking Taps
Рет қаралды 1,1 МЛН
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 880 М.
Каха и суп
00:39
К-Media
Рет қаралды 6 МЛН