But how can it fit so much music? (The FLAC Codec #2

But how can it fit so much music? (The FLAC Codec #2 - Lossless Audio Compression)

Рет қаралды 6,205

Күн бұрын

Пікірлер: 28

@ktmf2 2 жыл бұрын

Thanks for making an educational video on FLAC :) I hope this is not too blunt, but I'd like to point out a few problems in it :( 1) FLAC does not actually use Exp-Golomb coding, it works with Rice coding. The idea is similar, but it is ideal for a different distribution of residuals. Rice coding is actually simpler. To convert the residual samples to Rice symbols, we take a certain Rice parameter (I'll not discuss how to find the optimal parameter) and split the number according to it. If we need to store the number 9 (1001 in binary) with Rice parameter 3, we split that binary number in a most significant part (1) and a least significant part (001). The Rice parameter determines how many bits go into the least significant part. The 1 is stored unary (so becomes 01) and the rest is stored directly. A decoder can parse this, because the unary part is self-terminating, and the number of bits that follow the unary number is known. The number 7 (111) becomes the symbol 1111 (unary 1, binary 111), the number 34 (100010) becomes 000000001010. So, instead of a 3-part code, we only have a 2-part code. 2) At 15:11 you remark that higher order polynomials take more time to compute and more warm-up samples, but this is often not the reason lower order fixed predictors are used. The problem is that all derivations you do up to that point assume a stationary signal, but this is often not the case. The high coefficients used in higher order predictors make them increasingly sensitive to noise. This often results in the noise getting 'amplified', and the residual getting larger instead of smaller when compared with a smaller order predictor. 3) At 15:37 you say that the factorial divisors are missing, and that these are left out because we're not doing exact math. I don't think that is the explanation, I rather think it is a result of using Taylor expansions (which assume continuous data) on sampled data. If you follow the rationale written down at www.ietf.org/archive/id/draft-ietf-cellar-flac-07.html#name-fixed-predictor-subframe there is no missing factor. For example, to derive the third order fixed predictor, it makes sense to start from the last known sample, the last known first order discrete derivative and the last known second order discrete derivative. We add these three together. The last known sample is s[n-1]. The last known first order discrete derivative is s[n-1] - s[n-2], i.e. the difference between the last two samples. The last known second order discrete derivative is ((s[n-1] - s[n-2]) - (s[n-2] - s[n-3])), i.e. the difference between the last two first order discrete derivatives. If we put that all together we get 3*s[n-1] - 3*s[n-2] + s[n-3]. So, I don't know where the factorials went, but just trying to explain this without using a Taylor expansion seems to work fine. I hope this helps you :)

@kleinesfilmroellchen Жыл бұрын

Thank you very much, this means a lot to me! Regarding 1), this doesn't sound new to me, and I think it's just another interpretation. For my own implementation I did follow the "Rice coding" interpretation as the spec also does, since it's of course significantly simpler in software. I wanted to have a discovery process that I was leading the viewer along, and the Exponential Golomb path seemed more natural. Regarding 2), thanks for pointing that out, I did not properly research why and how orders are chosen (and in fact I still don't know since I have not (yet) written an encoder) Regarding 3), this was an oversimplification made to explain via Taylor Expansions, which I think quite some viewers will be familiar with. I have done the derivation myself as per the rationale you linked and found that there's nothing missing, but this is again where I wanted to have a discovery journey as well as a hopefully-familiar angle of approach. I will keep this as the pinned comment so viewers can read it as well :^)

@necroowl3953 7 ай бұрын

@@kleinesfilmroellchenplease put out part 3, you're doing this so well

@artanisax1534 Жыл бұрын

Holy what a fantastic elaboration and wanting ep. 3 desperately (it has been a year :

@PaulAup62 Жыл бұрын

Incredible series. I am NOT a math person but i found the whole thing incredibly interesting. Cant wait for part 3 if that ever comes out! Very very good work.

@mylegalname9852 27 күн бұрын

Here from the comments on the QOI video. Really glad I read the comments over there! This video is great.

@Kai3Music 2 жыл бұрын

Fantastic. When can we expect part 3?

@xDnator Жыл бұрын

Crazy quality video! I was shocked seeing you only have 1000 subs

@grggrgrgg 2 жыл бұрын

Looking forward to part 3 :)

@jayapayax 3 ай бұрын

Great content! I hope you can upload part 3 soon.

@Live-ws3tl 2 жыл бұрын

Thanks for the overview! I'm looking forward to the final video and also something about how to write a codec for it

@ianbolfa 2 жыл бұрын

Great content. Also I loved the animations and the editing.

@kleinesfilmroellchen 2 жыл бұрын

Source code is now available! Link in the description.

@templumsomnia Жыл бұрын

Thanks! You saved me a lot of time!

@phanihishi Жыл бұрын

Amazing. Thank you very much.

@uneston 2 жыл бұрын

part 3?

@juanchirino7135 Жыл бұрын

Exelent explanation!

@savantshuia Жыл бұрын

part 3 when?

@adamkatav9752 2 жыл бұрын

Great video, very organized

@kulikgabor7624 7 ай бұрын

Thank you for all the insights! I don't quite understand the mid side part of the video. If the left and the right samples are 16 bit integers, the sum will fit only into 32 bit integer which is big. If I divide them by 2 that will fit but the result gets rounded like if the left is 0 and the right is 1 the result would be 0 just like if they were both 0. How could I do this properly and take advantage on the side part being very small and keep the mid part small as well?

@user-vn9ld2ce1s Жыл бұрын

What a great content

@andrsfch 3 ай бұрын

wheres part 3 ?

@AlvaroALorite 14 күн бұрын

21:23 "all numbers in binary begin in 1"not entirely true; that's always true only in the case that those are "positive non-zero integer (natural) numbers" AND they are stored as *big endian* , right? You sort of imply it later, but I feel like it should be clarified :)

@yasunakaikumi 3 ай бұрын

still waiting for part 3

@kleinesfilmroellchen Ай бұрын

you're not the only one! i promise it will happen eventually, but you know how motivation is

@priwncess 6 ай бұрын

FLAC=feminine, wav=masculine :3

@volkerjung4804 10 ай бұрын

Hi, Filmröllchen, ich habe mit großen Erwartungen Dein Video angesehen und bin gleichzeitig hocherfreut und andererseits aber auch ziemlich deprimiert. Der Grund: Vielleicht bin ich doof. Ich verstehs nämlich nicht wirklich. Ich habe das Thema seit 20 Jahren auf dem Schirm, steige aber nicht WIRKLICH dahinter. Ich habe in all der Zeit keine Ressource gefunden, die es so erklärt, daß mans verstehen MUSS. Du bist tatsächlich wohl am nähesten dran - aber es fehlen in meinen Augen viele, viele Details und eine Einordnung in übergeordnete Konzepte. Wenn man sich mit LAC beschäftigt, landet man schnell bei "linear predictive coding" und da wird dann immer was von Filtern erzählt, die mit irgendeiner obskuren Levensin-Durbon-Rekursion berechnet werden - und so weiter. Was man aber definitiv NIE findet, ist irgendwer, der die Sache mal exakt so darstellt, daß mann schon unterirdisch dumm sein muß, um nicht zu kapieren, wie der Hase läuft. Unterirdisch dumm bin ich nicht, nur normal dumm. Kurz: Du ordnest das Verfahren leider in keiner Weise ein und erklärst nicht, wie sich das in die Mathematik allgemein einbettet und wie es zu anderen Erklärungen wenigstens ansatzweise kompatibel ist. Vielleicht können wir und darüber etwas austauschen... Als Dein Video losging, dachte ich: Hey, endlich erklärts mir mal einer wirklich, wie der Quark funktioniert.Dann kamen die Polynome, die sich durch Punkte schlängelten und irgendwie sah das alles ziemlich vielversprechend aus. Dann die Erklärung mit dem "trivialen" Fall der Differenzbildung, also Prädiktor Order 2, grafisch gut ausgearbeitet. Und dann machst Du leider den Fehler, den nahezu alle Erklärer machen: Das "triviale" Beispiel wird erschöpfend behandelt, obwohl es doch TRIVIAL ist. Anschließend gehts ans Eingemachte und DAS wird dann nicht mehr ins Detail zerlegt? Da wird dann nichts mehr GEZEIGT? Sollte man nicht wenigstens den Prädiktoren 3. und 4. Ordnung mal bei der Arbeit ZUSCHAUEN können? Ich kapiere nicht im Ansatz, was mir diese Koeffizienten nützen. In der Wikipedia steht zum Beispiel, das Audiosignal würde bei Flac in Blöcke zerlegt und zu denen würden dann irgendwelche Filterkoeffizienten berechnet werden, Du hingegen sagst, es wäre ne doofe Idee, längere Passagen prädiktieren zu wollen... Kurz und gut: Ich bin nach wie vor verloren in mmehr oder minder nur anreißenden Erklärungen, die die Sache nicht vollständig darlegen, so, daß es UNMÖGLICH ist, es nicht zu verstehen. Vielleicht können wir darüber mal reden und vielleicht können wir zusammen an ner tiefgreifenderen Erklärung arbeiten, die wirklich der Dümmste noch schnallen muß? Deine Erklärung hat mir mehr gesagt als jede zuvor - ein paar sehr entscheidende Details bleiben aber im Dunklen, mutmaßlich, weil sie für Dich quasi "trivial" sind, dummerweise nicht für den, der eben nicht in der Tiefe der Materie steckt und noch dazu von anderen Pseudoerklärungen quasi schon versaut wurde. Auch dieses Weglassen der Nenner hab ich nicht kapiert - da rechne ich mir was aus und schmeiß die Nenner einfach weg und das SCHADET mir nicht? Vielleicht tut es das ja wirklich nicht... Aber WARUM? Ob ein Wert 6 mal größer ist oder nicht, erscheint mir jetzt nicht vollkommen irrelevant, es sei denn, er wäre sowieso so winzig, daß man sich ihn auch gleich schenken könnte. Ich biete Dir an, das perfekte Opfer fürs Finden der perfekten Erklärung zu sein - vielleicht würdest Du das ja gut finden, zumal ja Teil 3 sowieso aussteht... DANKE auf jeden Fall dafür, daß Du mich etwas tiefer an die Sache herangeführt hast! Besten Gruß, Volker