Thank u very much buddy I was lost for 2 days, neither gemini nor perplexity could solve this
@jameslinton6424 Жыл бұрын
Excellent video! I have one question regarding the 'main' function. Why did you mention 'Read open the MIDI file and extract the pitch class distribution' when the script is actually analyzing audio? Does Librosa somehow convert the audio to MIDI before generating a chromagram? Thank you, Daniel, for sharing this valuable code with us.
@dszelogowski Жыл бұрын
Thanks for the feedback! Calculating the chromagram in itself is the pitch-class distribution; librosa uses the short-time Fourier transform (STFT) algorithm to do a frequency distribution analysis, which can then be classified into the different pitch sets by their frequency. No need to convert to MIDI as a result, which is spectacular in terms of performance! As a result, the chromagram returned from the STFT is just a visualization of the pitch-class distribution; taking the sum over axis=1 thus gives us the number of occurrences of each note, which the algorithm uses to approximate the key.
@NilsBarbier Жыл бұрын
Why do you use a circulant matrix ? Can you develop this part please ! Have a great day !
@dszelogowski Жыл бұрын
This implementation uses a circulant matrix to speed up the computation of the correlation between the pitch class distribution of a piece of music and the key profiles. A circulant matrix is a matrix that is invariant to circular shifts, meaning that if you shift the rows or columns of the matrix by any number of positions, the matrix will not change. This property makes circulant matrices very efficient for computing correlations since the correlation of two vectors can be computed by multiplying the vectors together and then taking the DFT of the product. In the algorithm, the pitch class distribution of a piece of music is represented as a vector. The key profiles are also represented as vectors. The correlation between the pitch class distribution and the key profiles can then be computed by multiplying the two vectors together and then taking the DFT of the product. If the key profiles were not circulant matrices, then the correlation would have to be computed by multiplying the vectors and then summing the products. This would be much slower than using a circulant matrix because the DFT of a product of two vectors is much faster to compute than the sum of the products.
@NilsBarbier Жыл бұрын
@@dszelogowski Waw thank you so much for your answer! But what does DFT mean ? (I'm french so sorry ahah)
@dszelogowski Жыл бұрын
@@NilsBarbier no worries! It's the Discrete Fourier Transform.
@voinaalex6424 Жыл бұрын
amazing video! did you test the accuracy of the algorithm on more complex musical pieces (e.g house, pop, rap songs) ?
@obineg57528 ай бұрын
haha, he used "more complex" in the same sentence with "house, pop, rap" ;)
@PaulNahay10 ай бұрын
A little solfege lesson in the correct nomenclature for the "black notes" in any key: "do#" is really "di", "re#" is really "ri", "fa#" is really "fi", "so#" is really "si", "la#" is really "li", "re flat" is really "ra", "mi flat" is really "me", "so flat" is really "se", "la flat" is really "le", and "ti flat" is really "te".