Probably my most favorite design thus far. Was thinking about Shazam yesterday
@almogkurtser45572 жыл бұрын
Great stuff! I think it's good you avoided the whole DFT thing that just diverts the discussion into very specialized algorithmic land. As for partitioning, while it's clearly a viable solution, I think this specific use case has two properties that might lend themself well to relying solely on single leader replication instead: 1. Immutable data 2. Writes (insertions of new songs) do not need to be reflected immediately (= we should be okay with gigantic replication lag caused by a batch job) 3. Even if all reads were in the order of 10ms, song identification might still take as much as 10 seconds. Slow DB reads that hit the disk will be negligible I think. So I guess both strategies are viable (we'll need some sort of replication on top of partitioning in any case to have high availability), so maybe the discussion can be diverted to pros and cons in terms of costs, which can lead to another nest of vipers, namely capacity estimation for how many nodes we'll need with each approach and their cost. :(
@jordanhasnolife51632 жыл бұрын
You make a good point - I'd imagine the matching phase is a bottleneck and as a result those index reads don't have to be as fast! Though based on another paper I had read it seemed it was best to do everything in memory - in which case I'd think partitioning was necessary.
@almogkurtser45572 жыл бұрын
@@jordanhasnolife5163 Could you point me to the paper that mention the significance of keeping everything in memory?
@jordanhasnolife51632 жыл бұрын
@@almogkurtser4557 Just faster than on disk. By keeping things in memory you can use a hash table index as opposed to a B-Tree or LSM-tree + SSTables
@almogkurtser45572 жыл бұрын
@@jordanhasnolife5163 That's true, B-Tree is how I imagined it working in a replication only setup. I Probably just misunderstood your prev comment thinking you were alluding to additional challenges that would be inherent to storing on disk. The main disadvantage is potentially using more nodes to serve the same amount of request due to higher latency per request with the advantage being simpler routing since we only need to know which nodes exist rather than using consistent hashing. Availability would also be better since every key is contained in every node. I guess it comes down to complexity vs cost. I have no idea if say 5x B-Tree node would be cheaper or more expensive than R*X servers with more memory (R being the replication factor for each partition). We can make the partition solution cheaper ( = more complex) by saying that we don't need RX servers but rather just X, each being a partition follower but keeping that partition as a cold storage and only warming it into memory when the partition leader fails.
@jh07202 жыл бұрын
Another banger 🤷♂️
@jordanhasnolife51632 жыл бұрын
Thanks for all the kind words Jonathan, I appreciate it :)
@theinfamous4231 Жыл бұрын
thank you so much for the Explanation man!!
@jordanhasnolife5163 Жыл бұрын
Np! It's a cool Algo!
@cd-ux9ot2 жыл бұрын
What about an ML approach? Train an autoencoder model where the input data is all the songs plus random/background noise and the output is the actual song id. Then use the model weights as embeddings and store them in a vector database. During search, create embedding from the query audio, lookup candidates in the vector database using similarity search. Then another model to score the candidates an select the candidate with the highest score.
@jordanhasnolife51632 жыл бұрын
This could perhaps work, the biggest problem imo is that the clip of the song can be from any starting point. I think something like an RNN would work best here since they are good for sequences, and you could make each fingerprint an element of the sequence that you're trying to classify.
@cd-ux9ot2 жыл бұрын
@@jordanhasnolife5163 An RNN is a great choice. Transformers have recently shown great performance with sequence data. Convolution layers can even work with audio data because a spectrogram is image-like.
@jordanhasnolife51632 жыл бұрын
@@cd-ux9ot Cool point! I will say I don't know too much about machine learning but I'd be interested to see if anyone has taken an inference based approach for this problem - it may be a bit overkill I guess since shazam was already working pretty well without it :)
@jayantprakash64252 жыл бұрын
good explanation
@user-se9zv8hq9r2 жыл бұрын
based and jewfro-pilled
@buildleadinnovate242 жыл бұрын
How singing song app created???
@AyushPanditofficial2 жыл бұрын
I m from india i love your videos i watched all.your videos we need more videos i m learning many thing from your video implementing in my web app
@Jashan771142 жыл бұрын
What is sharding the indexes?
@jordanhasnolife51632 жыл бұрын
We use consistent hashing on the 64 bit fingerprint which is comprised of the start note, end note, and time delta. If you mean what is responsible for upholding the consistent hashing, then we could use a coordination service like zookeeper to do that for us.
@UjjwalBhardwaj9610 ай бұрын
@@jordanhasnolife5163 You have mentioned in the video that we can shard the data such that all the generated hash (fingerprint) with the same anchor point goes in the same node. Can we achieve this using consitent hashing?