26: Robinhood Stock Trading Platform | Systems Design Interview Questions With Ex-Google SWE

Рет қаралды 10,068

Күн бұрын

Пікірлер

@jordanhasnolife5163 7 ай бұрын

To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/Jordanhasnolife/ . You’ll also get 20% off an annual premium subscription.

@RS7-123 Ай бұрын

are we liking our own comments now? 😂

@jordanhasnolife5163 Ай бұрын

@@RS7-123 Had to for the ad haha

@emenikeanigbogu9368 2 ай бұрын

I feel like you can use windowing with flink to consume data from exchange because that will buffer it before hitting the pricing server so we reduce the load on the pricing server. Edit: It seems that the pricing server isn't doing anything complex, but I feel that it would require more logic to upkeep then just throwing in flink

@jordanhasnolife5163 2 ай бұрын

It's effectively the same idea, but if we use flink you have to put the data in kafka first, and then I imagine we introduce a bit of a delay there. Also just a question of whether we really need to handle every message, or can we do some amount of subsampling if things are coming in too fast

@ShreeharshaV 24 күн бұрын

Thanks Jordan for the video. Few more questions 1. How would you alter the design if there are some users that are performing large scale trading with 100s of trades per seconds, perhaps algorithmic trading. I assume you would bring in message queue in between to not overwhelm the DB. Would it still be a relational Orders DB in that case? 2. I don't fully understand the reasoning behind Websocket based pricing updates for client device. Wouldn't SSE be a better choice here, something you did for Bidding platform design? What is being sent from Client device frequently for us to have a bidirectional communication? In my understanding, Client can tell the service list of tickers it is interested in and LB can assign user a User server based on consistent hashing on UserID. User server will keep a mapping of tickerName to list of userId SSE connections. For every update it receives for a ticker, it will update the userId connection. 3. Can we have a redis pub sub solution between user server and pricing server/publisher instead of websocket based connection? Reasoning behind that is User server already seems to have websocket connections with Client device and then we are burdening it with more Websocket connections with pricing servers. Let me know your thoughts.

@jordanhasnolife5163 16 күн бұрын

1) I don't think this should put a significant burden on our orders database, which isn't sharded by user. If it really was, then sure, a message queue could be alright - though know we have to give order confirmations asynchronously. 2) Fair point, SSE seems fine 3) I imagine that Redis pub sub is just using some sort of socket under the hood, and I had wanted to give myself a bit more flexibility in the middle layer to write custom algorithms to determine the price, but the idea seems fine to me!

@skullTT Ай бұрын

for hot assets like AAPL, GOOG, do they have dedicated pricing servers? for example, one layer-1 AAPL pricing server directly connect to publisher, multiple layer-2 AAPL pricing server connected to the layer-1 AAPL pricing server, user server will connect any one of the layer-2 pricing server

@jordanhasnolife5163 Ай бұрын

Yeah I think that's very reasonable if it's necessary

@foxedex447 7 ай бұрын

would be cool if next video was about exchanges

@RS7-123 Ай бұрын

i had a question about the choice of udp and how it may cause packets to be received out of order. however, towards the 2nd half of video, you do say that we potentially need to aggregate these prices anyway, so it’s not like they are expected to be accurate. also i guess one way of mitigating this could be setting a timestamp along with price from exchange (doesn’t matter if theres a skew) and ignore messages that arrive late

@jordanhasnolife5163 Ай бұрын

UDP does not guarantee ordered delivery of packets unlike TCP. You're correct on the timestamp part!

@RS7-123 Ай бұрын

great video. question 1) what happens when you write to the order db correctly, but fail before making the call to exchange - potentially maybe the exchange did receive your order but failed while responding? we obv can’t do 2p commit.

@jordanhasnolife5163 Ай бұрын

Presumably, you'd have some retry here, but in the short term your order just doesn't get filled - honestly pretty similar to if you successfully submitted an order and it never matched with a counterparty.

@Secret4us 14 күн бұрын

Good design and video, thanks.

@e431215 2 ай бұрын

Did I miss the PositionsDB route explanation ? OrderDB -> CDC Kafka -> Sream Consumer -> PositionsDB?

@jordanhasnolife5163 Ай бұрын

In order to get our current positions we basically just need to be listening to the exchange for which of our orders were actually filled

@rishabhgoyal8110 23 күн бұрын

Brilliant content!

@sbahety92 6 ай бұрын

Can we use a message queue in between publisher and pricing server ? This will decouple both services and publisher will not be blocked on the response from pricing server.

@jordanhasnolife5163 6 ай бұрын

You could but considering the pricing server only needs to cache the last value I'm not sure how necessary it is

@htm332 6 ай бұрын

@@jordanhasnolife5163 by what mechanism would the publishing servers communicate with the pricing servers?

@NBetweenStations 6 ай бұрын

Dope video oh great one! I have a question about the exchange order flow. If I buy a stock and the server makes an order with the exchange, is there some sort of callback from the exchange when it’s filled?? How does Robinhood know when an order is filled or not? Thanks very much

@jordanhasnolife5163 6 ай бұрын

Yeah I believe I showed that towards the beginning of the video. There's a private feed that tells you when you had an execution

@ponsivakumarpalraj7456 6 ай бұрын

Isn't it ideal to use a queue(Kafka) between the Publisher servers and the Pricing servers? or is that the intention and not specifically mentioned in the final design diagram? thank you!

@jordanhasnolife5163 6 ай бұрын

Don't really think it's necessary, given that if the pricing server is receiving too many messages, it can just hold on to the most recent. If we were using an algorithm that required all messages (such as recreating an ordered book and using a weighted average), as opposed to caching the most recent, I'd then agree with you that we'd want something like kafka in between.

@SwagataBasu-ki4bc 7 ай бұрын

Great content, thanks! I have been going through your database videos and they have been really helpful. Could you please do a design video on Multi-engine malware analyser?

@jordanhasnolife5163 7 ай бұрын

Can you elaborate on what this is? Haven't heard of it.

@SwagataBasu-ki4bc 6 ай бұрын

@@jordanhasnolife5163 Sure. A real life example would be VirusTotal. Following is a short problem description. Desgin a multi-engine Malware scanner system which allows users to upload files to be scanned by multiple Anti Virus engines and extraction scripts. Stores and returns metadata and results about the uploaded files. Lot of focus on storage of data, data model and sharding strategies. Requirements/constraints: - Uploded files to be forever accessible, hence a highly available storage system for raw files - File size can vary between 100KB - 1GB - Metadata and scanning services could run on a mixture of linux or windows. - Addition of new engines or scripts on the fly with minimal/zero down time. - As realtime as possible. There is a leetcode sys design interview question discussion page on similar problem with title "FB | System Design | Multi-Engine Malware Analyzer". I tried to post direct links to save you the effort but YT annoyingly keeps removing my comment!

@SwagataBasu-ki4bc 6 ай бұрын

@@jordanhasnolife5163 Desgin a multi-engine Malware scanner system which allows users to upload files to be scanned by multiple Anti Virus engines and extraction scripts. Stores and returns metadata and results about the uploaded files. Lot of focus on storage of data, data model and sharding strategies. Requirements/constraints: - Uploded files to be forever accessible, hence a highly available storage system for raw files - File size can vary between 100KB - 1GB - Metadata and scanning services could run on a mixture of linux or windows. - Addition of new engines or scripts on the fly with minimal/zero down time. - As realtime as possible. There is a leetcode sys design interview question discussion page on similar problem with title "FB | System Design | Multi-Engine Malware Analyzer". I tried to post direct links to save you the effort but YT annoyingly keeps removing my comment!

@SwagataBasu-ki4bc 6 ай бұрын

@@jordanhasnolife5163 Sorry but what's the best way to send you the problem description other than commenting? I have tried adding the problem description around 5 times now and every time my comment gets deleted ¯\_(ツ)_/¯

@guitarMartial 4 ай бұрын

Great video Jordan! I was wondering though would it make sense to have a distributed cache Redis style as opposed to intermediary cache replicas on the read path which can enable construction of portfolios for users? The key thing here seems to be minimization of latency and with Redis Enterprise one could scale ops/sec quite effectively for building of ticker trackers for end users. Second as an optimization - the websocket could selectively send a message to the read path that the stock tracker / pricing page is up so now send me informaiton about quotes to reduce the load on pricing servers. Finally would Vitess make sense here for horizontal sharding of the MySQL tier? This way the application does not have to implement much around horizontal sharding itself. Awesome video nonetheless ty! I tried the same problem ahead of time myself - and roughly arrived at a similar solution!

@jordanhasnolife5163 4 ай бұрын

Redis and vitess - yeah for sure. I think you're mainly describing existing technologies that implement these ideas. 2) seems like an interesting idea, then you may potentially have some cache load delay when a user first requests pricing.

@KratosProton 7 ай бұрын

Great content, keep it up!!

@Maffeos 7 ай бұрын

Quick question: in the final diagram you drew the websocket connection as not passing through the pricing service load balancer between the client and the user server. Was that intentional and does it imply websocket connections usually don't work with load balancers?

@Maffeos 7 ай бұрын

Follow-up: you mentioned another load balancer between the user server and the pricing servers earlier in the video, but it's not in the final diagram. Is that just a simplification, or does it again imply websocket connections don't work with load balancers?

@jordanhasnolife5163 7 ай бұрын

The load balancer is just to help a client figure out which server it should connect to. Websockets should be 1:1 between the client and server, otherwise we have an extra network hop and everyone has to go through the LB

@Maffeos 6 ай бұрын

@@jordanhasnolife5163 making sure i understand: websocket connections don’t go through the LB, but the client still needs to use the LB first to determine which server to establish the websocket connection with, correct?

@jordanhasnolife5163 6 ай бұрын

@@Maffeos Yep!

@nikhilm9494 7 ай бұрын

Keep up the good work Jordan!

@yashaswishetty 7 ай бұрын

Thank you for the great content.. Helps us a lot

@RohitKumar29 7 ай бұрын

Really loved the content

@miry_sof 3 ай бұрын

What tool do you use for the note?

@jordanhasnolife5163 3 ай бұрын

iPad air, oneNote, apple pencil

@silversurferablaze1 3 ай бұрын

Lot of sharding there, need to clear my concept on this first 👍

@chaitanyatanwar8151 17 күн бұрын

thank you!

@sohansingh2022 7 ай бұрын

Thanks 🌹

@KratosProton 7 ай бұрын

Cool resuming now!!

@dkcjenx 4 ай бұрын

LMAO for tech lead part

@divyaundi1055 7 ай бұрын

exactly what we asked for ;)

@rydmerlin 6 ай бұрын

Where is your finance knowledge coming from?

@jordanhasnolife5163 6 ай бұрын

Muh job

@ShreeharshaV 6 ай бұрын

Thanks Jordan for a great video as usual. Small doubt in Cancelling orders workflow over here kzbin.info/www/bejne/iXLEZ6t8rqaHmdk: 1. User provides OrderId of RobinHood to API Gateway through cancelOrders API 2. Service then needs to first find out exchangeId for the corresponding orderId. How does it know which OrderIDExchangeIdMappingDB shard to look for this orderId? Reason being that DB is sharded based on ExchangeID and not OrderId.

@jordanhasnolife5163 6 ай бұрын

Ah jeez really nice catch. At least off the top of my head, I can't think of a better option than a two phase commit to two tables, one from orderId to exchangeId, and the other from exchangeId to orderId

@ShreeharshaV 6 ай бұрын

@@jordanhasnolife5163 Thanks for your response. I was wondering, can the OrderIDExchangeIdMappingDB not be shared based on OrderId, instead of ExchangeId? Cant understand the reason why it needs to be sharded by exchangeId only.

@jordanhasnolife5163 6 ай бұрын

@@ShreeharshaV Well we should have one sharded by exchange order ID because that's what we'll get back from the exchange and we need to know how to map it back to one of our orders.

@eastwest8151 4 ай бұрын

This can be done by a secondary index on internal ID if the DB supports it.

@jordanhasnolife5163 4 ай бұрын

@@eastwest8151 If it's partitioned that's gonna be rough