Like always a lot of it is file processing. It's a bit boring don't you think? But the way how they manage the delivery processes is quite cool.
@mohanmohan-dr3en Жыл бұрын
Pnkpnnpoppno ok na yanna padhu ki hi z
@shaliniharoon6307 Жыл бұрын
😊
@ayatadilrazashaikh8760 Жыл бұрын
@@mohanmohan-dr3enyou
@antopolskiy3 жыл бұрын
I just stumbled upon this channel. What a great commentary and breakdown. It offers a lot of perspective. Thank you! As a data scientist who wants to be able to manage data engineering part as well to some extent, I find it really engaging and fun.
@andreaskayy3 жыл бұрын
Thanks Sergey. Welcome to the channel. Going to do a lot more of these videos :) It's also a lot of fun creating these videos
@swapnikrocks3 жыл бұрын
Hi Andreas, great commentary. One thought, Maybe the metadata is ingested via AWS glue on the S3 objects and maintained in postgres via lambda?
@andreaskayy3 жыл бұрын
This could work. As I understand it the meta data is written through the left SQS. I think the providers of these files fill this as they drop something into S3. That's what I would do. Then the RDS has the information and EMR knows what to process
@puneetgupta87 Жыл бұрын
also there must be method he did not describe may be handled in SQS for number of retries on a file within EMR you can not keep on reprocessing it
@salilmarathponmadom72552 жыл бұрын
A great Informative use case. Thanks a lot. I have a doubt, Cant't the metadata be in the same files which they process initially ?
@johnsonfoo3 жыл бұрын
Love your commentary! One question I have, you mentioned nosql could be a better choice. Is it because RDS is slower due to transaction overhead?
@andreaskayy3 жыл бұрын
I was thinking more about flexibility of having a schema less db. Could also be better for scaling reasons when having a lot of writes and reads to it.
@danielolaru24963 жыл бұрын
This was such a nice and informative video. Please do more, maybe one on Azure? Thanks Andreas!
@andreaskayy3 жыл бұрын
Do you know a source for azure use cases? Would love to do this, but don't know a good source
@puneetgupta87 Жыл бұрын
what about emr , never worked on it , so it scales vertically or horizontally and is it automatic ?
@fozantalat4509 Жыл бұрын
@andreadkayy please make more videos like this where you discuss the data architecture of different Big tech companies, and it would be really great if you can make video on how we can make a smaller version of project using this architecture .
@andreaskayy Жыл бұрын
Thanks! I have many different hands-on projects in my academy.
@opherdubrovsky41753 жыл бұрын
Hi Andreas. Loved your video and the commentary. Since making the This is My Architecture video we've made more improvements, one of them is making our Spark cluster work in a serverless type way. Quite cool actually. Would love to get on a call to explain that and other improvements as well as constraints we had and the reasoning behind some of our architecture decisions. If your up to it, we can even do a follow-on video to explain some of the concepts. Let me know what you think. Opher Dubrovsky
@andreaskayy3 жыл бұрын
That would be super interesting Opher! I'll send you a connection request on LinkedIn. Let's chat there
@opherdubrovsky41753 жыл бұрын
@@andreaskayy sounds good
@desavera3 жыл бұрын
This is interesting ! The SQS as a buffering mechanism for reingestion of state control data is nice but yeah RDS might not be the best choice here but Dynamo instead. It is a metadata store ultimately.
@andreaskayy3 жыл бұрын
Yeah sitting the meta data as "complex" documents might be a good addition here