55TB A Day: Data Engineering Expert Reacts to Nielsen AWS Data Architecture

Рет қаралды 51,364

Күн бұрын

Пікірлер: 24

@andreaskayy 3 жыл бұрын

Like always a lot of it is file processing. It's a bit boring don't you think? But the way how they manage the delivery processes is quite cool.

@mohanmohan-dr3en Жыл бұрын

Pnkpnnpoppno ok na yanna padhu ki hi z

@shaliniharoon6307 Жыл бұрын

😊

@ayatadilrazashaikh8760 Жыл бұрын

@@mohanmohan-dr3enyou

@antopolskiy 3 жыл бұрын

I just stumbled upon this channel. What a great commentary and breakdown. It offers a lot of perspective. Thank you! As a data scientist who wants to be able to manage data engineering part as well to some extent, I find it really engaging and fun.

@andreaskayy 3 жыл бұрын

Thanks Sergey. Welcome to the channel. Going to do a lot more of these videos :) It's also a lot of fun creating these videos

@swapnikrocks 3 жыл бұрын

Hi Andreas, great commentary. One thought, Maybe the metadata is ingested via AWS glue on the S3 objects and maintained in postgres via lambda?

@andreaskayy 3 жыл бұрын

This could work. As I understand it the meta data is written through the left SQS. I think the providers of these files fill this as they drop something into S3. That's what I would do. Then the RDS has the information and EMR knows what to process

@puneetgupta87 Жыл бұрын

also there must be method he did not describe may be handled in SQS for number of retries on a file within EMR you can not keep on reprocessing it

@salilmarathponmadom7255 2 жыл бұрын

A great Informative use case. Thanks a lot. I have a doubt, Cant't the metadata be in the same files which they process initially ?

@johnsonfoo 3 жыл бұрын

Love your commentary! One question I have, you mentioned nosql could be a better choice. Is it because RDS is slower due to transaction overhead?

@andreaskayy 3 жыл бұрын

I was thinking more about flexibility of having a schema less db. Could also be better for scaling reasons when having a lot of writes and reads to it.

@danielolaru2496 3 жыл бұрын

This was such a nice and informative video. Please do more, maybe one on Azure? Thanks Andreas!

@andreaskayy 3 жыл бұрын

Do you know a source for azure use cases? Would love to do this, but don't know a good source

@puneetgupta87 Жыл бұрын

what about emr , never worked on it , so it scales vertically or horizontally and is it automatic ?

@fozantalat4509 Жыл бұрын

@andreadkayy please make more videos like this where you discuss the data architecture of different Big tech companies, and it would be really great if you can make video on how we can make a smaller version of project using this architecture .

@andreaskayy Жыл бұрын

Thanks! I have many different hands-on projects in my academy.

@opherdubrovsky4175 3 жыл бұрын

Hi Andreas. Loved your video and the commentary. Since making the This is My Architecture video we've made more improvements, one of them is making our Spark cluster work in a serverless type way. Quite cool actually. Would love to get on a call to explain that and other improvements as well as constraints we had and the reasoning behind some of our architecture decisions. If your up to it, we can even do a follow-on video to explain some of the concepts. Let me know what you think. Opher Dubrovsky

@andreaskayy 3 жыл бұрын

That would be super interesting Opher! I'll send you a connection request on LinkedIn. Let's chat there

@opherdubrovsky4175 3 жыл бұрын

@@andreaskayy sounds good

@desavera 3 жыл бұрын

This is interesting ! The SQS as a buffering mechanism for reingestion of state control data is nice but yeah RDS might not be the best choice here but Dynamo instead. It is a metadata store ultimately.

@andreaskayy 3 жыл бұрын

Yeah sitting the meta data as "complex" documents might be a good addition here