55TB A Day: Data Engineering Expert Reacts to Nielsen AWS Data Architecture

  Рет қаралды 51,364

Andreas Kretz

Andreas Kretz

Күн бұрын

Пікірлер: 24
@andreaskayy
@andreaskayy 3 жыл бұрын
Like always a lot of it is file processing. It's a bit boring don't you think? But the way how they manage the delivery processes is quite cool.
@mohanmohan-dr3en
@mohanmohan-dr3en Жыл бұрын
Pnkpnnpoppno ok na yanna padhu ki hi z
@shaliniharoon6307
@shaliniharoon6307 Жыл бұрын
😊
@ayatadilrazashaikh8760
@ayatadilrazashaikh8760 Жыл бұрын
​@@mohanmohan-dr3enyou
@antopolskiy
@antopolskiy 3 жыл бұрын
I just stumbled upon this channel. What a great commentary and breakdown. It offers a lot of perspective. Thank you! As a data scientist who wants to be able to manage data engineering part as well to some extent, I find it really engaging and fun.
@andreaskayy
@andreaskayy 3 жыл бұрын
Thanks Sergey. Welcome to the channel. Going to do a lot more of these videos :) It's also a lot of fun creating these videos
@swapnikrocks
@swapnikrocks 3 жыл бұрын
Hi Andreas, great commentary. One thought, Maybe the metadata is ingested via AWS glue on the S3 objects and maintained in postgres via lambda?
@andreaskayy
@andreaskayy 3 жыл бұрын
This could work. As I understand it the meta data is written through the left SQS. I think the providers of these files fill this as they drop something into S3. That's what I would do. Then the RDS has the information and EMR knows what to process
@puneetgupta87
@puneetgupta87 Жыл бұрын
also there must be method he did not describe may be handled in SQS for number of retries on a file within EMR you can not keep on reprocessing it
@salilmarathponmadom7255
@salilmarathponmadom7255 2 жыл бұрын
A great Informative use case. Thanks a lot. I have a doubt, Cant't the metadata be in the same files which they process initially ?
@johnsonfoo
@johnsonfoo 3 жыл бұрын
Love your commentary! One question I have, you mentioned nosql could be a better choice. Is it because RDS is slower due to transaction overhead?
@andreaskayy
@andreaskayy 3 жыл бұрын
I was thinking more about flexibility of having a schema less db. Could also be better for scaling reasons when having a lot of writes and reads to it.
@danielolaru2496
@danielolaru2496 3 жыл бұрын
This was such a nice and informative video. Please do more, maybe one on Azure? Thanks Andreas!
@andreaskayy
@andreaskayy 3 жыл бұрын
Do you know a source for azure use cases? Would love to do this, but don't know a good source
@puneetgupta87
@puneetgupta87 Жыл бұрын
what about emr , never worked on it , so it scales vertically or horizontally and is it automatic ?
@fozantalat4509
@fozantalat4509 Жыл бұрын
@andreadkayy please make more videos like this where you discuss the data architecture of different Big tech companies, and it would be really great if you can make video on how we can make a smaller version of project using this architecture .
@andreaskayy
@andreaskayy Жыл бұрын
Thanks! I have many different hands-on projects in my academy.
@opherdubrovsky4175
@opherdubrovsky4175 3 жыл бұрын
Hi Andreas. Loved your video and the commentary. Since making the This is My Architecture video we've made more improvements, one of them is making our Spark cluster work in a serverless type way. Quite cool actually. Would love to get on a call to explain that and other improvements as well as constraints we had and the reasoning behind some of our architecture decisions. If your up to it, we can even do a follow-on video to explain some of the concepts. Let me know what you think. Opher Dubrovsky
@andreaskayy
@andreaskayy 3 жыл бұрын
That would be super interesting Opher! I'll send you a connection request on LinkedIn. Let's chat there
@opherdubrovsky4175
@opherdubrovsky4175 3 жыл бұрын
@@andreaskayy sounds good
@desavera
@desavera 3 жыл бұрын
This is interesting ! The SQS as a buffering mechanism for reingestion of state control data is nice but yeah RDS might not be the best choice here but Dynamo instead. It is a metadata store ultimately.
@andreaskayy
@andreaskayy 3 жыл бұрын
Yeah sitting the meta data as "complex" documents might be a good addition here
@raulvallejo7255
@raulvallejo7255 3 жыл бұрын
More comments on streaming-first architectures!!
@andreaskayy
@andreaskayy 3 жыл бұрын
I'll look into it :)
Data Engineering Expert Reacts to AWS Architecture at Veritiv
14:40
Andreas Kretz
Рет қаралды 51 М.
SIZE DOESN’T MATTER @benjaminjiujitsu
00:46
Natan por Aí
Рет қаралды 8 МЛН
小路飞和小丑也太帅了#家庭#搞笑 #funny #小丑 #cosplay
00:13
家庭搞笑日记
Рет қаралды 12 МЛН
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 7 МЛН
It's Over.. Becoming a Data Engineer is a Scam: Reaction
27:25
Andreas Kretz
Рет қаралды 2,3 М.
Why a Data Lakehouse Architecture
8:02
IBM Technology
Рет қаралды 64 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 177 М.
I think I was wrong about AWS Amplify
30:39
Web Dev Cody
Рет қаралды 61 М.
Nielsen: Processing 55TB of Data Per Day with AWS Lambda
9:05
Amazon Web Services
Рет қаралды 186 М.
Vocabulary for Data Engineers - Data Engineering 101
15:11
Seattle Data Guy
Рет қаралды 42 М.
The Harsh Reality of Being a Data Engineer
14:21
Jash Radia
Рет қаралды 248 М.
Data Engineering Was Hard Until I Learned These 5 Secrets: Reaction
20:32
Amazon DataZone - Data Mesh and Modern Data Architecture on AWS
27:41
Event-Driven Architecture (EDA) vs Request/Response (RR)
12:00
Confluent
Рет қаралды 173 М.
SIZE DOESN’T MATTER @benjaminjiujitsu
00:46
Natan por Aí
Рет қаралды 8 МЛН