Thomas Hass

Пікірлер

@SaintXoXo-i 25 күн бұрын

thanks a lot homie.. dont know where to start.. this video really helps

@DataAI-k9x Ай бұрын

Thanks bro, appreciate the detailed video with the wonderful resources... :)

@DataMyselfAI Ай бұрын

UPDATE: I just passed my certification exam and observed the following topics to be important (some are not even directly mentioned in the exam guide): - Costs in Bedrock (especially provisioned throughput) - Evaluation methods (ROUGE & BLEU) - Types of bias (e.g., sampling bias) - General ML concepts: confusion matrix, correlation matrix, epochs, GAN, SVM - SageMaker inference types - F1 score, accuracy, etc. - SageMaker Canvas, Data Wrangler, Ground Truth Plus Next week, I'll add additional questions about these topics and others I encountered during the exam to my practice exam course.

@gatorpika Ай бұрын

Watched your first two videos, liked and subscribed. Great stuff! I have never tried CDC as I am old skool batch, but the thing that always freaked me out was if I had to go back and reload from bronze because something happened to the related target in silver, seems I would always have to reload from the beginning with the first full load. With batch I could identify the time period that was effed up and just reload that. Is that a correct assumption and if so how is that normally handled in practice to avoid huge multiyear reloads? I am assuming the source data is gone due to shorter retention.

@DataMyselfAI Ай бұрын

Thanks 🙏 Yeah you're right, production-ready, robust implementations of CDC can be a headache. That's why there are reliable, ready-to-use solutions like Delta Live Tables in Databricks that can handle it efficiently.

@rajeshbhosale2008 Ай бұрын

Thanks, Thomas!!! This helps a lot.

@PhaniBhushan-f5w Ай бұрын

Can you please make a video on "Use a reusable ETL framework in your AWS lake house architecture" ?

@DataMyselfAI Ай бұрын

I will put it on my list, you could use dbt for that or are you interested in an AWS native solution? :)

@PhaniBhushan-f5w Ай бұрын

@@DataMyselfAI , here is the reference link : aws.amazon.com/blogs/architecture/use-a-reusable-etl-framework-in-your-aws-lake-house-architecture/

@demohub 2 ай бұрын

Really helpful. Thanks for the video. From Snowflake can you do writes, or only reads?

@rolandstuffle2439 2 ай бұрын

Thanks, amazing value man Also doing your course right now - really good stuff

@ManishJindalmanisism 2 ай бұрын

HI Thomas, I have one question on this. When you are creating hostsIncrementalInputDF in glue, every time you will read the full bronze table and then do clean/transformation over it. Will that not be waste of resources as table grows over time? Should not this data frame only pick and process only those records from the bronze table which has changed or are new, since last run ?

@DataMyselfAI 2 ай бұрын

Hi Manish, you are absolutely correct that this would be a waste of resources and incur unnecessary transformations. That's why I activated Glue job bookmarks for the job, so that only new files are picked up compared to the last run. Also, this is more of a proof of concept. In a real scenario, we would need a more robust setup to ensure that everything works correctly, even if the job fails.

@balaece25 2 ай бұрын

can i get the tool name that you have used to create the flow diagram gif?

@DataMyselfAI 2 ай бұрын

Sure, I have used Canva for that :)

@rolandstuffle2439 3 ай бұрын

Great video, thanks man!

@DataMyselfAI 2 ай бұрын

Thanks Roland 🙏

@MrDomenic123 3 ай бұрын

Awesome video and well explained! There is not much content out there on Apache XTable, so glad you covered it.

@DataMyselfAI 3 ай бұрын

Thanks! Happy you enjoyed it 😁

@raghuerumal 6 ай бұрын

Good job Thomas ... Liked your demo and explanation. Please share blog with code snippets for lambda and glue job. Thank you

@DataMyselfAI 6 ай бұрын

Thank you for the positive feedback :) You can find the blog post with all code shown here: bit.ly/4aONz1M

Ең жақсы KZbin

Пікірлер