thanks a lot homie.. dont know where to start.. this video really helps
@DataAI-k9xАй бұрын
Thanks bro, appreciate the detailed video with the wonderful resources... :)
@DataMyselfAIАй бұрын
UPDATE: I just passed my certification exam and observed the following topics to be important (some are not even directly mentioned in the exam guide): - Costs in Bedrock (especially provisioned throughput) - Evaluation methods (ROUGE & BLEU) - Types of bias (e.g., sampling bias) - General ML concepts: confusion matrix, correlation matrix, epochs, GAN, SVM - SageMaker inference types - F1 score, accuracy, etc. - SageMaker Canvas, Data Wrangler, Ground Truth Plus Next week, I'll add additional questions about these topics and others I encountered during the exam to my practice exam course.
@gatorpikaАй бұрын
Watched your first two videos, liked and subscribed. Great stuff! I have never tried CDC as I am old skool batch, but the thing that always freaked me out was if I had to go back and reload from bronze because something happened to the related target in silver, seems I would always have to reload from the beginning with the first full load. With batch I could identify the time period that was effed up and just reload that. Is that a correct assumption and if so how is that normally handled in practice to avoid huge multiyear reloads? I am assuming the source data is gone due to shorter retention.
@DataMyselfAIАй бұрын
Thanks 🙏 Yeah you're right, production-ready, robust implementations of CDC can be a headache. That's why there are reliable, ready-to-use solutions like Delta Live Tables in Databricks that can handle it efficiently.
@rajeshbhosale2008Ай бұрын
Thanks, Thomas!!! This helps a lot.
@PhaniBhushan-f5wАй бұрын
Can you please make a video on "Use a reusable ETL framework in your AWS lake house architecture" ?
@DataMyselfAIАй бұрын
I will put it on my list, you could use dbt for that or are you interested in an AWS native solution? :)
@PhaniBhushan-f5wАй бұрын
@@DataMyselfAI , here is the reference link : aws.amazon.com/blogs/architecture/use-a-reusable-etl-framework-in-your-aws-lake-house-architecture/
@demohub2 ай бұрын
Really helpful. Thanks for the video. From Snowflake can you do writes, or only reads?
@rolandstuffle24392 ай бұрын
Thanks, amazing value man Also doing your course right now - really good stuff
@ManishJindalmanisism2 ай бұрын
HI Thomas, I have one question on this. When you are creating hostsIncrementalInputDF in glue, every time you will read the full bronze table and then do clean/transformation over it. Will that not be waste of resources as table grows over time? Should not this data frame only pick and process only those records from the bronze table which has changed or are new, since last run ?
@DataMyselfAI2 ай бұрын
Hi Manish, you are absolutely correct that this would be a waste of resources and incur unnecessary transformations. That's why I activated Glue job bookmarks for the job, so that only new files are picked up compared to the last run. Also, this is more of a proof of concept. In a real scenario, we would need a more robust setup to ensure that everything works correctly, even if the job fails.
@balaece252 ай бұрын
can i get the tool name that you have used to create the flow diagram gif?
@DataMyselfAI2 ай бұрын
Sure, I have used Canva for that :)
@rolandstuffle24393 ай бұрын
Great video, thanks man!
@DataMyselfAI2 ай бұрын
Thanks Roland 🙏
@MrDomenic1233 ай бұрын
Awesome video and well explained! There is not much content out there on Apache XTable, so glad you covered it.
@DataMyselfAI3 ай бұрын
Thanks! Happy you enjoyed it 😁
@raghuerumal6 ай бұрын
Good job Thomas ... Liked your demo and explanation. Please share blog with code snippets for lambda and glue job. Thank you
@DataMyselfAI6 ай бұрын
Thank you for the positive feedback :) You can find the blog post with all code shown here: bit.ly/4aONz1M