Tracking Processed Data Using AWS Glue Job Bookmarks | Incremental ETL In-depth intuition

Рет қаралды 7,180

Knowledge Amplifier

Күн бұрын

Пікірлер: 16

@manojt7012 2 жыл бұрын

Ur consistency is just inspiring... Fan of ur contents 👌🏻

@KnowledgeAmplifier1 2 жыл бұрын

Thank you Manoj T for your continuous support ! Happy Learning :-)

@MatheusRibeiro-or2hq 2 жыл бұрын

Great Video!

@KnowledgeAmplifier1 2 жыл бұрын

Thank you Matheus Ribeiro! Happy Learning

@balasakiran Жыл бұрын

Nice explonatios, crisp and clearn. I have a quick question, over a period of time, say after 2 months, if there is a need to do a history load(process all files ) , how can this be achieved ?

@farookshaik7462 2 жыл бұрын

Really useful. Keeping going..

@KnowledgeAmplifier1 2 жыл бұрын

Thank you Farook Shaik! Happy Learning :-)

@ravikreddy7470 2 жыл бұрын

What's the difference between incremental job bookmarking and incremental crawling?

@KnowledgeAmplifier1 2 жыл бұрын

Ravi K R , Incremental crawls helps to prevent recrawling of same data from source systems , instead of that crawl only new data and make it available in Glue Catalog for processing , & AWS Glue Job bookmarking helps to prevent the reprocessing of old data . One helps in crawling incrementally , one helps in processing incrementally .... Hope this will give you some idea , for more details , you can refer these links -- Incremental crawls in AWS Glue docs.aws.amazon.com/glue/latest/dg/incremental-crawls.html Tracking processed data using job bookmarks docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html Happy Learning

@ravikreddy7470 2 жыл бұрын

@@KnowledgeAmplifier1 crawling and processing both are different?

@KnowledgeAmplifier1 2 жыл бұрын

@@ravikreddy7470 yes , crawler creates the metadata that allows GLUE Jobs and services such as ATHENA to view the S3 information as a database with tables & process it .

@yashgangrade5460 9 ай бұрын

I ran glue crawler but it's giving error HIVE_INVALID_METADATA: Hive metadata for table raw is invalid: Table descriptor contains duplicate columns.

@tcsanimesh Жыл бұрын

Superb explanation!! However I have one question. When we enable bookmark for incremental load.. let’s assume the requirement is for incremental load only but it’s not daily but weekly.. so I mean weekly incremental load.. in that case also will this concept work.. I mean doesn’t aws glue read a definite duration back from the bookmarked time stamp only or it is like read all files after the last book marked time stamp

@FRUXT Жыл бұрын

How the job bookmark knows what to increment ? We need to specify it to track a specific column ?

@basavapn6487 8 ай бұрын

Can you please make an video when i have requirement where daily an getting files into s3 bucket and i want to process last 90days data present in s3 using glue