This is a great video. The visualization helped a lot also. Thank you so much!
@harishnttdata23253 жыл бұрын
Very Useful Video. Time saver
@AWSTutorialsOnline3 жыл бұрын
Glad to hear that
@VishalSharma-hv6ks2 жыл бұрын
Hi Sir, Thanks a lot for this wonderful video. I have a doubt. Like I am using AWS Glue as ETL which is reading data everyday from Oracle RDBMS. But in Oracle I have update and delete with insert. You mentioned that we can use incremental read using bookmarking but what about the delete and update in Oracle side. How can we handle this situation. Thank you sir in advance.
@sivahanuman4466 Жыл бұрын
Excellent Sir Very Useful
@AWSTutorialsOnline Жыл бұрын
Thanks and welcome
@yusnardo3 жыл бұрын
can I run the workflow recursively? I use boundedSize in my glue job. So I need to run the job multiple time in every month until the bookmark was done
@AWSTutorialsOnline3 жыл бұрын
a job can start another instance of the same job in the job code as long as concurrency allows. But is not a true recursive call - so think about exist condition when doing so.
@tiktok43722 жыл бұрын
Thank you for the video, i have a question that does job bookmark work with DataFrame, suppose i use glueContext.create_data_frame_from_catalog, and then do some transformation to the Dataframe and and write the Dataframe to S3 bucket
@AWSTutorialsOnline2 жыл бұрын
yes it does
@veerachegu2 жыл бұрын
Tq so much explanation is very clear cut
@AWSTutorialsOnline2 жыл бұрын
Welcome 😊
@tylerdurden8692 Жыл бұрын
When i try to speicify multiple keys in jobbookmarkkeys , its not working, its taking only the primary of jdbc always. even when there is some modifcations on existing records also its not given, it processing again, anything i am missing here
@AWSTutorialsOnline Жыл бұрын
you can multiple key as long as they increasing or decreasing in values. it that happening in the table?
@tylerdurden8692 Жыл бұрын
@@AWSTutorialsOnline no, it means u are saying like the key field should be auto increment kind of field
@AWSTutorialsOnline Жыл бұрын
@@tylerdurden8692 yes, increment or decrement. Please check this link, it has rules about JDBC - docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
@pulakhazra57923 жыл бұрын
Much clear and helpful.
@AWSTutorialsOnline3 жыл бұрын
Glad it was helpful!
@abir955714 ай бұрын
How does job bookmark scale on massive data set ?
@YogithaVenna3 жыл бұрын
Where is the state information stored? Is it persisted in any data store? What happens behind the scenes?
@AWSTutorialsOnline3 жыл бұрын
The information is not public so cannot say with confidence.
@mylikeskeyan20552 жыл бұрын
Please put some demo for jdbc with bookmarking for a table and shows the daily updated records only in the output
@mohdshoeb51013 жыл бұрын
How i can manage multiple join table through bookmarks.Because When joining table I don't have unique key so that I concatanate multiple id then I get unique key. I need to set bookmark with multiple key. Please tell me how we can do
@AWSTutorialsOnline3 жыл бұрын
Apologies for the late response due to my summer break. Joining tables for bookmark not possible. You might want to create an ETL Glue Job which merges these datasets together and create primary key. Then run bookmark based processing on the merged dataset. Hope it helps,
@deepakshrikanttamhane2852 жыл бұрын
Hi Sir , Its very helpful but how configure s3 timestamp based job bookmark instead of using bookmark key
@AWSTutorialsOnline2 жыл бұрын
I think when you just enable job bookmark without mentioning any key; it uses timestamp for the bookmark purpose. Please check this link - docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
@deepakshrikanttamhane2852 жыл бұрын
Great , It works
@creativeminds73973 жыл бұрын
Hello , Your videos are simply superb 👌, I have pgp encrypted files in s3 and I need to implement bookmarks ,can you help either it work or not . If not any another approach to follow
@AWSTutorialsOnline3 жыл бұрын
Hi, sorry never worked with pgp files. Hard to say without testing,
@abdulhaseeb49803 жыл бұрын
Hi, I hope you are doing great. Currently I'm saving the entries for new files on SQS and then read from Glue to read those files but now I want to use the bookmark option. I'm using Python shell job and it's not supported in it. Now I will move to spark job but I will not use spark context there. can you please guide me how I can do this?
@AWSTutorialsOnline3 жыл бұрын
In order to use job bookmark, you have to program in certain way using spark context. This link might help - docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
@deepakbhutekar5450 Жыл бұрын
sir, how we handle updated records using jobbookmark.? or How jobBookmarkKey identify given record is been updated . becoz once particular record is processes and bookmark and if for some reason process record got updated in source table so how we handle this situation using jobBookMark..?
@joseabzum30733 жыл бұрын
What if I want to delete a .csv? Can some process automatically delete the parquet file?
@AWSTutorialsOnline3 жыл бұрын
You need to use boto3 S3 API to delete the file. Please check this link - boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.delete_object
@joseabzum30733 жыл бұрын
@@AWSTutorialsOnline Hi, but how can I know what parquet file belongs to a deleted .csv?
@vishalrajmane76493 жыл бұрын
Do u have any video for incremental load in aws glue for newly inserted updated and deleted data from source to target??
@AWSTutorialsOnline3 жыл бұрын
I don't have any video on this. But if you are ingesting data from relational database then there are two methods which can work - 1) Using Lake Formation Blueprint or 2) Using Amazon Database Migration Service (DMS) to move data to S3. I have videos about blueprint and DMS but it does not cover incremental update scenario. You can check them in my channel.
@vishalrajmane76493 жыл бұрын
Thnks for the help. I will check the options that u have suggested..🙂
@selvaganesh25293 жыл бұрын
Hi , when I try to reset the bookmark I am getting "entitynotfoundexception , continuation for job not found" source is s3 I hav not altered the transformation ctx also, what might be the error
@AWSTutorialsOnline3 жыл бұрын
not sure. never come across this error. Can you share more details about what you are doing - some how which I can reproduce.
@selvaganesh25293 жыл бұрын
@@AWSTutorialsOnline I fixed the issue, it was due to job_name which I have given as parameter which shouldn't be given as per aws documentation..
@भ्रमंती-ज5ज Жыл бұрын
Hello, how can we rest glue job state ?
@kumark31762 жыл бұрын
Hi Sir, Thanks for sharing the information on Bookmark. I have a task to work on building the bookmark functionality using the PySpark & bookmarking in DynamoDB. I am new to the Bigdata framework technologies & we're moving from glue bookmarking to our own customized code (written in pyspark or java). Can you please suggest any material or sample code when I can use as a reference. We're trying to update based on lastUpdatedTime & DelayTime as motioned by you in this tutorial. Please reply & help me. Thank you..
@sukanyabanu67852 жыл бұрын
Hi ,, Were you able to find a solution ?
@vishalrajmane76493 жыл бұрын
If u have plz provide me th link.
@AWSTutorialsOnline3 жыл бұрын
I don't have any video incremental update. But if you are ingesting data from relational database then there are two methods which can work - 1) Using Lake Formation Blueprint or 2) Using Amazon Database Migration Service (DMS) to move data to S3. I have videos about blueprint and DMS but it does not cover incremental update scenario. You can check them in my channel and go through AWS documentation to understand incremental update part.