AWS Tutorials - Single AWS Glue Job & Multiple Transformations

Рет қаралды 7,421

AWS Tutorials

Күн бұрын

Пікірлер: 16

@tcsanimesh 2 жыл бұрын

Awesome!!Best in the entire you tube inventory. Please don't stop making these type of videos.

@simij851 2 жыл бұрын

Thank you, awesome video. Without using the step functions, and the same concept, will I be able to read them sequentially. I have 150 tables to read, creating parallel tasks in step function might be tedious, so was wondering if we can have it read ( by using loop) ?

@afjalahamad2465 2 жыл бұрын

please make videos on AWS Glue Schema Registries.

@gunjanagrawal7014 Жыл бұрын

Hi, it was really nice explanation. Question: We have multiple inhouse json source data files which comes with header and footer as well as on different timing and different sources. What do be need? : We want to source these files in S3 and then want to run glue job to write this data to different aurora postgres SQL.. we have 20 sources, so looking some parameterizec solution . Please guide or share if you have any code snippet.

@AWSTutorialsOnline Жыл бұрын

Unless there is a common pattern across these files which can be parameterized, I would recommend you create separate jobs for each files.

@faingtoku Жыл бұрын

Is it posible to do something similar while streaming different jsons with kinesis and storing to db?

@AWSTutorialsOnline Жыл бұрын

It might not be possible to do it with streaming data because it works with fixed schema for the data coming in Kinesis.

@faingtoku Жыл бұрын

@@AWSTutorialsOnline thank you for your response ! Then how could I stream different json from multiple sources to kinesis and dump it to a db different tables with pyspark/glue? Should I add a special key to each json so I can detect which transformation I should use ?

@peterpan9361 2 жыл бұрын

can you make a video how to move sharepoint data to AWS s3 ? This is a common requirement for many big companies, but no automated solution I could fine. I believe we can do using AWS lambda, doing api call to sharepoint, but not sure how to do. Can you please assist :)

@sriadityab4794 2 жыл бұрын

Can we assign Spark properties like driver and executor memory for glue job?

@AWSTutorialsOnline 2 жыл бұрын

You cannot for both as Glue is AWS managed service. However, you can select WorkerType and NumberofWorkers as parameters which decides overall vCPU, Memory and Disk Space allocated to the job.

@tamara28899hi Жыл бұрын

How would you manage version control for the transformation code stored in S3?

@arvindsinha1566 Жыл бұрын

i have chart CSV files having.1 minute duration OHLC (open, high, low, closed). data. I want to generate 5 minutes, 30 minutes, 1 hour duration OHLC data . How to achieve using glue? I can have multiple CSV files.

@Bee-ib1pb 2 жыл бұрын

@simij851 2 жыл бұрын

Thank you for doing this, I tried this, and it was super helpful. But randomly, I would get this error .. An error occurred while calling z:com.amazonaws.services.glue.util.Job.commit. Continuation update failed due to version mismatch. Expected version 103 but found version 105 reason being with concurrency and bookmark being enabled, while parallel jobs complete and do a job commit(), glue gets confused. If you know how you've handled this situation that would be awesome

@simij851 2 жыл бұрын

Removing book marks, helps with resolving the error, but I need the book marks enabled for all the tables that I'm running concurrently. wondering if i I try changing in the glue job script to job.init(args["JOB_NAME"] + args["ctbl"],args), and within step function while I specify the job name to give "JobName": "JOBNAME+ctbl"