Importing CSV files from S3 into Redshift with AWS Glue

Рет қаралды 81,035

Majestic.cloud

Күн бұрын

Пікірлер: 59

@anandd3081 4 жыл бұрын

Just amazing...! Cant put more words about this wonderful video..pls keep adding more videos...thank you Sir.

@johnfromireland7551 4 жыл бұрын

Advice for Majestic Cloud : Pause your screen recorder earlier in the video while your are waiting for services provisioning to occur.Apart from that you flashed through the various screens and it is far too tiny to see.

@Majesticcloud 4 жыл бұрын

Thank you for the advice. I will take it in consideration for future videos and when I will do a remake for this one.

@tarvinder91 4 жыл бұрын

this is amazing. you explained why are we using such setup too. Great job

@joshvanathan1000 5 жыл бұрын

Clear explanation from the scratch 😉😁 . Thank you !!

@vigneshjaisankar7087 2 жыл бұрын

Thanks for the content. I'm new to AWS, If you clarify this questions that would be more helpful. From my understanding S3 is source, redshift is target first we have to create a table in redshift second we read the data from s3 and create a datastore in glue third we read the schema from redshift and create a datastore then we are creating a job to connect the s3datastore and redshift to move the data from s3 to redshift when we run the job the data get copied from s3 to redshift connection is used to connect with redshift cluster to run the job. Is my understanding corrent?

@manojkumarchallagund 3 жыл бұрын

Hi, could you tell me how do we take the Glue to job to the higher environment? For Example, taking an export copy of a job from the Development environment to SIT environment ? is there any option for that?

@KhangNguyen-iz9pb 4 жыл бұрын

Thanks for the helpful video! I am wondering wheter we make the job to store to redshift without the crawler to redshift before?

@SunilBholaTrader 3 жыл бұрын

crawler is to populate glue catalog with table metadata.. job do all ETL stuff

@illiprontimusic9764 4 жыл бұрын

Geez.... Thanks, it works. Although this is complicated as hell. Using Azure ADF, this video would be 10 mins max... it is WAY easier!

@Leonquenix 2 жыл бұрын

This helped a lot, my friend. Thank you very much!

@keshavamugulursrinivasiyen5502 3 жыл бұрын

Is it possible to execute this scenario in Free Tier account?

@user-ix1ob1hr1b 4 жыл бұрын

Hi, do you need to have the target table created in Redshift before creating the job?

@BabAcademy 4 жыл бұрын

yea most likely

@SunilBholaTrader 3 жыл бұрын

job can create table or if table exists - on "target" screen choose that

@qs6190 4 жыл бұрын

Thank you ! Very nice and concise !

@Majesticcloud 4 жыл бұрын

Glad it was helpful!

@crescentbabu7855 Жыл бұрын

Great tutorial

@Majesticcloud Жыл бұрын

Glad you think so!

@misatran1107 2 жыл бұрын

Hello Mr..thank for your video...hmm...I wonder why I should create crawler from redshift to a db.... I think create a job to transfer from s3 to redshift is enough

@RaushanKumar_nitks 4 жыл бұрын

Great Explanation from Scratch. Thanks you very much !!!

@makrantim 4 жыл бұрын

if CSV file has integer , is that handled by the glue and redshift as I get error. Thanks. Video example has double

@SunilBholaTrader 3 жыл бұрын

you have the option to modify the schema.. that why data cleanse comes first - before load

@keshavamugulursrinivasiyen5502 3 жыл бұрын

@Majesticcloud Can you /anyone help to load the Date type (input file = MM/DD/YYYY) column into redshift using GLUE. I mean how to update in GLUE script. Appreciate your help

@priyankasuneja4781 4 жыл бұрын

share example where loading from csv to database but when there is change in existing record then update and if new record insert.job bookmarks just insert incase of updates too...we want to update the existing record

@SunilBholaTrader 3 жыл бұрын

database dont have feature for update bookmark.. but files have

@tusharmayekar9072 4 жыл бұрын

How we can encode column also select dist key or sort key. Could you please share video on it? This is very nice explaination. Thanks

@anandd3081 4 жыл бұрын

Thank you Sir...Very useful video, tried implementing myself today..however facing issue when 'testing the connection' to redshift - it prompts "myRedshiftConnection failed. VPC S3 endpoint validation failed for SubnetId nnnnnnnn VPC: vpc-1234abcd21. Reason: Could not find S3 endpoint or NAT gateway for subnetId: subnet-1234abcd13 in Vpc vpc-1234abcd1 .

@SunilBholaTrader 3 жыл бұрын

create endpoint in vpc for s3

@keshavamugulursrinivasiyen5502 3 жыл бұрын

@@SunilBholaTrader Even i got the same error, however i created the end point for S3 (interface as well as gateway), still the same error. any Suggestions would be appreciate.

@SunilBholaTrader 3 жыл бұрын

@@keshavamugulursrinivasiyen5502 create self referencing security group

@keshavamugulursrinivasiyen5502 3 жыл бұрын

@@SunilBholaTrader thanks will try and let you know

@keshavamugulursrinivasiyen5502 3 жыл бұрын

@sunil Bhola, got it, i tried and it is working fine

@v1ct0rx24 4 жыл бұрын

how about from dynamodb to s3? pls help

@adityanjsg99 2 жыл бұрын

To the point!! No nonsense

@FP-mg5qk 4 жыл бұрын

Is it possible to do this but with DynamoDB instead of Redshift?

@SunilBholaTrader 3 жыл бұрын

any thing with odbc/jdbc and many other options

@vamsikrishna4691 4 жыл бұрын

Good explanation, very clear ... Thank You!!

@piyushsonigra1979 4 жыл бұрын

How to rewrite existing or glue only take incremental data from s3 file ? no luck with bookmark

@Majesticcloud 4 жыл бұрын

Could you explain a bit more in detail what you're trying to accomplish ? I'm not sure I understood the question exactly.

@priyankasuneja4781 4 жыл бұрын

hey piyush..what approach did u used to accomplish the upsert task

@vijayravi1189 4 жыл бұрын

@@priyankasuneja4781 First create a staging table in Redshift with the same schema as the original table, but with no data. Then copy the data present in S3 into the staging table. Delete the common rows present in both staging and original table (say, you might be pulling data from the past 2 weeks). Then insert the data from staging table into original table. Now you will see extra new rows in the original table. This is a classic case of leveraging transaction capability in Redshift. merge_qry = """ begin ; copy mysql_dwh_staging.orders from 's3://mysql-dwh-52820/current/orders.csv' iam_role 'arn:aws:I am:::role/redshift-test-role' CSV QUOTE '\"' DELIMITER ',' acceptinvchars; delete from mysql_dwh.orders using mysql_dwh_staging.orders where mysql_dwh.orders.order_id = mysql_dwh_staging.orders.order_id ; insert into mysql_dwh.orders select * from mysql_dwh_staging.orders; truncate table mysql_dwh_staging.orders; end ; """ result = db.query(merge_qry)

@SunilBholaTrader 3 жыл бұрын

stage table in redshift .. populate it and then update main table

@timothyzee6592 5 жыл бұрын

can i use MS mySQL to load the data base to AWS?

@SunilBholaTrader 3 жыл бұрын

export mysql to file and upload to s3

@advaitz 4 жыл бұрын

How to create glue workflow without console.

@SunilBholaTrader 3 жыл бұрын

glue workflow GUI is there.. else use SDK

@vsr1727 3 жыл бұрын

Thank you 👍🙏👌

@mujtabahussain2293 4 жыл бұрын

Very useful. thanks a lot

@MrLangam 4 жыл бұрын

Thank you kind sir.

@karennatally8409 2 жыл бұрын

thank you soo much!

@jigri_pokhri 4 жыл бұрын

Legend

@dalwindersingh5902 4 жыл бұрын

LOOKS CONFUSING and WRONG - why at the end you are showing data from taxable_csv table ( because etl job job setting shows taxable_csv is the source ), target is productdb_public_taxables_csv..... you should show target data productdb_public_taxables_csv....i dont understand the role of productdb_public_taxables_csv.