AWS Tutorials - Access S3 Data in Amazon Redshift using Redshift Spectrum

Рет қаралды 18,867

Күн бұрын

Пікірлер: 50

@nareshdulam58 3 жыл бұрын

A friendly suggestion. some times your voice is not clear but the content wise your video is amazing and your explanation also :)

@AWSTutorialsOnline 3 жыл бұрын

Thanks Naresh, I will work to improve on the quality

@mallik1232 3 жыл бұрын

I regularly watch all your videos . Nice explanation and use cases.

@AWSTutorialsOnline 3 жыл бұрын

Glad you like them!

@duggidk 2 жыл бұрын

audio is not good, please correct. This is very useful

@勝己加古 3 жыл бұрын

ありがとうございます。分かり易くS3に保存したデータをRedshiftSpectrumを使用して参照できました。

@AWSTutorialsOnline 3 жыл бұрын

どういたしまして

@gianniskamakas3578 2 жыл бұрын

We have to offer a mic to this guy. The info and the way of presentation is awesome. Bravo my friend

@AWSTutorialsOnline 2 жыл бұрын

:) Thanks. Sorry for the bad quality. I changed my mic in the later videos. I need to find sometime to re-record these old videos.

@lizzychen7665 2 жыл бұрын

omg your tutorial is awesome! Saving my life!!!

@2kpravin 3 жыл бұрын

Very informative video and content! Thanks for sharing it. A friendly suggestion, please use good microphone if possible.

@AWSTutorialsOnline 3 жыл бұрын

Thanks, sure. I will work in sound quality.

@himanshumehta0703 3 жыл бұрын

Very well explained 👍

@mvjrao123 2 жыл бұрын

This is what I was looking for my use case. We are using similar process at my work. My question is what is the difference between creating an Athena table/ run the queries on it VS create a table in Redshift as you showed and run the queries?

@AWSTutorialsOnline 2 жыл бұрын

You use spectrum only when you have need to access S3 data in Redshift. Athena is a way to query S3 data outside Redshift.

@takeiteasy1868 3 жыл бұрын

Hi , I am big fan of your tutorials. Thank you for your informative videos

@AWSTutorialsOnline 3 жыл бұрын

Thanks for your appreciation

@takeiteasy1868 3 жыл бұрын

@@AWSTutorialsOnline i did send u email few days back, i checked ur resonse today. My name is Amit. I will respond you shortly, need ur guidance. Regards

@AWSTutorialsOnline 3 жыл бұрын

sure, look forward to.

@arunanshuchakraborty2192 2 жыл бұрын

Thanks for this video. I have a question that if I have multiple external schemas in a database, how can I create schema specific external tables. I see that a table once created in a particular schema, gets replicated in all schemas of the same database. Not sure how to avoid that. Looking forward to a suggestion from you. Thanks !!

@AWSTutorialsOnline 2 жыл бұрын

I think there is some confusion. An external table is created in a schema in a database. I don't think it gets replicated to all the schemas.

@gayathrichakravarthy1056 3 жыл бұрын

Thanks for the video! I would like to ask a question - if I have a crawler updating the Glue catalog, will the changes to the schema be picked up by the Redshift exernal table too?

@AWSTutorialsOnline 3 жыл бұрын

To be honest - I did not think or tried about it. Let me experiment with it and confirm.

@pardeep657 3 жыл бұрын

Thanks for video. when you are explaining costs involved, there was sound issue..can you explain the pricing part? also how about performance, is it similar to querying the table on cluster?

@AWSTutorialsOnline 3 жыл бұрын

Hi, the performance of querying a local table is faster than querying external table (S3 spectrum). There are ways to improve it like using parquet format and row level partitioning. for cost - you can refer this link. aws.amazon.com/premiumsupport/knowledge-center/redshift-spectrum-query-charges/ It is $5 per TB data scanned for the query.

@vishwarajgupta1963 3 жыл бұрын

Thanks for your video. Very informative. Can you please share an exmple of join using external and internal schema tables. thanks.

@AWSTutorialsOnline 3 жыл бұрын

The external and internal query can look like the following - select [internalschema].[table].[column1], [internalschema].[table].[column2], [externalschema].[table].[column1], [externalschema].[table].[column2] FROM [internalschema].[table], [externalschema].[table] WHERE [internalschema].[table].[columnA] = [externalschema].[table].[columnB] Hope it helps

@vishwarajgupta1963 3 жыл бұрын

@@AWSTutorialsOnline amazing thanks sir.

@dineshsanklecha3226 2 жыл бұрын

Hi, how can I handle schema changes from S3 to redshift. I have a file in S3, i have tried to load the file from S3 to redshift by deleting one column and also tried adding new column but it doesn't make changes in the exisiting table i have in redshift. Thanks

@AWSTutorialsOnline 2 жыл бұрын

if S3 data structure is changing, you need update the external table in Redshift. Otherwise, try to create external schema from Glue Catalo. It might be dynamic. I have not tested for the scenario you are talking about.

@dineshsanklecha3226 2 жыл бұрын

Thank you, i will give a try.

@sanjeettkumar4746 2 жыл бұрын

Will it work for parquet file?

@ponniramaiah7747 3 жыл бұрын

Big Thanks!

@AWSTutorialsOnline 3 жыл бұрын

Glad it helped!

@ketansahu8476 3 жыл бұрын

Hey, As always your videos are fantastic. I got a question!!!. I was trying to replicate this, but I'm getting an error while creating an external table. Did you also create a JDBC connection in the Glue catalog?

@AWSTutorialsOnline 3 жыл бұрын

were you creating external table for S3 or Glue Catalog? The Redshift IAM role permission works in a different way for both cases. You don't need JDBC connection for external table.

@ketansahu8476 3 жыл бұрын

@@AWSTutorialsOnline I was creating the External table for S3. I was on this step, "create external schema s3dataschema from data catalog database 'dev' iam_role '{Redshift-Role-ARN}' create external database if not exists; And the error I got is "ERROR: Failed to perform AWS request, curlError=Failed to connect to glue.us-east-1.amazonaws.com port 443:connection timed out". I asked you about the glue connection because when I checked about this error AWS mentioned that check connection in the glue catalog. Appreciate your help and response

@AWSTutorialsOnline 3 жыл бұрын

With your syntax, you are creating external schema for Glue Catalog not S3. The S3 syntax goes like shown below. Most - probably, you are creating external schema with a glue catalog which has catalog from Redshift database itself. Hence connection error. Use the syntax like shown below to create external schema from S3 based data. create external table s3dataschema.countrybusinessindex( country nvarchar(100), businessindex int ) row format delimited fields terminated by ',' stored as textfile location 's3://dojo-data';

@ketansahu8476 3 жыл бұрын

@@AWSTutorialsOnline Sorry, I was wrong with my interpretation about S3 and Glue. But I'm having this error with the Glue. As I can understand from your tutorial, to implement the code you mentioned above, I have to first create the external schema s3dataschema. And there, I'm stuck with the error. :( But Anyway Thanks for your response and help.

@AWSTutorialsOnline 3 жыл бұрын

External schema to Glue works little different syntax and role permission wise. Here is a link for that. Hope it helps. docs.aws.amazon.com/lake-formation/latest/dg/tut-query-redshift.html