Joining Files with AWS Glue

  Рет қаралды 2,968

Borrowed Cloud

Borrowed Cloud

Күн бұрын

Пікірлер
@ajayshirke6917
@ajayshirke6917 7 ай бұрын
Very clear and well presented Video
@mohamedmubeenazath2651
@mohamedmubeenazath2651 Жыл бұрын
I have one doubt- Is it possible to combine structured and unstructured data in aws glue visual studio?
@satishmajji481
@satishmajji481 2 жыл бұрын
Why did you add Transform0.repartition(1) to create an output file?
@joshpowell2194
@joshpowell2194 2 жыл бұрын
It appears that Glue does not automatically aggregate files if the source files have nulls. This line of code is the workaround for that issue I think. It appears to function that way for me so far, but I havent extensively tested it
@satishmajji481
@satishmajji481 2 жыл бұрын
@@joshpowell2194 repartition is used to increase or decrease the number of partitions of a file. Here, he used it to decrease the partitions from 16 to 1. However, coalesce is more preferred over reparation to decrease the number of partitions as it involves less shuffling of data.
@joshpowell2194
@joshpowell2194 2 жыл бұрын
Thanks @@satishmajji481 , Do you coalesce on each line during the mapping step (to handle null)? or is there a dataframe level operation that uses coalesce on the whole set at the end?
Secure S3 Static Website with SSL/TLS Certificate
9:57
Borrowed Cloud
Рет қаралды 1,9 М.
AWS Hands-On: ETL with Glue and Athena
22:35
Cumulus Cycles
Рет қаралды 31 М.
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 14 МЛН
ETL | AWS Glue | AWS S3 |  Load Data from AWS S3 to Amazon RedShift
37:55
Cloud Quick Labs
Рет қаралды 83 М.
AWS Tutorials - Joining Datasets in AWS Glue ETL Job
25:57
AWS Tutorials
Рет қаралды 6 М.
Manage AWS Glue Jobs with Step Functions
19:36
Knowledge Amplifier
Рет қаралды 15 М.
AWS Glue Export Single File to S3
20:26
I am Lu
Рет қаралды 4,4 М.
Intro to Amazon EMR - Big Data Tutorial using Spark
22:02
jayzern
Рет қаралды 33 М.