I have one doubt- Is it possible to combine structured and unstructured data in aws glue visual studio?
@satishmajji4812 жыл бұрын
Why did you add Transform0.repartition(1) to create an output file?
@joshpowell21942 жыл бұрын
It appears that Glue does not automatically aggregate files if the source files have nulls. This line of code is the workaround for that issue I think. It appears to function that way for me so far, but I havent extensively tested it
@satishmajji4812 жыл бұрын
@@joshpowell2194 repartition is used to increase or decrease the number of partitions of a file. Here, he used it to decrease the partitions from 16 to 1. However, coalesce is more preferred over reparation to decrease the number of partitions as it involves less shuffling of data.
@joshpowell21942 жыл бұрын
Thanks @@satishmajji481 , Do you coalesce on each line during the mapping step (to handle null)? or is there a dataframe level operation that uses coalesce on the whole set at the end?