Just started following your playlist since last 3 days. .. The way you have provided explaination it's so amazing concepts which I could not clear in last 7-8 months. in just last 3 days I got clarity ont those topics. .. Thanks a lot for creating such amazing content.
@rajasdataengineering7585 Жыл бұрын
Thanks Shashank, for your comment! Keep watching
@azarudeena64672 жыл бұрын
Easy to understand
@rajasdataengineering75852 жыл бұрын
Thank you
@ajinkyamore83592 жыл бұрын
Really Nice Explanation. Thanks
@rajasdataengineering75852 жыл бұрын
Thank you
@PavanKumar-tt8mm2 жыл бұрын
Good Raja. Today i had learn new topic..Thankyou.
@rajasdataengineering75852 жыл бұрын
Thank you Pavan
@deevjitsaha31682 ай бұрын
i tried creating parquet file in gzip compression type but it created multiple part files. however it supposed to create one file right??
@karamveersolanki1382 жыл бұрын
Hi Raja, one doubt: Regarding splitable, you said more than one core can access it. Isn't it means that the file is spread over multiple partitions and is available for parallel processing.
@rajasdataengineering75852 жыл бұрын
Good question Karamveer. The data is distributed across nodes in the form of partitions but that's within cluster environment (within onheap memory when we talk about spark). But what we are discussing here is file storage within external system such as dbfs, S3, adls, hdfs etc. So when spark is reading data from external environment, if the huge file is not in splitable format, it would take more time to distribute the data across nodes in the form of partitions because that non-splitable file cant be read by multiple cores at a time. Hope it is clear. Thanks for this good question
@sohelsayyad5572 Жыл бұрын
thank you sir, if huge file is not splittable then, can we convert its compression format to make it splittable, if yes how do we do that ? Also is there any scenario of parquet/orc/avro where its not splittable and need workaround. how we resolve it ? 👍
@srinubathina7191 Жыл бұрын
Thank You Sir
@rajasdataengineering7585 Жыл бұрын
Most welcome
@Mehtre1089 ай бұрын
Hello sir what is sequence to watch videos because some are not there in playlist
@kanstantsinhulevich4313 Жыл бұрын
Hey, Raja. I know that parquet file with gzip codec is splittable. Of course if we compress csv file with gzip codec it won't be splittable. It would be nice if you will ad some clarification.
@rajasdataengineering7585 Жыл бұрын
Hi Kanstantsin, yes you are right. Parquet file with gzip is splittable by default while CSV with gzip is non-splittable by default. However there are some workaround to split gzipped CSV files like reading it in textinputformat api or pre-splitting the gzipped file into multiple pieces
@abhaybisht1012 жыл бұрын
Nice content Raja 🤟
@rajasdataengineering75852 жыл бұрын
Thanks Abhay
@karthickrajachandrasekar84862 жыл бұрын
Hi raja, thanks for amazing explanation. I have one doubt is there any ways, after compressing into gz, same name will shown?
@rajasdataengineering75852 жыл бұрын
Yes same name will be shown after compression
@karthickrajachandrasekar84862 жыл бұрын
@@rajasdataengineering7585 It will shown like part004 like that. How to fetch the same name that will be given csv?
@rajasdataengineering75852 жыл бұрын
If you want to have specific name, dataframe can be converted to pandas and write with specific name
@sravankumar17672 жыл бұрын
Nice explanation Raj 👌 👍
@rajasdataengineering75852 жыл бұрын
Thanks Sravan
@abhinavsingh1173 Жыл бұрын
Your course it best. But problem with you course is that you are not attching the github link for your sample data and code. Irequest you as your audience please do this. Thanks